+Design notes for a language - 2013 version
+===========================
+
+Design Philosophy
+-----------------
+
+A large part of design is compromise. Once the good ideas are formed,
+they must then be blended nicely, and this blending always involves
+compromise.
+
+So we need a clear philosophy to guide this compromise. This
+philosophy does not serve to provide simple answers, but rather
+provides some unifying themes that can but use to motivate, or at
+least to justify, the answer.
+
+1/
+ The compiler needs maximal information so that it can detect errors
+ and optimise implementation.
+
+ The syntax needs to be clean and non-distracting so that a human
+ reader can easily see what the code is trying to say.
+
+ These two principles are clearly in conflict. One wants to add
+ more to the program, the other - less. Finding a balance between
+ these two for each design element is key to success. The best answer
+ will involve finding a syntax which provides information without
+ distraction. This will not always be possible (if ever) but is a
+ worthy goal.
+
+2/
+ Language should enable a feature to be provided, rather than provide
+ it directly.
+ This means that the feature can be in a library, and can be changed
+ using the language itself. This is inherently more flexible.
+ However if the language doesn't "know" about a feature, it is less
+ able to type-check and optimise it.
+
+
+ 3/ Things that are different should look different.
+ This allows the compiler to correct silly mistakes more easily, and
+ reduces the chance of making them.
+
+ So trying to use just a few core concepts is a mistake. It muddies
+ human thinking and makes life harder for the compiler.
+
+ 4/
+ familiar is better than novel
+ mnemonic is better than obscure
+ short is better than long
+ consistent is better than irregular
+ pronounceable is better than unpronounceable
+
+
+Design Concerns
+---------------
+
+Name of language
+token syntax
+comments
+expressions or statements
+ if/then/else -> ? :
+ But what form could we use for a case/switch? Maybe we shouldn't.
+
+syntax of structure blocks
+How types are constructed and described
+ - scalar: int, float, char, Bool, enum
+ - vector: zero based. with size
+ - struct
+ - tuples - how first-class are they?
+ - union/variant/alternate
+ - non-discriminated unions?
+ - lookup table, stack, queue,
+ - functions and methods
+ - lambda forms
+ - closures
+ - references
+Does a "pointer to a pointer" make sense in a strongly statically
+ typed language?
+Should enum values be pervasive, or require typename: prefixes?
+ 'import' should be allowed to import an enum namespace.
+Slices
+ - pointer + length. What is the role of 'capacity' in Go?
+formal parameters - in/out/inout? Just a struct.
+ are the names of parameters part of the type, or not.
+ Probably not. But names in struct are part of the type. So not
+ really a struct.
+return values from function: 'return', or name=value
+closures for loops
+ what does 'break' mean in a closure? or 'continue'? or 'return'.
+ I guess the closure includes implicit labels. _loop _func
+ So "return label value,value" could return from enclosing block named
+ 'label'.
+break/continue/goto
+__reserved_names__
+case/typecase/match/switch
+exceptions/errors
+defer/finally/after...
+ how is this scoped?
+ - does it create a block: try BLOCK finally STUFF
+ - does it run on function exit
+ - can it attach to the lifetime of a value:
+ x = take_lock(semaphore)
+ unlock implicitly happens when x binding is broken.
+printf - formatting, input/output in general
+manifest constants of various types
+What are strings? What operations are in the language?
+What sorts of numbers are built in?
+ Sized ints, unlimited ints, float, fixed, imaginary, complex
+Initialization and default values
+Polymorphism:
+ subtype
+ parametric
+ If parameter is untyped, then it can still be useful if
+ any needed function (e.g. compare, add) are passed in as well.
+ These functions could be in a structure ... which could be a vtable.(virgil)
+Scope of bindings
+overloading of bindings.
+ - functions/methods with different arg types
+ - operators for different objects
+ - function with different first arg type is like a method
+ and can be useful for sin(), min() abs() etc.
+ Certain names could be declared as methods and converted to
+ method syntax when called.
+subrange types
+ - useful for validating array references
+ - useful for avoiding 'default' case in switch
+ - can be deduced from some arithmetic: 'mod' particularly
+overriding of bindings
+ - relates to inheritance.
+Nature of bindings: immutable, const, var, computed
+target: script, jit, pre-compile, ...
+"eval"
+memory allocation and management
+static or dynamic typing
+threads/tasks
+co-routines, generators
+macros
+equality tests
+assignment semantics - references, copies
+ binding versus assignment: particular relevant with tuples
+Type inferences
+ avoid redundancy, but redundancy can be good.
+modules
+foreign function interface
+pattern matching for destructuring
+ This is the general case for "tuple = tuple" with the question of
+ which bits on the left are literal and which are formal names.
+ This is only an issue if the appearance of a value starts like
+ an identifier "Some(value)" "None".
+'unsafe' at different levels
+ - bit manipulation
+ - memory allocation/casting
+ - concurrent access.
+tracing/debugging
+ can the language/compiler make it easier to trace code execution?
+ flag every time a field changes
+
+dependant types
+reflection
+ tags for reflection
+
+structure embedding using "import struct" or "import fields from
+ struct" ??
+
+function attributes: optimise_options, inline, ....
+
+destructors?
+ - go doesn't have them - relies on 'after'.
+
+introspection
+ why is this so much fun? I never use it in C!!! or Python.
+ introspection in Go look weird and is probably related to lame type
+ system.
+
+nullable pointer: builtin functionality or via a union?
+array indexing: should checks be explicit where not resolved by type?
+operators: type must match. A constant can be one of several types.
+ iterate through options until expression is type-correct. If
+ there are several constants .... might need to be careful.
+ a = 1 << n '1' must match 'a'
+ if (1<<n) < 5:
+
+test case support
+
+Design Details
+--------------
+
+ Lexer:
+ Parsing of keywords and identifiers and comments are fairly
+ obvious:
+ keywords and identifiers are [_\alpha][_\alpha\num]*
+ comments are //..EOL or /*..*/
+ symbols are longest string that is known
+ spaces separate as needed
+ NEWLINE and indent changes are tokens which are handled at the next
+ level
+ That leaves strings and numbers.
+
+ Strings bounded by '' or "" may not contain newlines. I would
+ rather they didn't contain delimeter either. \q and \Q can be used
+ if the quote is really need.
+
+ Need something that can contain newlines and is (almost) completely
+ literal. Everything upto the close must be included verbatim.
+ Go uses ``, Python use ''' '''
+ ''' is nice because it is more unique and less likely to want to be
+ in the string. But I would really like a multi-line string to be
+ completely verbatim. That would require a uniquifier at the start:
+ '''unique123
+ stuff goes here
+ '''unique123
+ Also, does the newlines immediately after the ''' go into the
+ string?
+ What about just before the closing '''??
+
+ I think:
+ "" and '' both enclose strings in which various \escapes are
+ interpreted and newlines are not permitted.
+ ''' must come at end of line and following text upto ''' one a
+ line by itself forms a string. Trailing newline is included. The
+ leading one isn't.
+
+ And what about numbers?
+ These are combinations of:
+ digits
+ decimal point
+ 'e' or 'E' possibly followed by + or -
+ 'x' or 'b' or 'o' for base
+ '_' for separator
+
+ Maybe 'i' or 'j' to indicate 'imaginary'.
+ But what if I want polar co-ordinates?
+ Should I require 12 + Im(12)
+
+ A type would need to be able to declare that is can be
+ initialised by a number, with one of several suffixes?
+
+ So at tokenisation we accept any string starting with a digit,
+ containing [0-9._exboEXBO] or [-+] immediately after [Ee].
+
+ Note that decimals cannot start with '.', so 0.9 is needed
+ rather than .9
+
+ Also there are no length suffixes. We have untyped constants
+ like Go does. Though we could allow [iufl] as extra letter to
+ support that if we wanted to.
+
+
+ type names:
+ Arrays/vectors:
+ Pascal: array [0..5] of string;
+ C: string[5] (sort of)
+ Go: [5]string
+ Rust: [string, ..5]
+ Except for Pascal which is too verbose, it is hard to choose
+ between them. Putting 'string' at the end makes sense and
+ is more pronounceable (Pascal reads well but looks ugly).
+ But I find it hard to get used to the look of Go.
+ That probably leaves Rust which looks a bit like an array
+ constant: [ onestring, anotherstring, etc]
+
+
+ Structures:
+ Pascal
+ record
+ a: t1;
+ b: t2;
+ end
+ C:
+ struct name {
+ t1 a;
+ t2 b;
+ }
+ Go:
+ struct {
+ a t1
+ b t2
+ }
+ Rust:
+ struct NAME {
+ a: t1,
+ b: t2,
+ }
+ Python:
+ class foo:
+ def xx():
+ pass
+
+ I prefer the ': type' of Pascal and Rust. I like semicolons to
+ separate as comma might separate names with same type (maybe)
+
+ Functions:
+ (arguments) : type
+ would be consistent if ':' introduced the type, but that would
+ mean arrays should be "[]:string" which is unnecessary.
+ So maybe the ':' terminates the list of names. Then we need a
+ special separator: (arguments) -> (return type)
+
+ Do we want a name to introduce? "struct" "fun"/"fn"? If so we
+ should for consistency have array ... like Pascal.
+
+ Without a name:
+ { name:type, name:type} is a struct
+ [length:type ] is an array ([5:int] reads "array of 5 integers").
+ (from:type,from:type -> to:type, to:type) is a function
+ { tag -> name:type, name:type; tag2 -> name:type, name:type } is
+ a union
+
+ That is really rather elegant. May be a bit too terse though, don't
+ know.
+
+ 'tags' get assigned globally unique numbers, so multiple tagged
+ sets can be embedded in another.
+ However if the first one is "tag = 0 -> ... ,", then zero etc
+ is assigned and embedding is more limited.
+
+ Does that just leave pointers?
+ - closures have same apparent type as functions
+ - methods do too.
+ - tuples only have implicit types - unless they are structs or
+ function parameter/return
+ - vtables ? interfaces
+
+ So: pointers.
+ Maybe just a leading '*'.
+ But do we want to distinguish among:
+ borrowed
+ owned
+ counted
+ collected
+ These are properties of the object in some cases, the pointer in
+ others.
+ An object can be: stack, owned, counted, collected
+ A pointer can be: strong or weak.
+ But a strong pointer needs to know what to do when dropped:
+ - nothing (so may as well be weak)
+ - destroy/free
+ - dec ref and maybe destroy/free
+
+ So a pointer needs to be one of:
+ borrowed, owned, counted, collected
+ * ~ + ?
+ Pointers to stack are always borrowed. Raw pointers are borrowed
+ too, but can only exist in unsafe code.
+
+ And vtables. These are usually created automatically from
+ method and interface definitions. So they don't need a syntax.
+ Interfaces do though. They are sets of named functions. So
+ they look just like structures but all the fields are functions.
+ In fact they *are* structures. They use embedding for
+ inheritance.
+
+ Pointers to different values are quite different.
+ - pointer to scalar or record is just an address
+ - pointer to fixed-sized array is just an address
+ - pointer to var-sized array is address plus size
+ - pointer to implementation is a pointer as above plus a pointer
+ to the vtable.
+ This makes var-sized arrays with vtables a little awkward.
+
+ But is there really any value in allowing implementations on
+ var sized arrays? Yes, I think that would be nice. I guess
+ they get converted to struct for a vtable pointer, with the struct
+ stored ... in the 'once' space.
+
+ Should it be possible to add to a struct after it is defined?
+ This sort-of makes sense for unions where extra cases could be
+ added.
+
+ With structures we can embed old in new. Maybe that way around is
+ best.
+ This new union has all the value in the old one, plus some more
+ values.
+ A value from the old union can be assigned to a variable of the
+ new union, but not viceversa without range checking.
+ How does multiple embedding work with unions? With an offset?
+
+
+ Initialisers for different types:
+ Array: [ value, value, num: value, ]
+ struct: { name: value }
+ union: { tag -> name:value, name:value
+ | tag -> name:value
+ }
+ tuple: (value, value, value) - can map to several known types.
+ function: ....
+
+ functions are more awkward. If the type is given explicitly as
+ def name : type = value
+ then we don't want to repeat any of it, but if we don't have an
+ explicit type, we need to at least list parameter names.
+ def foo = (arg, arg, arg) {
+ code
+ return rv,rv
+ }
+
+ or assume '_' is the argument list
+ def foo = {
+ my, args, = _
+ return args, my
+ }
+ Real asymmetry. What is the inverse of 'return' ?
+ import? with? receive?
+ 'receive' might work with co-routines? Maybe not.
+
+
+ Another thought about structs.
+ There are 2 sorts: Those which are used for external
+ communication and must be in exact order with no padding,
+ and those which are only used internally and the compiler is free
+ to reorder as it chooses. The second sort could usefully be
+ extended later. This would allow a literate programming style
+ where elements are added to the structure at much the same time
+ as the code which uses it.
+ External communication structures could be flagged as big-endian
+ or little-endian (or host-endian). or should each field have
+ an end? Probably not.
+
+
+ Pointers:
+ I don't really like the '*' prefix operator in C, nor the need for
+ '->' after pointers where '.' is enough after a struct.
+ Pointers to arrays behave the same as the array. It would be nice
+ if pointers to all things behaved just like the things.
+
+ There are some exceptions though:
+ 1/ p = foo
+ this could make the pointer point to foo, or copy the contents
+ of 'foo' into the thing pointed to by 'p'.
+ Both are valid so we need a simple syntax to differentiate.
+ 2/ if p < q:
+ Could compare the addresses of the boxes for e.g. double-lock
+ deadlock avoidance, or could compare the values themselves.
+ 3/ if p == q:
+ Does this say "the same box" or "the boxes hold the same
+ values"? Both are necessary.
+
+ i.e. these are the things that you can do with a pointers: assign
+ it and compare it.
+ if p.foo and p[foo] and p(foo) automatically dereference, what
+ syntax do I want for the occasional explicit dereference. Or do I
+ want a syntax to inhibit auto-dereference?
+ p. = foo
+ could assign to the box rather than assign a new box. Or
+ p = foo
+ could assign to the box, while
+ p = &foo
+ could assign a new box. But pointers to pointers become
+ problematic.... or do they. If general the languages imposed
+ whatever referencing or dereferencing is needed to match types.
+ If & is present, then it skips up one level.
+
+ But parameter passing should be like assignment, and the default
+ should be by-reference if possible. So assignment should be
+ by-reference. So we need
+ *p = *q
+ or
+ p. = q.
+ or decide that annotations on the right are deduced based on type,
+ *p = q
+ p. = q
+ p[] = q
+
+ p < q should normally compare the values though.
+ p == q should compare pointers
+
+ If p is a pointer to pointer to int, then
+ p = 5
+ means a double dereference on the else, which I think is wrong.
+ p =
+ should alway cause p to point to something new.
+ p. =
+ can change the thing p points to. so
+ p.. = 5
+ would be needed. That's a bit ugly.
+ (p) = 5
+ does that help? It shows something odd is happening.
+ **p = 5
+ At least we are familiar with that.
+
+ As pointers are distinct from other values (e.g. they cannot have
+ methods), they could have a different assignment operator.
+ p = value # assigns the pointer
+ p := value # copies the value into p.
+
+ Maybe I'm trying too hard here. 'p' is the pointer. Any operation
+ on 'p' other than comparison dereferences it. That is ugly.
+ p === q
+ could be pointer comparison
+ or
+ p is q
+ like in 'crack': http://www.mindhog.net/~mmuller/projects/crack/Manual-0.3.html
+
+ Maybe "p." means "dereference as many times as needed".
+ So it never receives a pointer, only a non-pointer value.
+ To change a pointer you could
+ newp = thing where newp :int* = p
+
+ i.e. explicitly name the type of the value being copied.
+ p =(int*) q
+
+ Thought: assignment is different from function parameter binding,
+ as the latter can never mean a 'copy' if the target is a pointer.
+
+
+
+ There might be a bigger picture here. 'pointers' are very
+ different values to most others. They cannot be the receiver of
+ methods, only the thing they point to can. But what about pointers
+ to arrays aka slices. Can they be the receivers of methods?
+ Yes in exactly the same way that arrays can.
+ Which is like saying that 'pointers' can receive methods, but only
+ the methods that their targets receive.
+ This is a little awkward for slices because they need to carry a
+ parameter: the size. Unless we support parameters more broadly and
+ decide that any type might have arbitrary runtime parameters.
+ So a slice becomes a hidden object with a size and a pointer.
+ Does that help???
+
+ What I would like is:
+ a < b
+ compares the pointed-to values. It is exactly a.cmp(b) < 0
+ a == b
+ does that same: a.cmp(b) == 0
+ a = b
+ assigns the values
+ a = &b
+ assigns the pointer.
+
+
+ My biggest problem is that the mapping "x < y => a.lt(y)" combined
+ with the fact that field selection auto-dereferences, means that
+ "a < b" compare content. So "a == b" must also compare content.
+ So let's go for 'a === b' asks 'are these as indistinguishable as
+ possible'.
+ And in 'unsafe-pointer' mode, a <<< b can compare pointers.
+
+ Given that, what does "a = b" do? It really should assign
+ pointers.
+ So remaining question is how to assign values.
+ a = .... // always changes a, not what a points to
+ a. = .... // always changes exactly what a points to
+ a.. = ....// changes the thing pointed to by what a points to.
+
+ a = copy b ???
+ This is like "swap a, b" in a way.
+
+ Getting the 'type' of that would be interesting. owned? borrowed?
+
+
+ Structures can be internal (no implied order or padding) or
+ external (order and alignment rules apply, endian can be
+ specified).
+ Unions can define a local enum, or use a predefined enum (Boolean
+ might be useful)
+ For unions, the enum might be implicitly stored or a stage
+ parameter, or it might be an explicit structure member.
+ Do we want names for these things like 'struct' in C, or do we want
+ to use 'traits' to distinguish, or something implicit?
+
+ Enums:
+ In C, the enum values become global names and the variable has one
+ of those values, which are integers.
+ In Go, enum values are local to the type: type.foo type.bar.
+ In Rust, the values are global function names which can return
+ a suitably qualified union value.
+
+ I like namespace control. I also like enums being tied closely to
+ unions. So if an enum is actually a union, then ??
+ If 'v' is a variable, a 'e' is an enumerate value with member x,y
+ v can be compared with typename.e
+ v.e is true or false
+ v.e.x and v.e.y are only accessible if v is known to be e
+
+ or v.e could be one of these things that is a value or nothing, and
+ can be used with 'else'.
+ If there is no ambiguity, v.x is valid and can be used if v.e is
+ known. Or can be used with an implicit test of v.e
+ if v.x == 3
+ is implicitly "if v.x && v.x==3"
+ or if a := v.x
+
+
+ Pointer traits
+ The compiler understands 2 types of pointers: strong and weak.
+ Weak pointers are held in the context of some strong pointer. The
+ strong pointer may be to the same object, or it maybe to some
+ container object or allocation domain which is known not to free
+ anything for the time being.
+ Weak pointers can only be stored in life-limited variables which
+ are subordinate in lifetime to the strong reference. This includes
+ local variables in a lower scope, and variables inside the
+ strong-referenced value.
+
+ Strong pointers can be moved, copied, or destroyed.
+ When moved, the previous storage must become invalid.
+ When copied, the storage manager much be told. This might
+ copy the object, add a refcount, or do nothing and leave it
+ up to mark/sweep garbage collection.
+ When destroyed, the storage manager again must know. It
+ might decrement the reference count, free the object, or again
+ do nothing.
+
+ "same object" is a fairly vague term which needs to be made
+ specific.
+ It could be a structure - holding two pointers, one strong and one
+ weak. If they were to the same type that would be pointless. If
+ the strong were to a container and the weak were to an object that
+ is somehow 'inside' that container, it would make sense.
+ It could also be a linked collection of structures such as a linked
+ list or a binary tree. In order for the language to be able to
+ "reason" about such a thing it would be necessary for it to "know"
+ about such a thing, so there would need to be a way to describe
+ it. There need to be rules for when two structures (including
+ arrays) are part of the 'same' object and when they are part of
+ different objects. We also need to know if these things are
+ hierarchical: whether something can be a 'sub-object'.
+
+ This important rule is that all weak references in an object must
+ be invalidated before or as the superior strong reference is
+ moved.
+ This has many interesting cases, one of which is the doubly linked
+ list. In such a list the forward link might be seen as strong, so
+ the backward link would need to be seen as weak. It is a
+ distinguished weak link as it can always be found from the object.
+ So to move the strong reference, we need to be sure all weak
+ references are gone. That doesn't mean we need to invalidate all
+ weak references in the whole object, as we know how to get to the
+ only one.
+ So it seems some weak references as 'findable' - probably in
+ various ways. The language needs to know if an object can
+ contain:
+ - no weak references (so moving strong references is trivial)
+ - findable weak references (so moving a strong reference just
+ requires focused removal), or
+ - general weak references (so moving a strong reference requires
+ invalidating all references)
+
+ That case of doubly linked lists is also interesting as a structure
+ can be on multiple lists. This can be managed in a couple of ways:
+ - refcount the structure so it can have multiple strong links for
+ the forward links
+ - have one list subordinate to another so that the struct must be
+ removed from the subordinate one (or verified not present in)
+ before being removed from primary one. This is probably the
+ sort assertion that would need to be specified in the language
+
+
+ When we have a weak reference stored in a structure we need to be
+ able to explain why it is safe. And when we remove a strong
+ reference, we need to explain why we know that all dependant weak
+ references are known to be resolved.
+
+Attributes of things.
+ Different fields, locals, globals etc may have different
+ attributes beyond their type, or maybe as part of their type.
+ Some of these can change during the lifetime of the name or value.
+ I need to collect them all together to try to understand them.
+
+ constant - a value assigned (or bound) once and never changed.
+ if it is scoped then it is bound once whenever the scope is
+ entered though possibly to a different value each time.
+ A name might get reused within a scope if it is
+ only assigned when dead. e.g.
+ a = foo
+ print a
+ if bar:
+ a = 1
+ else:
+ a = 2
+ print a
+ here 'a' has been reused as the foo' value dies
+ after the first print (it can never be seen again).
+ This 'a' could still be 'constant'
+
+ There is a question here: when would 'foo' be
+ destroyed if we had destructors? It feels like it
+ should be destroyed as soon as it dies. However if
+ it holds a lock or other resource, we might want it
+ to only be destroyed when it goes out of scope.
+
+ stable - This is like 'constant' but applies to a value
+ rather than a name. It means the value (which
+ might be a large data structure) will not change
+ for the moment. 'stable' needs to be asserted
+ while being the sole owner of a value.
+ add-only- This too applies to a value. It says that other
+ values can be added to the main value, but not
+ removed.
+ This only applies to values included by dependant
+ references.
+ readonly- This attribute of a borrowed reference means that
+ it will only be used to read. The value might
+ change, but this reference won't be used to change
+ it.
+ A value can only become stable while all borrowed
+ references are readonly.
+ mutable - This is maybe just the absence of the above.
+ A mutable field can be changed as long as we have a
+ mutable pointer to the structure.
+ A mutable local variable can be updated when not
+ dead and can possibly have it's address taken.
+
+ owned - This is a strong reference - when it dies the
+ object dies.
+ Maybe we can have 'indirect' references through
+ e.g. a refcounter. Then when the ref dies, the
+ refcounter is decremented. And if it dies, the
+ target dies.
+ borrowed - A weak reference. Must be in the context of some
+ strong (owned) reference to something. The strong
+ reference might be to the same thing, or it might
+ be to a container object which is add-only
+ dependant- A reference in a structure cannot be borrowed but
+ it can be 'dependant'. This means that the
+ referred value must exist somewhere else.... need
+ to clarify this.
+ pure function
+
+
+ I think we will need user-defined attributes. These record
+ important aspects of the state of a value. These can be
+ asserted by unsafe code, and required by functions etc and the
+ compiler can check that requirements are always met.
+ So a list_head might have an 'empty' flag which is cleared by
+ 'list_add' and has a known value after the "list_empty" test.
+ The 'stable' flag in one data structure might be implied by
+ the 'held' flag in a lock.
+
+ Integers could have a 'non-zero' flag which allows them to be
+ the divisor. They might also have an upper bound - like
+ arrays?
+ I sometimes "know" that a number is a power of two, which is
+ easy to test for
+
+ A similar flag would apply to pointers which may, or may not, be NULL.
+
+Names for types?
+ I don't like typedef in C. It hides stuff.
+ I see a type name and I don't immediately know if it is a
+ pointer or a structure.
+ Structures need names as do enums and unions.
+ Pointers don't need names.
+ Scalars (uint32 etc) have names and don't need more, unless
+ some sort of 'unit' concert were included.
+ What about arrays? That are like pointers so maybe not.
+ But array contains a size. It is like a record in that it
+ combines to things - member type and size. Is that enough to
+ justify a name? I suspect not, so in the first instance, only
+ things with names can have names.
+
+Polymorphism:
+
+ Parametric polymorphism is quite important.
+ Various types can be parameterised by other types. This
+ sets the types of fields in structs and members of arrays etc.
+ Also want values as parameters for the size of arrays.
+ And internal parameterisation so the size of an array can match
+ some int field, or an enum can select an option.
+
+ Functions/methods are very important here. The type
+ parameters may be implied by the actuals. This then needs
+ to imply the types of the results.
+ Each result can be an intersection?? Or maybe it can be
+ a type-function of the parameters - but do we allow
+ intersections?
+
+ A parametric type is like a function. It needs a name and
+ some parameters:
+ type foo<a:comparable,b:bucket<a>> = {
+ thing : a
+ stuff:b
+ }
+ A method for a parametric type is then a parameteric-typed
+ function
+ interface foo<a,b> = {
+ fn foodo(first:a, second:a -> bool)
+ }
+ methods might want an extra type parameter, 'self'.
+ Using this would require the compiler to know that two objects
+ didn't just conform to the same interfaces, but were of the
+ same type. Shouldn't be too hard.
+
+ But we also want a function to be parameterised.
+ choose<a,b>(which:Bool, first:a, second:b -> a|b)
+ which could be:
+ choose<a:A, b:A>(which:Bool, first:a, second:b ->
+ result:A)
+
+ When this is called, the actual parameters have a list of types,
+ including any interfaces. Each use of a type-parm intersects
+ possible types until we have minimum set. Then results is
+ a possible list of types... needs to be the intersections..
+
+
+Interfaces:
+ Interfaces list methods that can be called on a value.
+ Often we want an interface with a single method. This is
+ particularly useful for the methods that operators map to,
+ but there are lots of others. Constructor, destructor etc
+ etc.
+ Go effectively does this as you only need to declare the
+ methods that you will use - the effective type is the sum
+ of all these.
+ One advantage is you don't need two names for everything
+ Comparable:compare, addable:add, destroyable:destroy.
+
+ But it can be good documentation to collect related
+ methods together into an 'interface'. It can also provide
+ namespace control.
+
+ method<A> foo(self:A, b:int -> bob:rune);
+
+ This declares a method which can then be realised for
+ different concrete types.
+
+ interface bar {
+ foo(self:bar, ...)
+ }
+ declares the same but "bar.foo", not "foo".
+ Maybe syntax should be more similar.
+ interface foo(self:Self: b:int -> bob:rune); ??
+
+ But now we have a preference for polluting the global
+ namespace.
+ If I want a method that is just for my type, it should not
+ require an interface. I guess:
+ method mytype.mymethod(thing, thing -> thong)
+ is a local method:
+ method foo.bar<mytype>(...)
+ is an interface method.
+ impl foo for mytype {
+ method bar(....)
+ }
+ can save some typing..
+
+ Every type effectively defines an interface which applies only
+ to itself using its name.
+
+ Syntax is still a bit ugly. I don't want to assume 'self'
+ is the parameter name, and I don't want it to appear as
+ first arg. So where do I put it?
+ Go does
+ func (name Type) methodname (parameters) (result)
+ Which works because all methodnames are global ... sort of.
+ I want two methodname namespaces:
+ - local to the Type
+ - inherited from an interface. I could use
+ .interface.method??
+ func (name Type) .compare(other Type)
+ matches
+ interface .compare<T>(other T)
+
+ Another thought. I sometimes want an object to respond to
+ the same interface in two different ways. A bit like being on two lists.
+ I might have a table with byname and bynum lookup methods which both
+ use [].
+ mytable.byname['fred'] = entry
+ print mytable.byid[42].name
+ Sometimes ".foo" is an attribute, sometimes a view...
+
+ So an interface can be bound to the object, or to a name in the object.
+ Garbage collection:
+ Supposing that I thought this was a good idea, how would it
+ work?
+ The general idea is you keep track of what memory is allocated
+ in what sized blocks (as you have to for 'free' anyway) and
+ keep one spare bit for each piece of memory.
+
+ Then you :
+ - clear all those bits.
+ - scan all allocated memory looking for anything that
+ might be a pointer into the target area, and
+ setting the relevant bit.
+ - when you set a bit, you scan that memory as well
+ - when finished, any memory which doesn't have the bit
+ set is ipso-facto free.
+
+ Variations involve breaking the heap up into sections,
+ possible with different age properties, and only cleaning
+ one section at a time. That might mean keeping a record of
+ all pointers that cross a boundary from one section to
+ another.
+
+ Another variation involves copying boxes to make free space
+ more compact. This requires finding all inward pointers.
+
+
+ It is really hard to see this as a valuable thing. The
+ programmer should know what is going on in general, and the
+ language should be able to track and enforce proper handling.
+
+ It seems useful for language which encourage lots of
+ short-lived objects, e.g. for string and list manipulations.
+ The language should be able to track these, free them when
+ no-longer needed, and copy them to a more stable area when
+ they are made more permanent.
+
+ I think I'm going to have to write some code and see what
+ happens.
+
+
+ Type Equality:
+ When are two types "the same" ?
+ Some languages use "same name". Others use "same structure".
+ Both have their appeal.
+ Go uses structure unless both types have names - then it requires
+ 'same name'.
+ I wonder if "int" is a name. Probably it is.
+ I'm cautious of structural equivalence of structures. The 'names'
+ of the fields are important and I don't think two fields are the
+ same just because they are spelt the same.
+ 'arrays' don't have names, so structure works there.
+ Parameterised types should match structurally too - array is just a
+ parameterised type. Ditto for pointers.
+ Two structures declared separately are different though, even if
+ the names and types are the same.
+ What about functions: Are the parameter/return names part of the
+ type, or are they merely internal name bindings? They look like
+ structures but aren't usually used as structured. Yet when arg
+ lists are long it could make code more readable to tag some params.
+ I guess we fall back on "unnamed things match named things".
+ If both function declarations have names, they must be the same
+ declaration. If either doesn't they match on structure.
+
+ A manifest value doesn't have a type. So if I say:
+
+ a = { foo=1, bar=2 };
+ then there is no type. That value could be of any type for which
+ it is a match. If the type of 'a' is known, it must match. If
+ not, the binding is illegal. There is some finite list of types
+ which this could match and we effectively try them all until one is
+ type-legal, or we run out.
+
+
+ Automatic casting:
+ casting numbers to larger types is probably safe??
+ What about casting a Foo to a union which has Foo as one branch.
+ This can be a compile-time rewritten to a simple union creation.
+
+ Expressions or statements?
+ i.e. do I want to be able to say:
+ a = if x then b else c fi
+ so a becomes 'b' or 'c'.
+ I tend to think not. if/then/else/fi is a statement and making
+ it an expression as well can be confusing and make it hard for the
+ compiler to determine what your mistake is.
+ This also applies to 'switch/case' but not to while/for.
+
+ For 'if', can use
+ a = x ? b : c
+ but that doesn't generalise to switch very well. Does that matter?
+
+ Another syntax is :
+ a = b if x else c;
+ which is quite different to the reader - both human and machine.
+ For switch:
+ a = b if x,
+ c if y,
+ d if z,
+ e otherwise
+ using true
+
+Case/switch:
+ Should this just be syntactic sugar over if/then/else.
+ I don't like the idea of typecase or Rust:match as it
+ provides functionality that is 'only' in the case statement,
+ not stand alone. To that extent I like Go:type assertion as
+ it is stand alone.
+
+ But what about fall-through. It is implicit in C and explicit
+ in Go with yukky rules.
+ Why do we find that we need effective 'gotos' there, but not
+ so much elsewhere?
+ Maybe it is like dropping an else.
+ if a:
+ b
+ else if c:
+ d
+
+ or
+
+ if a:
+ b
+ if a or c:
+ d
+
+ No, it isn't quite like that.
+
+ switch (num_bits) { /* fall through... */
+ case 8: if (byte & 128) num++;
+ case 7: if (byte & 64) num++;
+ case 6: if (byte & 32) num++;
+ case 5: if (byte & 16) num++;
+ case 4: if (byte & 8) num++;
+ case 3: if (byte & 4) num++;
+ case 2: if (byte & 2) num++;
+ case 1: if (byte & 1) num++;
+ default: break;
+ }
+
+ readdir uses it: switch is on 'state' and we might handle
+ several consecutive states.
+
+ case TypeSegmentMap:
+ case TypeQuota:
+ /* These only get merged while in a checkpoint. */
+ if (fs->qphase == fs->phase)
+ break;
+ printk("Suitably in-phase\n");
+ /* FALL THROUGH */
+ case TypeFile:
+ case TypeSymlink:
+
+ one case is a special-case of another. The enum
+ doesn't show full structure. Really want to repeat
+ the TypeSegmentMap an TypeQuota with TypeFile etc.
+
+ switch(why) {
+ case NewSpace:
+ watermark += RELEASE_RESERVED * fs->max_segment;
+ /* FALL THROUGH */
+ case ReleaseSpace:
+ watermark += ACCOUNT_RESERVED * fs->max_segment;
+ /* FALL THROUGH */
+ case CleanSpace:
+ case AccountSpace:
+ /* Definitely no water mark here. */
+ break;
+ }
+
+ Again, there is a progression of values, like the first case.
+
+ So two options:
+ 1/ state-machine and we handle a state and move on
+ 2/ values have a specific/general relationship.
+ and probably other less general things.
+ FIXME
+
+ Maybe "switch" chooses just one branch, while "select" chooses
+ all relevant branches? Or something
+
+
+Break / continue / return
+ Do these have labels? I almost never need labels. But I
+ often use "continue" inside a 'switch' to break out and
+ continue the surrounding loop. A 'break' from nested 'for'
+ loops can be good.
+
+ But labels are ugly - how are they indented? Or are they part
+ of the 'loop' syntax?
+ Sometimes I want to break out of an 'if' so I make it a
+ 'while', which looks odd.
+ But breaking out of an 'if' would mean that to conditionally
+ break a while would need a label. But break from 'if' is
+ also conditional, so maybe not needed.... gets ugly.
+
+ The label could be implicit. 'break if' ???
+
+ This is really a controlled 'goto'. We can go to:
+ - end of block. 'continue' for while, 'break' for switch
+ - out of block. 'break' for while', 'break' for switch
+
+ What if we tagged blocks as being 'breakable'? Too
+ surprising.
+
+ What about "break variable" which leave the scope of the
+ variable?
+ This would be hard to break out of a while loop.
+ Or "next i" would continue a for loop
+ Or any variable bound once in the loop.
+ then "last i" could break out. or 'done' or 'have' or return!
+ let v =:
+ if asa
+ return
+ if sdfsfd
+ have v2
+
+Anonymous/blank variable.
+ Go uses '_' as a 'blank' identifier. It can be assigned or bound
+ to anything which evaluated the thing but ignores it. This makes
+ some sense.
+ For 'switch' it might be nice to have a similar thing which holds
+ the value which is being switched on.
+ If the name is used, the value must be 'true', else it must match
+ the value. This is a rather odd semantic. And what is the scope
+ of the name? Including blocks within switch would be useful, but
+ then nested switches redefine a name, which is less good.
+
+ So maybe it should be
+ switch _1 = value:
+ case _1 == 3: whatever
+ case _1 == 4: otherthing
+
+
+ Structured blocks:
+ { curly brackets are common, but a bit ugly }
+ BEING begin/end is rather verbose END
+ COMMENT ending with reverse-spelled open is cute but silly TNEMMOC
+ proposal:
+ indenting stuff after a colon is really cool
+ and works rather well I think.
+ It doesn't easily allow do { } while(); constructs, though
+ do:
+ stuff
+ until done
+ would work.
+
+ There is no harm in allowing a '{' instead of the ':', and then
+ ending when a '}' is found.
+
+
+ Indenting.... how far can I push this?
+ - two consecutive non-empty lines must have compatible indents
+ meaning the strings of leading space/tabs are either identical,
+ or one is a prefix of the other. Anything else is an error.
+
+ If the second is longer, then second line continues the first.
+ If they are the same, the first line is terminated if that make
+ sense.
+ If the second is shorter it must match an earlier indent and
+ all parse back to there is terminated.
+ This is all much like python
+
+ - Can we require that smaller units much be indented correctly.
+ i.e. if a nonterminal starts half way into a line and wraps,
+ then the next line must indent the same depth:
+ if a == b and a ==
+ d:
+ for a fairly ugly example. i.e. the cfactor is still open, so
+ the next line must be indented to the start of the cfactor.
+ This isn't needed for syntax but avoids confusing looking code.
+ Need to be careful to allow opening bracket at end of line.
+ i.e. if the only part of the nonterminal is a literal, then that
+ non-terminal doesn't count.
+
+ A "pragma" or "safety feature" would select among:
+ - ignore indent
+ - indent for termination.
+ - check all indents
+
+ Go uses newlines without worrying about indents. I don't think I
+ like this. Indents are important sometimes and newlines aren't
+ always.
+
+ So yes: options. which specify tab size for 'check all indents'.
+
+ Colons:
+ I seem to be using the humble ':' too much:
+ 1/ The conditional expression a?b:c
+ 2/ The indented block opener
+ if a == b:
+ do_something
+ 3/ path separator for modules
+ io:printf()
+ 4/ To introduce a type:
+ var foo : type = value;
+
+
+ so if I wrote:
+ if a == b ? io:printf() : print='yes':
+ Then I could get confused.
+ The first ':' can't be the block opener because the ?: isn't
+ finished.
+ But the second ':' could introduce the global module :print
+ So ?: doesn't conflict with if:, but we can't use ':' for modules.
+ So either use '::' like Rust and C++, or '.' like other namespaces.
+ :: is very noisy. Maybe that is good?
+ . is very quite and make the lhs look like a value, not a module.
+ Does that matter? Different things should look different.
+
+ Large scale structure:
+ Key elements of a program are:
+ declarations
+ these define types, methods, functions,
+ and give names to modules .. and other things
+ statements
+ These always live in functions/methods I think.
+ expressions
+ These mostly live in statements or some declarations.
+
+ So a program should be all declarations. But that isn't much fun
+ for "scripts". Or would scripts look better if they were forced to
+ keep code together with a low-syntax 'main' function.
+
+ func main():
+ code goes here
+
+ declarations start with the thing being declared:
+ func type mod const var impl
+ each is followed by a name and a value - maybe.
+ Not sure where imports fit in
+ import name-to-import 'path-to-module'
+ ??
+
+
+ Assignments/bindings.
+ There are 3 things one might want to say:
+ 1/ This new name is bound to this value for the rest of
+ this scope
+ 2/ This new name is bound to this value now, but it
+ might get changed later in this scope
+ 3/ This name is already present in this scope and I'm
+ now changing the value
+ We could use 'var' 'let' or 'set' or 'const' or 'let mut' or 'with'
+ to introduce one or another.
+ We could use "=" vs ":=" for different assignments.
+ We could not care about different between 2 and 3
+ We could use a sigil to differentiate 'new' from 'old'.
+ In a tuple "a,b,c = 1,2,3" we could conceivably have all 3.
+ We could add :type to name to introduce new name.
+
+ a = b // only allowed if 'a' already defined and mutable
+ a: = b // defines new 'a'
+
+ DON'T KNOW about constants....
+
+ We could also distinguish "this is a default value" and "this
+ is an initial value" where the former can be replaced without
+ use. Probably not useful though.
+
+ Least typing should be "new constant name". Next should be
+ "reused mutable name". Most typing for "new mutable name".
+
+ += -= etc mean pre-existing mutable. So := should too?
+
+ I really don't like the
+ a,b,c = function()
+ thing. Having different bindings has got to be ugly, whether
+ we make them so they are visually ugly, or assume them so
+ they are conceptually ugly.
+ a:=, b+=, c= function()
+ a:, b+, c = function()
+
+ Does
+ a[xx] = b
+ need to know if a[xx] has been assigned yet?
+ If we don't want to pre-initialise arrays, then probably yes.
+ For 'hash' arrays, there might be requirements about checking
+ first.
+
+ I like: ":=" means "replace current value".
+ So '=' means 'assign initial value. But how do we indicate
+ 'const' or 'var'.
+ Could require 'var' in front?
+
+ var (a, b, c) = function()
+ (var a), b+, c: = function()
+ a:, b+, var c = function()
+
+
+ New thought: The matching idea from Rust could make
+ interesting assignment.
+ if 4*5 = fred
+ could always be made true by binding 'fred' to 20.
+ if action(state, NEWLINE) = REDUCE(production)
+ will fail if the action is SHIFT, or bind production
+ if the action is a REDUCE.
+ if (a,b,c) = (1,2,3)
+ if a and b are already bound but c is not, this tests
+ a and b and if they match 1 and 2, binds c to 3 and continues.
+ Probably need a sigil on the 'c' or 'production'. '?'
+ if 4 * 5 = fred?
+ if action(state, NEWLINE) = REDUCE(production?)
+ switch (action) {
+ REDUCE(production?) :
+ SHIFT:
+ ACCEPT:
+
+ This 'binding' is very different to assigning to an array
+ element or a struct field.
+
+
+ There could be quite a lot of different assignment-like
+ things.
+
+ - 'global' constants which have a fixed value, at least for
+ the invocation of a program
+ - 'fields', both in structures and stack frames, which can be
+ updated and have their address taken, so are mutable
+ - 'constants' which are assigned just once in their scope -
+ though possibly in multiple code locations due to branches
+ if foo() { a = 1} else {a = 2}
+ 'a' could be a constant
+ - 'reused constants' - one name which is used repeatedly but
+ each time it is really a different variable. There are
+ clear points before each 'assignment' were it can be
+ declared 'invalid'. e.g. a 'status' return value that is
+ only used for immediate operations.
+ These are never assigned while 'live'. So
+ a = 1; if b {a = 2}; print a
+ is not an option as the value from 'a=1' could be printed
+ and so is live during the if statement.
+ These can even be 'updated'.
+ a = 1; a = 2*a; print a;
+ Semantics are identical to
+ a = 1; A = 2*a; print A;
+ so it is a reused-constant.
+ A for-loop control valuable should be a reused constant.
+ - 'variable'. This is a value that can be changed in
+ different ways on different paths. It is conceptually one
+ value that changes. Possibly an accumulator
+
+ We can assign to:
+ - a name (local variable)
+ - a value.name (field)
+ - a value[expression] (array slot)
+ Can we also assign to a function call?
+ - foo(a,b,c) = 42
+ That would mean that 'foo' returns an lvalue. That is not
+ unreasonable.
+
+ But I'm still not happy with how lvalues work.
+ a = lvalue
+ Does that make 'a' point to the thing or does it copy the
+ thing?
+ I think it that form it must make 'a' the lvalue.
+ *a = lvalue
+ would copy the content.
+ If 'a' and 'b' are pointers to pointers to integers, then
+ a = b
+ *a = *b
+ **a = **b
+ would all do different things in C. Is there any other way to
+ achieve those same things with less syntax?
+ Type enforcement can make the '*' on the right redundant.
+
+ How do I move a reference, as opposed to borrow a reference?
+ Or copy an object? In Rust it depends one the type. If there
+ are mutable fields or owning pointers, it cannot be copied so
+ it is moved. '&' is used to borrow a reference.
+ I'm not sold on the type-dependant answer. I think the syntax
+ should make is obvious.
+
+ I think I want syntax for "take the value and replace with
+ NULL".
+ a <= foo
+ then we could swap with
+ a <=> foo
+ But that is a comparison! Could use "<-" but it doesn't look
+ like '='. a =< b ?? Then I could pass "<b" to a function
+ to move out the value.
+
+ "move a value", "copy a value", "borrow a reference to a value"
+ <a a &a
+
+ "receive a .... what?
+
+ Pointers really are quite special so we need to be careful how
+ they move. Given the specialness, what can we say about
+ pointers to pointers? I think they must always be borrowed.
+ Does that help?
+
+
+ a,b,c = function(args)
+ is a bit ugly because we might want to do different things
+ with the returned values, and we might have trouble
+ remembering their names. However I don't mind
+ a,b,c = e,f,g
+ as it is just a short-hand: except for 'swap'.
+ So we could do something totally different:
+ r = function(args)
+ a,b,c = r.foo, r.bar, r.baz
+ where foo, bar, baz are the names of the return values in the
+ function signature.
+ Usually you would just use the r.foo etc as needed.
+ Alternately something like:
+ import function(args):
+ if err:
+ go cry()
+ print result
+
+ The requirement of no shadowing mean we have to be careful
+ with names, and probably allow:
+ import function(args) as result, errors
+ The syntax is a bit horrible but I'm warm to the concept.
+ It is like the "with" statement in Pascal.
+
+
+Pattern matching:
+ Pattern matching seems cool, particularly for variant records,
+ but also for popping things off arrays:
+ while ra = [first, rest]:
+ print first
+ ra = rest
+ which requires the pattern to match a single-element list.
+ while first = ra.pop():
+ print first
+ A pattern match seems to be a short hand for multiple tests
+ and some assignments:
+ if ra = Cons(a, Cons(b, _))
+ is
+ if ra.type == Cons && ra.next.type == Cons:
+ a, b = ra.first, ra.next.first
+ We could achieve the same brevity with
+ if a,b ~= ra.first, ra.next.first
+ which is expected to fail if the runtime types are wrong.
+ or
+ if a,x = ra.Cons &&
+ b,_ = x.Cons
+
+ Matching is a lot like regexp matching (surprised?)
+ But the language knows about it (like perl and regexp).
+ However variables have to match whatever is there, inside the
+ given context. They cannot specify what to match.
+ You can have "Cons(a,b) where a < b", but you cannot do
+ "string matches a[]+b[]" where a in hash"
+ i.e the common match syntax cannot let a variable match a
+ range from components from which a choice must be made.
+
+
+ Constants:
+ Constants are numbers or strings or declared constant words.
+ Numbers and strings can be tagged with a single letter suffix.
+ A type may 'claim' certain suffixes, so "complex" could claim
+ 'i' on numbers, and "regexp" could claim "r" on strings.
+ They can also claim unadorned constants, but then only one
+ non-internal type can be type-correct.
+
+ 'e' and 'p' are used for exponent and power and aren't
+ allowed as number suffixes.
+
+ Types can provide constant functions which are evaluated
+ entirely at compile time.
+
+ "Ceylon" allows iso suffixes: k M G T P for big,
+ m u n p f for small. It's cute but I think I prefer 'eN' suffixes:
+ k == e3 M == e6 etc.
+
+ If we allow a constant to have a suffix like 'r' for strings
+ or j, i, th for complex numbers, how does a type specify that
+ is can use that? It needs to be an initialisation function.
+ "I can be a string if 'r'" "I can be a number if 'th'".
+
+ This is an implementation for ''r or 0th
+ implement new<regexp> for ''r {
+ typify(n:''r -> regexp)
+ }
+
+ The interface 'n' is a parameterised interface
+ interface new<result> {
+ typify(n:self -> r:result)
+ }
+
+ These need to be 'constant' or 'referentially transparent'
+ or 'pure' functions that can be evaluated at compile time.
+
+ Loops:
+ while loops are easy-ish. What about list loop and more
+ generic things. 'foreach' etc.
+ Co-routines can have a function which returns successive
+ values, via 'yield' or lazy evaluation.
+ Or a closure can be passed to a looping function.
+ Or a 'iteration object' can be returned on which we call 'next'
+ repeatedly.
+ Loops: (repeating myself?)
+ So many possibilities
+ for loop while repeat/until foreach etc etc
+ Really like exit in the middle
+ do:
+ preparation
+ while condition:
+ actual work
+
+
+ For loops like to initialise and increment:
+ with var = val
+ then var++
+ while var < last:
+ print var
+
+ foreach loops need a variable(set) and a list of values, which
+ might be generated lazily.
+ Alternate to lazy generation, could pass the body as a closure
+ to an 'each' function like Rust does.
+ Lazy generation could involve a closure being returned from a
+ function. The closure would need some state, so the 'return'
+ couldn't discard it completely. These could be 'inout'
+ parameters? or
+
+ How to address the need for "search and perform special case if
+ not found?
+ for each possibility
+ if match:
+ break
+ then
+ Do this if loop completed normally
+ else
+ Do this if loop aborted.
+
+ if:
+ for each possibility:
+ if match:
+ use true
+ else:
+ handle not-found case.
+
+ http://www.scribd.com/doc/38873257/Knuth-1974-Structured-Programming-With-Go-to-Statements
+ talks about 'events'
+
+ loop until event1, event2, ...
+ stuff
+ ... event1;
+ ... event
+ then event1 => ....
+ event2 => ....
+
+ This is a lot like exceptions. However 'until' has a different feel to it than 'try'.
+ 'use' is better than 'raise'.
+
+ I like this as a generisation of switch:
+
+ switch:
+ .... use foo;
+ ... use bar;
+ then:
+ foo -> ....
+ bar -> ....
+
+ All 'use' values must be either unique identifiers or
+ expressions with values that have a common type.
+ If unknown identifiers, then they comprrise values in a
+ new anonymous enum.
+ The labels in the 'then' part are value of that type.
+ A value *can* appear twice with some annotation(?) in which
+ case all relevant branches are taken.
+ Maybe the notation is 'next' at the end of the statement
+ block?
+ If there is no 'next', then the labels cannot appear again.
+ Actually we allow 'next' and 'last' if and only if at least
+ one label occurs again. If there is any next/last, then there
+ must be one at the end of the block.
+
+ if statements:
+
+ Go has nice feature where you can perform an action before an 'if'
+ test.
+ if a = something(); a > 3 { use a }
+
+ This is particularly useful in elseif constructs
+ a = something()
+ if a > 3 { use a }
+ else if b = somethingelse(); b < 5 { use b }
+ else if ....
+
+ without the if action, we need an extra level of indenting.
+ if a > 3 { use a}
+ else {
+ b = something else();
+ if b < 5 { }
+ else ...
+
+ But only a simple statement is allowed.
+ Could I use "with"
+
+ with: a = something()
+ if a > 3 :
+ use a
+ else with b = somethingelse():
+ if b < 5:
+ use b
+
+ Looks a bit clumsy and probably ambiguous. Maybe:
+
+ if with:
+ a = something
+ a < 5
+
+ no, that is much worse.
+
+ We can make "with stuff" unambiguous by requiring a second body
+ with:
+ commands that can start new scope
+ do/if/while/match
+
+
+ Another idea. We don't allow assignments in expressions, but
+ we allow a "where" clause at the end which contains a list of
+ bindings. The expressions are only evaluated where needed:
+ if a < 5 && b > 6 where b := n/2:
+
+ will not divide n by 2 if a >= 5.
+ This means that list can only be bindings, not general statements.
+ If they are new bindings, how are they scoped? At least to the
+ bodies of the if or while and the rest of the expression.
+ If a binding were optional, it couldn't carry through:
+ if a < 5 || b > 6 where b := n/2:
+ if a<5, b would not be bound, but it would be bound in the 'else'.
+ This lazy evaluation could work with a prefixed "with" clause too.
+ Would assignments be OK? I think they could be confusing.
+ Lazy evaluated binding could allow two different paths which want
+ to evaluate at different times to just use one piece of source code.
+
+
+ Another idea:
+ As well as
+ if expression:
+ statements
+ We could have
+ if:
+ function block with 'return'
+ then:
+ statements
+ else...
+ Maybe 'return' looks a bit heavy and a synonym might be better.
+ yield? use? done? I like 'use' for now.
+ The 'if' is one big scope, so binding created in the function
+ block remain in the 'then' statements.
+ This allows:
+ if:
+ a = 4
+ use a > 3
+ then:
+ ....
+ else if:
+ b = a*a
+ use b < 0
+ ....
+ How would that look with {}?
+
+ if {
+ a = 4
+ use a > 3
+ } then {
+ ....
+ } else if {
+ b = a*a
+ use b < 0
+ }
+
+ The same syntax could allow conditional expressions:
+ a = if a = 4:
+ use 4
+ else:
+ use 3
+ though that is rather verbose and part of the appear of ?: is brevity.
+
+ -----
+ If do want statements in the condtion of 'if' and 'while', optionally.
+
+ if cond BLOCK
+ or
+ if:
+ BLOCK
+ then:
+ BLOCK
+
+ An expression block may contain "use" directives: "use True".
+ They act a bit like a "return" - we have the answer.
+ For while, it means that we can avoid "do {} while" as a separate case
+ while:
+ BLOCK
+ do:
+ skip
+
+ We may also be able to avoid 'break' in some cases with "use False"
+ at arbitrary places. 'continue' isn't quite so easy, though a whileswtch
+ could do it:-)
+
+ In a 'switch' we can "use" values other than True or False.
+ If a name is used which isn't defined, then it is assumed to be a new
+ enum, so all 'use' statements must use undefined names, and all
+ cases much use the same set of names.
+
+ switch:
+ use do_it
+ case do_it: print "Hello"
+ case skip_it: abort()
+
+ case names can appear multiple times with mcase(?)
+
+ switch bit:
+ case 8: if byte & 128 {num += 1}
+ case 7, 8: if byte & 64 {num += 1}
+ case 6..8: if byte & 32 {num += 1}
+ case 5..8: if byte & 16 {num += 1}
+ case 4..8: if byte & 8 {num += 1}
+ case 3..8: if byte & 4 {num += 1}
+ case 2..8: if byte & 2 {num += 1}
+ case 1..8: if byte & 1 {num += 1}
+
+
+ break? or last?
+ "last i" exits the enclosing loop that modifies 'i'
+ "next i" retrys
+
+ 'for' loops can be:
+
+ for assignement:
+ while condition:
+ then statement:
+ do:
+ stuff
+ else:
+
+ or
+ for assignment ; condition ; statement:
+
+ or
+ foreach varlist = object
+
+ So a general while statement is
+ for initialisation (can introduce variables)
+ while condition
+ then block
+ do block
+ else block
+
+ First do 'for', then 'condtion'.
+ if 'condition' is true, do 'do' and 'then' and try cond again
+ if 'condition' ever false, do 'else' and stop.
+
+ Maybe:
+ 'if' and 'switch' are much the same
+ 'then' and 'case true' ditto
+ 'else' and 'case false' as well
+ 'while' case be followed be cases as well as by 'else'.
+
+ With a 'for' I want a close to run if we didn't 'break'.
+ I think 'else' does that.
+
+ Do I want to use "else" for "default" in case?
+ "then" happens if "use true" or no "use" value
+ "case value" happens if "use value"
+ "else" happens if "use false" or "use value" with no match.
+
+
+ Print formatting.
+ Only values with Interface 'printable' can be printed.
+ The function is passed a width, a precision, and stuff.
+ So string can just have '%v' for a value, or '%20l' for a left
+ justified value.
+ But what about '*' parameters - and checking there are the right
+ number?
+ Could allow a function to have a compile-time preliminary section
+ which can throw errors and possibly infer types.
+ This probably requires there to be compile-time functions which
+ run fully at compile time. Though any functions that can, should
+ unless tagged run-time-only.
+ Function must be able to either flag a compile time errors.
+ At runtime this is ignored, so a runtime error or coping strategy
+ is also needed.
+
+ So: function runs until non-constant is needed, or heap change etc.
+ If no compile error triggered, all runs again at run time.
+ But for libraries to work, the compile-time part must be easily
+ available in the library without the need to rebuild the object code.
+
+ This doesn't really answer the problem..
+ print "%20l", foo
+ or
+ print "%v", foo.fmt(20,0,'l')
+
+ It would be nice to have curried functions here so that the 'fmt'
+ function could take a stream from which it could get a buffer to fill.
+ Or foo.fmt could return a function which acts on a stream ... that might be
+ complicated.
+
+ Exceptions and errors.
+ Errors need to be easy to catch, but even that isn't really enough.
+ They need to be hard to not-catch.
+ So any possible error must be caught or explicitly passed over.
+ That suggests that the type of a method should include the list of
+ exceptions that can be returned.
+
+ "try" is really pointless. An 'except' clause can simply be
+ available to every statement. Maybe spell it 'catch'.
+
+ {
+ fd = open();
+ fd.write(thing);
+ catch file_not_found:
+ whatever
+ }
+
+ -or-
+
+ {
+ fd = open();
+ fd.write(thing);
+ } catch file_not_found {
+ whatever
+ }
+
+ A 'catch' can occur anywhere. It will catch any unhandled errors
+ that happen before the location. So:
+ fd = open(file)
+ catch file_not_found:
+ return "fool"
+ fd.write(thing)
+ return "wise"
+
+ One thing the 'try' does provide is a synch point. in 'catch' we
+ know that everything upto 'try' succeeded, so names bound there are
+ certain to be bound.
+ Without 'try' we only know that the current scope was entered, and
+ that any code since the last equivalent catch which could not have
+ thrown the error must have been executed.
+ This is probably enough providing that 'equivalent catch' is well
+ defined.
+ At any point in the code there is a set of possible exceptions.
+ A 'catch' subtracts from the set and any op that can raise adds to
+ it.
+ Within a 'catch' we know that any that happened before these
+ exceptions last became possible has happened.
+
+ This requires that exceptions can live in a set. Maybe there is a
+ big discriminated union of exceptions? Or an hierarchy of them?
+ I think an inheritance tree .. though that is hard to merge. i.e
+ if two modules each declare some new exceptions ... not a big deal.
+ We know the type of the exception set for each imported module.
+ But wait: If I import two modules and want to pass up exceptions
+ inherited from either, what is my exception type? I either need to
+ offset enums when inheriting them, which is messy, or have
+ globally unique enum values, which are like type names.
+
+
+ So when I say:
+ catch exception_name(parameters):
+ that only applies to previous operations that could return an
+ exception with that value
+
+
+ Do I want an 'else' clause for 'catch' or should 'catch' clause
+ break/continue/return to get past the real work-to-do.
+
+ Can destructors throw exceptions?
+ How do exceptions work in constructors? There needs to be a point
+ where values transition from 'local' to 'in object' much like
+ the 'return' in a function.
+
+
+ Should break/continue/return be defined as exceptions? The
+ (implicit) label they go to is the expression 'type'
+
+
+
+
+Scope:
+
+ 'scope' applies to various things:
+ - how to have narrow scope for bindings in block. Just
+ having declarations at the top which go out-of-scope at the
+ bottom is too restrictive
+ - which names in a module are visible outside that module.
+ I think they must all be visible within the module, where that
+ is meaningful
+ - how to import names from modules. "import xx from yy" is nice,
+ "import * from yy" is good but risky if 'yy' changes, so maybe
+ "import v1* from yy" which means "all names listed in v1. Or
+ "import * from yy.v1".
+ - what is the scope for 'finally' or 'later', 'defer', 'after' ?
+
+
+ let a = b in c
+ or
+ c where a = b
+
+ or
+ with: a = b
+ do:
+ c
+
+ The introduce bindings but they are either for a single statement
+ or for a block. We can keep limit indents a bit by combining
+ an if-list or loop with the 'with'. I wonder if that is enough.
+
+ Having completely free scoping would require
+ let a := b
+ anywhere, and "unlet a" anywhere too. I think that is probably too
+ free.
+
+
+ For extra-module visibility there are options:
+ - 'public' tag for all public names
+ - 'private' tag for all non-public names
+ - Some convention with capitals or underscores to direct which
+ is which, but with UTF I don't trust capitals, and I hate
+ imposing underscores except for "don't use this" cases.
+ - 'export' lists (or 'hide' lists)
+ 'export' lists would be good for wild-card imports.
+ - external/internal section of the module
+ Would need section in each 'struct' too.
+
+ I think top-level symbols that might be imported should be
+ explictly exported, and exported to a list if a wild-card import
+ is allowed.
+ Names in a struct which should be hidden can be marked 'private'
+ as hiding them is less critical.
+
+ All symbols should be available even without import, using e.g.
+ :module:symbol
+ This allows instant access in simple scripts without pre-planning.
+ If 'module' is imported, then just "module:symbol" is sufficient.
+ Possible ":module" searches current directory, then system path,
+ while "::module" only searches system path.
+
+
+
+
+ I think 'after' scope should be tied to a name so when the name
+ goes out of scope, the action triggers.
+ So it could be:
+ after name:
+ actions
+ though most such actions are to free memory - which must be
+ automagic, or to release resources (locks, files, etc) which
+ should be the destructor of some value.
+ i.e. 'open' returns an 'owned' value so it is auto-closed when
+ forgotten.
+
+ 'name' could be a loop label. That might be interesting.
+
+ But should 'after' file when the value changes, or when the name
+ disappears. I think it should be when the value changes where
+ that is meaningful.
+ Maybe it only makes sense on 'owned' values?
+
+
+
+ Tuples:
+ Tuples are present as the parameters to a "function". But where
+ else?
+ Tuples are usually (a,b,c) but can be useful as a:b:c. This is
+ briefer. As long a 'tuple' isn't a first class type,
+ "a" can be a 1-element tuple or an expression, depending on
+ context.
+ This is useful for a '%' function for string conversion.
+ The args to '%' are tuples where first value is sent 'stringify'
+ with remaining bits. so 3:'x' will print in hex ... bit ugly
+ though.
+ Maybe not needed then.
+
+ mapping between tuples and structures is probably useful.
+ Assigning tuple to tuple with matching
+ return a tuple from a function .. well, return a struct but allow
+ matching to a tuple of names.
+ Tuple could equally be an array if length matches.
+
+ virgil maps all tuples to lists of values, so nesting disappears.
+ This is pleasingly simple. It means that you cannot have tuples
+ as individual arguments or return values but if you cannot query
+ the size (which would make them arrays) that doesn't matter.
+
+ A tuple can be assigned (bound) to an array if the size is
+ flexible, but binding contents of an array to a tuple is only
+ possible if the array size is known at compile time.
+ So the type of printf could (almost) be
+ printf(string, printable[])
+ and it could still be called as
+ printf("I have %d and %d", foo, bar);
+
+
+ What about
+ if a in (1,2,4,8):
+ as a use of tuples. It expands to a disjunction of separate tests.
+
+
+ Expressions with side effects.
+ "functions" will often have side effects, but what about
+ a++
+ a += 5
+ etc
+
+ They are problematic because the interact badly with
+ order-of-operation rules.
+ They are useful because they allow brevity.
+ if x && (a += 5) < 7 ...
+ rather than
+ if x:
+ a += 5
+ if a < 7 ...
+ order doesn't apply immediately after && because && is a
+ serialisation point.
+ if with a=5: a < 6 and then with b=6:
+
+
+ Type specifiers:
+ Do we want:
+ type name, name
+ or
+ name, name type
+ possibly with punctuation?
+ Do we want annotations on the name:
+ char foo[]
+ or on the type
+ [char] foo
+ i.e. can we embed an identifier inside the type
+ struct foo *(*ID)();
+
+ I think the C experiment shows that to be a failure. You still need
+ the types without the identifiers, so it is best to always have them
+ separate.
+ i.e. a type should always be identified by a set phrase.
+ type name = phrase
+ var name: phrase = value
+
+
+
+ Dependant types:
+ http://beust.com/weblog/2011/08/18/a-gentle-introduction-to-dependent-types/
+ suggests they only work for immutable values and I don't doubt
+ they are easier there. With mutable values we would need to allow
+ the type of a name to change ... or be dependant on some variable.
+ So if "array of N integers" is the type of A, then
+ A[N++] = foo
+ doesn't change the type. i.e. if the parameter is variable, the
+ value can be mutable.
+ So this boils down to describing invariants which the compiler will
+ validate for us. i.e. a proof assistant.
+
+ An object can be in a number of different stages. Some of these
+ happen during initialisation, but others can be just part of the
+ overall life-cycle of the object. During different stages,
+ different properties can persist. For example an object may be
+ immutable for a certain well defined period of 'time'. If the type
+ system can know of this, then any references that only exist during
+ that time can be known to be immutable, while references that last
+ longer are not.
+
+ There is scope for more than 'immutable' and 'mutable'.
+ e.g. an object might be in a 'growth' state, where new components
+ are added, but none are removed.
+ The set of objects in a container be unchanging, while the objects
+ themselves may change.
+ At some stage, the set of contained objects might transition from
+ one type to another type - or from one stage to another stage.
+ For the language to be able to affirm this it would need to know
+ that some operation visits all member elements of some type/stage.
+ This might need to be (unsafely) affirm by the programmer rather
+ than mathematically proven by the compiler.
+
+
+ So there are two ideas here. Once is more explicit statements
+ about life lines. An object can identifier several 'stages'
+ which can be transitioned by various operations.
+ Each component can have different type properties in different
+ stages.
+ The type system can restrict certain methods to certain stages.
+ The compiler must *know* the stage of a pointer. It is static
+ info store in the lifeline analysis. When an object is known to
+ 'own' certain borrowed pointers, the lifetime of those borrowed
+ pointers is affected by changes in the object's stage.
+
+ stages can affect:
+ - mutability: shallow or deep. Addition or subtraction or both or neither.
+ - values of scalars
+ - NULL, nullable, or non-null for pointers
+ - size of arrays as related to other values - variable or
+ constant.
+ - 'once' or counted or collected for some pointers.
+
+ Some stages are exported, some are internal only.
+
+ It might be helpful to have different names for different aspects
+ of type variability.
+ We have
+ Parameterised types: The type depends on some type parameter
+ Dependent types: the type depends on some nearby value
+ Temporal types: the type depends on a state which changes over
+ time. Such a state might be represented in some variable,
+ but it shouldn't need to be
+ Linear types: The number of strong references is dependent on
+ some variable or age thing. So this is an aspect of a type
+ that can vary.
+
+ These need to be specified and then enforced.
+ Parameterised types make sense where-ever a type has multiple
+ subtypes.
+ This includes a struct with multiple elements and a function with
+ arguments and returns.
+ struct <f>:
+ next: f
+ others: array of f
+ func max<f:comparable>(a,b:f):f
+
+ Type checking this should be fairly easy. Implementing would be
+ easy if f was always a pointer...
+
+ Dependent types work in a structure or a function call.
+ We probably want arbitrary computations on values(?).
+ If changing the value means changing the type we need types of
+ different precision e.g. mutable and immutable.
+ Optional and non-optional.
+ Or we need multi-assignment to change the value and the typed name
+ at the same time.
+ Or we can allow that the type is flexible between uses.
+ So the type assertions are only verified when the type is used.
+ So I can change inter-dependent fields in any order and it will
+ work.
+
+ Temporal types might be tricky to describe. We need to identify a
+ clock. It might be an internal field or it might be an external
+ value.
+ By "external" we must mean with a super-ordinate object.
+ e.g. a 'container' might contain lots of 'members' and the type
+ of each member might be dependent on a field in the container.
+ The clock could even be the program counter.
+
+
+ Boot Strapping
+ An interpreter written in C, with a compiler written in target,
+ that links to llvm, seems like a good plan.
+
+
+
+ Modules / packages / crates.
+ This is the primary mechanism for namespace control and for
+ libraries.
+ Within a package, all names defined in that package are visible
+ providing the right path is used. Outside the package, only
+ exported names are visible. Some names are exported by default
+ but can be marked "private". Others are private by default and but
+ can be marked "public".
+
+ Packages can define:
+ - types: struct, enum, array, function, interface etc
+ - constants
+ - functions
+ - methods
+ - other packages
+
+ Does it make sense for a package to export a method for a type that
+ the package imported. As the name of the method is in the
+ namespace of the type, it probably isn't polite to add something to
+ someone else's namespace.
+ But a new name for a type could be defined, added to, and exported.
+
+ How are name conflicts handled.
+ Certainly you cannot define the same name twice in the same scope.
+ However:
+ you might import a name with a wildcard, then define it.
+ You might embed a struct/enum in a local structure enum and
+ add name which already exists.
+
+ This is interesting because the library that you import might get
+ a new revision. It would be a pain for old code to break when
+ that happens.
+
+ So importing a library should specify a version. only names from
+ that version are imported.
+
+ You don't need to explicitly import if you don't want namespace
+ changes.
+ Just use ":modulename:symbol" to get any symbol from any module.
+ If you want to avoid using ":modulename:" you can
+ import name, name from modulename
+ or
+ import version_number from modulename
+
+ exported symbols needed to be version-tagged. That could get ugly.
+
+ private-by-default could be "public.2" rather than "public" which
+ isn't too bad, but what about public by default?
+ I guess I need to decide which symbols are which!
+ There are "items": type/constant/function/package
+ And there is fields, which include methods.
+
+ Go: exports only things that start with upper case letter
+ Rust: exports all field/methods but nothing else.
+
+ We could have an "export" clause for items:
+ export version item, item, item, item
+ This makes "static" the default which is safest, and makes the
+ "interface" easy to see.
+
+ For fields, we could have an "export version" block, or
+ "export version" exports all fields/methods listed *before* the
+ pseudo.
+ So when you add more to the end, they aren't exported.
+ Though it is probably more consistent to have
+ export version field,field,field
+ inside the struct
+
+
+ Should modules have "variables"? or any state?
+ If so, they can usefully have an 'init' routine.
+
+
+ Module naming:
+ modules live in files with related name. If file name is a valid
+ identifier, we can just use that name in the code.
+ If it isn't, or if we want to change it for some other reason, we
+ would need to 'import identifier "filename"'
+ Also, the file might be in a directory hierarchy, where '/' might
+ not be the right separator.
+ So we need to import a path, possible into a module, and provide
+ a local name for that, if default doesn't suit.
+ import "file" = name
+ import "file" { import "file" {import identifier=new, id, id}}
+
+
+ Overloading
+ Overloading seems to be largely disliked. Hard to be sure why.
+
+ Overloading + - * / % is already done for numbers.
+ Overloading [] could be very useful for hashes
+ Overloading () && || ?: would be very bad because they have
+ strong meanings. Ditto for =
+ Overloading -> . prefix-* prefix-& could be weird
+ Overloading & | ^ << >> &^ ~ might be OK
+ Overloading == would be bad, it has to apply to all types anyway?
+ Overloading < <= > >= ???
+
+ So:
+ some operators need boolean. Possibly a 'testable' interface
+ could allow anything to be tested. ?: && || !
+ some operators compare. So a 'comparable' interface?
+ < <= > >= == !=
+ some operators are limited and can freely be overloaded:
+ + - * / % & | ^ << >> &^ prefix-~ prefix-`-'
+ some operators are special:
+ []
+
+ [[ &^ is not really different from "& ^" except that "&^=" is
+ legal ]]
+ [[ what about bit operations without having to explicitly shift?
+ |< &^< ?? |+ &-
+
+ Maybe it would be cleaner to have a simple opp to convert a
+ number to a bit. #1 Then we clear a bit with "a &^#= 5" ??
+
+ Or a 'bitset' type could define '+' and '-' to add/remove bits.
+ ]]
+
+ Overloading of a binary operator is a method-call on the first
+ argument passing the second.
+
+ So for binary operations like + - * / , left and right side must
+ be identical, like with Go.
+ But what about '<<' and '>>'. Go allows the RHS to be any unsigned
+ int type. This isn't nicely consistent. To be consistent we need
+ to required either a type or an interface. The type would need to
+ be unsigned int. The interface would need to be
+ to_unsigned_int_clipping which returns 0 or MAXINT if it cannot
+ give a faithful unsigned int. This is much the same type as the
+ index to an array.
+
+ So having an 'int' and 'uint' interface might be defensible
+
+ So each operator is defined in terms of a method call.
+ e.g. a + b => a.plus(b)
+ a && b => if a.test() then a else b fi
+ a < b => a.compare(b) == less_than
+ a[b] => *a.lookup(b, create:Bool)
+
+ All of the first set of operators can also be combined with '=',
+ e.g. +=
+ ++ -- : are the really needed? +=1 -=1 are just as good?
+ Though the could call a.incr() and a.decr()
+
+ Do we want a += b to be different from a = a.plus(b) ???
+ For a list it would be good to know that we don't need to duplicate.
+
+
+ Wait... for 'plus' to be a valid method it would need to define the
+ type of the second arg, and the return type. Don't like that.
+ So treat it not as a known method, but a simple textual
+ substitution.
+ a + b is translated to a.plus(b) which is in any particular
+ interface.
+
+ On the other hand, test and compare are from an interface.
+ What about lookup? Do parameterized methods help here?
+
+ Type of plus is <X>(X,X)->X
+ Type of shiftleft is <X>(X,uint)->X
+ Type of compare is <X>(X,X)-> (-1,0,1)
+
+List of (possible) operators.
+
+ + - * / % & | ^
+ < <= > >= != ==
+ && || !
+ << >>
+ "in" "not in"
+ "is" "!is" "is<=" pointer comparison.
+ []
+ [x:y] ??
+
+ >>> signed right shift?
+
+ String catentation? '+'?? '++' ??
+
+
+ "in" dispatches on the right-hand-op.
+ "string" does substring search
+ "int" does bit test.
+
+ if 'in' can to bit-test, how do I do bit set/clear?
+ |- |+ &- &+ Not good as +- can be prefix
+ -| +| ?? Hard to type...
+ What is good for set-addition and set-subtraction?
+ foo[] = bar
+ is niceish for adding an element to foo, at end I guess.
+ But it doesn't generalise to subtraction.
+ :+ and :- ??
+
+ Could use ++ and -- except that I want ++ for concat.
+ What is -- ??
+
+ 'else' - like || or ?: "a else b" == a if !!a, or b if !a
+ 'then' ?? a then b is either b or NULL.
+ 'and then' == &&
+ 'or else' == ||
+ 'and' == &
+ 'or' == |
+ 'not' == !
+
+ 'as' for cast: "value as type".
+ I think I prefer "type(value)", though the 'as' allows
+ separate name spaces for types and functions
+
+ 'div' for integer division? Probably not.
+
+ 'if'. "a if b" is either 'a' or False.
+ This isn't always type safe, but
+ "a if b else c"
+ becomes 'a' if 'b' is true, or 'c' if not.
+ so "a else b" is short hand for "a if a else b"
+
+ Prefix operators:
+ - negate
+ + abs value?
+ / reciprocal?
+ ~ invert
+ ! not
+
+ & takes the address of a field?
+ [n:] takes address of an array member
+ Can we take address of a local variable? Or do we define
+ the local variable as a pointer and allocate from stack.
+ * dereference a pointer. This is normally automatic, but not
+ for assignment
+
+ < extract the value and set variable to nil.
+
+tasks/IPC
+
+ co-operatively schedules threads across N processes with
+ all OS call non-blocking seems like the way to go.
+ Questions are:
+ - how to slice?
+ time?
+ on blocking action?
+ explicit yield?
+ - when to clone?
+ always?
+ - how to create new threads?
+ library or builtin?
+ function or closure?
+ - how to control threads
+ handle for 'kill'?
+ - how to wait for threads?
+ maybe don't. When they die, they die. Any waiting
+ uses IPC.
+ - What to do for IPC?
+ - message passing
+ - shared memory with locks
+ - monitor?
+ - synchronous interfaces?
+ - how much control of shared memory is needed?
+
+ Message passing is clearly a good tool but I'm not sure it should
+ be the only one. Something that abstracts locking could be good.
+ E.g. a pointer that needs to be 'activated' before it can be used.
+ i.e. a refcounted object which has "lock" which returns a 'once'
+ pointer to a subtype which provide more access. The original
+ allows read access to some things.
+
+ Or a type where some/all methods are required to own a particular
+ lock. Actually it should be that some fields are only accessible
+ when some lock is held. So the type is parameterised by a lock.
+ By default fields are only accessible on a single thread. If
+ a lock is provided, then they become accessible when that lock is
+ held.
+
+
+ I wonder if we need more pointer types, or use dependant types..
+ - 'once' references
+ - 'one thread'
+ - 'ref counted' - needs locking.. so:
+ - counted-onethread
+ - managed-onethread
+ - borrowed references might be onethread or need-locks
+
+
+ ParaSail goes even further and parallelises everything. Any two
+ expressions that aren't interdependent are potentially run in
+ parallel. This requires the compile to know about interdependence.
+ ParaSail puts in lots of requirements to avoid aliasing etc.
+ You can never (I think) have two pointers to the one thing. This
+ feels overly restrictive to me, but could often happen in practice.
+ If the compiler could detect this - or at least make good use of
+ owned pointers..
+
+
+ (may2014)
+ I really like the idea that locking is done with 'once' pointers.
+ The locks are released just like memory is freed.
+
+ Weird idea: I wonder if a hash table of spinlocks would work so the same
+ data structs can be used with or without locks.
+ The obvious problem is that alias can catch stacked locks and I deadlock
+ waiting for a lock that I hold.
+ With little effort I could tell if it was me.
+ What would I gain though? hashing pointers does take some time...
+ It would remove the need to store spinlocks in any data structure.
+
+
+ The base hardware locking primitive seems to be
+ load-linked/store-conditional, or cmp-swap. If I were to expose
+ that in the language I would really need to understand all the
+ stuff about memory models, which might be fun to learn I guess....
+
+
+Starting tasks
+ go BLOCK
+ starts BLOCK in another task. When it completes, thread is done.
+ Any non-shareable values mentioned there become unavailable.
+ Any refcounted values that are also used after BLOCK are increfed first.
+
+
+Scheduling
+ if we have multiple threads then we need to have a scheduler.
+ We could use OS threads and use the OS scheduler, but that is likely
+ to impose an overhead that we don't want.
+ So even if we schedule only when a thread blocks, we still need some
+ priority info to decide who goes next, and some way to share
+ threads across a pool of OS threads.
+ Explicitly assigning numbers is very boring and error prone.
+ Putting things in order somehow would be nicer.
+ With event-loop code we can have an ordered list of what
+ to do next. This is effectly an explicitly coded scheduler.
+ To do something like that means putting all routines into some
+ data structure in the language. That would be good for introspection
+ and killing etc.
+ Some threads are blocked, some are ready. A thread can be placed in
+ a group for round robin. A group can be ....(don't know what I was
+ going to say).
+
+
+Memory allocation:
+ How do I get a new 'box' to hold a value?
+ For 'stack' boxes, it just happens automagically as needed.
+ For various heaps we need some syntax, like
+ new(type)
+ but we need to know which heap (per-thread, common, whatever)
+ so:
+ new(type, heapname)
+
+ But I'm not very keen on builtin function - they pollute the
+ namespace.
+ var name : type = @heapname
+ I don't think I just want a sigil like Rust as I suspect there
+ could be multiple sources of allocation, or styles of
+ allocation - enough that names might be useful.
+
+ Whatever it is, we need to indicate:
+ - a type - could come from the variable being assigned
+ - parameters to the type such as array size or union
+ discriminant (does that make sense?)
+ - region to allocate in
+ - type of pointer: unsafe, counted, owned, managed,
+ never-free
+ These could be associated with the region.
+ 'Managed' (garbage collected) and 'never-free'
+ really need to be in their own regions, but
+ counted, owned, unsafe can be together.
+
+ However they may well be managed separately.
+ The compiler only knows about 'owned' and
+ 'borrowed' references. When an owned reference
+ is dropped the manager knows whether to dec a ref
+ or explicit free or do nothing.
+ Allocating from a 'never-free' manager will return a
+ borrowed reference from the manager.
+
+ Given that the type might need to be parameterised it is best
+ to have it explicit and not deduced from the var. Let the var
+ deduce from the allocated value.
+
+ So we want some syntax that includes allocator and type and
+ says "allocate it". new(type, heap) is obvious, but we cannot
+ pass types to any other functions.
+ heap(type) has same problem.
+ typename<parameters>.alloc(heap) is too noisy
+ We could include the heap as a parameter?
+ typename<heap>
+ when there are no parameters
+
+ But.... in many cases an allocation goes straight into a
+ struct field, so the type is known and repeating it is boring
+ and while it can be type checked and so not error prone, it
+ shouldn't be necessary.
+ But we still need parameters......
+
+
+Global variables.
+ Opinions vary. They are like gotos : dangerous but useful.
+ Can we constrain them?
+
+ 1/ total-global is a bad idea. module-global at most. This
+ avoids namespace pollution. Still accessible from other
+ modules as "that_module.globalvar".
+ 2/ makes referential transparency hard: so include 'read/writes
+ mod-static' and 'read/writes foreign static' properties of
+ a function
+ 3/ local variables must never 'hide' global variables ... or
+ anything in a greater scope.
+ 4/ it should be easy to pass lots of function parameters.
+ @struct means "take all the names in this struct and if
+ any are names of function arguments, use them as such.
+
+ Globals are bad because very few things are 'once only' Even
+ a random number generator context could be per-thread.
+
+ If two functions don't touch globals, and operate on
+ unconnected data structures, or are read-only on shared one,
+ then they can run in parallel. Detecting this would be good.
+ Warning when it cannot be done would be nice. Programmer
+ would need to suggest it.
+
+ A global that is set once in an early 'stage' of the program
+ and never changed again does not break referential
+ transparency for most of the life of the program.
+
+ Analysing these lifetime issues might be difficult. Static
+ analysis of all known types might be part of the problem.
+ Loaded code (.so) could not be known about at compile time so
+ types would need to be explicit, but they must always be in a
+ stage *after* they are loaded.
+
+ Key lessons here:
+ - function need 'access globals' in their type
+ - program stages might help isolate these.
+ So global gets type 'read-only is stage X', and function
+ gets 'called in stage X'. But what is 'X'.
+
+
+Stuff:
+ If a name is never changed but not declared constant,is that an error?
+ if a value is set but never used, is that an error?
+ What about:
+ a = 1;
+ a = 2;
+ ??
+ probably.
+
+
+Closures:
+ Closures are a cool idea - a little inline function with
+ access to the current stack frame - a bit like a method for
+ the current frame rather than for a separate object.
+ It can be used
+ - to create arbitrary code structures like maps
+ and threads bodys and innovative loops
+ - to pass compare functions to sorts code etc
+ - I wonder what else..
+ - by holding state they can work as generators
+ - this requires the closure to outlive the frame!
+ - closures are good for 'after' functions.
+
+ We need to know what happens to names in the closure.
+ Are the values copied, or are references taken, or what?
+ The answers are probably obvious in various contexts, but need
+ to be clearly enumerated.
+
+ A possible trouble with closures is that they can play tricks
+ with the lifetime of a variable. e.g. if we create a closure
+ inside a loop, using a variable that was declared in the loop,
+ then does each closure get the *same* variable, or do they
+ each get separate variables -- effectively duplicating the
+ variable.
+ I think that neither is really appropriate as either could be
+ confusing. If however the name binding was a constant, then
+ copying the value into the closure may well be appropriate.
+
+ In general I don't think that closing over a variable should
+ extend its life time. Go does this and allows the address of
+ a local variable to be passed out - the variable is
+ automatically made into a heap variable. I think it best to
+ require an explicit heap allocation if you want to return a
+ pointer. It is less likely to be confusing.
+ A pointer obtained with '&' should only ever be a 'borrowed'
+ pointer. There must be a strong reference which protects it.
+
+
+ Syntax for closures?
+ Go uses func (args) result { body }
+ Rust uses |arg list| { body }
+ Python uses lambda args : body
+
+ For small closures, listing the args is boring.
+ Using "$1 $2" is simple and brief.
+ Still need a syntax to say "this is a closure".
+ { use $1 < $2 }
+ where expression is expected could do it.
+ Then
+ { a,b: use a < b }
+ could be a more general syntax. Not sure it can be
+ differentiated on an LR basis though...
+ That is a similar syntax to list comprehensions
+ [ a : for a in range(50) if a % 2 ]
+ No: list comprehensions are in [] - the array syntax.
+ But {} is like a structure... or do we use tuples for
+ structures?
+
+ I think I want in-line closures to be expression only.
+ I think burying statements inside expressions is wrong.
+ This argues for "x if b else c".
+ If you want a more complex closure, you need a local
+ function declaration.
+ This means that you cannot pass a large closure directly
+ to a function like Rust does. Given issues with 'break'
+ and 'continue', this is probably a good thing. But are
+ there times when it would be good? If there were it should
+ probably be handled by something like 'do' which has
+ a closure as a distinct syntactic object. Does it need
+ args? All local variables are available, but they aren't
+ available to the function which is passed the closure.
+ Passing local vars in would be clumsy - really do need
+ lambda concept.
+
+Type analysis.
+ I've never done anything like type analysis, so my first
+ thought is that it sounds easy, but is probably not else more
+ would be done.
+
+ The first and probably most difficult task is to create a good
+ type model. This will allow us to annotate every node in the
+ code graph with appropriate type information, and the make
+ correct deductions.
+ The deduction we want to be able to make are:
+ - is this correct? or at least consistent
+ - what are the unspecified or underspecified types.
+
+ Scalars, strings, Arrays, structs, unions, functions, closures
+ are different arrangements of data.
+ Scalars, strings, functions, closures are immutable, as can be
+ the others. Immutable values can be copied instead of
+ referenced.
+
+ Values can be:
+ mutable or immutable - can I change them
+ volatile or non-volatile - others can change them
+ once, borrowed(wrt some once)
+ compile-time, or runtime.
+
+ Every variable has a definite type, as does any function
+ result. The args to a function determine the types of the
+ results, though those types may be a complex function of the
+ arg types. e.g. intersection, union.
+
+ Constants have a list of possible types. We choose the first
+ type for which there is no type error.
+ If there are multiple constants, they are tested in
+ depth-first, left to right order.
+
+Literate programming
+ There should be more of this. Language should make it easy by
+ allowing order of code to match order of presentation. This
+ means we should be able to add to any namespace at a later
+ date by reintroducing it with some sort of additive syntax:
+ type foo += { .... stuff }
+ Maybe could allow previous stuff to be repeated, which could
+ be filled in by tool and formatted differently by
+ presentation.
+
+ In literate mode, everything is a comment except between
+ [[[ and ]]] each all alone on a line.
+
+ It would be nice to allow the description to be processed as
+ markdown. For this, the compiler needs to extract precisely
+ the code blocks (not the span-level code though).
+ I think that and line indented 4 or more spaces following a
+ block line starts a code block which ends when a non-empty has
+ fewer than 4 leading blanks.
+ First line in code block cannot start with list char: - * + 1 >
+
+ Github markdown also allows ``` to start a code block, and ```
+ to end it. ```plato would turn on 'plato' syntax
+ highlighting.
+ stackoverflow would use <!-- language: plato -->
+
+ What about:
+ reStructuredText
+ Asciidoc http://www.methods.co.nz/asciidoc/ - nah, ugly
+ pandoc http://johnmacfarlane.net/pandoc/README.html#pandocs-markdown
+
+ reStructuredText is similar to markdown. It uses "::" at the
+ end of a line to suggest following lines until undent are
+ code.
+
+ pandoc is like markdown. One thing it adds is
+ ~~~~~~ { .plato }
+ Here is some code
+ ~~~~~~~
+ as well as ``` like github
+
+ When writing a program in literate style, is there a need to
+ write code out of order? I suspect that there is, as we might
+ want to add a second task for a loop to do, or refine a test.
+ Actually - that is a trick one. Can we *change* previous
+ code?
+
+ We could deliberately leave ellipses everywhere with names?
+ The block would have a name as well as the ellipsis.
+
+ This really needs to be tried.
+
+
+Extension
+ If I want this to be very general programming language, it
+ should be possible to write in various different styles.
+ This included:
+ - easy access to regexps like perl
+ - dynamic typed programming
+ - lisp-like programming?
+
+ A string could have an 'r' suffix to make it behave as a
+ regexp:
+ if "foo.*bar"r < mystring
+ if (a,b) = "foo\([bar]*\)x\(.\)"r < mystring
+
+ Dynamic typing needs a universal type which must be
+ extensible.
+ This is like an extensible union which I already considered
+ with universal enums. But that doesn't really work...
+ Need to think more.
+
+ Lisp? That is really a totally different engine with a totally
+ different syntax. Hard to see how it could possibly fit.
+ Any 'eval' based operation is really impossible to fit within
+ the syntax.
+
+ List processing more generally? If we have arrays that can be
+ chopped up and joined, we start to have something a bit
+ general.
+ Add list comprehension and map/reduce functions it might get
+ interesting. But then memory management gets interesting too.
+ Could 'tuples' be claimed by a type like literals can?
+
+
+Random stuff - don't know where this fits.
+
+ If a variable is know to be of some interface type, and we assign
+ a value of a more refined type, then the variable can be known to
+ have that more refined type until the value could change.
+ This might be useful for struct fields if we have exclusive access,
+ or named function return values.
+
+Strings
+ Strings need to be a built-in type. But how should they look?
+ NUL termination has problems - counting.
+ Storing a count in front can be wasteful if we want 64bit counts
+ and short strings.
+ Could take the utf8 approach:
+
+ 0-127 : 1 byte 7bit string length. (128 bytes)
+ 128-191: 2 byte 14bit string length (16 Kilobytes)
+ 192-255: 8 byte 62bit string length (4 exabytes)
+
+ We always nul terminate so strings can be passed to 'C'.
+
+Numbers:
+ I really don't like the fact that ints can overflow without me
+ knowing.
+ I think any possible overflow must be checked, unless the var is
+ explicitly declared as cyclic (modulus).
+ This means the language must know the range of values, or include
+ lots of tests.
+
+
+runtime assistance.
+ - profiling. collecting statistics of where time is spent
+ - coverage. Count number of types each block is entered and allow
+ this to be accumulated over multiple runs.
+ Then format code with highlighting of un-executed and hot-spot code.
+
+ For interpreter, this can all be internal.
+ For compiler, we need an external data file. Probably a log that we append
+ on completion and process later, using uuid hashes of functions.
+
+ Unit-tests, with mocking. This needs lots of experimentation.
+
+
+Literate help.
+ Sometimes it might be nice to assert that the order of things mustn't matter.
+ When building a function up during literate programing you might want to
+ say the certain things happen, but you want to be sure their order isn't important.
+ i.e. they all appear to happen at once.
+ [Other times there are dependencies that must be honoured.
+ For these, "after foo" or "before foo" might be useful
+ ]
+ In particularly, when updating a data structure a number of steps might
+ be needed. Each should be able to access the 'before' field and assign the
+ 'after' fields.
+ Maybe an explicit syntax for 'before' or 'orig', and an error if there is
+ room for uncertainty.
+ We probably want a name for a block of code in which a value is varying.
+ Invarient apply at start and end but not during. Values assigned are not
+ visible until the end.