Design notes for a language - 2013 version =========================== Design Philosophy ----------------- A large part of design is compromise. Once the good ideas are formed, they must then be blended nicely, and this blending always involves compromise. So we need a clear philosophy to guide this compromise. This philosophy does not serve to provide simple answers, but rather provides some unifying themes that can but use to motivate, or at least to justify, the answer. 1/ The compiler needs maximal information so that it can detect errors and optimise implementation. The syntax needs to be clean and non-distracting so that a human reader can easily see what the code is trying to say. These two principles are clearly in conflict. One wants to add more to the program, the other - less. Finding a balance between these two for each design element is key to success. The best answer will involve finding a syntax which provides information without distraction. This will not always be possible (if ever) but is a worthy goal. 2/ Language should enable a feature to be provided, rather than provide it directly. This means that the feature can be in a library, and can be changed using the language itself. This is inherently more flexible. However if the language doesn't "know" about a feature, it is less able to type-check and optimise it. 3/ Things that are different should look different. This allows the compiler to correct silly mistakes more easily, and reduces the chance of making them. So trying to use just a few core concepts is a mistake. It muddies human thinking and makes life harder for the compiler. 4/ familiar is better than novel mnemonic is better than obscure short is better than long consistent is better than irregular pronounceable is better than unpronounceable Design Concerns --------------- Name of language token syntax comments expressions or statements if/then/else -> ? : But what form could we use for a case/switch? Maybe we shouldn't. syntax of structure blocks How types are constructed and described - scalar: int, float, char, Bool, enum - vector: zero based. with size - struct - tuples - how first-class are they? - union/variant/alternate - non-discriminated unions? - lookup table, stack, queue, - functions and methods - lambda forms - closures - references Does a "pointer to a pointer" make sense in a strongly statically typed language? Should enum values be pervasive, or require typename: prefixes? 'import' should be allowed to import an enum namespace. Slices - pointer + length. What is the role of 'capacity' in Go? formal parameters - in/out/inout? Just a struct. are the names of parameters part of the type, or not. Probably not. But names in struct are part of the type. So not really a struct. return values from function: 'return', or name=value closures for loops what does 'break' mean in a closure? or 'continue'? or 'return'. I guess the closure includes implicit labels. _loop _func So "return label value,value" could return from enclosing block named 'label'. break/continue/goto __reserved_names__ case/typecase/match/switch exceptions/errors defer/finally/after... how is this scoped? - does it create a block: try BLOCK finally STUFF - does it run on function exit - can it attach to the lifetime of a value: x = take_lock(semaphore) unlock implicitly happens when x binding is broken. printf - formatting, input/output in general manifest constants of various types What are strings? What operations are in the language? What sorts of numbers are built in? Sized ints, unlimited ints, float, fixed, imaginary, complex Initialization and default values Polymorphism: subtype parametric If parameter is untyped, then it can still be useful if any needed function (e.g. compare, add) are passed in as well. These functions could be in a structure ... which could be a vtable.(virgil) Scope of bindings overloading of bindings. - functions/methods with different arg types - operators for different objects - function with different first arg type is like a method and can be useful for sin(), min() abs() etc. Certain names could be declared as methods and converted to method syntax when called. subrange types - useful for validating array references - useful for avoiding 'default' case in switch - can be deduced from some arithmetic: 'mod' particularly overriding of bindings - relates to inheritance. Nature of bindings: immutable, const, var, computed target: script, jit, pre-compile, ... "eval" memory allocation and management static or dynamic typing threads/tasks co-routines, generators macros equality tests assignment semantics - references, copies binding versus assignment: particular relevant with tuples Type inferences avoid redundancy, but redundancy can be good. modules foreign function interface pattern matching for destructuring This is the general case for "tuple = tuple" with the question of which bits on the left are literal and which are formal names. This is only an issue if the appearance of a value starts like an identifier "Some(value)" "None". 'unsafe' at different levels - bit manipulation - memory allocation/casting - concurrent access. tracing/debugging can the language/compiler make it easier to trace code execution? flag every time a field changes dependant types reflection tags for reflection structure embedding using "import struct" or "import fields from struct" ?? function attributes: optimise_options, inline, .... destructors? - go doesn't have them - relies on 'after'. introspection why is this so much fun? I never use it in C!!! or Python. introspection in Go look weird and is probably related to lame type system. nullable pointer: builtin functionality or via a union? array indexing: should checks be explicit where not resolved by type? operators: type must match. A constant can be one of several types. iterate through options until expression is type-correct. If there are several constants .... might need to be careful. a = 1 << n '1' must match 'a' if (1< (return type) Do we want a name to introduce? "struct" "fun"/"fn"? If so we should for consistency have array ... like Pascal. Without a name: { name:type, name:type} is a struct [length:type ] is an array ([5:int] reads "array of 5 integers"). (from:type,from:type -> to:type, to:type) is a function { tag -> name:type, name:type; tag2 -> name:type, name:type } is a union That is really rather elegant. May be a bit too terse though, don't know. 'tags' get assigned globally unique numbers, so multiple tagged sets can be embedded in another. However if the first one is "tag = 0 -> ... ,", then zero etc is assigned and embedding is more limited. Does that just leave pointers? - closures have same apparent type as functions - methods do too. - tuples only have implicit types - unless they are structs or function parameter/return - vtables ? interfaces So: pointers. Maybe just a leading '*'. But do we want to distinguish among: borrowed owned counted collected These are properties of the object in some cases, the pointer in others. An object can be: stack, owned, counted, collected A pointer can be: strong or weak. But a strong pointer needs to know what to do when dropped: - nothing (so may as well be weak) - destroy/free - dec ref and maybe destroy/free So a pointer needs to be one of: borrowed, owned, counted, collected * ~ + ? Pointers to stack are always borrowed. Raw pointers are borrowed too, but can only exist in unsafe code. And vtables. These are usually created automatically from method and interface definitions. So they don't need a syntax. Interfaces do though. They are sets of named functions. So they look just like structures but all the fields are functions. In fact they *are* structures. They use embedding for inheritance. Pointers to different values are quite different. - pointer to scalar or record is just an address - pointer to fixed-sized array is just an address - pointer to var-sized array is address plus size - pointer to implementation is a pointer as above plus a pointer to the vtable. This makes var-sized arrays with vtables a little awkward. But is there really any value in allowing implementations on var sized arrays? Yes, I think that would be nice. I guess they get converted to struct for a vtable pointer, with the struct stored ... in the 'once' space. Should it be possible to add to a struct after it is defined? This sort-of makes sense for unions where extra cases could be added. With structures we can embed old in new. Maybe that way around is best. This new union has all the value in the old one, plus some more values. A value from the old union can be assigned to a variable of the new union, but not viceversa without range checking. How does multiple embedding work with unions? With an offset? Initialisers for different types: Array: [ value, value, num: value, ] struct: { name: value } union: { tag -> name:value, name:value | tag -> name:value } tuple: (value, value, value) - can map to several known types. function: .... functions are more awkward. If the type is given explicitly as def name : type = value then we don't want to repeat any of it, but if we don't have an explicit type, we need to at least list parameter names. def foo = (arg, arg, arg) { code return rv,rv } or assume '_' is the argument list def foo = { my, args, = _ return args, my } Real asymmetry. What is the inverse of 'return' ? import? with? receive? 'receive' might work with co-routines? Maybe not. Another thought about structs. There are 2 sorts: Those which are used for external communication and must be in exact order with no padding, and those which are only used internally and the compiler is free to reorder as it chooses. The second sort could usefully be extended later. This would allow a literate programming style where elements are added to the structure at much the same time as the code which uses it. External communication structures could be flagged as big-endian or little-endian (or host-endian). or should each field have an end? Probably not. Pointers: I don't really like the '*' prefix operator in C, nor the need for '->' after pointers where '.' is enough after a struct. Pointers to arrays behave the same as the array. It would be nice if pointers to all things behaved just like the things. There are some exceptions though: 1/ p = foo this could make the pointer point to foo, or copy the contents of 'foo' into the thing pointed to by 'p'. Both are valid so we need a simple syntax to differentiate. 2/ if p < q: Could compare the addresses of the boxes for e.g. double-lock deadlock avoidance, or could compare the values themselves. 3/ if p == q: Does this say "the same box" or "the boxes hold the same values"? Both are necessary. i.e. these are the things that you can do with a pointers: assign it and compare it. if p.foo and p[foo] and p(foo) automatically dereference, what syntax do I want for the occasional explicit dereference. Or do I want a syntax to inhibit auto-dereference? p. = foo could assign to the box rather than assign a new box. Or p = foo could assign to the box, while p = &foo could assign a new box. But pointers to pointers become problematic.... or do they. If general the languages imposed whatever referencing or dereferencing is needed to match types. If & is present, then it skips up one level. But parameter passing should be like assignment, and the default should be by-reference if possible. So assignment should be by-reference. So we need *p = *q or p. = q. or decide that annotations on the right are deduced based on type, *p = q p. = q p[] = q p < q should normally compare the values though. p == q should compare pointers If p is a pointer to pointer to int, then p = 5 means a double dereference on the else, which I think is wrong. p = should alway cause p to point to something new. p. = can change the thing p points to. so p.. = 5 would be needed. That's a bit ugly. (p) = 5 does that help? It shows something odd is happening. **p = 5 At least we are familiar with that. As pointers are distinct from other values (e.g. they cannot have methods), they could have a different assignment operator. p = value # assigns the pointer p := value # copies the value into p. Maybe I'm trying too hard here. 'p' is the pointer. Any operation on 'p' other than comparison dereferences it. That is ugly. p === q could be pointer comparison or p is q like in 'crack': http://www.mindhog.net/~mmuller/projects/crack/Manual-0.3.html Maybe "p." means "dereference as many times as needed". So it never receives a pointer, only a non-pointer value. To change a pointer you could newp = thing where newp :int* = p i.e. explicitly name the type of the value being copied. p =(int*) q Thought: assignment is different from function parameter binding, as the latter can never mean a 'copy' if the target is a pointer. There might be a bigger picture here. 'pointers' are very different values to most others. They cannot be the receiver of methods, only the thing they point to can. But what about pointers to arrays aka slices. Can they be the receivers of methods? Yes in exactly the same way that arrays can. Which is like saying that 'pointers' can receive methods, but only the methods that their targets receive. This is a little awkward for slices because they need to carry a parameter: the size. Unless we support parameters more broadly and decide that any type might have arbitrary runtime parameters. So a slice becomes a hidden object with a size and a pointer. Does that help??? What I would like is: a < b compares the pointed-to values. It is exactly a.cmp(b) < 0 a == b does that same: a.cmp(b) == 0 a = b assigns the values a = &b assigns the pointer. My biggest problem is that the mapping "x < y => a.lt(y)" combined with the fact that field selection auto-dereferences, means that "a < b" compare content. So "a == b" must also compare content. So let's go for 'a === b' asks 'are these as indistinguishable as possible'. And in 'unsafe-pointer' mode, a <<< b can compare pointers. Given that, what does "a = b" do? It really should assign pointers. So remaining question is how to assign values. a = .... // always changes a, not what a points to a. = .... // always changes exactly what a points to a.. = ....// changes the thing pointed to by what a points to. a = copy b ??? This is like "swap a, b" in a way. Getting the 'type' of that would be interesting. owned? borrowed? Structures can be internal (no implied order or padding) or external (order and alignment rules apply, endian can be specified). Unions can define a local enum, or use a predefined enum (Boolean might be useful) For unions, the enum might be implicitly stored or a stage parameter, or it might be an explicit structure member. Do we want names for these things like 'struct' in C, or do we want to use 'traits' to distinguish, or something implicit? Enums: In C, the enum values become global names and the variable has one of those values, which are integers. In Go, enum values are local to the type: type.foo type.bar. In Rust, the values are global function names which can return a suitably qualified union value. I like namespace control. I also like enums being tied closely to unions. So if an enum is actually a union, then ?? If 'v' is a variable, a 'e' is an enumerate value with member x,y v can be compared with typename.e v.e is true or false v.e.x and v.e.y are only accessible if v is known to be e or v.e could be one of these things that is a value or nothing, and can be used with 'else'. If there is no ambiguity, v.x is valid and can be used if v.e is known. Or can be used with an implicit test of v.e if v.x == 3 is implicitly "if v.x && v.x==3" or if a := v.x Pointer traits The compiler understands 2 types of pointers: strong and weak. Weak pointers are held in the context of some strong pointer. The strong pointer may be to the same object, or it maybe to some container object or allocation domain which is known not to free anything for the time being. Weak pointers can only be stored in life-limited variables which are subordinate in lifetime to the strong reference. This includes local variables in a lower scope, and variables inside the strong-referenced value. Strong pointers can be moved, copied, or destroyed. When moved, the previous storage must become invalid. When copied, the storage manager much be told. This might copy the object, add a refcount, or do nothing and leave it up to mark/sweep garbage collection. When destroyed, the storage manager again must know. It might decrement the reference count, free the object, or again do nothing. "same object" is a fairly vague term which needs to be made specific. It could be a structure - holding two pointers, one strong and one weak. If they were to the same type that would be pointless. If the strong were to a container and the weak were to an object that is somehow 'inside' that container, it would make sense. It could also be a linked collection of structures such as a linked list or a binary tree. In order for the language to be able to "reason" about such a thing it would be necessary for it to "know" about such a thing, so there would need to be a way to describe it. There need to be rules for when two structures (including arrays) are part of the 'same' object and when they are part of different objects. We also need to know if these things are hierarchical: whether something can be a 'sub-object'. This important rule is that all weak references in an object must be invalidated before or as the superior strong reference is moved. This has many interesting cases, one of which is the doubly linked list. In such a list the forward link might be seen as strong, so the backward link would need to be seen as weak. It is a distinguished weak link as it can always be found from the object. So to move the strong reference, we need to be sure all weak references are gone. That doesn't mean we need to invalidate all weak references in the whole object, as we know how to get to the only one. So it seems some weak references as 'findable' - probably in various ways. The language needs to know if an object can contain: - no weak references (so moving strong references is trivial) - findable weak references (so moving a strong reference just requires focused removal), or - general weak references (so moving a strong reference requires invalidating all references) That case of doubly linked lists is also interesting as a structure can be on multiple lists. This can be managed in a couple of ways: - refcount the structure so it can have multiple strong links for the forward links - have one list subordinate to another so that the struct must be removed from the subordinate one (or verified not present in) before being removed from primary one. This is probably the sort assertion that would need to be specified in the language When we have a weak reference stored in a structure we need to be able to explain why it is safe. And when we remove a strong reference, we need to explain why we know that all dependant weak references are known to be resolved. Attributes of things. Different fields, locals, globals etc may have different attributes beyond their type, or maybe as part of their type. Some of these can change during the lifetime of the name or value. I need to collect them all together to try to understand them. constant - a value assigned (or bound) once and never changed. if it is scoped then it is bound once whenever the scope is entered though possibly to a different value each time. A name might get reused within a scope if it is only assigned when dead. e.g. a = foo print a if bar: a = 1 else: a = 2 print a here 'a' has been reused as the foo' value dies after the first print (it can never be seen again). This 'a' could still be 'constant' There is a question here: when would 'foo' be destroyed if we had destructors? It feels like it should be destroyed as soon as it dies. However if it holds a lock or other resource, we might want it to only be destroyed when it goes out of scope. stable - This is like 'constant' but applies to a value rather than a name. It means the value (which might be a large data structure) will not change for the moment. 'stable' needs to be asserted while being the sole owner of a value. add-only- This too applies to a value. It says that other values can be added to the main value, but not removed. This only applies to values included by dependant references. readonly- This attribute of a borrowed reference means that it will only be used to read. The value might change, but this reference won't be used to change it. A value can only become stable while all borrowed references are readonly. mutable - This is maybe just the absence of the above. A mutable field can be changed as long as we have a mutable pointer to the structure. A mutable local variable can be updated when not dead and can possibly have it's address taken. owned - This is a strong reference - when it dies the object dies. Maybe we can have 'indirect' references through e.g. a refcounter. Then when the ref dies, the refcounter is decremented. And if it dies, the target dies. borrowed - A weak reference. Must be in the context of some strong (owned) reference to something. The strong reference might be to the same thing, or it might be to a container object which is add-only dependant- A reference in a structure cannot be borrowed but it can be 'dependant'. This means that the referred value must exist somewhere else.... need to clarify this. pure function I think we will need user-defined attributes. These record important aspects of the state of a value. These can be asserted by unsafe code, and required by functions etc and the compiler can check that requirements are always met. So a list_head might have an 'empty' flag which is cleared by 'list_add' and has a known value after the "list_empty" test. The 'stable' flag in one data structure might be implied by the 'held' flag in a lock. Integers could have a 'non-zero' flag which allows them to be the divisor. They might also have an upper bound - like arrays? I sometimes "know" that a number is a power of two, which is easy to test for A similar flag would apply to pointers which may, or may not, be NULL. Names for types? I don't like typedef in C. It hides stuff. I see a type name and I don't immediately know if it is a pointer or a structure. Structures need names as do enums and unions. Pointers don't need names. Scalars (uint32 etc) have names and don't need more, unless some sort of 'unit' concert were included. What about arrays? That are like pointers so maybe not. But array contains a size. It is like a record in that it combines to things - member type and size. Is that enough to justify a name? I suspect not, so in the first instance, only things with names can have names. Polymorphism: Parametric polymorphism is quite important. Various types can be parameterised by other types. This sets the types of fields in structs and members of arrays etc. Also want values as parameters for the size of arrays. And internal parameterisation so the size of an array can match some int field, or an enum can select an option. Functions/methods are very important here. The type parameters may be implied by the actuals. This then needs to imply the types of the results. Each result can be an intersection?? Or maybe it can be a type-function of the parameters - but do we allow intersections? A parametric type is like a function. It needs a name and some parameters: type foo> = { thing : a stuff:b } A method for a parametric type is then a parameteric-typed function interface foo = { fn foodo(first:a, second:a -> bool) } methods might want an extra type parameter, 'self'. Using this would require the compiler to know that two objects didn't just conform to the same interfaces, but were of the same type. Shouldn't be too hard. But we also want a function to be parameterised. choose(which:Bool, first:a, second:b -> a|b) which could be: choose(which:Bool, first:a, second:b -> result:A) When this is called, the actual parameters have a list of types, including any interfaces. Each use of a type-parm intersects possible types until we have minimum set. Then results is a possible list of types... needs to be the intersections.. Interfaces: Interfaces list methods that can be called on a value. Often we want an interface with a single method. This is particularly useful for the methods that operators map to, but there are lots of others. Constructor, destructor etc etc. Go effectively does this as you only need to declare the methods that you will use - the effective type is the sum of all these. One advantage is you don't need two names for everything Comparable:compare, addable:add, destroyable:destroy. But it can be good documentation to collect related methods together into an 'interface'. It can also provide namespace control. method foo(self:A, b:int -> bob:rune); This declares a method which can then be realised for different concrete types. interface bar { foo(self:bar, ...) } declares the same but "bar.foo", not "foo". Maybe syntax should be more similar. interface foo(self:Self: b:int -> bob:rune); ?? But now we have a preference for polluting the global namespace. If I want a method that is just for my type, it should not require an interface. I guess: method mytype.mymethod(thing, thing -> thong) is a local method: method foo.bar(...) is an interface method. impl foo for mytype { method bar(....) } can save some typing.. Every type effectively defines an interface which applies only to itself using its name. Syntax is still a bit ugly. I don't want to assume 'self' is the parameter name, and I don't want it to appear as first arg. So where do I put it? Go does func (name Type) methodname (parameters) (result) Which works because all methodnames are global ... sort of. I want two methodname namespaces: - local to the Type - inherited from an interface. I could use .interface.method?? func (name Type) .compare(other Type) matches interface .compare(other T) Another thought. I sometimes want an object to respond to the same interface in two different ways. A bit like being on two lists. I might have a table with byname and bynum lookup methods which both use []. mytable.byname['fred'] = entry print mytable.byid[42].name Sometimes ".foo" is an attribute, sometimes a view... So an interface can be bound to the object, or to a name in the object. Garbage collection: Supposing that I thought this was a good idea, how would it work? The general idea is you keep track of what memory is allocated in what sized blocks (as you have to for 'free' anyway) and keep one spare bit for each piece of memory. Then you : - clear all those bits. - scan all allocated memory looking for anything that might be a pointer into the target area, and setting the relevant bit. - when you set a bit, you scan that memory as well - when finished, any memory which doesn't have the bit set is ipso-facto free. Variations involve breaking the heap up into sections, possible with different age properties, and only cleaning one section at a time. That might mean keeping a record of all pointers that cross a boundary from one section to another. Another variation involves copying boxes to make free space more compact. This requires finding all inward pointers. It is really hard to see this as a valuable thing. The programmer should know what is going on in general, and the language should be able to track and enforce proper handling. It seems useful for language which encourage lots of short-lived objects, e.g. for string and list manipulations. The language should be able to track these, free them when no-longer needed, and copy them to a more stable area when they are made more permanent. I think I'm going to have to write some code and see what happens. Type Equality: When are two types "the same" ? Some languages use "same name". Others use "same structure". Both have their appeal. Go uses structure unless both types have names - then it requires 'same name'. I wonder if "int" is a name. Probably it is. I'm cautious of structural equivalence of structures. The 'names' of the fields are important and I don't think two fields are the same just because they are spelt the same. 'arrays' don't have names, so structure works there. Parameterised types should match structurally too - array is just a parameterised type. Ditto for pointers. Two structures declared separately are different though, even if the names and types are the same. What about functions: Are the parameter/return names part of the type, or are they merely internal name bindings? They look like structures but aren't usually used as structured. Yet when arg lists are long it could make code more readable to tag some params. I guess we fall back on "unnamed things match named things". If both function declarations have names, they must be the same declaration. If either doesn't they match on structure. A manifest value doesn't have a type. So if I say: a = { foo=1, bar=2 }; then there is no type. That value could be of any type for which it is a match. If the type of 'a' is known, it must match. If not, the binding is illegal. There is some finite list of types which this could match and we effectively try them all until one is type-legal, or we run out. Automatic casting: casting numbers to larger types is probably safe?? What about casting a Foo to a union which has Foo as one branch. This can be a compile-time rewritten to a simple union creation. Expressions or statements? i.e. do I want to be able to say: a = if x then b else c fi so a becomes 'b' or 'c'. I tend to think not. if/then/else/fi is a statement and making it an expression as well can be confusing and make it hard for the compiler to determine what your mistake is. This also applies to 'switch/case' but not to while/for. For 'if', can use a = x ? b : c but that doesn't generalise to switch very well. Does that matter? Another syntax is : a = b if x else c; which is quite different to the reader - both human and machine. For switch: a = b if x, c if y, d if z, e otherwise using true Case/switch: Should this just be syntactic sugar over if/then/else. I don't like the idea of typecase or Rust:match as it provides functionality that is 'only' in the case statement, not stand alone. To that extent I like Go:type assertion as it is stand alone. But what about fall-through. It is implicit in C and explicit in Go with yukky rules. Why do we find that we need effective 'gotos' there, but not so much elsewhere? Maybe it is like dropping an else. if a: b else if c: d or if a: b if a or c: d No, it isn't quite like that. switch (num_bits) { /* fall through... */ case 8: if (byte & 128) num++; case 7: if (byte & 64) num++; case 6: if (byte & 32) num++; case 5: if (byte & 16) num++; case 4: if (byte & 8) num++; case 3: if (byte & 4) num++; case 2: if (byte & 2) num++; case 1: if (byte & 1) num++; default: break; } readdir uses it: switch is on 'state' and we might handle several consecutive states. case TypeSegmentMap: case TypeQuota: /* These only get merged while in a checkpoint. */ if (fs->qphase == fs->phase) break; printk("Suitably in-phase\n"); /* FALL THROUGH */ case TypeFile: case TypeSymlink: one case is a special-case of another. The enum doesn't show full structure. Really want to repeat the TypeSegmentMap an TypeQuota with TypeFile etc. switch(why) { case NewSpace: watermark += RELEASE_RESERVED * fs->max_segment; /* FALL THROUGH */ case ReleaseSpace: watermark += ACCOUNT_RESERVED * fs->max_segment; /* FALL THROUGH */ case CleanSpace: case AccountSpace: /* Definitely no water mark here. */ break; } Again, there is a progression of values, like the first case. So two options: 1/ state-machine and we handle a state and move on 2/ values have a specific/general relationship. and probably other less general things. FIXME Maybe "switch" chooses just one branch, while "select" chooses all relevant branches? Or something Break / continue / return Do these have labels? I almost never need labels. But I often use "continue" inside a 'switch' to break out and continue the surrounding loop. A 'break' from nested 'for' loops can be good. But labels are ugly - how are they indented? Or are they part of the 'loop' syntax? Sometimes I want to break out of an 'if' so I make it a 'while', which looks odd. But breaking out of an 'if' would mean that to conditionally break a while would need a label. But break from 'if' is also conditional, so maybe not needed.... gets ugly. The label could be implicit. 'break if' ??? This is really a controlled 'goto'. We can go to: - end of block. 'continue' for while, 'break' for switch - out of block. 'break' for while', 'break' for switch What if we tagged blocks as being 'breakable'? Too surprising. What about "break variable" which leave the scope of the variable? This would be hard to break out of a while loop. Or "next i" would continue a for loop Or any variable bound once in the loop. then "last i" could break out. or 'done' or 'have' or return! let v =: if asa return if sdfsfd have v2 Anonymous/blank variable. Go uses '_' as a 'blank' identifier. It can be assigned or bound to anything which evaluated the thing but ignores it. This makes some sense. For 'switch' it might be nice to have a similar thing which holds the value which is being switched on. If the name is used, the value must be 'true', else it must match the value. This is a rather odd semantic. And what is the scope of the name? Including blocks within switch would be useful, but then nested switches redefine a name, which is less good. So maybe it should be switch _1 = value: case _1 == 3: whatever case _1 == 4: otherthing Structured blocks: { curly brackets are common, but a bit ugly } BEING begin/end is rather verbose END COMMENT ending with reverse-spelled open is cute but silly TNEMMOC proposal: indenting stuff after a colon is really cool and works rather well I think. It doesn't easily allow do { } while(); constructs, though do: stuff until done would work. There is no harm in allowing a '{' instead of the ':', and then ending when a '}' is found. Indenting.... how far can I push this? - two consecutive non-empty lines must have compatible indents meaning the strings of leading space/tabs are either identical, or one is a prefix of the other. Anything else is an error. If the second is longer, then second line continues the first. If they are the same, the first line is terminated if that make sense. If the second is shorter it must match an earlier indent and all parse back to there is terminated. This is all much like python - Can we require that smaller units much be indented correctly. i.e. if a nonterminal starts half way into a line and wraps, then the next line must indent the same depth: if a == b and a == d: for a fairly ugly example. i.e. the cfactor is still open, so the next line must be indented to the start of the cfactor. This isn't needed for syntax but avoids confusing looking code. Need to be careful to allow opening bracket at end of line. i.e. if the only part of the nonterminal is a literal, then that non-terminal doesn't count. A "pragma" or "safety feature" would select among: - ignore indent - indent for termination. - check all indents Go uses newlines without worrying about indents. I don't think I like this. Indents are important sometimes and newlines aren't always. So yes: options. which specify tab size for 'check all indents'. Colons: I seem to be using the humble ':' too much: 1/ The conditional expression a?b:c 2/ The indented block opener if a == b: do_something 3/ path separator for modules io:printf() 4/ To introduce a type: var foo : type = value; so if I wrote: if a == b ? io:printf() : print='yes': Then I could get confused. The first ':' can't be the block opener because the ?: isn't finished. But the second ':' could introduce the global module :print So ?: doesn't conflict with if:, but we can't use ':' for modules. So either use '::' like Rust and C++, or '.' like other namespaces. :: is very noisy. Maybe that is good? . is very quite and make the lhs look like a value, not a module. Does that matter? Different things should look different. Large scale structure: Key elements of a program are: declarations these define types, methods, functions, and give names to modules .. and other things statements These always live in functions/methods I think. expressions These mostly live in statements or some declarations. So a program should be all declarations. But that isn't much fun for "scripts". Or would scripts look better if they were forced to keep code together with a low-syntax 'main' function. func main(): code goes here declarations start with the thing being declared: func type mod const var impl each is followed by a name and a value - maybe. Not sure where imports fit in import name-to-import 'path-to-module' ?? Assignments/bindings. There are 3 things one might want to say: 1/ This new name is bound to this value for the rest of this scope 2/ This new name is bound to this value now, but it might get changed later in this scope 3/ This name is already present in this scope and I'm now changing the value We could use 'var' 'let' or 'set' or 'const' or 'let mut' or 'with' to introduce one or another. We could use "=" vs ":=" for different assignments. We could not care about different between 2 and 3 We could use a sigil to differentiate 'new' from 'old'. In a tuple "a,b,c = 1,2,3" we could conceivably have all 3. We could add :type to name to introduce new name. a = b // only allowed if 'a' already defined and mutable a: = b // defines new 'a' DON'T KNOW about constants.... We could also distinguish "this is a default value" and "this is an initial value" where the former can be replaced without use. Probably not useful though. Least typing should be "new constant name". Next should be "reused mutable name". Most typing for "new mutable name". += -= etc mean pre-existing mutable. So := should too? I really don't like the a,b,c = function() thing. Having different bindings has got to be ugly, whether we make them so they are visually ugly, or assume them so they are conceptually ugly. a:=, b+=, c= function() a:, b+, c = function() Does a[xx] = b need to know if a[xx] has been assigned yet? If we don't want to pre-initialise arrays, then probably yes. For 'hash' arrays, there might be requirements about checking first. I like: ":=" means "replace current value". So '=' means 'assign initial value. But how do we indicate 'const' or 'var'. Could require 'var' in front? var (a, b, c) = function() (var a), b+, c: = function() a:, b+, var c = function() New thought: The matching idea from Rust could make interesting assignment. if 4*5 = fred could always be made true by binding 'fred' to 20. if action(state, NEWLINE) = REDUCE(production) will fail if the action is SHIFT, or bind production if the action is a REDUCE. if (a,b,c) = (1,2,3) if a and b are already bound but c is not, this tests a and b and if they match 1 and 2, binds c to 3 and continues. Probably need a sigil on the 'c' or 'production'. '?' if 4 * 5 = fred? if action(state, NEWLINE) = REDUCE(production?) switch (action) { REDUCE(production?) : SHIFT: ACCEPT: This 'binding' is very different to assigning to an array element or a struct field. There could be quite a lot of different assignment-like things. - 'global' constants which have a fixed value, at least for the invocation of a program - 'fields', both in structures and stack frames, which can be updated and have their address taken, so are mutable - 'constants' which are assigned just once in their scope - though possibly in multiple code locations due to branches if foo() { a = 1} else {a = 2} 'a' could be a constant - 'reused constants' - one name which is used repeatedly but each time it is really a different variable. There are clear points before each 'assignment' were it can be declared 'invalid'. e.g. a 'status' return value that is only used for immediate operations. These are never assigned while 'live'. So a = 1; if b {a = 2}; print a is not an option as the value from 'a=1' could be printed and so is live during the if statement. These can even be 'updated'. a = 1; a = 2*a; print a; Semantics are identical to a = 1; A = 2*a; print A; so it is a reused-constant. A for-loop control valuable should be a reused constant. - 'variable'. This is a value that can be changed in different ways on different paths. It is conceptually one value that changes. Possibly an accumulator We can assign to: - a name (local variable) - a value.name (field) - a value[expression] (array slot) Can we also assign to a function call? - foo(a,b,c) = 42 That would mean that 'foo' returns an lvalue. That is not unreasonable. But I'm still not happy with how lvalues work. a = lvalue Does that make 'a' point to the thing or does it copy the thing? I think it that form it must make 'a' the lvalue. *a = lvalue would copy the content. If 'a' and 'b' are pointers to pointers to integers, then a = b *a = *b **a = **b would all do different things in C. Is there any other way to achieve those same things with less syntax? Type enforcement can make the '*' on the right redundant. How do I move a reference, as opposed to borrow a reference? Or copy an object? In Rust it depends one the type. If there are mutable fields or owning pointers, it cannot be copied so it is moved. '&' is used to borrow a reference. I'm not sold on the type-dependant answer. I think the syntax should make is obvious. I think I want syntax for "take the value and replace with NULL". a <= foo then we could swap with a <=> foo But that is a comparison! Could use "<-" but it doesn't look like '='. a =< b ?? Then I could pass " for ''r { typify(n:''r -> regexp) } The interface 'n' is a parameterised interface interface new { typify(n:self -> r:result) } These need to be 'constant' or 'referentially transparent' or 'pure' functions that can be evaluated at compile time. Loops: while loops are easy-ish. What about list loop and more generic things. 'foreach' etc. Co-routines can have a function which returns successive values, via 'yield' or lazy evaluation. Or a closure can be passed to a looping function. Or a 'iteration object' can be returned on which we call 'next' repeatedly. Loops: (repeating myself?) So many possibilities for loop while repeat/until foreach etc etc Really like exit in the middle do: preparation while condition: actual work For loops like to initialise and increment: with var = val then var++ while var < last: print var foreach loops need a variable(set) and a list of values, which might be generated lazily. Alternate to lazy generation, could pass the body as a closure to an 'each' function like Rust does. Lazy generation could involve a closure being returned from a function. The closure would need some state, so the 'return' couldn't discard it completely. These could be 'inout' parameters? or How to address the need for "search and perform special case if not found? for each possibility if match: break then Do this if loop completed normally else Do this if loop aborted. if: for each possibility: if match: use true else: handle not-found case. http://www.scribd.com/doc/38873257/Knuth-1974-Structured-Programming-With-Go-to-Statements talks about 'events' loop until event1, event2, ... stuff ... event1; ... event then event1 => .... event2 => .... This is a lot like exceptions. However 'until' has a different feel to it than 'try'. 'use' is better than 'raise'. I like this as a generisation of switch: switch: .... use foo; ... use bar; then: foo -> .... bar -> .... All 'use' values must be either unique identifiers or expressions with values that have a common type. If unknown identifiers, then they comprrise values in a new anonymous enum. The labels in the 'then' part are value of that type. A value *can* appear twice with some annotation(?) in which case all relevant branches are taken. Maybe the notation is 'next' at the end of the statement block? If there is no 'next', then the labels cannot appear again. Actually we allow 'next' and 'last' if and only if at least one label occurs again. If there is any next/last, then there must be one at the end of the block. if statements: Go has nice feature where you can perform an action before an 'if' test. if a = something(); a > 3 { use a } This is particularly useful in elseif constructs a = something() if a > 3 { use a } else if b = somethingelse(); b < 5 { use b } else if .... without the if action, we need an extra level of indenting. if a > 3 { use a} else { b = something else(); if b < 5 { } else ... But only a simple statement is allowed. Could I use "with" with: a = something() if a > 3 : use a else with b = somethingelse(): if b < 5: use b Looks a bit clumsy and probably ambiguous. Maybe: if with: a = something a < 5 no, that is much worse. We can make "with stuff" unambiguous by requiring a second body with: commands that can start new scope do/if/while/match Another idea. We don't allow assignments in expressions, but we allow a "where" clause at the end which contains a list of bindings. The expressions are only evaluated where needed: if a < 5 && b > 6 where b := n/2: will not divide n by 2 if a >= 5. This means that list can only be bindings, not general statements. If they are new bindings, how are they scoped? At least to the bodies of the if or while and the rest of the expression. If a binding were optional, it couldn't carry through: if a < 5 || b > 6 where b := n/2: if a<5, b would not be bound, but it would be bound in the 'else'. This lazy evaluation could work with a prefixed "with" clause too. Would assignments be OK? I think they could be confusing. Lazy evaluated binding could allow two different paths which want to evaluate at different times to just use one piece of source code. Another idea: As well as if expression: statements We could have if: function block with 'return' then: statements else... Maybe 'return' looks a bit heavy and a synonym might be better. yield? use? done? I like 'use' for now. The 'if' is one big scope, so binding created in the function block remain in the 'then' statements. This allows: if: a = 4 use a > 3 then: .... else if: b = a*a use b < 0 .... How would that look with {}? if { a = 4 use a > 3 } then { .... } else if { b = a*a use b < 0 } The same syntax could allow conditional expressions: a = if a = 4: use 4 else: use 3 though that is rather verbose and part of the appear of ?: is brevity. ----- If do want statements in the condtion of 'if' and 'while', optionally. if cond BLOCK or if: BLOCK then: BLOCK An expression block may contain "use" directives: "use True". They act a bit like a "return" - we have the answer. For while, it means that we can avoid "do {} while" as a separate case while: BLOCK do: skip We may also be able to avoid 'break' in some cases with "use False" at arbitrary places. 'continue' isn't quite so easy, though a whileswtch could do it:-) In a 'switch' we can "use" values other than True or False. If a name is used which isn't defined, then it is assumed to be a new enum, so all 'use' statements must use undefined names, and all cases much use the same set of names. switch: use do_it case do_it: print "Hello" case skip_it: abort() case names can appear multiple times with mcase(?) switch bit: case 8: if byte & 128 {num += 1} case 7, 8: if byte & 64 {num += 1} case 6..8: if byte & 32 {num += 1} case 5..8: if byte & 16 {num += 1} case 4..8: if byte & 8 {num += 1} case 3..8: if byte & 4 {num += 1} case 2..8: if byte & 2 {num += 1} case 1..8: if byte & 1 {num += 1} break? or last? "last i" exits the enclosing loop that modifies 'i' "next i" retrys 'for' loops can be: for assignement: while condition: then statement: do: stuff else: or for assignment ; condition ; statement: or foreach varlist = object So a general while statement is for initialisation (can introduce variables) while condition then block do block else block First do 'for', then 'condtion'. if 'condition' is true, do 'do' and 'then' and try cond again if 'condition' ever false, do 'else' and stop. Maybe: 'if' and 'switch' are much the same 'then' and 'case true' ditto 'else' and 'case false' as well 'while' case be followed be cases as well as by 'else'. With a 'for' I want a close to run if we didn't 'break'. I think 'else' does that. Do I want to use "else" for "default" in case? "then" happens if "use true" or no "use" value "case value" happens if "use value" "else" happens if "use false" or "use value" with no match. Print formatting. Only values with Interface 'printable' can be printed. The function is passed a width, a precision, and stuff. So string can just have '%v' for a value, or '%20l' for a left justified value. But what about '*' parameters - and checking there are the right number? Could allow a function to have a compile-time preliminary section which can throw errors and possibly infer types. This probably requires there to be compile-time functions which run fully at compile time. Though any functions that can, should unless tagged run-time-only. Function must be able to either flag a compile time errors. At runtime this is ignored, so a runtime error or coping strategy is also needed. So: function runs until non-constant is needed, or heap change etc. If no compile error triggered, all runs again at run time. But for libraries to work, the compile-time part must be easily available in the library without the need to rebuild the object code. This doesn't really answer the problem.. print "%20l", foo or print "%v", foo.fmt(20,0,'l') It would be nice to have curried functions here so that the 'fmt' function could take a stream from which it could get a buffer to fill. Or foo.fmt could return a function which acts on a stream ... that might be complicated. Exceptions and errors. Errors need to be easy to catch, but even that isn't really enough. They need to be hard to not-catch. So any possible error must be caught or explicitly passed over. That suggests that the type of a method should include the list of exceptions that can be returned. "try" is really pointless. An 'except' clause can simply be available to every statement. Maybe spell it 'catch'. { fd = open(); fd.write(thing); catch file_not_found: whatever } -or- { fd = open(); fd.write(thing); } catch file_not_found { whatever } A 'catch' can occur anywhere. It will catch any unhandled errors that happen before the location. So: fd = open(file) catch file_not_found: return "fool" fd.write(thing) return "wise" One thing the 'try' does provide is a synch point. in 'catch' we know that everything upto 'try' succeeded, so names bound there are certain to be bound. Without 'try' we only know that the current scope was entered, and that any code since the last equivalent catch which could not have thrown the error must have been executed. This is probably enough providing that 'equivalent catch' is well defined. At any point in the code there is a set of possible exceptions. A 'catch' subtracts from the set and any op that can raise adds to it. Within a 'catch' we know that any that happened before these exceptions last became possible has happened. This requires that exceptions can live in a set. Maybe there is a big discriminated union of exceptions? Or an hierarchy of them? I think an inheritance tree .. though that is hard to merge. i.e if two modules each declare some new exceptions ... not a big deal. We know the type of the exception set for each imported module. But wait: If I import two modules and want to pass up exceptions inherited from either, what is my exception type? I either need to offset enums when inheriting them, which is messy, or have globally unique enum values, which are like type names. So when I say: catch exception_name(parameters): that only applies to previous operations that could return an exception with that value Do I want an 'else' clause for 'catch' or should 'catch' clause break/continue/return to get past the real work-to-do. Can destructors throw exceptions? How do exceptions work in constructors? There needs to be a point where values transition from 'local' to 'in object' much like the 'return' in a function. Should break/continue/return be defined as exceptions? The (implicit) label they go to is the expression 'type' Scope: 'scope' applies to various things: - how to have narrow scope for bindings in block. Just having declarations at the top which go out-of-scope at the bottom is too restrictive - which names in a module are visible outside that module. I think they must all be visible within the module, where that is meaningful - how to import names from modules. "import xx from yy" is nice, "import * from yy" is good but risky if 'yy' changes, so maybe "import v1* from yy" which means "all names listed in v1. Or "import * from yy.v1". - what is the scope for 'finally' or 'later', 'defer', 'after' ? let a = b in c or c where a = b or with: a = b do: c The introduce bindings but they are either for a single statement or for a block. We can keep limit indents a bit by combining an if-list or loop with the 'with'. I wonder if that is enough. Having completely free scoping would require let a := b anywhere, and "unlet a" anywhere too. I think that is probably too free. For extra-module visibility there are options: - 'public' tag for all public names - 'private' tag for all non-public names - Some convention with capitals or underscores to direct which is which, but with UTF I don't trust capitals, and I hate imposing underscores except for "don't use this" cases. - 'export' lists (or 'hide' lists) 'export' lists would be good for wild-card imports. - external/internal section of the module Would need section in each 'struct' too. I think top-level symbols that might be imported should be explictly exported, and exported to a list if a wild-card import is allowed. Names in a struct which should be hidden can be marked 'private' as hiding them is less critical. All symbols should be available even without import, using e.g. :module:symbol This allows instant access in simple scripts without pre-planning. If 'module' is imported, then just "module:symbol" is sufficient. Possible ":module" searches current directory, then system path, while "::module" only searches system path. I think 'after' scope should be tied to a name so when the name goes out of scope, the action triggers. So it could be: after name: actions though most such actions are to free memory - which must be automagic, or to release resources (locks, files, etc) which should be the destructor of some value. i.e. 'open' returns an 'owned' value so it is auto-closed when forgotten. 'name' could be a loop label. That might be interesting. But should 'after' file when the value changes, or when the name disappears. I think it should be when the value changes where that is meaningful. Maybe it only makes sense on 'owned' values? Tuples: Tuples are present as the parameters to a "function". But where else? Tuples are usually (a,b,c) but can be useful as a:b:c. This is briefer. As long a 'tuple' isn't a first class type, "a" can be a 1-element tuple or an expression, depending on context. This is useful for a '%' function for string conversion. The args to '%' are tuples where first value is sent 'stringify' with remaining bits. so 3:'x' will print in hex ... bit ugly though. Maybe not needed then. mapping between tuples and structures is probably useful. Assigning tuple to tuple with matching return a tuple from a function .. well, return a struct but allow matching to a tuple of names. Tuple could equally be an array if length matches. virgil maps all tuples to lists of values, so nesting disappears. This is pleasingly simple. It means that you cannot have tuples as individual arguments or return values but if you cannot query the size (which would make them arrays) that doesn't matter. A tuple can be assigned (bound) to an array if the size is flexible, but binding contents of an array to a tuple is only possible if the array size is known at compile time. So the type of printf could (almost) be printf(string, printable[]) and it could still be called as printf("I have %d and %d", foo, bar); What about if a in (1,2,4,8): as a use of tuples. It expands to a disjunction of separate tests. Expressions with side effects. "functions" will often have side effects, but what about a++ a += 5 etc They are problematic because the interact badly with order-of-operation rules. They are useful because they allow brevity. if x && (a += 5) < 7 ... rather than if x: a += 5 if a < 7 ... order doesn't apply immediately after && because && is a serialisation point. if with a=5: a < 6 and then with b=6: Type specifiers: Do we want: type name, name or name, name type possibly with punctuation? Do we want annotations on the name: char foo[] or on the type [char] foo i.e. can we embed an identifier inside the type struct foo *(*ID)(); I think the C experiment shows that to be a failure. You still need the types without the identifiers, so it is best to always have them separate. i.e. a type should always be identified by a set phrase. type name = phrase var name: phrase = value Dependant types: http://beust.com/weblog/2011/08/18/a-gentle-introduction-to-dependent-types/ suggests they only work for immutable values and I don't doubt they are easier there. With mutable values we would need to allow the type of a name to change ... or be dependant on some variable. So if "array of N integers" is the type of A, then A[N++] = foo doesn't change the type. i.e. if the parameter is variable, the value can be mutable. So this boils down to describing invariants which the compiler will validate for us. i.e. a proof assistant. An object can be in a number of different stages. Some of these happen during initialisation, but others can be just part of the overall life-cycle of the object. During different stages, different properties can persist. For example an object may be immutable for a certain well defined period of 'time'. If the type system can know of this, then any references that only exist during that time can be known to be immutable, while references that last longer are not. There is scope for more than 'immutable' and 'mutable'. e.g. an object might be in a 'growth' state, where new components are added, but none are removed. The set of objects in a container be unchanging, while the objects themselves may change. At some stage, the set of contained objects might transition from one type to another type - or from one stage to another stage. For the language to be able to affirm this it would need to know that some operation visits all member elements of some type/stage. This might need to be (unsafely) affirm by the programmer rather than mathematically proven by the compiler. So there are two ideas here. Once is more explicit statements about life lines. An object can identifier several 'stages' which can be transitioned by various operations. Each component can have different type properties in different stages. The type system can restrict certain methods to certain stages. The compiler must *know* the stage of a pointer. It is static info store in the lifeline analysis. When an object is known to 'own' certain borrowed pointers, the lifetime of those borrowed pointers is affected by changes in the object's stage. stages can affect: - mutability: shallow or deep. Addition or subtraction or both or neither. - values of scalars - NULL, nullable, or non-null for pointers - size of arrays as related to other values - variable or constant. - 'once' or counted or collected for some pointers. Some stages are exported, some are internal only. It might be helpful to have different names for different aspects of type variability. We have Parameterised types: The type depends on some type parameter Dependent types: the type depends on some nearby value Temporal types: the type depends on a state which changes over time. Such a state might be represented in some variable, but it shouldn't need to be Linear types: The number of strong references is dependent on some variable or age thing. So this is an aspect of a type that can vary. These need to be specified and then enforced. Parameterised types make sense where-ever a type has multiple subtypes. This includes a struct with multiple elements and a function with arguments and returns. struct : next: f others: array of f func max(a,b:f):f Type checking this should be fairly easy. Implementing would be easy if f was always a pointer... Dependent types work in a structure or a function call. We probably want arbitrary computations on values(?). If changing the value means changing the type we need types of different precision e.g. mutable and immutable. Optional and non-optional. Or we need multi-assignment to change the value and the typed name at the same time. Or we can allow that the type is flexible between uses. So the type assertions are only verified when the type is used. So I can change inter-dependent fields in any order and it will work. Temporal types might be tricky to describe. We need to identify a clock. It might be an internal field or it might be an external value. By "external" we must mean with a super-ordinate object. e.g. a 'container' might contain lots of 'members' and the type of each member might be dependent on a field in the container. The clock could even be the program counter. Boot Strapping An interpreter written in C, with a compiler written in target, that links to llvm, seems like a good plan. Modules / packages / crates. This is the primary mechanism for namespace control and for libraries. Within a package, all names defined in that package are visible providing the right path is used. Outside the package, only exported names are visible. Some names are exported by default but can be marked "private". Others are private by default and but can be marked "public". Packages can define: - types: struct, enum, array, function, interface etc - constants - functions - methods - other packages Does it make sense for a package to export a method for a type that the package imported. As the name of the method is in the namespace of the type, it probably isn't polite to add something to someone else's namespace. But a new name for a type could be defined, added to, and exported. How are name conflicts handled. Certainly you cannot define the same name twice in the same scope. However: you might import a name with a wildcard, then define it. You might embed a struct/enum in a local structure enum and add name which already exists. This is interesting because the library that you import might get a new revision. It would be a pain for old code to break when that happens. So importing a library should specify a version. only names from that version are imported. You don't need to explicitly import if you don't want namespace changes. Just use ":modulename:symbol" to get any symbol from any module. If you want to avoid using ":modulename:" you can import name, name from modulename or import version_number from modulename exported symbols needed to be version-tagged. That could get ugly. private-by-default could be "public.2" rather than "public" which isn't too bad, but what about public by default? I guess I need to decide which symbols are which! There are "items": type/constant/function/package And there is fields, which include methods. Go: exports only things that start with upper case letter Rust: exports all field/methods but nothing else. We could have an "export" clause for items: export version item, item, item, item This makes "static" the default which is safest, and makes the "interface" easy to see. For fields, we could have an "export version" block, or "export version" exports all fields/methods listed *before* the pseudo. So when you add more to the end, they aren't exported. Though it is probably more consistent to have export version field,field,field inside the struct Should modules have "variables"? or any state? If so, they can usefully have an 'init' routine. Module naming: modules live in files with related name. If file name is a valid identifier, we can just use that name in the code. If it isn't, or if we want to change it for some other reason, we would need to 'import identifier "filename"' Also, the file might be in a directory hierarchy, where '/' might not be the right separator. So we need to import a path, possible into a module, and provide a local name for that, if default doesn't suit. import "file" = name import "file" { import "file" {import identifier=new, id, id}} Overloading Overloading seems to be largely disliked. Hard to be sure why. Overloading + - * / % is already done for numbers. Overloading [] could be very useful for hashes Overloading () && || ?: would be very bad because they have strong meanings. Ditto for = Overloading -> . prefix-* prefix-& could be weird Overloading & | ^ << >> &^ ~ might be OK Overloading == would be bad, it has to apply to all types anyway? Overloading < <= > >= ??? So: some operators need boolean. Possibly a 'testable' interface could allow anything to be tested. ?: && || ! some operators compare. So a 'comparable' interface? < <= > >= == != some operators are limited and can freely be overloaded: + - * / % & | ^ << >> &^ prefix-~ prefix-`-' some operators are special: [] [[ &^ is not really different from "& ^" except that "&^=" is legal ]] [[ what about bit operations without having to explicitly shift? |< &^< ?? |+ &- Maybe it would be cleaner to have a simple opp to convert a number to a bit. #1 Then we clear a bit with "a &^#= 5" ?? Or a 'bitset' type could define '+' and '-' to add/remove bits. ]] Overloading of a binary operator is a method-call on the first argument passing the second. So for binary operations like + - * / , left and right side must be identical, like with Go. But what about '<<' and '>>'. Go allows the RHS to be any unsigned int type. This isn't nicely consistent. To be consistent we need to required either a type or an interface. The type would need to be unsigned int. The interface would need to be to_unsigned_int_clipping which returns 0 or MAXINT if it cannot give a faithful unsigned int. This is much the same type as the index to an array. So having an 'int' and 'uint' interface might be defensible So each operator is defined in terms of a method call. e.g. a + b => a.plus(b) a && b => if a.test() then a else b fi a < b => a.compare(b) == less_than a[b] => *a.lookup(b, create:Bool) All of the first set of operators can also be combined with '=', e.g. += ++ -- : are the really needed? +=1 -=1 are just as good? Though the could call a.incr() and a.decr() Do we want a += b to be different from a = a.plus(b) ??? For a list it would be good to know that we don't need to duplicate. Wait... for 'plus' to be a valid method it would need to define the type of the second arg, and the return type. Don't like that. So treat it not as a known method, but a simple textual substitution. a + b is translated to a.plus(b) which is in any particular interface. On the other hand, test and compare are from an interface. What about lookup? Do parameterized methods help here? Type of plus is (X,X)->X Type of shiftleft is (X,uint)->X Type of compare is (X,X)-> (-1,0,1) List of (possible) operators. + - * / % & | ^ < <= > >= != == && || ! << >> "in" "not in" "is" "!is" "is<=" pointer comparison. [] [x:y] ?? >>> signed right shift? String catentation? '+'?? '++' ?? "in" dispatches on the right-hand-op. "string" does substring search "int" does bit test. if 'in' can to bit-test, how do I do bit set/clear? |- |+ &- &+ Not good as +- can be prefix -| +| ?? Hard to type... What is good for set-addition and set-subtraction? foo[] = bar is niceish for adding an element to foo, at end I guess. But it doesn't generalise to subtraction. :+ and :- ?? Could use ++ and -- except that I want ++ for concat. What is -- ?? 'else' - like || or ?: "a else b" == a if !!a, or b if !a 'then' ?? a then b is either b or NULL. 'and then' == && 'or else' == || 'and' == & 'or' == | 'not' == ! 'as' for cast: "value as type". I think I prefer "type(value)", though the 'as' allows separate name spaces for types and functions 'div' for integer division? Probably not. 'if'. "a if b" is either 'a' or False. This isn't always type safe, but "a if b else c" becomes 'a' if 'b' is true, or 'c' if not. so "a else b" is short hand for "a if a else b" Prefix operators: - negate + abs value? / reciprocal? ~ invert ! not & takes the address of a field? [n:] takes address of an array member Can we take address of a local variable? Or do we define the local variable as a pointer and allocate from stack. * dereference a pointer. This is normally automatic, but not for assignment < extract the value and set variable to nil. tasks/IPC co-operatively schedules threads across N processes with all OS call non-blocking seems like the way to go. Questions are: - how to slice? time? on blocking action? explicit yield? - when to clone? always? - how to create new threads? library or builtin? function or closure? - how to control threads handle for 'kill'? - how to wait for threads? maybe don't. When they die, they die. Any waiting uses IPC. - What to do for IPC? - message passing - shared memory with locks - monitor? - synchronous interfaces? - how much control of shared memory is needed? Message passing is clearly a good tool but I'm not sure it should be the only one. Something that abstracts locking could be good. E.g. a pointer that needs to be 'activated' before it can be used. i.e. a refcounted object which has "lock" which returns a 'once' pointer to a subtype which provide more access. The original allows read access to some things. Or a type where some/all methods are required to own a particular lock. Actually it should be that some fields are only accessible when some lock is held. So the type is parameterised by a lock. By default fields are only accessible on a single thread. If a lock is provided, then they become accessible when that lock is held. I wonder if we need more pointer types, or use dependant types.. - 'once' references - 'one thread' - 'ref counted' - needs locking.. so: - counted-onethread - managed-onethread - borrowed references might be onethread or need-locks ParaSail goes even further and parallelises everything. Any two expressions that aren't interdependent are potentially run in parallel. This requires the compile to know about interdependence. ParaSail puts in lots of requirements to avoid aliasing etc. You can never (I think) have two pointers to the one thing. This feels overly restrictive to me, but could often happen in practice. If the compiler could detect this - or at least make good use of owned pointers.. (may2014) I really like the idea that locking is done with 'once' pointers. The locks are released just like memory is freed. Weird idea: I wonder if a hash table of spinlocks would work so the same data structs can be used with or without locks. The obvious problem is that alias can catch stacked locks and I deadlock waiting for a lock that I hold. With little effort I could tell if it was me. What would I gain though? hashing pointers does take some time... It would remove the need to store spinlocks in any data structure. The base hardware locking primitive seems to be load-linked/store-conditional, or cmp-swap. If I were to expose that in the language I would really need to understand all the stuff about memory models, which might be fun to learn I guess.... Starting tasks go BLOCK starts BLOCK in another task. When it completes, thread is done. Any non-shareable values mentioned there become unavailable. Any refcounted values that are also used after BLOCK are increfed first. Scheduling if we have multiple threads then we need to have a scheduler. We could use OS threads and use the OS scheduler, but that is likely to impose an overhead that we don't want. So even if we schedule only when a thread blocks, we still need some priority info to decide who goes next, and some way to share threads across a pool of OS threads. Explicitly assigning numbers is very boring and error prone. Putting things in order somehow would be nicer. With event-loop code we can have an ordered list of what to do next. This is effectly an explicitly coded scheduler. To do something like that means putting all routines into some data structure in the language. That would be good for introspection and killing etc. Some threads are blocked, some are ready. A thread can be placed in a group for round robin. A group can be ....(don't know what I was going to say). Memory allocation: How do I get a new 'box' to hold a value? For 'stack' boxes, it just happens automagically as needed. For various heaps we need some syntax, like new(type) but we need to know which heap (per-thread, common, whatever) so: new(type, heapname) But I'm not very keen on builtin function - they pollute the namespace. var name : type = @heapname I don't think I just want a sigil like Rust as I suspect there could be multiple sources of allocation, or styles of allocation - enough that names might be useful. Whatever it is, we need to indicate: - a type - could come from the variable being assigned - parameters to the type such as array size or union discriminant (does that make sense?) - region to allocate in - type of pointer: unsafe, counted, owned, managed, never-free These could be associated with the region. 'Managed' (garbage collected) and 'never-free' really need to be in their own regions, but counted, owned, unsafe can be together. However they may well be managed separately. The compiler only knows about 'owned' and 'borrowed' references. When an owned reference is dropped the manager knows whether to dec a ref or explicit free or do nothing. Allocating from a 'never-free' manager will return a borrowed reference from the manager. Given that the type might need to be parameterised it is best to have it explicit and not deduced from the var. Let the var deduce from the allocated value. So we want some syntax that includes allocator and type and says "allocate it". new(type, heap) is obvious, but we cannot pass types to any other functions. heap(type) has same problem. typename.alloc(heap) is too noisy We could include the heap as a parameter? typename when there are no parameters But.... in many cases an allocation goes straight into a struct field, so the type is known and repeating it is boring and while it can be type checked and so not error prone, it shouldn't be necessary. But we still need parameters...... Global variables. Opinions vary. They are like gotos : dangerous but useful. Can we constrain them? 1/ total-global is a bad idea. module-global at most. This avoids namespace pollution. Still accessible from other modules as "that_module.globalvar". 2/ makes referential transparency hard: so include 'read/writes mod-static' and 'read/writes foreign static' properties of a function 3/ local variables must never 'hide' global variables ... or anything in a greater scope. 4/ it should be easy to pass lots of function parameters. @struct means "take all the names in this struct and if any are names of function arguments, use them as such. Globals are bad because very few things are 'once only' Even a random number generator context could be per-thread. If two functions don't touch globals, and operate on unconnected data structures, or are read-only on shared one, then they can run in parallel. Detecting this would be good. Warning when it cannot be done would be nice. Programmer would need to suggest it. A global that is set once in an early 'stage' of the program and never changed again does not break referential transparency for most of the life of the program. Analysing these lifetime issues might be difficult. Static analysis of all known types might be part of the problem. Loaded code (.so) could not be known about at compile time so types would need to be explicit, but they must always be in a stage *after* they are loaded. Key lessons here: - function need 'access globals' in their type - program stages might help isolate these. So global gets type 'read-only is stage X', and function gets 'called in stage X'. But what is 'X'. Stuff: If a name is never changed but not declared constant,is that an error? if a value is set but never used, is that an error? What about: a = 1; a = 2; ?? probably. Closures: Closures are a cool idea - a little inline function with access to the current stack frame - a bit like a method for the current frame rather than for a separate object. It can be used - to create arbitrary code structures like maps and threads bodys and innovative loops - to pass compare functions to sorts code etc - I wonder what else.. - by holding state they can work as generators - this requires the closure to outlive the frame! - closures are good for 'after' functions. We need to know what happens to names in the closure. Are the values copied, or are references taken, or what? The answers are probably obvious in various contexts, but need to be clearly enumerated. A possible trouble with closures is that they can play tricks with the lifetime of a variable. e.g. if we create a closure inside a loop, using a variable that was declared in the loop, then does each closure get the *same* variable, or do they each get separate variables -- effectively duplicating the variable. I think that neither is really appropriate as either could be confusing. If however the name binding was a constant, then copying the value into the closure may well be appropriate. In general I don't think that closing over a variable should extend its life time. Go does this and allows the address of a local variable to be passed out - the variable is automatically made into a heap variable. I think it best to require an explicit heap allocation if you want to return a pointer. It is less likely to be confusing. A pointer obtained with '&' should only ever be a 'borrowed' pointer. There must be a strong reference which protects it. Syntax for closures? Go uses func (args) result { body } Rust uses |arg list| { body } Python uses lambda args : body For small closures, listing the args is boring. Using "$1 $2" is simple and brief. Still need a syntax to say "this is a closure". { use $1 < $2 } where expression is expected could do it. Then { a,b: use a < b } could be a more general syntax. Not sure it can be differentiated on an LR basis though... That is a similar syntax to list comprehensions [ a : for a in range(50) if a % 2 ] No: list comprehensions are in [] - the array syntax. But {} is like a structure... or do we use tuples for structures? I think I want in-line closures to be expression only. I think burying statements inside expressions is wrong. This argues for "x if b else c". If you want a more complex closure, you need a local function declaration. This means that you cannot pass a large closure directly to a function like Rust does. Given issues with 'break' and 'continue', this is probably a good thing. But are there times when it would be good? If there were it should probably be handled by something like 'do' which has a closure as a distinct syntactic object. Does it need args? All local variables are available, but they aren't available to the function which is passed the closure. Passing local vars in would be clumsy - really do need lambda concept. Type analysis. I've never done anything like type analysis, so my first thought is that it sounds easy, but is probably not else more would be done. The first and probably most difficult task is to create a good type model. This will allow us to annotate every node in the code graph with appropriate type information, and the make correct deductions. The deduction we want to be able to make are: - is this correct? or at least consistent - what are the unspecified or underspecified types. Scalars, strings, Arrays, structs, unions, functions, closures are different arrangements of data. Scalars, strings, functions, closures are immutable, as can be the others. Immutable values can be copied instead of referenced. Values can be: mutable or immutable - can I change them volatile or non-volatile - others can change them once, borrowed(wrt some once) compile-time, or runtime. Every variable has a definite type, as does any function result. The args to a function determine the types of the results, though those types may be a complex function of the arg types. e.g. intersection, union. Constants have a list of possible types. We choose the first type for which there is no type error. If there are multiple constants, they are tested in depth-first, left to right order. Literate programming There should be more of this. Language should make it easy by allowing order of code to match order of presentation. This means we should be able to add to any namespace at a later date by reintroducing it with some sort of additive syntax: type foo += { .... stuff } Maybe could allow previous stuff to be repeated, which could be filled in by tool and formatted differently by presentation. In literate mode, everything is a comment except between [[[ and ]]] each all alone on a line. It would be nice to allow the description to be processed as markdown. For this, the compiler needs to extract precisely the code blocks (not the span-level code though). I think that and line indented 4 or more spaces following a block line starts a code block which ends when a non-empty has fewer than 4 leading blanks. First line in code block cannot start with list char: - * + 1 > Github markdown also allows ``` to start a code block, and ``` to end it. ```plato would turn on 'plato' syntax highlighting. stackoverflow would use What about: reStructuredText Asciidoc http://www.methods.co.nz/asciidoc/ - nah, ugly pandoc http://johnmacfarlane.net/pandoc/README.html#pandocs-markdown reStructuredText is similar to markdown. It uses "::" at the end of a line to suggest following lines until undent are code. pandoc is like markdown. One thing it adds is ~~~~~~ { .plato } Here is some code ~~~~~~~ as well as ``` like github When writing a program in literate style, is there a need to write code out of order? I suspect that there is, as we might want to add a second task for a loop to do, or refine a test. Actually - that is a trick one. Can we *change* previous code? We could deliberately leave ellipses everywhere with names? The block would have a name as well as the ellipsis. This really needs to be tried. Extension If I want this to be very general programming language, it should be possible to write in various different styles. This included: - easy access to regexps like perl - dynamic typed programming - lisp-like programming? A string could have an 'r' suffix to make it behave as a regexp: if "foo.*bar"r < mystring if (a,b) = "foo\([bar]*\)x\(.\)"r < mystring Dynamic typing needs a universal type which must be extensible. This is like an extensible union which I already considered with universal enums. But that doesn't really work... Need to think more. Lisp? That is really a totally different engine with a totally different syntax. Hard to see how it could possibly fit. Any 'eval' based operation is really impossible to fit within the syntax. List processing more generally? If we have arrays that can be chopped up and joined, we start to have something a bit general. Add list comprehension and map/reduce functions it might get interesting. But then memory management gets interesting too. Could 'tuples' be claimed by a type like literals can? Random stuff - don't know where this fits. If a variable is know to be of some interface type, and we assign a value of a more refined type, then the variable can be known to have that more refined type until the value could change. This might be useful for struct fields if we have exclusive access, or named function return values. Strings Strings need to be a built-in type. But how should they look? NUL termination has problems - counting. Storing a count in front can be wasteful if we want 64bit counts and short strings. Could take the utf8 approach: 0-127 : 1 byte 7bit string length. (128 bytes) 128-191: 2 byte 14bit string length (16 Kilobytes) 192-255: 8 byte 62bit string length (4 exabytes) We always nul terminate so strings can be passed to 'C'. Numbers: I really don't like the fact that ints can overflow without me knowing. I think any possible overflow must be checked, unless the var is explicitly declared as cyclic (modulus). This means the language must know the range of values, or include lots of tests. runtime assistance. - profiling. collecting statistics of where time is spent - coverage. Count number of types each block is entered and allow this to be accumulated over multiple runs. Then format code with highlighting of un-executed and hot-spot code. For interpreter, this can all be internal. For compiler, we need an external data file. Probably a log that we append on completion and process later, using uuid hashes of functions. Unit-tests, with mocking. This needs lots of experimentation. Literate help. Sometimes it might be nice to assert that the order of things mustn't matter. When building a function up during literate programing you might want to say the certain things happen, but you want to be sure their order isn't important. i.e. they all appear to happen at once. [Other times there are dependencies that must be honoured. For these, "after foo" or "before foo" might be useful ] In particularly, when updating a data structure a number of steps might be needed. Each should be able to access the 'before' field and assign the 'after' fields. Maybe an explicit syntax for 'before' or 'orig', and an error if there is room for uncertainty. We probably want a name for a block of code in which a value is varying. Invarient apply at start and end but not during. Values assigned are not visible until the end.