Supposing I were to write a series documenting my language experiments. What wouldI include: 1/ Design rules / guidelines - serve programmer, not compiler. Obviously the compiler servers the programmer, and many things make life easier for both. - Allow programmer to tell the compiler what they are thinking. - Builtin types should not be too special. Whatever is avail to them should be available to others. So operators and literals should be available to all types. - Similar things should look similar. This encourages a uniform syntax which is easier to remember. It can also encourage mnemonics. - Different things should look different. Just because two distinct concepts can be implemented with a single syntax doesn't mean they should be. Code is read more than it is written, so it should be clear from the syntax what the intent is. - Build on previous languages. Programmers already know several languages. They shouldn't have to re-learn everything. Only do an old feature in a new way if it add substantial value. - Allow a range of verbosity - let the programmer choose. 1a/ Ocean - better than C 2/ literate programming and md2c 2a/ lexical structure - simple - we have lots of history - use it. - general - use same scanner for multiple purposes - newlines and indents - comments and white space - newlines and indents - identifiers: ID_START ID_CONT plus 2 lists - specials - reserved words - must be a subset of identifiers - strings ' " ` ''' " - numbers 0 0x 0b e p 3/ LR grammar basics with a calculator and error recovery 4/ Adding indent/newline handling to LR Grammar 5/ Statements Structured control flow. : INDENT UNDENT -or- { } if EXPRESSION BLOCK or if BLOCK then BLOCK ditto for 'while'. Means 'do{}while' not needed; Do I want "while: then:" or "while: do:" ?? ditto for 'switch' but this has a twist. The case labels can be a new enum. Also labels can appear multiple times! including default??? Also for/then/else and while/do/then/else 'next variable' instead of 'continue' or 'loop' 'last variable' instead of 'break' 'return value' and 'use value' 'for ???' The point of 'for' is to allow a co-routine. So we want really simple heads just like go for initialise; next while condition or for varlist, iterator := value.each() then iterator.next() calls iterator.next() funcall assignment/binding defer? fallthrough? exceptions? for while/if/switch then do case else 'for' requires 'while' 'switch' excludes 'then' [for] while [then] do [case] [else] if then [else] switch {case} [else] 'then' is implied after 'if expression' 'do' is implied after 'while expression' But how do I get 'then' after 'while expression:' ?? for a = 0; while a < 10; then a+= 1: print a I could have 'for then while' ... for SimpleStatements then SimpleStatements while Expression: statements else: something We don't need 'break' as we can 'use false' in the condition block. we probably don't need 'continue' as we can 'use true' and have the 'but always execute this bit' in the statement block. 5.0/ variables and scope is hole-in-scope allowed? No Can we import a scope - from a module? - yes with version number or name list. No over-riding allowed. - from a struct? - nice, but not nice enough. might be useful in 'switch' Might be OK with explicit list. - does a syntax like $foo help? Why not just use binding. We introduce a variable with(?) name : type = value but could that be ambiguous with ':' being used to start a block? Probably not. Multiple variables could then be: name:type, name:type, name:type = val, val, val but that looks rather silly. name, name, name: type = val, val, val Then we cannot assign a new and an old in the one asignment, to a tuple (from a function). Could use "var!" to assert a new variable. a!, b = fred() a bit ugly? An issue is that we introduce new state incrementally through a block of code, but we don't really want to keep indenting. Yet it would be nice to mark where the variable's scope begins and ends. a! could introduce with assignment and close with usage (?) a! = function() if a >= 0: x += a?; if a = function() use a >= 0 then: x += a; I guess passing 'a!' with pass-by-reference, or generaly taking a reference would be bad. How much do we really need to declare variables? In most cases types can be computed. In those cases we just need to guard against typos. Also make it clear to human reader what intention is. For the former, ensuring each name is both assigned and use is probably enough. For the latter, we want to differentiate between "assign" and "re-assign" Maybe '=' normally declares a new var, and '!=' over-rides?. Or :=. i.e. ?= for any '?' changes an existing value. Just '=' defines a new name. But what about tuple assignment? only newnames are allowed. += doesn't work, nor does := Swap assignment? Move assignment. They require lvalue on both sides. swap a, b (b becomes a) move a, b (b becomes nil) move shouldn't be needed for locals as data flow will figure it out. It is needed for members of structures. Can I use a symbol? <=?? <<= ?= =< == Or do I mark the lvalue '@a' means "return the value of 'a' and set 'a' to NULL" b := @a.sort() I like '=' for assigning an immutable binding, and 'x=' for some 'x' for mutable bindings. So '+=' adds and '*=' multiplies. ':=' replaces. But what introduces a new name? var x is a little bit noisy. x .= value means "has value now, but might change" The visual connection between "." and ":" might help. Maybe x := value does two things (two dots): declares the name and assigns the value. x .= value just does one thing - it assigns a value. To introduce a variable without giving it a value we assign '_' x := _ if foo: x .= thing else: x .= other print x Type can be given in <> x := 27 Though that might not be good as is otherwise unbound. We could go with x:int := 27 Though there are more colons than I would like. Probably <> is OK. I like the idea of binding a name to a field rather than to a value. This is like by-reference function parameters. It is different from a pointer because it is really just a syntactic shorthand. i.e. the binding is constant. I'm not sure if this is really useful though. I also like the idea of the Pascal "with". I have occasionally missed it. It exposed fields from a structure into the namespace. Unfortunate it isn't obvious how to expose two structures, particularly of the same type. I guess x = &thing x.field y.field is good enough. Do I really need multiple assignment? It is useful for 'swap', but I think I prefer an explicit 'swap'. It allows unbinding of tuples, but is not a = tupple a.1 a.2 a.3 just as good? I guess names are better, but if names are important maybe they should be declared in the tuple. One benefit of multiple assignment is that a "simple statement" can declare multiple variables, useful in a "for" clause. That could just as well be handled with a 'simple block' which is 'simple_block ; simple_statement' Q: Do I really need "x:=_" in the above? As the "print x" usage is not an initial usage, there must be a prior assignment - or two. So I could make it work, but do I want to? It would mean that when reading code I cannot easily tell the lifetime of a name. Maybe I use a 'var' statement to declare names var a, b What if I allowed a suffix statement which maintained the scope of the cond statement, but could affect the more general scope. if foo: x := thing else: x := other finally: b := x print b I could declare that a name is bound throughout the whole block in which is appears and if it appears in multiple blocks, one of those must contain the others. On every path that leads to any usage, the name must be initially bound. So if it is defined in one branch of an 'if' but not the other, then it must be local to that if. If it is bound with a do loop, it must be local. If it is bound in all case and the 'else', then it could be more global. multiple assignment is useful to collect procedure return a, x+, b: = myfunction() myfunction(): a=$1; x+=$2; b:= $next; i.e. after evaluating a function, all the return values are available as $N or $name until the next procedure call. In that case $$ could be an error myfunction() switch $$: case filenotfound: whatever. What about functions called inside expressions? The value cannot be used as there isn't one. So all code must be dead until $$ is tested. A function could identify return values as $xx names. A type might be 'error' which has a special behaviour. I wonder about some_expression ? $$+1 : other no point (some_expression ?: other-1) + 1 5.0a - decision time. A new local name (variable) can be introduced with: name ::= value // binding is constant name::type = name := value // initial assignment name:type = name = value // name must already be defined and is being replaced. This binding extends at least until the end of the enclosing statement. If the statement loops, the binding ends with the loop If the statement is one of alternates, the binding continue only if all branches introduced it, or only into parallel branches in later conditionals. After the minimal extent, a new binding will over-ride. The name must have been used before it is over-ridden. Bindings can be changed with name op= value acts as name = name op value Multiple assignments are not supported. Use a=1; b:=2; c = 3; if you want. To swap two bindings we have swap lvalue, lvalue More that 2 can be given and they are rotated with first value landing in final lvalue Assigning to record fields and array elements normally uses =, though := can be used for record fields when initialising a record before first use. Assigning to a reference normally makes the reference refer to something else. If the reference is to a struct or array, then foo.field or foo[index] can be used to assign to a member. So assign to the whole thing being referenced, use foo. = value which is easily confused with foo .= value // damn. For each name we need to identify places in the code where is initialized and change, and then each usage needs to link back to one of those. If a variable is changed conditionally: if x: a = 1 subsequent usage of 'a' link back to the end of that whole statement. We can compare two expressions using the target of these links when comparing bindings. If they match, then the values from the expressions are the same. This can be used to determine where a name is valid. Q: what about concurrent memory models. Need to study this. X/ types-1 boolean. Needed for 'if' and comparisons etc. Maybe places that might expect boolean actually expect object with 'test' method 'order': Less, Eql, Greater a ?= b or a name:type, name:type) interfaces { name:functiontype, name:functiontype,...} Each function has an implied(?) first argument "self:self". The type 'self' can be used in other args and return values. X/ types 3 algebraic types <: :> &(intersection) |(union) parametric types - type or constant parameter value-dependent types. Value can be quite distant linear types: number of references is part of type and can depend on value temporal types: linear progression depends of value. "clock" concept needed. parallel types(?) can be accessed in multiple threads. Maybe atomic types(?). dependent types that depend on an atomic can also be parallel e.g. they becomes writable when an atomic has some value. a refcnt atomic could interface with 'linear'... A borrowed reference might need to indicate where it is borrowed from? There must be some strong reference which we "know" won't be dropped. e.g. it could be read-only for the lifetime of the borrow. 5a/ functions, procedures and tuples. A function can return 1 value. A procedure can return 0 or more - a tuple to be precise. So where can a procedure be used? - direct assignment no, all return values are available in $N until next procedure call. or maybe proc(a,b,c) -> x,y,z or proc(a,b,c, out:x,y,z) Q/ dynamic dispatch and polymorphism. Dispatching a method call against a reference of incomplete type can be handled is various ways. We need to understand these to understand consequences of choices about how methods are attaches to types. 1/ If only one, or may be 2, methods are needed for the apparent type, then they can be passed around with the pointer. This is exactly what qsort() allows. 'comparable' has a single method 2/ If only one interface is needed, a pointer to that interface's implementation This requires interfaces to be separate well defined things, which isn't the case for 'go' I think 3/ The object can contain a pointer to one of the above or to a function which finds and returns a given interface or method. This requires each interface or method to have a globally unique name, which isn't too hard to manage using the relocating linker I think that every object which has interfaces needs to have that lookup function. It may be in the object, or may need to be part of any reference. Arrays etc can be parameterised by a concrete type so they can hold one function for many references to small objects. A module can define an interface to some other type, or to an interface. So a module might define a "sort" interface to an "array of comparable" interface. If a module imports several modules which all add different interfaces to an external interface, then the importing module must define a lookup function which finds all the different methods by 'name'. Any data type can have a collection of methods. Some of these might belong to an interface. The other can only be used when you have a reference of the type, not of an interface. A data type can declare that it contains a dispatch function, or the compiler will use fat pointers. A data type might instead contain an enum - which might be much smaller. This assumes that all subtypes are in the one module and the compiler can create switch statements to handle all interface methods. X/ Error returns from functions. Exceptions? -errno works surprisingly well with ERR_PTR(). But NaN works even better. Sometime we might want to handle errors in normal flow. Sometime we might want them to be exceptional. go distinguishes by allowing "foo, err := function" to catch the error. I could allow 'foo' to 'hold' the error and so foo := function except foo == errortype: do stuff Without an 'except', the block is aborted. How that that work for procedures? They explicitly return err if needed? Or any return can be conditional 6/ Declarations 7/ Expressions + - * / % & | ^ && || ! &^ &~ >> << < > <= >= != += -= *= /= %= &= |= ^= (one token or 2?) and or not cand cor "and then" "or else" else then ?: if/else max min (link 'and' 'or', or max() min() lambda somehow? fn( x.y x() x[] x{} x<> .1 .2 .3 for tuple access. * as a prefix operator dereferences < as a prefix operator dereferences and sets to nil. precedence - how many levels with tuple: $1 $2 $3 ... Integer division? - overload '/' - 'div' - '\' - '//' - no, that's a comment - /_ ( divide to the floor) - /- (divide and discard remainder). then -/ could be 'keep remainder'. but /- could be 'divide by a negated value' 4/-5 String concatention? - overload '+' - use '++' as general 'join' operator 6/ Pointers are special The type can carry refcount info and locking info 7/ assignment and binding patterns. i.e. destructuring. If a structure is a namespace, then "with" might populate the active namespace... That could trigger hole-in-scope though, which is bad. Patterns are mostly used in switch/case. switch value: case pattern and the point of 'pattern' is that it might not match. e.g. it might assume some element is not NULL, and has a particular structure. e.g. if this is a cons-cell, do that, if it is nil, do the other. Ahhh, no. This is used for tagged structures. The case determines that a given tag is active, and makes the relevant fields easily available. Is that syntactic sugar needed? I don't think so. A switch might be useful, not it doesn't need syntax. May the name '_' could be bound to the recent 'use'd value so switch some_funct(...): case _.tag = foo : ?? that's a bit ugly. tagswitch some_funct(..): case foo: print _.foo_name that is probably cleaner. So 'tagswitch' is a shorthand for switch: _ = X; use _.tag: 8/ Operator choices - is this part of Expressions? a else b // same as a ?: b a if b else c // b ? a : c 9/ Type syntax I'd like to use ':', but not sure if it is being over-used for blocks. However: field : type declares a field to have the type, so fred:int = 27 creates and integer call fred with value 27. A type can be: A name "int" A struct "{field: type, field:type, ..}" An array "[ length : type ]" A procedure "(arg:type, arg:type -> result:type, result:type)" A function "(arg:type, arg:type): result_type A tagged union in a struct "{ struct fields, tag -> name:type, name:type; tag2->name:type ...}" A borrowed pointer "*type". An owner pointer "~type" A counted pointer "+type" A collected pointer "?type". xx/ output formatting 10/ modules, packages, exports/imports and linkage. .... 11/ standard library strings. hash, file, regexp, trees, channels/pipes 12/ reflection - useful for tracing written in-language unit tests and mock objects?