*Intro Over a decade later I am thinking about language design again. It is probably the release of GCC 4.6.0 mention on lwn.net and the ensuing discussion which touched on 'go' from google. Particularly see http://www.cowlark.com/2009-11-15-go/ So many thoughts swimming around, it is hard to know where to start... *Type system parameterised interfaces for subtyping, but no inheritance numbers type coersion is bad - it can hide errors too easily. But explicit casting can get very ugly. So: A value - literal or named, or an operation that must produce a precisely correct results - can be coersed into a value of greater range or precision automatically. A value - the result of an operation that might overflow - cannot be coersed at all. A cast must be used to restrict range or precision Thus "a = b + c" cannot lose anything. If b and c are the same size but differ in sign, neither can be coersed into the other. The could both be coersed into a larger signed value, but the 'a' would have to encompass that value or an error would result. So if b is s32 and c is u32, the addition would be s64 so a could be 's64' or bigger. If b and c have the same type, overflow can still occur but is expected. In that case it can only be assigned to a variable of the same size. So if b and c are s32, 'a' must also be s32. If 'a' is s64, then one of b and c must first be cast to 's64' before the addition. tags/modifiers/whatever: - 'nullable' 'nonullable' on pointers default is later - 'once' for reference counting - function args can indicate if function absorbs a reference an return values might provide a reference? - 'signed' ?? Can I assign 'signed' to 'unsigned' with no check? immutable and sharable types... Numbers and strings and bool are immutable. User defined are normally sharable, but can be immutable or 'once'. .. something to allow an object to be embedded in another... 'types' describe particular implementations. Each object is of a particular type. This included int, bool, etc. An object is created using the 'type' function against a set of values which might be manifest constants or might be other objects. Outside the module which defines a type, the internals of the object are not visible except through the interfaces.. 'interfaces' describe the behaviour of an object. It identifies attributes and functions which the object provides. Attributes maybe be read-only or re-write but whether they are fields or managed attributes is not externally visible .... except that would make implementation inefficient for the common 'field' case. Probably field attributes need to be directly readable, but writing should always be handled (though a NULL pointer might imply direct update). Of course if we did incremental compilation we could detect direct-access fields and optimise them(??). An interface is a function name with input and output types. Then these can be gathered in structs - which unroll of course. So an interface can be defined as a list of functions and interfaces. The items listed in the input and output structs are names paired with either an interface or a type. The function can also be an operator, or get_indexed, put_indexed - which handle name[index] references. interface matching is strict inheritance matching. i.e. if two interface both define 'foo' they don't match. Rather they must both inherit the *same* foo from somewhere. An object implements multiple interfaces, possibly with code in several files... virtual functions.. or templates or something... can be written for an interface. The function can then be called on any object that implements the interface. The function uses other functions of the interface to implement the result. So an "array of comparable" might implement a qsort function What about a mergesort function on a list? What is the list? Yes.. what is a list? How can we implement linux/list.h lists in a fully type-safe way? Possibly we cannot without tagging the list_head with a type indicator and requiring a check, which would be boring and pointless. I think that the only times we do list_entry, we know where the head of the list is, and it isn't here. So the type of a list_head must include the identity of the head of the list. next is a pointer to a listhead of this type embedded in a foo, or at X When insert-after or delete we don't need to care about X. In fact we only care when walking the list and calling list_entry. We could tag a 'head' with a low bit in one of the addresses. Ugly but effective. So everything on the list is either embedded in a foo, or is a load-list-head which is tagged. If we allow "if X != Y" to change the type of X to exclude Y, as we might with "if X != NULL" we might be able to do something... But I suspect not. The assertions we are making are simply to rich to make in a comprehensible type system. So we probably want the 'list' module to be able to make 'unsafe' casts and assert that they are safe. What about parameterised types etc. A 'max' or 'min' function can work on any two items that are mutually ordered, but we don't care what particular type it is. So the type is an unknown: max : (a,b:X < TotalOrdered) -> X When we call 'max' we don't tell it what type X is, it has to deduce from the passed args. So that is an implicit parameter. But we could say intmax = max(X=int) to create an explicitly typed 'intmax' function(??) Containers are the classic case for parameterised types: stack(X) : push(entry:X); pop() -> X empty -> bool this is an interface which could be provided by a type with an embedded linkage field, or by an expandable array --------------------- So, some concretish thoughts. We have two distinct things - interfaces and types. An interface defines a set of function names and their types. An interface can include another interface and may broaden the types args or narrow the types of return values. Broadening may remove args. Narrowing may add extra return values A type may declare that it conforms to one or more interfaces, and by implication any included interface. There must be a strict connection. Just defining functions with the same names it not sufficient - the interface must be declared and the compiler will check it. If a type conforms to two interfaces which both have functions with the same name, the type should provide two functions and will need to indicate which serves which interface. A type includes a data structure and a set of functions and provides a namespace to enclose them. The structure may include objects of other types either by reference or more directly by 'embedding', sometimes called inheritance. The function of those types are available and may be used to implement a shared interface, either explicitly by declaring an interface function to implement by an imported function, or implicitly by marking a member object as a 'parent' - though a better name is needed. When the member's function is called a reference to the contained object is passed together with a set of function pointers that call into the relevant interface functions for main type. A 'set of functions' is a data structure which starts with a function pointer - the dispatching function. Args are set up with the identity of the called function first, the target object next, then any args. In the common case remainder of the data structure is a sorted list of method identities and function pointers. The dispatching function does a binary search to find the target, possibly subtracts an offset from the target object, and jumps into the function. But it would be best if not all functions had to go through this lookup. Of course the caller could cache the found function and not look it up again if it is called several times. But that is only part of the solution. The standard approach is to declare functions as 'final' if we know they will not be over-ridden. In our case that means that calls in the parent which expect the parent will never call the child's function. Possibly this should be the default to encourage efficiency. If a function is marked 'virtual', then any call to it must go through a lookup table. If it isn't, then it doesn't have to. This marking is in the .... interface? When an external caller knows the type of an object, normal function calls are direct, virtual function calls are dispatched. When a caller only knows an interface of an object, all calls are dispatched. So it is the type which determines whether functions are virtual or not. Do we ever want to be able to make function calls based on an interface? i.e. could some feature of an interface be final. This could make sense for a template like function. e.g. qsort. An interface 'sortable' requires an array with comparison and defines qsort. But why do that? Why not just define qsort as a function. I guess that is the thing. A final interface method is really just a function, possibly in some namespace. So an interface defines methods or 'static' functions. A type defines functions which can be 'virtual'ised. An interface can be parameterised so that some types in args or return are taken from the parameters. interface comparable(base) { int lesseq(this:base, that:base); } So numbers conform to comparable(number) and strings to comparable(string) int conforms to comparable(number) interface arrayof(base) { base get_elem(this, index:int) void set_elem(this, index:int, elem:base) } A function can be written to a subtype of an interface. function base max(a,b:base) where base:comparable with X:object, base:comparable(X) function base median(a: array of base) I want to think about closures - a function with some 'local' variables already set... a lot like an object, but with one method. This is created if you can take the address of a chunk of code in a function - it holds on to the context as you export it. I would either need to copy the stack frame, or have stack frames on the heap which doesn't sound efficient. copying the frame would be difficult if you had addresses of things, but maybe you wouldn't. Or maybe you only allocate a frame on the heap if it can possibly be part of a closure. *statements, expressions, operators Pascal made a strong distinction between statements and expressions. This seemed elegant but turned out to be somewhat clumsy. It created a strong distinction between procedures and function which seem unnecessary. It is sometime nice to have expressions with side-effects such as 'variable++' but this can cause problems with macro and order-of-execution. So don't have macros! and don't allow side-effects to affect other values in the same expression.... Some languages have a 'where' modifier to an expression or statement which can contain arbitrary code to set up the context for the expression. Typically is defines a bunch of names that are used in the expression. It doesn't seem to have caught on and I wonder why... while not finished where { foo; bar; baz; finished := whatever } do { stuff } I sort-of like 'after' instead of 'where' as it doesn't imply that the body has to explicitly affect the function. C allows {( for; bar; baz; not whatever )} to achieve a similar idea, but placing the 'whatever' at the end hides it a bit. With the expression at the start it is more clear where we are going. A for could then be: initialise; while condition do increment after { body} which is kind-of interesting, though having the initialise out the front is a little bit horrible. Pascal's 'with' (which never quite was useful) could allow: with initialisation while condition do increment after {body} if condition after {context} do body; else other I wonder what case/switch should look like; I probably want implied 'break'. In fact the 'go' switch is quite nice. switch expression { case expression: statements ; case expression: statements; } Part of me wants those statements inside { } so they are clearly bracketed - but that might be ugly: switch expression { case expr { code code } case expr { code code } } The word 'case' becomes pointless... I think I want 'continue' rather than 'fall-though' meaning 'find the next case that matches. I'm still not happy with the switch syntax. Operators... I'm not sure what to say about those. Having unbounded overloading seems like a bad idea, having none is too restrictive. I think there should be a defined set of operators with arity and precedence fixed. Possibly precedence should be undefined when different definitions of operators are used to that parentheses are needed. i.e. a + b * c binds OK when they are all in the same type family, but is a different type to b and c, then it must be a + (b * c) I'm not really sure what that means - need to know more about how operators are overloaded. Every operator which is infix could also be prefix. Post fix.. less clear: arg op1 op2 arg Either op1 is postfix and op2 is infix, or op1 is infix and op2 is prefix. Best to exclude postfix ops from being infix. Hmm.. there is more i want to think about here. I would be cool if function calls could use a richer syntax to separate args than just comma. e.g. append $value to $list distance from $x1,$y1 to $x2,$y2 distance from $x1,$y1 to $x2,$y2 via $x3,$y3 Cobol lives again? There are lots of questions here... As the function words are not predefined we need them to either be kept distinct from local names, or be syntacticly distinguishable. $name works in the above, but not in real life. We probably want the function names to belong to a type, but the type isn't apparent until later if at all. In the 'append' case the key type is at the very end. In the distance case there is no key type. We could require distance from point(x1,y1) to point(x2,y2) though a name of type 'point' could also be used If I defined a variable 'distance' with a postfix operator 'from' it could get awfully confusing. Disallowing variables named for function starters might be too restrictive as you probably cannot control which function names get defined.. The alternate is for variables to over-ride functions... which is probably fine. So each word introduces a temporary name space which over-rides what is there. 'distance' introduces 'from', 'to', 'via' But a variable called 'distance' hides all of that. If the name is followed immediately (no space) by a parenthesis, then the new namespace only exists inside the parenthetic space. That makes "point(x,y), z" different from "point (x,y), z". That would be bad. We could go lisp-like: (point x,y) (distance from here to there via elsewhere) cf. distance(here, there, elsewhere) Not sure - the traditional version allows the functional bits to stand out, but doesn't make their relationship clear. number divided by number ??? number div number *Basic syntax Lots of little questions here... 1/ are parentheses just to over-rule precedence? or do they have a real meaning? i.e. what is the function call syntax name list print a,b,c -- that works me.append a -- looks a bit odd. What is a list? a,b,c -- clearly a list a -- not so sure :a:b:c -- different syntax,so lists can be nested :a:b,c:d -- first and third are not lists. :a -- list of one element :(:a):b -- a list of a list of one element, and 'b' nil -- list with zero elements - empty list a,b,c, -- trailing , is allowed and sometimes assumed. So a function takes a single argument, which might be a list... But how then do you call a function which takes no args? me.tostring -- Is that a function or a function call? me.tostring nil -- ugly me.tostring() -- but now parentheses mean something. Maybe getting the function (or closure) requires different syntax? &me.tostring lambda me.tostring 2/ when are {} required to group commands? Do we separate commands? I don't like too many {} - so only require them to disambiguate. Don't separate anything. Separators are ugly. We terminate this with ',' or ';' when syntactially necessary *strings String constants and char constants are not syntactically different. The type of a constant is determined from context. They can be enclosed in "" or ''. In either case the terminal char can never appear in the string, not even quoted. This makes parsing simpler. \escapes are recognised in "" but not in ''. \q is a quote. Adjacent string constants are catenated so: "'" '"' is a string of the two different quote characters. A multi-line string is introduced by """. This must be the first and last thing on the line after initial blanks. Every subsequent line until another identical line must start with the same sequence of initial blanks. All the lines between are joined with the blank sequence removed. The big question is: how are internal strings stored. UTF-8, UTF-16, and UTF-32 are obvious candidates. UTF-8 is best for European languages UTF-16 is better for many Asian languages UTF-32 is fairly pointless for general storage. So we probably have two internal formats with auto-conversion like for numbers. A pragma determines how constants are stored?? Or better, we allow a string to be either UTF-8 or UTF-16 and self-describing, like any good object. Each byte array is prefixed with a count that contains a bit flag which distinguished between UTF-8 on a byte count, or UTF-16 and a word count. The bit is the lsb and is zeroed before using the count - so the count must be even. If there are an odd number of bytes/words it is simply padded.... Or maybe it is set to 1 so that with the count it is closer to a round number of bytes total. *Operator Overloading. Operators in general... It would be nice if modules could define operators instead of just functions. Then 'int' could be just a module. But operators affect parsing at a fairly deep level, both by precedence and fix-order. If two modules both import the same operator with different precedence, that would be bad. But ditto with global names. Anyway - can an operator be infix and prefix and postfix? C only has ++ and --. .name ->name [] () are also sort-of For prefix: & * + - ~ ! ++ -- cast If we expect a term and see an op, it must be prefix. If we expect and 'end' and see an op, it could be infix or postfix. If after an operator we see another operator, the could be infix prefix or postfix infix So op cannot be both infix and postfix. Operators like 'and' and 'or' and 'not' and 'or else' and 'and then' really need to be predefined names or parsing gets too complex. Probably assert that all 'other' operators bind more tightly than pre-defined ops, are left->right, equal in precedence and are infix or prefix but never postfix... or at least when postfix they cannot be followed by something - just a close. But do we really need new operators? Probably yes. Not many, but sometimes and operator is just soooo much neater. Look at &^ in 'go' - a really good idea I think. <: :> ... I wonder... Overloading The issue here is who to determine which instance rules. It is simplest if the type of the first operand must accept the operator. With int/float, the type of 'int' is actually 'number' which defines division. int/array would not work as 'number' does know about array. *Declarations and assignments Lots of interpreted languages don't require, or even allow, variables to be declared before use. This makes writing quick code easy (is that good?) but also makes typos easy. Checking for unused values and uninitialised name references can help catch a lot of the error but I'm not sure it is generally a good thing. While some C variants allow declarations to be mixed with code but that seems to be unpopular and I cannot say that I like it much. Still - having the declaration close to use first use can be nice. The 'with' statement I suggested earlier could make this work nicely. with name = value do stuff could serve as a declaration of 'name' with exactly the type of 'value', though with int name = value do stuff is not much harder so should be preferred. I wonder if :name = value would be a good syntax to declare 'name'. int:name = value would disambiguate if needed The same could be used in type matching for function. max(:X:a,b) -> X means X is a type variable determined from the args, and becomes the type of the result. I wonder if it would be nice to allow :(a,b,c) = 1,2,3 to initialise all 3. It would be the same as :a,:b,:c = 1,2,3 Or do we want int:a = 1 int:b = 2 Do we want multiple names per type? How? int:(a,b) = 1,2 int:a,:b = 1,2 int:a,b = 1,2 *Type labels: How, syntactically, do we give a type to a variable. Pascal uses name, name : type which doesn't work very well with initialisation, though it could I guess. C uses type name, name or type modified-name such as *name or name[num] which means some types don't have a nice closed name. in Pascal "array [0..36] of int" becomes "int _[36]" ?? or use a typedef. I think types should have simple closed names so they can be passed around, particularly to parameterise interfaces. Also "int* a, b" looks like and b are both int pointers, but they aren't. "int *a=b; *a=b;" do two very different things. "array of" is very verbose. int[] array of ints int* reference to an int int& formal parameter which takes the address of the passed value?? - no, maybe not. [int,err] union of two possibilities? *Pointers: Pointers are good. we like pointers. Pointer arithmetic is bad. It allows very silly things. It also allows cool things (container_of), but the language should provide those. Pointers can be cast to integers - that can help with arbitrary ordering or equality checks (but what about memory compaction?). But integers can never be cast back to pointers. Array slices are a good idea. It requires passing a size around with a pointer - though that could be optimised out in some cases, and would be needed anyway in others. If strings are array slices then we don't need nul termination, but have the cost of a length... Probably not a big cost and gets right of some horrible cut/paste issues with null termination. We can pass a substring without copying or mutating. Strings are very special. They are not arrays of characters. They cannot be indexed but can be iterated. Each element is a unicode point, which might have an ASCII value.. Strings are not mutable. Byte arrays are normal arrays and are mutable. You might be able to convert a string to a byte array but in general you cannot. The language has values and references. 'references' are ways to refer to a value, often a name (i.e. a variable) or a field in an object, or a slot in an array. Values can be mutable or immutable. The reference to an immutable object might be a copy rather than a pointer. Values have a lifetime. When the lifetime ends the value is destroyed, which might a destructor function. Lifetime can be determined by: - reference counting - when it hits 0 the value is destroyed - single-reference. There is only one reference and when it is destroyed, the value is destroyed - garbage collection. References can be found in memory and where there are no more, the value is destroyed. A reference-counted value must container a counter. A collected value must contain a single bit usable in mark/sweep. *error returns and exception handlers Returning and handling errors is the bane of any programmers life. Exception handling is essential but feels heavy weight and clumsy. It doesn't have to. I like the "set -e" functionality of the Bourne shell: If anything returns an error status that isn't used, then execution aborts. As similar concept would be that any function can return a union of the normal return value, or an error. If the caller only accepts the normal return value, the error is raise up the stack. If the caller accepts the error: then it does whatever it likes. So: answer, err := funciton_call(args) if (err) { do whatever, answer is undefined. return err; } Obviously the function must be declared as possibly returning an error. We could make the error type a part of the language, or we could just generically allow unions to be returned. This would allow structured error objects that are specific to the context. Exploring 'go' suggests that there could be more than just error returns to consider. This is also timeouts. i.e. in some cases: answer := function will wait for an answer while answer, ok := function will return immediately, either with an answer or an error. Does that make sense? Should the calling context determine if a delay is appropriate or not? That really sounds like a different argument. This is very different to the 'err' return. The function always throws an error if that it what it has to do, the call sight either catches it or lets of flow on up the stack. An implementation might place a global setjmp which is like passing an arg down, but it is a thread-global arg. So 'catch error' is a stacked thread-global value. Could not 'timeout' be similar? Or probably 'nodelay' as timeouts would be implemented by killing of a thread that was no longer interesting... No - nodelay is too global. Some delays are small, some are large. We cannot put them all in the one basket. It makes more sense for the channel to be marked for delays or not, or rather a channel endpoint. Or a channel could normally block, but channel.ndelay could be a channel that threw an error instead. A problem with this approach is syntax. It requires a 'catch' to be written like: rv, err := { lots of function code here; } if (err) { handle error } Which is much worse than after_scope{handle error} Or it could be: with error = stuff if error do handle error *modularity name space control - who owns namespaces and how are they made available? Obviously any block - or at least any 'with' statement - introduces a namespace which over-rides existing names.... or should conflicts be error? That is safer. A structure variable holds a namespace with local interpretation. This is traditionally introduced by {value of the type}.{name in local namespace} It isn't just structures, any type, even the humble 'int' could have names attached. "$int.abs" might find the absolute value. But would be nice if 'abs' were more visible abs of $int (???) as that extends to e.g. pop value off stack, push value on stack Is that really better than stack.push(value); value = stack.pop; Maybe not, but as the args are more complex and optional, something like: table.find(value, hash, compare-fn, ....) loses something.. but then that is fairly contrived. In mdadm I have 'force', 'verbose' flags, then for e.g. create, level, chunk, size, raid_disks, spare_disks, etc.. Using lots of prepositions will quickly run out of steam here. Requiring tags for each fields would be boring as it would look like: Create(dev=devname, level=level, raid_disks=raid_disks, ....) I really should use some little structures to gather stuff. It would be nice if I didn't have to have a common type. if var = val would work when they are different types, and become: for each field in var: do var.field = val.field At least this could be allowed for arg passing to function. Maybe something like "&val" expands to A 'struct' is different from an object .. or a 'packed' layout for that matter. A struct is simply a collection of distinct values or variables, each with a name and/or a position. A struct does not comprise a type - different structs are simply different things. structs are compatible based on name or position matching. This is only relevant when a struct is assigned, either directly with '=' or when assigning actual parameter to formal parameters in a function call. In these cases every variable in the target must match a value from the source, either by name or position. The source may have extra values, but it may not be deficient. Multiple structs can be combined by listing them. A struct is really just a short-hand for a few names/values. A struct does not have an address, cannot be stored in an object, or passed around. A name can be defined as a struct in a local scope and the various fields can be assigned. The whole struct can be assigned to some other struct value in which case the source must have matching names or positions. But here is the problem: what if it has both matching names and positions? and what if they don't align? (x,y) = (y:1, x:3) I think that names must take priority. Maybe only use positions if one or the other doesn't have names. If both have names, then they must match. So: a,b = fncall() a,b doesn't have names, so it is purely positional. (result:a, err:b) = fncall() requires that fncall returns a result and an err. So structs are primarily used for function call args and results. Values can be just listed and are then positional. If a struct is in the list, it is interpolated, not a separate object. I guess positional parameters take precedence.. x, y, foo where foo is a struct, assigns first from x, second from y, and the rest from foo. If foo has names they are used, else positions. A type name is given a struct and creates the object. There might be several different structs than can be used in which case name matching is important. Imaginary(x:float,y:float) vs Imaginary(mag:float,direction:float) If no labels are given, the first with enough matches wins. Also a typed object can fill in a struct if it has all the right names. Some might be attributes and some might be computed attributes, but true functions are not allowed because there is nowhere for args to come from. So type name are in the global namespace (unless they are prefixed by a module name). So it must be OK for any name to be in the global namespace, it might be a type, it might just be a function. Later.... after thinking (and writing) about object us in the kernel one thing that really is missing from C is the variable namespace. C just has the fields in a struct but cannot have anything else. (The other big thing is a richer type system). So what if an 'object' (the basic abstraction) was about define a name space as much as a set of storage fields. And are these really two different things? or are they compatible? So an 'object' creates a name space. In it are names which can be: - constant: a number or a string or even a function. The value might be stored in 'text' space or might just be known to the compiler. - static: The name refers to a globally allocated variable which is known only through this object type. So like a 'class variable' ... which is probably a bad idea for exactly the same reason that global variables are a bad idea. - instance/auto: field that gets included each instance - method: This is a function that gets stored in a vtable. The innermost object that has methods owns the vtable. Each object with any methods devices a vtable structure which embeds that vtable of the inner object. So there can only be one - multi-inheritance doesn't work. But an object can still store function pointers.. But maybe multi-inheritance is fine once we understand inheritance properly. Typically we embed a parent in a child and say that the behaviours of the parent applied to the child through the parent. So if you want to do X to the child, and X can be done to the parent, then you do it to the parent. But you might have adjusted that parent through giving it a vtable that understands the child. So a second parent just means a second vtable pointer in there somewhere. Of course is both parents can respond to X, then the child needs to explicity disambiguate. Which leads to over-rides and name resolution. If I include a parent as 'p' then everything in it is available as 'p.thing'. We might some or all things to be available as 'thing' directly, or maybe 'pthing' - i.e. with a different name. So this is just standard importing. We can import everything, or specific things with renaming. Do we want 'import if it doesn't cause a conflict'?? That is probably reasonable. So: import all, import none, import if unique, import X (as Y). struct foo bar {X, y as z, *unique} Then X == bar.X, z == bar.y, $1 = bar.$1 if it is unique. When a method is declared, its vtable location must be identified. It can be: - inline, so the method is just a function pointer - attached, so a vtable pointer is in the structure - attached through some parent - delegated to some referenced object. *parallelism I've never really given this a lot of serious thoughts. In the kernel, we simply use locks for protection and a fairly heavy-weight mechanism for creation threads. fork/wait. 'go' has 'go statements' to run those statements in parallel, and uses channels (blocking or buffered) for synchronisation. They assume the library might provide locking primitives. I guess channels as a builtin rather than a library function might allow some useful optimisations. You would thinks locks would too. I feel that the language should not preferentially define any synchronisation mechanisms. Locks, queues, signals etc could all be provided by the library, and if necessary the compiler could 'know' about some and allow optimisations just like the gcc C compiler 'knows' about strcpy from the library. So the language just needs a way to start a new thread. If the termination of the thread is important, it can set some flag or whatever when it ends. It might be nice to be able to abort a thread - kill it. Rather than defining a naming system we could allow a thread to be given a handle of some sort which it passes to the runtime say "exit when this flag is set". This is a bit like a signal handler which I decided was a bad idea... No, I think the 'thread' creation function should return a handle which can be killed. So handle := do { commands } handle.kill() handle.running() etc. This creates a closure that copies all visible local variable, but shares references to allocated and global variables. concurrency management is very easy to get wrong. Sometimes it is nice to be able to use lowlevel primitives like RCU and write barriers and spinlock, but in many cases your needs are so fine grained and you want the language to do it for you. This could in part be through data structures that do the synch for you (channels, bags, refcounts). However it is probably good to have some way to tag something to say that it must run locked. Locking is something that must be optional. A structure needs locking when shared, but if it is always used in a locked context, then the locking is a waste. And while it may only be setting a bit, it has to be bus-locked test-and-set which is definitely slower than not doing it. If we had 2 bits for each lock, one could say if locking is active and the other if the lock is held. (There would be a completely separate wait-queue found through a hash on the lock address). If either bit is set, enter spinlock protocol. So when allocating an object we enable locking or not depending on something. If an object is lockable, then certain fields should only be accessed while the lock is held. So they get to be 'private' or 'hidden'. Granularity of language-imposed locking is probably object and method. i.e. a method takes a lock on an object. Presumably this is what the 'synchronised' type attribute means. *introspection or reflection... In 'go' this allows the actual type of an object to be explored to some extent - the name and type etc of each field of a struct can be extracted. I suspect that is used. I don't really like typecase that it uses though. Maybe I need to understand how unions should work... *Memory management Garbage collections seems to be popular. I hate it. It makes this stuff "just work" which is good, but it feels very untidy. Also there is no guarantee when it happens so destructors which free other resources like fds aren't always safe. Reference counting is a real over head, particularly on small objects but talloc has good results... One problem with refcounting is that you need it to be atomic in a multithread environment. But the GC would need to stop everything in multithread too. A program which was known to not run for long could be compiled with refcounting disabled and just never to free anything... Though that is bad for destructors too. Probably should allow manual management with no refcounting, and 'once' tagging on pointer types so the compiler can check no refs are taken rather than counting them all. If we had really good reference counting, then locks could make use of them too. A 'lock' function returns a 'locked' object and when that object is released the lock is dropped by the destructor. So just going out-of-scope is enough to unlock, but var = get_lock(lock) do stuff var = None would allow it to be explicit. This also allows lock ownership to be passed around. If a function type declares that it absorbs a reference, then it is thereby allowed to unlock something that other code locked. Possibly the most interesting part of managing refcount is determining which references in heap objects are counted as references in the stack should be fairly easy to track. e.g. a double-linked list won't want two counts - the 'back' link should be ignored. And possible it won't want to count at all if it is e.g. in a hash table we might not want to count at all - when the refcount becomes zero just remove from the list. So: a reference can be: - counted: meaning it contributes to the refcount - clearable: meaning that when the object is deleted, the reference can be found and destroyed was with the bask link in a double-link list - dependant: meaning that it depends on another reference, and will only be held while that other reference exists. Getting the compiler to check these is probably too hard. Dependant references might be do-able if we had a lock on the holder of the prime link.. but that only makes sense for transient dependant references. Checking that the destructor actually destroys all clearable references is asking too much. Maybe the 'clearable' declaration could include a label for the code in the destructor where the 'clear' happens - or vice-versa. i.e. annotate each clearing with the clearable field that this clears. compiler could track movement of references and insert 'get' and 'put' where absolutely necessary based on annotations. So the programmer could still make mistakes, but they are at a higher level *Higher order functions and first class functions Functions are first class if they can be passed around like other values, and of course called - given args and allowed to return a value. Higher order function take functions as arguments, such as 'map' which takes a function and a list and applies the function to each member of the list. Then there is currying where you just pass one arg to a function and it returns a function which can be applied to the second arg. The type of 'curry' would be rather subtle: curry : function(x,..y)->z, x -> function(..y)->z Then there are closures. This is a function completed with some bound variables, just like an object. i.e. a call frame *is* an object and any section of code in the frame can be given a name and can then be called from out side the frame. After parsing the code in the function we can determine if there are any code blocks that might be exported and determine which variables are in there. These get placed in an object allocated on the heap. All other variables remain on the stack. i.e. we create an object to contain the closed functions. But we don't pass the object around ... we pass the function around. That is a little odd. Unless a function *is* an object. But what does that mean? It doesn't obviously have state, and it only has one behaviour. i.e. referencing a label in it makes no sense. Only calling it does. So that is its one behaviour - callable. So a "regular" object should be callable too? Why not? So foo.bar is an object which curries foo. x = foo.bar x(hello) is the same as for.bar(hell) What is 'x'? a pointer to an object and a function. It is X with the interface narrowed down to a single function. The type doesn't have a name, though the interface might.