2 Over a decade later I am thinking about language design again.
4 It is probably the release of GCC 4.6.0 mention on lwn.net and the
5 ensuing discussion which touched on 'go' from google. Particularly
6 see http://www.cowlark.com/2009-11-15-go/
8 So many thoughts swimming around, it is hard to know where to start...
13 parameterised interfaces for subtyping, but no inheritance
16 type coersion is bad - it can hide errors too easily.
17 But explicit casting can get very ugly.
20 A value - literal or named, or an operation that must produce
21 a precisely correct results - can be coersed into a value of
22 greater range or precision automatically.
23 A value - the result of an operation that might overflow -
24 cannot be coersed at all.
25 A cast must be used to restrict range or precision
26 Thus "a = b + c" cannot lose anything.
27 If b and c are the same size but differ in sign, neither can be
28 coersed into the other. The could both be coersed into a
29 larger signed value, but the 'a' would have to encompass that
30 value or an error would result.
31 So if b is s32 and c is u32, the addition would be s64 so a
32 could be 's64' or bigger.
35 If b and c have the same type, overflow can still occur but is
36 expected. In that case it can only be assigned to a variable
37 of the same size. So if b and c are s32, 'a' must also be
38 s32. If 'a' is s64, then one of b and c must first be cast to
39 's64' before the addition.
42 tags/modifiers/whatever:
44 - 'nullable' 'nonullable' on pointers default is later
45 - 'once' for reference counting
46 - function args can indicate if function absorbs a reference
47 an return values might provide a reference?
48 - 'signed' ?? Can I assign 'signed' to 'unsigned' with no check?
51 immutable and sharable types...
52 Numbers and strings and bool are immutable. User defined are
53 normally sharable, but can be immutable or 'once'.
54 .. something to allow an object to be embedded in another...
57 'types' describe particular implementations. Each object is
58 of a particular type. This included int, bool, etc.
59 An object is created using the 'type' function against a set
60 of values which might be manifest constants or might be other
62 Outside the module which defines a type, the internals of the
63 object are not visible except through the interfaces..
65 'interfaces' describe the behaviour of an object.
66 It identifies attributes and functions which the object
68 Attributes maybe be read-only or re-write but whether they are
69 fields or managed attributes is not externally visible
70 .... except that would make implementation inefficient for the
72 Probably field attributes need to be directly readable, but
73 writing should always be handled (though a NULL pointer might
75 Of course if we did incremental compilation we could detect
76 direct-access fields and optimise them(??).
78 An interface is a function name with input and output types.
79 Then these can be gathered in structs - which unroll of
81 So an interface can be defined as a list of functions and
83 The items listed in the input and output structs are names
84 paired with either an interface or a type.
86 The function can also be an operator, or get_indexed,
87 put_indexed - which handle name[index] references.
90 interface matching is strict inheritance matching.
91 i.e. if two interface both define 'foo' they don't
92 match. Rather they must both inherit the *same* foo from
95 An object implements multiple interfaces, possibly with code
98 virtual functions.. or templates or something... can be
99 written for an interface. The function can then be called on
100 any object that implements the interface. The function uses
101 other functions of the interface to implement the result.
102 So an "array of comparable" might implement a qsort function
103 What about a mergesort function on a list? What is the list?
106 Yes.. what is a list?
107 How can we implement linux/list.h lists in a fully type-safe
109 Possibly we cannot without tagging the list_head with a type
110 indicator and requiring a check, which would be boring and
112 I think that the only times we do list_entry, we know where
113 the head of the list is, and it isn't here.
115 So the type of a list_head must include the identity of the
117 next is a pointer to a listhead of this type embedded in a foo, or at X
119 When insert-after or delete we don't need to care about X.
120 In fact we only care when walking the list and calling
122 We could tag a 'head' with a low bit in one of the addresses.
124 So everything on the list is either embedded in a foo, or
125 is a load-list-head which is tagged.
127 If we allow "if X != Y" to change the type of X to exclude Y,
128 as we might with "if X != NULL" we might be able to do
130 But I suspect not. The assertions we are making are simply to
131 rich to make in a comprehensible type system.
132 So we probably want the 'list' module to be able to make
133 'unsafe' casts and assert that they are safe.
136 What about parameterised types etc.
137 A 'max' or 'min' function can work on any two items that are
138 mutually ordered, but we don't care what particular type it
139 is. So the type is an unknown:
140 max : (a,b:X < TotalOrdered) -> X
142 When we call 'max' we don't tell it what type X is, it has to
143 deduce from the passed args. So that is an implicit
147 to create an explicitly typed 'intmax' function(??)
149 Containers are the classic case for parameterised types:
154 this is an interface which could be provided by a type with an
155 embedded linkage field, or by an expandable array
159 ---------------------
160 So, some concretish thoughts.
161 We have two distinct things - interfaces and types.
162 An interface defines a set of function names and their types.
163 An interface can include another interface and may broaden the
164 types args or narrow the types of return values.
165 Broadening may remove args. Narrowing may add extra return
168 A type may declare that it conforms to one or more interfaces,
169 and by implication any included interface. There must be a
170 strict connection. Just defining functions with the same
171 names it not sufficient - the interface must be declared and
172 the compiler will check it.
173 If a type conforms to two interfaces which both have functions
174 with the same name, the type should provide two functions and
175 will need to indicate which serves which interface.
177 A type includes a data structure and a set of functions and
178 provides a namespace to enclose them. The structure may
179 include objects of other types either by reference or more
180 directly by 'embedding', sometimes called inheritance.
181 The function of those types are available and may be used to
182 implement a shared interface, either explicitly by declaring
183 an interface function to implement by an imported function, or
184 implicitly by marking a member object as a 'parent' - though a
185 better name is needed.
187 When the member's function is called a reference to the
188 contained object is passed together with a set of function
189 pointers that call into the relevant interface functions for
192 A 'set of functions' is a data structure which starts with a
193 function pointer - the dispatching function. Args are set up
194 with the identity of the called function first, the target
195 object next, then any args. In the common case remainder of
196 the data structure is a sorted list of method identities and
197 function pointers. The dispatching function does a binary
198 search to find the target, possibly subtracts an offset from
199 the target object, and jumps into the function.
201 But it would be best if not all functions had to go through
202 this lookup. Of course the caller could cache the found
203 function and not look it up again if it is called several
204 times. But that is only part of the solution.
206 The standard approach is to declare functions as 'final' if we
207 know they will not be over-ridden. In our case that means
208 that calls in the parent which expect the parent will never
209 call the child's function.
210 Possibly this should be the default to encourage efficiency.
211 If a function is marked 'virtual', then any call to it must go
212 through a lookup table. If it isn't, then it doesn't have to.
213 This marking is in the .... interface?
215 When an external caller knows the type of an object, normal function
216 calls are direct, virtual function calls are dispatched.
217 When a caller only knows an interface of an object, all calls
219 So it is the type which determines whether functions are
222 Do we ever want to be able to make function calls based on an
223 interface? i.e. could some feature of an interface be final.
224 This could make sense for a template like function.
226 An interface 'sortable' requires an array with comparison and
227 defines qsort. But why do that? Why not just define qsort
228 as a function. I guess that is the thing. A final interface
229 method is really just a function, possibly in some namespace.
230 So an interface defines methods or 'static' functions.
231 A type defines functions which can be 'virtual'ised.
234 An interface can be parameterised so that some types in args
235 or return are taken from the parameters.
236 interface comparable(base) {
237 int lesseq(this:base, that:base);
239 So numbers conform to comparable(number) and strings to
241 int conforms to comparable(number)
243 interface arrayof(base) {
244 base get_elem(this, index:int)
245 void set_elem(this, index:int, elem:base)
248 A function can be written to a subtype of an interface.
249 function base max(a,b:base) where base:comparable
250 with X:object, base:comparable(X)
251 function base median(a: array of base)
255 I want to think about closures - a function with some 'local'
256 variables already set... a lot like an object, but with one
258 This is created if you can take the address of a chunk of code
259 in a function - it holds on to the context as you export it.
260 I would either need to copy the stack frame, or have stack
261 frames on the heap which doesn't sound efficient.
262 copying the frame would be difficult if you had addresses of
263 things, but maybe you wouldn't. Or maybe you only allocate a
264 frame on the heap if it can possibly be part of a closure.
266 *statements, expressions, operators
268 Pascal made a strong distinction between statements and
269 expressions. This seemed elegant but turned out to be somewhat
270 clumsy. It created a strong distinction between procedures and
271 function which seem unnecessary.
272 It is sometime nice to have expressions with side-effects such as
273 'variable++' but this can cause problems with macro and
274 order-of-execution. So don't have macros! and don't allow
275 side-effects to affect other values in the same expression....
276 Some languages have a 'where' modifier to an expression or
277 statement which can contain arbitrary code to set up the context
278 for the expression. Typically is defines a bunch of names that
279 are used in the expression. It doesn't seem to have caught on and
282 where { foo; bar; baz; finished := whatever }
285 I sort-of like 'after' instead of 'where' as it doesn't imply that
286 the body has to explicitly affect the function.
287 C allows {( for; bar; baz; not whatever )} to achieve a similar idea,
288 but placing the 'whatever' at the end hides it a bit. With the
289 expression at the start it is more clear where we are going.
292 initialise; while condition do increment after { body}
293 which is kind-of interesting, though having the initialise out the
294 front is a little bit horrible.
295 Pascal's 'with' (which never quite was useful) could allow:
296 with initialisation while condition do increment after {body}
297 if condition after {context}
300 I wonder what case/switch should look like;
301 I probably want implied 'break'. In fact the 'go' switch is quite
303 switch expression { case expression: statements ; case
304 expression: statements; }
305 Part of me wants those statements inside { } so they are clearly
306 bracketed - but that might be ugly:
308 case expr { code code }
309 case expr { code code }
311 The word 'case' becomes pointless...
312 I think I want 'continue' rather than 'fall-though' meaning 'find
313 the next case that matches. I'm still not happy with the switch
316 Operators... I'm not sure what to say about those. Having
317 unbounded overloading seems like a bad idea, having none is too
318 restrictive. I think there should be a defined set of operators
319 with arity and precedence fixed. Possibly precedence should be
320 undefined when different definitions of operators are used to that
321 parentheses are needed.
323 i.e. a + b * c binds OK when they are all in the same type family,
324 but is a different type to b and c, then it must be a + (b * c)
325 I'm not really sure what that means - need to know more about how
326 operators are overloaded.
328 Every operator which is infix could also be prefix.
329 Post fix.. less clear:
331 Either op1 is postfix and op2 is infix, or
332 op1 is infix and op2 is prefix.
333 Best to exclude postfix ops from being infix.
335 Hmm.. there is more i want to think about here.
336 I would be cool if function calls could use a richer syntax to
337 separate args than just comma.
339 append $value to $list
340 distance from $x1,$y1 to $x2,$y2
341 distance from $x1,$y1 to $x2,$y2 via $x3,$y3
343 There are lots of questions here...
344 As the function words are not predefined we need them to either
345 be kept distinct from local names, or be syntacticly
346 distinguishable. $name works in the above, but not in real life.
347 We probably want the function names to belong to a type, but the
348 type isn't apparent until later if at all.
349 In the 'append' case the key type is at the very end.
350 In the distance case there is no key type.
352 distance from point(x1,y1) to point(x2,y2)
353 though a name of type 'point' could also be used
354 If I defined a variable 'distance' with a postfix operator 'from'
355 it could get awfully confusing.
356 Disallowing variables named for function starters might be too
357 restrictive as you probably cannot control which function names
359 The alternate is for variables to over-ride functions... which is
362 So each word introduces a temporary name space which over-rides
364 'distance' introduces 'from', 'to', 'via'
365 But a variable called 'distance' hides all of that.
366 If the name is followed immediately (no space) by a parenthesis,
367 then the new namespace only exists inside the parenthetic space.
368 That makes "point(x,y), z" different from "point (x,y), z". That
370 We could go lisp-like: (point x,y)
371 (distance from here to there via elsewhere)
372 cf. distance(here, there, elsewhere)
374 Not sure - the traditional version allows the functional bits to
375 stand out, but doesn't make their relationship clear.
377 number divided by number ???
384 Lots of little questions here...
385 1/ are parentheses just to over-rule precedence? or do they
386 have a real meaning? i.e. what is the function call syntax
388 print a,b,c -- that works
389 me.append a -- looks a bit odd.
391 a,b,c -- clearly a list
393 :a:b:c -- different syntax,so lists can be nested
394 :a:b,c:d -- first and third are not lists.
395 :a -- list of one element
396 :(:a):b -- a list of a list of one element, and 'b'
397 nil -- list with zero elements - empty list
398 a,b,c, -- trailing , is allowed and sometimes assumed.
399 So a function takes a single argument, which might be a
400 list... But how then do you call a function which takes no
403 me.tostring -- Is that a function or a function call?
404 me.tostring nil -- ugly
405 me.tostring() -- but now parentheses mean something.
407 Maybe getting the function (or closure) requires different
412 2/ when are {} required to group commands? Do we separate commands?
413 I don't like too many {} - so only require them to
415 Don't separate anything. Separators are ugly. We
416 terminate this with ',' or ';' when syntactially necessary
419 String constants and char constants are not syntactically
420 different. The type of a constant is determined from context.
422 They can be enclosed in "" or ''. In either case the terminal
423 char can never appear in the string, not even quoted. This
424 makes parsing simpler.
425 \escapes are recognised in "" but not in ''. \q is a quote.
427 Adjacent string constants are catenated so: "'" '"' is a
428 string of the two different quote characters.
430 A multi-line string is introduced by """. This must be the
431 first and last thing on the line after initial blanks.
432 Every subsequent line until another identical line must
433 start with the same sequence of initial blanks. All the lines
434 between are joined with the blank sequence removed.
436 The big question is: how are internal strings stored.
437 UTF-8, UTF-16, and UTF-32 are obvious candidates.
438 UTF-8 is best for European languages
439 UTF-16 is better for many Asian languages
440 UTF-32 is fairly pointless for general storage.
442 So we probably have two internal formats with auto-conversion
443 like for numbers. A pragma determines how constants are stored??
445 Or better, we allow a string to be either UTF-8 or UTF-16 and
446 self-describing, like any good object.
447 Each byte array is prefixed with a count that contains a
448 bit flag which distinguished between UTF-8 on a byte count, or
449 UTF-16 and a word count.
450 The bit is the lsb and is zeroed before using the count - so
451 the count must be even. If there are an odd number of
452 bytes/words it is simply padded.... Or maybe it is set to 1 so
453 that with the count it is closer to a round number of bytes
459 *Operator Overloading.
461 Operators in general...
462 It would be nice if modules could define operators instead of
463 just functions. Then 'int' could be just a module.
465 But operators affect parsing at a fairly deep level, both by
466 precedence and fix-order.
468 If two modules both import the same operator with different
469 precedence, that would be bad. But ditto with global names.
471 Anyway - can an operator be infix and prefix and postfix?
472 C only has ++ and --. .name ->name [] () are also sort-of
473 For prefix: & * + - ~ ! ++ -- cast
475 If we expect a term and see an op, it must be prefix.
476 If we expect and 'end' and see an op, it could be infix or
478 If after an operator we see another operator, the could be
479 infix prefix or postfix infix
480 So op cannot be both infix and postfix.
482 Operators like 'and' and 'or' and 'not' and 'or else' and 'and
483 then' really need to be predefined names or parsing gets
486 Probably assert that all 'other' operators bind more tightly
487 than pre-defined ops, are left->right, equal in precedence
488 and are infix or prefix but never postfix... or at least when
489 postfix they cannot be followed by something - just a close.
491 But do we really need new operators? Probably yes. Not many,
492 but sometimes and operator is just soooo much neater.
493 Look at &^ in 'go' - a really good idea I think.
494 <: :> ... I wonder...
497 The issue here is who to determine which instance rules.
498 It is simplest if the type of the first operand must accept
499 the operator. With int/float, the type of 'int' is actually
500 'number' which defines division. int/array would not work as
501 'number' does know about array.
507 *Declarations and assignments
509 Lots of interpreted languages don't require, or even allow,
510 variables to be declared before use. This makes writing quick
511 code easy (is that good?) but also makes typos easy. Checking
512 for unused values and uninitialised name references can help
513 catch a lot of the error but I'm not sure it is generally a
516 While some C variants allow declarations to be mixed with code
517 but that seems to be unpopular and I cannot say that I like it
519 Still - having the declaration close to use first use can be
520 nice. The 'with' statement I suggested earlier could make
522 with name = value do stuff
523 could serve as a declaration of 'name' with exactly the type
525 with int name = value do stuff
526 is not much harder so should be preferred.
530 would be a good syntax to declare 'name'.
532 would disambiguate if needed
533 The same could be used in type matching for function.
535 means X is a type variable determined from the args, and
536 becomes the type of the result.
538 I wonder if it would be nice to allow
540 to initialise all 3. It would be the same as
545 Do we want multiple names per type? How?
551 How, syntactically, do we give a type to a variable.
554 which doesn't work very well with initialisation, though it
560 such as *name or name[num]
561 which means some types don't have a nice closed name.
562 in Pascal "array [0..36] of int" becomes "int _[36]" ??
564 I think types should have simple closed names so they can
565 be passed around, particularly to parameterise interfaces.
566 Also "int* a, b" looks like and b are both int pointers, but
567 they aren't. "int *a=b; *a=b;" do two very different things.
568 "array of" is very verbose.
570 int* reference to an int
571 int& formal parameter which takes the address of the passed
572 value?? - no, maybe not.
573 [int,err] union of two possibilities?
577 Pointers are good. we like pointers.
578 Pointer arithmetic is bad. It allows very silly things.
579 It also allows cool things (container_of), but the language
580 should provide those.
582 Pointers can be cast to integers - that can help with
583 arbitrary ordering or equality checks (but what about memory
584 compaction?). But integers can never be cast back to pointers.
586 Array slices are a good idea. It requires passing a size
587 around with a pointer - though that could be optimised out
588 in some cases, and would be needed anyway in others.
589 If strings are array slices then we don't need nul
590 termination, but have the cost of a length... Probably not a
591 big cost and gets right of some horrible cut/paste issues with
592 null termination. We can pass a substring without copying or
595 Strings are very special. They are not arrays of characters.
596 They cannot be indexed but can be iterated. Each element is a
597 unicode point, which might have an ASCII value..
598 Strings are not mutable.
600 Byte arrays are normal arrays and are mutable. You might be
601 able to convert a string to a byte array but in general you
605 The language has values and references. 'references' are ways
606 to refer to a value, often a name (i.e. a variable) or a field
607 in an object, or a slot in an array.
609 Values can be mutable or immutable. The reference to an
610 immutable object might be a copy rather than a pointer.
611 Values have a lifetime. When the lifetime ends the value is
612 destroyed, which might a destructor function.
613 Lifetime can be determined by:
614 - reference counting - when it hits 0 the value is destroyed
615 - single-reference. There is only one reference and when it
616 is destroyed, the value is destroyed
617 - garbage collection. References can be found in memory and
618 where there are no more, the value is destroyed.
620 A reference-counted value must container a counter.
621 A collected value must contain a single bit usable in mark/sweep.
625 *error returns and exception handlers
627 Returning and handling errors is the bane of any programmers
628 life. Exception handling is essential but feels heavy weight and
629 clumsy. It doesn't have to.
630 I like the "set -e" functionality of the Bourne shell: If
631 anything returns an error status that isn't used, then execution
634 As similar concept would be that any function can return a union
635 of the normal return value, or an error. If the caller only
636 accepts the normal return value, the error is raise up the stack.
637 If the caller accepts the error: then it does whatever it likes.
639 answer, err := funciton_call(args)
641 do whatever, answer is undefined.
645 Obviously the function must be declared as possibly returning an
646 error. We could make the error type a part of the language, or
647 we could just generically allow unions to be returned. This
648 would allow structured error objects that are specific to the
651 Exploring 'go' suggests that there could be more than just error
652 returns to consider. This is also timeouts.
655 will wait for an answer while
656 answer, ok := function
657 will return immediately, either with an answer or an error.
658 Does that make sense? Should the calling context determine if a
659 delay is appropriate or not? That really sounds like a different
661 This is very different to the 'err' return. The function always
662 throws an error if that it what it has to do, the call sight
663 either catches it or lets of flow on up the stack.
664 An implementation might place a global setjmp which is like
665 passing an arg down, but it is a thread-global arg.
666 So 'catch error' is a stacked thread-global value.
667 Could not 'timeout' be similar? Or probably 'nodelay' as
668 timeouts would be implemented by killing of a thread that was no
669 longer interesting...
670 No - nodelay is too global. Some delays are small, some are
671 large. We cannot put them all in the one basket.
672 It makes more sense for the channel to be marked for delays or
673 not, or rather a channel endpoint.
674 Or a channel could normally block, but channel.ndelay could be a
675 channel that threw an error instead.
677 A problem with this approach is syntax.
678 It requires a 'catch' to be written like:
680 lots of function code here;
685 Which is much worse than
686 after_scope{handle error}
690 if error do handle error
694 name space control - who owns namespaces and how are they made
697 Obviously any block - or at least any 'with' statement -
698 introduces a namespace which over-rides existing names....
699 or should conflicts be error? That is safer.
701 A structure variable holds a namespace with local
702 interpretation. This is traditionally introduced by
703 {value of the type}.{name in local namespace}
705 It isn't just structures, any type, even the humble 'int'
706 could have names attached. "$int.abs" might find the absolute
707 value. But would be nice if 'abs' were more visible
709 as that extends to e.g. pop value off stack, push value on stack
710 Is that really better than stack.push(value); value = stack.pop;
711 Maybe not, but as the args are more complex and optional,
713 table.find(value, hash, compare-fn, ....)
714 loses something.. but then that is fairly contrived.
715 In mdadm I have 'force', 'verbose' flags, then for
716 e.g. create, level, chunk, size, raid_disks, spare_disks,
718 Using lots of prepositions will quickly run out of steam here.
719 Requiring tags for each fields would be boring as it would
721 Create(dev=devname, level=level, raid_disks=raid_disks, ....)
722 I really should use some little structures to gather stuff.
723 It would be nice if I didn't have to have a common type.
726 would work when they are different types, and become:
727 for each field in var:
728 do var.field = val.field
729 At least this could be allowed for arg passing to function.
730 Maybe something like "&val" expands to
733 A 'struct' is different from an object .. or a 'packed'
734 layout for that matter.
736 A struct is simply a collection of distinct values or
737 variables, each with a name and/or a position.
738 A struct does not comprise a type - different structs are simply
739 different things. structs are compatible based on name or
741 This is only relevant when a struct is assigned, either
742 directly with '=' or when assigning actual parameter to formal
743 parameters in a function call.
744 In these cases every variable in the target must match a value
745 from the source, either by name or position. The source may
746 have extra values, but it may not be deficient.
747 Multiple structs can be combined by listing them.
749 A struct is really just a short-hand for a few names/values.
750 A struct does not have an address, cannot be stored in an
751 object, or passed around.
753 A name can be defined as a struct in a local scope and the
754 various fields can be assigned. The whole struct can be
755 assigned to some other struct value in which case the source
756 must have matching names or positions.
757 But here is the problem: what if it has both matching names
758 and positions? and what if they don't align?
760 I think that names must take priority. Maybe only use
761 positions if one or the other doesn't have names. If both
762 have names, then they must match.
765 a,b doesn't have names, so it is purely positional.
766 (result:a, err:b) = fncall()
767 requires that fncall returns a result and an err.
769 So structs are primarily used for function call args and
771 Values can be just listed and are then positional.
772 If a struct is in the list, it is interpolated, not a separate
775 I guess positional parameters take precedence..
777 where foo is a struct, assigns first from x, second from y,
778 and the rest from foo. If foo has names they are used, else
781 A type name is given a struct and creates the object. There
782 might be several different structs than can be used in which
783 case name matching is important.
784 Imaginary(x:float,y:float) vs Imaginary(mag:float,direction:float)
785 If no labels are given, the first with enough matches wins.
787 Also a typed object can fill in a struct if it has all the
788 right names. Some might be attributes and some might be
789 computed attributes, but true functions are not allowed
790 because there is nowhere for args to come from.
792 So type name are in the global namespace (unless they are
793 prefixed by a module name).
794 So it must be OK for any name to be in the global namespace,
795 it might be a type, it might just be a function.
799 after thinking (and writing) about object us in the kernel one
800 thing that really is missing from C is the variable namespace.
801 C just has the fields in a struct but cannot have anything
802 else. (The other big thing is a richer type system).
803 So what if an 'object' (the basic abstraction) was about
804 define a name space as much as a set of storage fields.
805 And are these really two different things? or are they
808 So an 'object' creates a name space. In it are names which
810 - constant: a number or a string or even a function. The
811 value might be stored in 'text' space or might just be
812 known to the compiler.
813 - static: The name refers to a globally allocated variable
814 which is known only through this object type.
815 So like a 'class variable' ... which is probably a bad
816 idea for exactly the same reason that global variables are
818 - instance/auto: field that gets included each instance
819 - method: This is a function that gets stored in a vtable.
820 The innermost object that has methods owns the vtable.
821 Each object with any methods devices a vtable structure
822 which embeds that vtable of the inner object.
823 So there can only be one - multi-inheritance doesn't work.
824 But an object can still store function pointers..
826 But maybe multi-inheritance is fine once we understand
827 inheritance properly.
828 Typically we embed a parent in a child and say that the
829 behaviours of the parent applied to the child through the
831 So if you want to do X to the child, and X can be done to the
832 parent, then you do it to the parent. But you might have
833 adjusted that parent through giving it a vtable that
834 understands the child.
835 So a second parent just means a second vtable pointer in there
836 somewhere. Of course is both parents can respond to X, then
837 the child needs to explicity disambiguate.
839 Which leads to over-rides and name resolution.
840 If I include a parent as 'p' then everything in it is
841 available as 'p.thing'. We might some or all things to be
842 available as 'thing' directly, or maybe 'pthing' - i.e. with a
844 So this is just standard importing. We can import everything,
845 or specific things with renaming.
846 Do we want 'import if it doesn't cause a conflict'?? That is
847 probably reasonable. So:
848 import all, import none, import if unique, import X (as Y).
850 struct foo bar {X, y as z, *unique}
852 Then X == bar.X, z == bar.y, $1 = bar.$1 if it is unique.
854 When a method is declared, its vtable location must be
857 - inline, so the method is just a function pointer
858 - attached, so a vtable pointer is in the structure
859 - attached through some parent
860 - delegated to some referenced object.
866 I've never really given this a lot of serious thoughts.
867 In the kernel, we simply use locks for protection and a fairly
868 heavy-weight mechanism for creation threads. fork/wait.
869 'go' has 'go statements' to run those statements in parallel,
870 and uses channels (blocking or buffered) for synchronisation.
871 They assume the library might provide locking primitives.
872 I guess channels as a builtin rather than a library function
873 might allow some useful optimisations. You would thinks locks
876 I feel that the language should not preferentially define any
877 synchronisation mechanisms. Locks, queues, signals etc could
878 all be provided by the library, and if necessary the compiler
879 could 'know' about some and allow optimisations just like the
880 gcc C compiler 'knows' about strcpy from the library.
882 So the language just needs a way to start a new thread. If
883 the termination of the thread is important, it can set some
884 flag or whatever when it ends. It might be nice to be able to
885 abort a thread - kill it. Rather than defining a naming
886 system we could allow a thread to be given a handle of some
887 sort which it passes to the runtime say "exit when this flag
888 is set". This is a bit like a signal handler which I decided
890 No, I think the 'thread' creation function should return a
891 handle which can be killed.
894 handle := do { commands }
899 This creates a closure that copies all visible local variable,
900 but shares references to allocated and global variables.
902 concurrency management is very easy to get wrong.
903 Sometimes it is nice to be able to use lowlevel primitives
904 like RCU and write barriers and spinlock, but in many cases
905 your needs are so fine grained and you want the language to
907 This could in part be through data structures that do the
908 synch for you (channels, bags, refcounts). However it is
909 probably good to have some way to tag something to say that it
912 Locking is something that must be optional. A structure needs
913 locking when shared, but if it is always used in a locked
914 context, then the locking is a waste. And while it may only
915 be setting a bit, it has to be bus-locked test-and-set which
916 is definitely slower than not doing it.
918 If we had 2 bits for each lock, one could say if locking is
919 active and the other if the lock is held. (There would be a
920 completely separate wait-queue found through a hash on the
921 lock address). If either bit is set, enter spinlock protocol.
923 So when allocating an object we enable locking or not
924 depending on something.
926 If an object is lockable, then certain fields should only be
927 accessed while the lock is held. So they get to be 'private'
930 Granularity of language-imposed locking is probably object and
931 method. i.e. a method takes a lock on an object. Presumably
932 this is what the 'synchronised' type attribute means.
940 In 'go' this allows the actual type of an object to be
941 explored to some extent - the name and type etc of each field
942 of a struct can be extracted. I suspect that is used.
943 I don't really like typecase that it uses though. Maybe
944 I need to understand how unions should work...
948 Garbage collections seems to be popular. I hate it.
949 It makes this stuff "just work" which is good, but it feels
950 very untidy. Also there is no guarantee when it happens
951 so destructors which free other resources like fds aren't
953 Reference counting is a real over head, particularly on small
954 objects but talloc has good results...
955 One problem with refcounting is that you need it to be atomic
956 in a multithread environment. But the GC would need
957 to stop everything in multithread too.
958 A program which was known to not run for long could be compiled
959 with refcounting disabled and just never to free anything...
960 Though that is bad for destructors too.
962 Probably should allow manual management with no refcounting,
963 and 'once' tagging on pointer types so the compiler can check
964 no refs are taken rather than counting them all.
967 If we had really good reference counting, then locks could
968 make use of them too. A 'lock' function returns a 'locked'
969 object and when that object is released the lock is dropped
971 So just going out-of-scope is enough to unlock, but
976 would allow it to be explicit.
977 This also allows lock ownership to be passed around.
978 If a function type declares that it absorbs a reference, then
979 it is thereby allowed to unlock something that other code
982 Possibly the most interesting part of managing refcount is
983 determining which references in heap objects are counted
984 as references in the stack should be fairly easy to track.
985 e.g. a double-linked list won't want two counts - the 'back'
986 link should be ignored.
987 And possible it won't want to count at all if it is e.g. in a
988 hash table we might not want to count at all - when the
989 refcount becomes zero just remove from the list.
991 So: a reference can be:
992 - counted: meaning it contributes to the refcount
993 - clearable: meaning that when the object is deleted, the
994 reference can be found and destroyed was with
995 the bask link in a double-link list
996 - dependant: meaning that it depends on another reference,
997 and will only be held while that other
1000 Getting the compiler to check these is probably too hard.
1001 Dependant references might be do-able if we had a lock on the
1002 holder of the prime link.. but that only makes sense for
1003 transient dependant references.
1004 Checking that the destructor actually destroys all clearable
1005 references is asking too much.
1006 Maybe the 'clearable' declaration could include a label for
1007 the code in the destructor where the 'clear' happens - or
1008 vice-versa. i.e. annotate each clearing with the clearable
1009 field that this clears.
1011 compiler could track movement of references and insert 'get' and
1012 'put' where absolutely necessary based on annotations.
1014 So the programmer could still make mistakes, but they are at a
1017 *Higher order functions and first class functions
1019 Functions are first class if they can be passed around like
1020 other values, and of course called - given args and allowed to
1022 Higher order function take functions as arguments, such as
1023 'map' which takes a function and a list and applies the
1024 function to each member of the list.
1025 Then there is currying where you just pass one arg to a
1026 function and it returns a function which can be applied to the
1027 second arg. The type of 'curry' would be rather subtle:
1028 curry : function(x,..y)->z, x -> function(..y)->z
1030 Then there are closures. This is a function completed with
1031 some bound variables, just like an object.
1033 i.e. a call frame *is* an object and any section of code in
1034 the frame can be given a name and can then be called from out
1035 side the frame. After parsing the code in the function we can
1036 determine if there are any code blocks that might be exported
1037 and determine which variables are in there.
1038 These get placed in an object allocated on the heap. All
1039 other variables remain on the stack.
1040 i.e. we create an object to contain the closed functions.
1041 But we don't pass the object around ... we pass the function
1042 around. That is a little odd.
1043 Unless a function *is* an object. But what does that mean?
1044 It doesn't obviously have state, and it only has one
1045 behaviour. i.e. referencing a label in it makes no sense.
1046 Only calling it does. So that is its one behaviour -
1048 So a "regular" object should be callable too? Why not?
1049 So foo.bar is an object which curries foo.
1054 What is 'x'? a pointer to an object and a function.
1055 It is X with the interface narrowed down to a single function.
1056 The type doesn't have a name, though the interface might.