ocean-lang.org Git - ocean-D/blob - 2011

   1 *Intro
   2 Over a decade later I am thinking about language design again.
   3
   4 It is probably the release of GCC 4.6.0 mention on lwn.net and the
   5 ensuing discussion which touched on 'go' from google.  Particularly
   6 see  http://www.cowlark.com/2009-11-15-go/
   7
   8 So many thoughts swimming around, it is hard to know where to start...
   9
  10
  11 *Type system
  12
  13   parameterised interfaces for subtyping, but no inheritance
  14
  15   numbers
  16     type coersion is bad - it can hide errors too easily.
  17     But explicit casting can get very ugly.
  18
  19     So:
  20        A value - literal or named, or an operation that must produce
  21        a precisely correct results - can be coersed into a value of
  22        greater range or precision automatically.
  23        A value - the result of an operation that might overflow -
  24        cannot be coersed at all.
  25        A cast must be used to restrict range or precision
  26        Thus "a = b + c"  cannot lose anything.
  27        If b and c are the same size but differ in sign, neither can be
  28        coersed into the other.  The could both be coersed into a
  29        larger signed value, but the 'a' would have to encompass that
  30        value or an error would result.
  31        So if b is s32 and c is u32, the addition would be s64 so a
  32        could be 's64' or bigger.
  33
  34
  35        If b and c have the same type, overflow can still occur but is
  36        expected.  In that case it can only be assigned to a variable
  37        of the same size.  So if b and c are s32, 'a' must also be
  38        s32.  If 'a' is s64, then one of b and c must first be cast to
  39        's64' before the addition.
  40
  41
  42    tags/modifiers/whatever:
  43
  44       - 'nullable' 'nonullable' on pointers default is later
  45       - 'once' for reference counting
  46       - function args can indicate if function absorbs a reference
  47         an return values might provide a reference?
  48       - 'signed' ?? Can I assign 'signed' to 'unsigned' with no check?
  49
  50
  51     immutable and sharable types...
  52     Numbers and strings and bool are immutable. User defined are
  53     normally sharable, but can be immutable or 'once'.
  54     .. something to allow an object to be embedded in another...
  55
  56
  57         'types' describe particular implementations.  Each object is
  58         of a particular type.  This included int, bool, etc.
  59         An object is created using the 'type' function against a set
  60         of values which might be manifest constants or might be other
  61         objects.
  62         Outside the module which defines a type, the internals of the
  63         object are not visible except through the interfaces..
  64
  65         'interfaces' describe the behaviour of an object.
  66         It identifies attributes and functions which the object
  67         provides.
  68         Attributes maybe be read-only or re-write but whether they are
  69         fields or managed attributes is not externally visible
  70         .... except that would make implementation inefficient for the
  71         common 'field' case.
  72         Probably field attributes need to be directly readable, but
  73         writing should always be handled (though a NULL pointer might
  74         imply direct update).
  75         Of course if we did incremental compilation we could detect
  76         direct-access fields and optimise them(??).
  77
  78         An interface is a function name with input and output types.
  79         Then these can be gathered in structs - which unroll of
  80         course.
  81         So an interface can be defined as a list of functions and
  82         interfaces.
  83         The items listed in the input and output structs are names
  84         paired with either an interface or a type.
  85
  86         The function can also be an operator, or get_indexed,
  87         put_indexed - which handle name[index] references.
  88
  89
  90         interface matching is strict inheritance matching.
  91         i.e. if two interface both define 'foo' they don't
  92         match. Rather they must both inherit the *same* foo from
  93         somewhere.
  94
  95         An object implements multiple interfaces, possibly with code
  96         in several files...
  97
  98         virtual functions.. or templates or something... can be
  99         written for an interface. The function can then be called on
 100         any object that implements the interface.  The function uses
 101         other functions of the interface to implement the result.
 102         So an "array of comparable" might implement a qsort function
 103         What about a mergesort function on a list?  What is the list?
 104
 105
 106         Yes.. what is a list?
 107         How can we implement linux/list.h lists in a fully type-safe
 108         way?
 109         Possibly we cannot without tagging the list_head with a type
 110         indicator and requiring a check, which would be boring and
 111         pointless.
 112         I think that the only times we do list_entry, we know where
 113         the head of the list is, and it isn't here.
 114
 115         So the type of a list_head must include the identity of the
 116         head of the list.
 117            next is a pointer to a listhead of this type embedded in a foo, or at X
 118
 119         When insert-after or delete we don't need to care about X.
 120         In fact we only care  when walking the list and calling
 121         list_entry.
 122         We could tag a 'head' with a low bit in one of the addresses.
 123         Ugly but effective.
 124         So everything on the list is either embedded in a foo, or
 125         is a load-list-head which is tagged.
 126
 127         If we allow "if X != Y" to change the type of X to exclude Y,
 128         as we might with "if X != NULL" we might be able to do
 129         something...
 130         But I suspect not.  The assertions we are making are simply to
 131         rich to make in a comprehensible type system.
 132         So we probably want the 'list' module to be able to make
 133         'unsafe' casts and assert that they are safe.
 134
 135
 136         What about parameterised types etc.
 137         A 'max' or 'min' function can work on any two items that are
 138         mutually ordered, but we don't care what particular type it
 139         is.  So the type is an unknown:
 140           max : (a,b:X < TotalOrdered) -> X
 141
 142         When we call 'max' we don't tell it what type X is, it has to
 143         deduce from the passed args.  So that is an implicit
 144         parameter.
 145         But we could say
 146                 intmax = max(X=int)
 147         to create an explicitly typed 'intmax' function(??)
 148
 149         Containers are the classic case for parameterised types:
 150           stack(X) :
 151                 push(entry:X);
 152                 pop() -> X
 153                 empty -> bool
 154         this is an interface which could be provided by a type with an
 155         embedded linkage field, or by an expandable array
 156
 157
 158
 159         ---------------------
 160         So, some concretish thoughts.
 161         We have two distinct things - interfaces and types.
 162         An interface defines a set of function names and their types.
 163         An interface can include another interface and may broaden the
 164         types args or narrow the types of return values.
 165         Broadening may remove args. Narrowing may add extra return
 166         values
 167
 168         A type may declare that it conforms to one or more interfaces,
 169         and by implication any included interface.  There must be a
 170         strict connection.  Just defining functions with the same
 171         names it not sufficient - the interface must be declared and
 172         the compiler will check it.
 173         If a type conforms to two interfaces which both have functions
 174         with the same name, the type should provide two functions and
 175         will need to indicate which serves which interface.
 176
 177         A type includes a data structure and a set of functions and
 178         provides a namespace to enclose them.  The structure may
 179         include objects of other types either by reference or more
 180         directly by 'embedding', sometimes called inheritance.
 181         The function of those types are available and may be used to
 182         implement a shared interface, either explicitly by declaring
 183         an interface function to implement by an imported function, or
 184         implicitly by marking a member object as a 'parent' - though a
 185         better name is needed.
 186
 187         When the member's function is called a reference to the
 188         contained object is passed together with a set of function
 189         pointers that call into the relevant interface functions for
 190         main type.
 191
 192         A 'set of functions' is a data structure which starts with a
 193         function pointer - the dispatching function.  Args are set up
 194         with the identity of the called function first, the target
 195         object next, then any args.  In the common case remainder of
 196         the data structure is a sorted list of method identities and
 197         function pointers.  The dispatching function does a binary
 198         search to find the target, possibly subtracts an offset from
 199         the target object, and jumps into the function.
 200
 201         But it would be best if not all functions had to go through
 202         this lookup.  Of course the caller could cache the found
 203         function and not look it up again if it is called several
 204         times.  But that is only part of the solution.
 205
 206         The standard approach is to declare functions as 'final' if we
 207         know they will not be over-ridden.  In our case that means
 208         that calls in the parent which expect the parent will never
 209         call the child's function.
 210         Possibly this should be the default to encourage efficiency.
 211         If a function is marked 'virtual', then any call to it must go
 212         through a lookup table.  If it isn't, then it doesn't have to.
 213         This marking is in the .... interface?
 214
 215         When an external caller knows the type of an object, normal function
 216         calls are direct, virtual function calls are dispatched.
 217         When a caller only knows an interface of an object, all calls
 218         are dispatched.
 219         So it is the type which determines whether functions are
 220         virtual or not.
 221
 222         Do we ever want to be able to make function calls based on an
 223         interface?  i.e. could some feature of an interface be final.
 224         This could make sense for a template like function.
 225         e.g. qsort.
 226         An interface 'sortable' requires an array with comparison and
 227         defines qsort.  But why do that?  Why not just define qsort
 228         as a function.  I guess that is the thing.  A final interface
 229         method is really just a function, possibly in some namespace.
 230         So an interface defines methods or 'static' functions.
 231         A type defines functions which can be 'virtual'ised.
 232
 233
 234         An interface can be parameterised so that some types in args
 235         or return are taken from the parameters.
 236         interface comparable(base) {
 237                 int lesseq(this:base, that:base);
 238         }
 239         So numbers conform to comparable(number) and strings to
 240         comparable(string)
 241         int conforms to comparable(number)
 242
 243         interface arrayof(base) {
 244                 base get_elem(this, index:int)
 245                 void set_elem(this, index:int, elem:base)
 246         }
 247
 248         A function can be written to a subtype of an interface.
 249         function base max(a,b:base)  where base:comparable
 250         with X:object, base:comparable(X)
 251                 function base median(a: array of base)
 252
 253
 254
 255         I want to think about closures - a function with some 'local'
 256         variables already set... a lot like an object, but with one
 257         method.
 258         This is created if you can take the address of a chunk of code
 259         in a function - it holds on to the context as you export it.
 260         I would either need to copy the stack frame, or have stack
 261         frames on the heap which doesn't sound efficient.
 262         copying the frame would be difficult if you had addresses of
 263         things, but maybe you wouldn't.  Or maybe you only allocate a
 264         frame on the heap if it can possibly be part of a closure.
 265
 266 *statements, expressions, operators
 267
 268     Pascal made a strong distinction between statements and
 269     expressions.   This seemed elegant but turned out to be somewhat
 270     clumsy.  It created a strong distinction between procedures and
 271     function which seem unnecessary.
 272     It is sometime nice to have expressions with side-effects such as
 273     'variable++' but this can cause problems with macro and
 274     order-of-execution.  So don't have macros! and don't allow
 275     side-effects to affect other values in the same expression....
 276     Some languages have a 'where' modifier to an expression or
 277     statement which can contain arbitrary code to set up the context
 278     for the expression.  Typically is defines a bunch of names that
 279     are used in the expression.  It doesn't seem to have caught on and
 280     I wonder why...
 281        while not finished
 282             where { foo;  bar;  baz; finished := whatever }
 283        do { stuff }
 284
 285     I sort-of like 'after' instead of 'where' as it doesn't imply that
 286     the body has to explicitly affect the function.
 287     C allows {( for; bar; baz; not whatever )} to achieve a similar idea,
 288     but placing the 'whatever' at the end hides it a bit.  With the
 289     expression at the start it is more clear where we are going.
 290
 291     A for could then be:
 292         initialise; while condition do increment  after { body}
 293     which is kind-of interesting, though having the initialise out the
 294         front is a little bit horrible.
 295     Pascal's 'with' (which never quite was useful) could allow:
 296        with initialisation while condition do increment after {body}
 297        if condition after {context}
 298        do body; else other
 299
 300     I wonder what case/switch should look like;
 301     I probably want implied 'break'. In fact the 'go' switch is quite
 302     nice.
 303       switch expression { case expression: statements ; case
 304           expression: statements; }
 305     Part of me wants those statements inside { } so they are clearly
 306     bracketed - but that might be ugly:
 307        switch expression {
 308            case expr { code code }
 309            case expr { code code }
 310        }
 311     The word 'case' becomes pointless...
 312     I think I want 'continue' rather than 'fall-though' meaning 'find
 313     the next case that matches.   I'm still not happy with the switch
 314     syntax.
 315
 316     Operators... I'm not sure what to say about those.  Having
 317     unbounded overloading seems like a bad idea, having none is too
 318     restrictive.  I think there should be a defined set of operators
 319     with arity and precedence fixed.  Possibly precedence should be
 320     undefined when different definitions of operators are used to that
 321     parentheses are needed.
 322
 323     i.e. a + b * c binds OK when they are all in the same type family,
 324     but is a different type to b and c, then it must be a + (b * c)
 325     I'm not really sure what that means - need to know more about how
 326     operators are overloaded.
 327
 328     Every operator which is infix could also be prefix.
 329     Post fix.. less clear:
 330         arg op1 op2 arg
 331     Either op1 is postfix and op2 is infix, or
 332            op1 is infix and op2 is prefix.
 333     Best to exclude postfix ops from being infix.
 334
 335     Hmm.. there is more i want to think about here.
 336     I would be cool if function calls could use a richer syntax to
 337     separate args than just comma.
 338     e.g.
 339          append $value to $list
 340          distance from $x1,$y1 to $x2,$y2
 341          distance from $x1,$y1 to $x2,$y2 via $x3,$y3
 342     Cobol lives again?
 343     There are lots of questions here...
 344      As the function words are not predefined we need them to either
 345      be kept distinct from local names, or be syntacticly
 346     distinguishable.  $name works in the above, but not in real life.
 347     We probably want the function names to belong to a type, but the
 348     type isn't apparent until later if at all.
 349     In the 'append' case the key type is at the very end.
 350     In the distance case there is no key type.
 351     We could require
 352          distance from point(x1,y1) to point(x2,y2)
 353     though a name of type 'point' could also be used
 354     If I defined a variable 'distance' with a postfix operator 'from'
 355     it could get awfully confusing.
 356     Disallowing variables named for function starters might be too
 357     restrictive as you probably cannot control which function names
 358     get defined..
 359     The alternate is for variables to over-ride functions... which is
 360     probably fine.
 361
 362     So each word introduces a temporary name space which over-rides
 363     what is there.
 364     'distance' introduces 'from', 'to', 'via'
 365     But a variable called 'distance' hides all of that.
 366     If the name is followed immediately (no space) by a parenthesis,
 367     then the new namespace only exists inside the parenthetic space.
 368     That makes "point(x,y), z" different from "point (x,y), z".  That
 369     would be bad.
 370     We could go lisp-like:  (point x,y)
 371     (distance from here to there via elsewhere)
 372     cf.  distance(here, there, elsewhere)
 373
 374     Not sure - the traditional version allows the functional bits to
 375     stand out, but doesn't make their relationship clear.
 376
 377      number divided by number ???
 378      number div number
 379
 380
 381
 382
 383 *Basic syntax
 384         Lots of little questions here...
 385         1/ are parentheses just to over-rule precedence? or do they
 386           have a real meaning? i.e. what is the function call syntax
 387            name list
 388            print a,b,c   -- that works
 389            me.append a   -- looks a bit odd.
 390          What is a list?
 391                 a,b,c    -- clearly a list
 392                 a        -- not so sure
 393                 :a:b:c   -- different syntax,so lists can be nested
 394                 :a:b,c:d -- first and third are not lists.
 395                 :a       -- list of one element
 396                 :(:a):b  -- a list of a list of one element, and 'b'
 397                 nil      -- list with zero elements - empty list
 398                 a,b,c,   -- trailing , is allowed and sometimes assumed.
 399          So a function takes a single argument, which might be a
 400          list...  But how then do you call a function which takes no
 401         args?
 402
 403           me.tostring      -- Is that a function or a function call?
 404           me.tostring nil  -- ugly
 405           me.tostring()    -- but now parentheses mean something.
 406
 407           Maybe getting the function (or closure) requires different
 408           syntax?
 409             &me.tostring
 410             lambda me.tostring
 411
 412         2/ when are {} required to group commands?  Do we separate commands?
 413            I don't like too many {} - so only require them to
 414            disambiguate.
 415            Don't separate anything.  Separators are ugly.  We
 416            terminate this with ',' or ';' when syntactially necessary
 417
 418 *strings
 419         String constants and char constants are not syntactically
 420         different.  The type of a constant is determined from context.
 421
 422         They can be enclosed in "" or ''.  In either case the terminal
 423         char can never appear in the string, not even quoted.  This
 424         makes parsing simpler.
 425         \escapes are recognised in "" but not in ''.  \q is a quote.
 426
 427         Adjacent string constants are catenated so:  "'" '"' is a
 428         string of the two different quote characters.
 429
 430         A multi-line string is introduced by """.  This must be the
 431         first and last thing on the line after initial blanks.
 432         Every subsequent line until another identical line must
 433         start with the same sequence of initial blanks.  All the lines
 434         between are joined with the blank sequence removed.
 435
 436         The big question is: how are internal strings stored.
 437         UTF-8, UTF-16, and UTF-32 are obvious candidates.
 438         UTF-8 is best for European languages
 439         UTF-16 is better for many Asian languages
 440         UTF-32 is fairly pointless for general storage.
 441
 442         So we probably have two internal formats with auto-conversion
 443         like for numbers.  A pragma determines how constants are stored??
 444
 445         Or better, we allow a string to be either UTF-8 or UTF-16 and
 446         self-describing, like any good object.
 447         Each byte array is prefixed with a count that contains a
 448         bit flag which distinguished between UTF-8 on a byte count, or
 449         UTF-16 and a word count.
 450         The bit is the lsb and is zeroed before using the count - so
 451         the count must be even.  If there are an odd number of
 452         bytes/words it is simply padded.... Or maybe it is set to 1 so
 453         that with the count it is closer to a round number of bytes
 454         total.
 455
 456
 457
 458
 459 *Operator Overloading.
 460
 461         Operators in general...
 462         It would be nice if modules could define operators instead of
 463         just functions.  Then 'int' could be just a module.
 464
 465         But operators affect parsing at a fairly deep level, both by
 466         precedence and fix-order.
 467
 468         If two modules both import the same operator with different
 469         precedence, that would be bad.  But ditto with global names.
 470
 471         Anyway - can an operator be infix and prefix and postfix?
 472         C only has ++ and --.   .name ->name [] () are also sort-of
 473         For prefix:  & * + - ~ !  ++ -- cast
 474
 475         If we expect a term and see an op, it must be prefix.
 476         If we expect and 'end' and see an op, it could be infix or
 477         postfix.
 478         If after an operator we see another operator, the could be
 479           infix prefix     or     postfix infix
 480         So op cannot be both infix and postfix.
 481
 482         Operators like 'and' and 'or' and 'not' and 'or else' and 'and
 483         then'  really need to be predefined names or parsing gets
 484         too complex.
 485
 486         Probably assert that all 'other' operators bind more tightly
 487         than pre-defined ops, are left->right, equal in precedence
 488         and are infix or prefix but never postfix... or at least when
 489         postfix they cannot be followed by something - just a close.
 490
 491         But do we really need new operators?  Probably yes.  Not many,
 492         but sometimes and operator is just soooo much neater.
 493         Look at &^ in 'go' - a really good idea I think.
 494         <: :> ...  I wonder...
 495
 496    Overloading
 497         The issue here is who to determine which instance rules.
 498         It is simplest if the type of the first operand must accept
 499         the operator.  With int/float, the type of 'int' is actually
 500         'number' which defines division.  int/array would not work as
 501         'number' does know about array.
 502
 503
 504
 505
 506
 507 *Declarations and assignments
 508
 509         Lots of interpreted languages don't require, or even allow,
 510         variables to be declared before use.  This makes writing quick
 511         code easy (is that good?) but also makes typos easy.  Checking
 512         for unused values and uninitialised name references can help
 513         catch a lot of the error but I'm not sure it is generally a
 514         good thing.
 515
 516         While some C variants allow declarations to be mixed with code
 517         but that seems to be unpopular and I cannot say that I like it
 518         much.
 519         Still - having the declaration close to use first use can be
 520         nice.  The 'with' statement I suggested earlier could make
 521         this work nicely.
 522            with name = value do stuff
 523         could serve as a declaration of 'name' with exactly the type
 524         of 'value', though
 525            with int name = value do stuff
 526         is not much harder so should be preferred.
 527
 528         I wonder if
 529                 :name = value
 530         would be a good syntax to declare 'name'.
 531                 int:name = value
 532         would disambiguate if needed
 533         The same could be used in type matching for function.
 534                 max(:X:a,b) -> X
 535         means X is a type variable determined from the args, and
 536         becomes the type of the result.
 537
 538         I wonder if it would be nice to allow
 539                 :(a,b,c) = 1,2,3
 540         to initialise all 3.  It would be the same as
 541                 :a,:b,:c = 1,2,3
 542         Or do we want
 543                 int:a = 1
 544                 int:b = 2
 545         Do we want multiple names per type?  How?
 546                 int:(a,b) = 1,2
 547                 int:a,:b = 1,2
 548                 int:a,b = 1,2
 549
 550 *Type labels:
 551         How, syntactically, do we give a type to a variable.
 552         Pascal uses
 553              name, name : type
 554         which doesn't work very well with initialisation, though it
 555         could I guess.
 556         C uses
 557              type name, name
 558         or
 559              type modified-name
 560         such as *name or name[num]
 561         which means some types don't have a nice closed name.
 562         in Pascal "array [0..36] of int" becomes "int _[36]" ??
 563         or use a typedef.
 564         I think types should have simple closed names so they can
 565         be passed around, particularly to parameterise interfaces.
 566         Also "int* a, b" looks like and b are both int pointers, but
 567         they aren't. "int *a=b; *a=b;"  do two very different things.
 568         "array of" is very verbose.
 569         int[]   array of ints
 570         int*    reference to an int
 571         int&    formal parameter which takes the address of the passed
 572                 value?? - no, maybe not.
 573         [int,err] union of two possibilities?
 574
 575
 576 *Pointers:
 577         Pointers are good.  we like pointers.
 578         Pointer arithmetic is bad.  It allows very silly things.
 579         It also allows cool things (container_of), but the language
 580         should provide those.
 581
 582         Pointers can be cast to integers - that can help with
 583         arbitrary ordering or equality checks (but what about memory
 584         compaction?). But integers can never be cast back to pointers.
 585
 586         Array slices are a good idea.  It requires passing a size
 587         around with a pointer - though that could be optimised out
 588         in some cases, and would be needed anyway in others.
 589         If strings are array slices then we don't need nul
 590         termination, but have the cost of a length...  Probably not a
 591         big cost and gets right of some horrible cut/paste issues with
 592         null termination.  We can pass a substring without copying or
 593         mutating.
 594
 595         Strings are very special.  They are not arrays of characters.
 596         They cannot be indexed but can be iterated.  Each element is a
 597         unicode point, which might have an ASCII value..
 598         Strings are not mutable.
 599
 600         Byte arrays are normal arrays and are mutable.  You might be
 601         able to convert a string to a byte array but in general you
 602         cannot.
 603
 604
 605         The language has values and references.  'references' are ways
 606         to refer to a value, often a name (i.e. a variable) or a field
 607         in an object, or a slot in an array.
 608
 609         Values can be mutable or immutable.  The reference to an
 610         immutable object might be a copy rather than a pointer.
 611         Values have a lifetime.  When the lifetime ends the value is
 612         destroyed, which might a destructor function.
 613         Lifetime can be determined by:
 614         - reference counting - when it hits 0 the value is destroyed
 615         - single-reference.  There is only one reference and when it
 616                 is destroyed, the value is destroyed
 617         - garbage collection.  References can be found in memory and
 618                 where there are no more, the value is destroyed.
 619
 620         A reference-counted value must container a counter.
 621         A collected value must contain a single bit usable in mark/sweep.
 622
 623
 624
 625 *error returns and exception handlers
 626
 627      Returning and handling errors is the bane of any programmers
 628      life.  Exception handling is essential but feels heavy weight and
 629      clumsy.   It doesn't have to.
 630      I like the "set -e" functionality of the Bourne shell:  If
 631      anything returns an error status that isn't used, then execution
 632      aborts.
 633
 634      As similar concept would be that any function can return a union
 635      of the normal return value, or an error.  If the caller only
 636      accepts the normal return value, the error is raise up the stack.
 637      If the caller accepts the error: then it does whatever it likes.
 638      So:
 639         answer, err := funciton_call(args)
 640         if (err) {
 641                 do whatever, answer is undefined.
 642                 return err;
 643         }
 644
 645      Obviously the function must be declared as possibly returning an
 646      error.  We could make the error type a part of the language, or
 647      we could just generically allow unions to be returned.  This
 648      would allow structured error objects that are specific to the
 649      context.
 650
 651      Exploring 'go' suggests that there could be more than just error
 652      returns to consider.  This is also timeouts.
 653      i.e. in some cases:
 654             answer := function
 655      will wait for an answer while
 656             answer, ok := function
 657      will return immediately, either with an answer or an error.
 658      Does that make sense?  Should the calling context determine if a
 659      delay is appropriate or not?  That really sounds like a different
 660      argument.
 661      This is very different to the 'err' return.  The function always
 662      throws an error if that it what it has to do, the call sight
 663      either catches it or lets of flow on up the stack.
 664      An implementation might place a global setjmp which is like
 665      passing an arg down, but it is a thread-global arg.
 666      So 'catch error' is a stacked thread-global value.
 667      Could not 'timeout' be similar?  Or probably 'nodelay' as
 668      timeouts would be implemented by killing of a thread that was no
 669      longer interesting...
 670      No - nodelay is too global.  Some delays are small, some are
 671      large.  We cannot put them all in the one basket.
 672      It makes more sense for the channel to be marked for delays or
 673      not, or rather a channel endpoint.
 674      Or a channel could normally block, but channel.ndelay could be a
 675      channel that threw an error instead.
 676
 677      A problem with this approach is syntax.
 678      It requires a 'catch' to be written like:
 679       rv, err := {
 680               lots of function code here;
 681       }
 682       if (err) {
 683                 handle error
 684       }
 685       Which is much worse than
 686            after_scope{handle error}
 687
 688         Or it could be:
 689         with error = stuff
 690         if error do handle error
 691
 692
 693 *modularity
 694      name space control - who owns namespaces and how are they made
 695      available?
 696
 697         Obviously any block - or at least any 'with' statement -
 698         introduces a namespace which over-rides existing names....
 699         or should conflicts be error?  That is safer.
 700
 701         A structure variable holds a namespace with local
 702         interpretation.  This is traditionally introduced by
 703            {value of the type}.{name in local namespace}
 704
 705         It isn't just structures, any type, even the humble 'int'
 706         could have names attached.  "$int.abs" might find the absolute
 707         value.  But would be nice if 'abs' were more visible
 708               abs of $int (???)
 709         as that extends to e.g. pop value off stack, push value on stack
 710         Is that really better than stack.push(value); value = stack.pop;
 711         Maybe not, but as the args are more complex and optional,
 712         something like:
 713              table.find(value, hash, compare-fn, ....)
 714         loses something.. but then that is fairly contrived.
 715         In mdadm I have 'force', 'verbose' flags, then for
 716         e.g. create, level, chunk, size, raid_disks, spare_disks,
 717         etc..
 718         Using lots of prepositions will quickly run out of steam here.
 719         Requiring tags for each fields would be boring as it would
 720         look like:
 721            Create(dev=devname, level=level, raid_disks=raid_disks, ....)
 722         I really should use some little structures to gather stuff.
 723         It would be nice if I didn't have to have a common type.
 724         if
 725                var = val
 726         would work when they are different types, and become:
 727             for each field in var:
 728                 do var.field = val.field
 729         At least this could be allowed for arg passing to function.
 730         Maybe something like "&val" expands to
 731
 732
 733         A 'struct' is different from an object .. or a 'packed'
 734         layout for that matter.
 735
 736         A struct is simply a collection of distinct values or
 737         variables, each with a name and/or a position.
 738         A struct does not comprise a type - different structs are simply
 739         different things.  structs are compatible based on name or
 740         position matching.
 741         This is only relevant when a struct is assigned, either
 742         directly with '=' or when assigning actual parameter to formal
 743         parameters in a function call.
 744         In these cases every variable in the target must match a value
 745         from the source, either by name or position.  The source may
 746         have extra values, but it may not be deficient.
 747         Multiple structs can be combined by listing them.
 748
 749         A struct is really just a short-hand for a few names/values.
 750         A struct does not have an address, cannot be stored in an
 751         object, or passed around.
 752
 753         A name can be defined as a struct in a local scope and the
 754         various fields can be assigned.  The whole struct can be
 755         assigned to some other struct value in which case the source
 756         must have matching names or positions.
 757         But here is the problem: what if it has both matching names
 758         and positions? and what if they don't align?
 759           (x,y) = (y:1, x:3)
 760         I think that names must take priority.  Maybe only use
 761         positions if one or the other doesn't have names.  If both
 762         have names, then they must match.
 763         So:
 764             a,b = fncall()
 765         a,b doesn't have names, so it is purely positional.
 766         (result:a, err:b) = fncall()
 767         requires that fncall returns a result and an err.
 768
 769         So structs are primarily used for function call args and
 770         results.
 771         Values can be just listed and are then positional.
 772         If a struct is in the list, it is interpolated, not a separate
 773         object.
 774
 775         I guess positional parameters take precedence..
 776          x, y, foo
 777         where foo is a struct, assigns first from x, second from y,
 778         and the rest from foo.  If foo has names they are used, else
 779         positions.
 780
 781         A type name is given a struct and creates the object.  There
 782         might be several different structs than can be used in which
 783         case name matching is important.
 784           Imaginary(x:float,y:float) vs Imaginary(mag:float,direction:float)
 785         If no labels are given, the first with enough matches wins.
 786
 787         Also a typed object can fill in a struct if it has all the
 788         right names.  Some might be attributes and some might be
 789         computed attributes, but true functions are not allowed
 790         because there is nowhere for args to come from.
 791
 792         So type name are in the global namespace (unless they are
 793         prefixed by a module name).
 794         So it must be OK for any name to be in the global namespace,
 795         it might be a type, it might just be a function.
 796
 797
 798         Later....
 799         after thinking (and writing) about object us in the kernel one
 800         thing that really is missing from C is the variable namespace.
 801         C just has the fields in a struct but cannot have anything
 802         else. (The other big thing is a richer type system).
 803         So what if an 'object' (the basic abstraction) was about
 804         define a name space as much as a set of storage fields.
 805         And are these really two different things? or are they
 806         compatible?
 807
 808         So an 'object' creates a name space.  In it are names which
 809         can be:
 810          - constant: a number or a string or even a function.  The
 811             value might be stored in 'text' space or might just be
 812             known to the compiler.
 813          - static: The name refers to a globally allocated variable
 814             which is known only through this object type.
 815             So like a 'class variable' ... which is probably a bad
 816             idea for exactly the same reason that global variables are
 817             a bad idea.
 818          - instance/auto: field that gets included each instance
 819          - method: This is a function that gets stored in a vtable.
 820             The innermost object that has methods owns the vtable.
 821             Each object with any methods devices a vtable structure
 822             which embeds that vtable of the inner object.
 823             So there can only be one - multi-inheritance doesn't work.
 824             But an object can still store function pointers..
 825
 826         But maybe multi-inheritance is fine once we understand
 827         inheritance properly.
 828         Typically we embed a parent in a child and say that the
 829         behaviours of the parent applied to the child through the
 830         parent.
 831         So if you want to do X to the child, and X can be done to the
 832         parent, then you do it to the parent.  But you might have
 833         adjusted that parent through giving it a vtable that
 834         understands the child.
 835         So a second parent just means a second vtable pointer in there
 836         somewhere.  Of course is both parents can respond to X, then
 837         the child needs to explicity disambiguate.
 838
 839         Which leads to over-rides and name resolution.
 840         If I include a parent as 'p' then everything in it is
 841         available as 'p.thing'.  We might some or all things to be
 842         available as 'thing' directly, or maybe 'pthing' - i.e. with a
 843         different name.
 844         So this is just standard importing.  We can import everything,
 845         or specific things with renaming.
 846         Do we want 'import if it doesn't cause a conflict'??  That is
 847         probably reasonable.  So:
 848          import all, import none, import if unique, import X (as Y).
 849
 850         struct foo bar {X, y as z, *unique}
 851
 852         Then  X == bar.X, z == bar.y,  $1 = bar.$1 if it is unique.
 853
 854         When a method is declared, its vtable location must be
 855         identified.
 856         It can be:
 857          - inline, so the method is just a function pointer
 858          - attached, so a vtable pointer is in the structure
 859          - attached through some parent
 860          - delegated to some referenced object.
 861
 862
 863
 864
 865 *parallelism
 866         I've never really given this a lot of serious thoughts.
 867         In the kernel, we simply use locks for protection and a fairly
 868         heavy-weight mechanism for creation threads.  fork/wait.
 869         'go' has 'go statements' to run those statements in parallel,
 870         and uses channels (blocking or buffered) for synchronisation.
 871         They assume the library might provide locking primitives.
 872         I guess channels as a builtin rather than a library function
 873         might allow some useful optimisations.  You would thinks locks
 874         would too.
 875
 876         I feel that the language should not preferentially define any
 877         synchronisation mechanisms.  Locks, queues, signals etc could
 878         all be provided by the library, and if necessary the compiler
 879         could 'know' about some and allow optimisations just like the
 880         gcc C compiler 'knows' about strcpy from the library.
 881
 882         So the language just needs a way to start a new thread.  If
 883         the termination of the thread is important, it can set some
 884         flag or whatever when it ends.  It might be nice to be able to
 885         abort a thread - kill it.  Rather than defining a naming
 886         system we could allow a thread to be given a handle of some
 887         sort which it passes to the runtime say "exit when this flag
 888         is set".  This is a bit like a signal handler which I decided
 889         was a bad idea...
 890         No, I think the 'thread' creation function should return a
 891         handle which can be killed.
 892
 893         So
 894            handle := do { commands }
 895            handle.kill()
 896            handle.running()
 897            etc.
 898
 899         This creates a closure that copies all visible local variable,
 900         but shares references to allocated and global variables.
 901
 902         concurrency management is very easy to get wrong.
 903         Sometimes it is nice to be able to use lowlevel primitives
 904         like RCU and write barriers and spinlock, but in many cases
 905         your needs are so fine grained and you want the language to
 906         do it for you.
 907         This could in part be through data structures that do the
 908         synch for you (channels, bags, refcounts).  However it is
 909         probably good to have some way to tag something to say that it
 910         must run locked.
 911
 912         Locking is something that must be optional.  A structure needs
 913         locking when shared, but if it is always used in a locked
 914         context, then the locking is a waste.  And while it may only
 915         be setting a bit, it has to be  bus-locked test-and-set which
 916         is definitely slower than not doing it.
 917
 918         If we had 2 bits for each lock, one could say if locking is
 919         active and the other if the lock is held. (There would be a
 920         completely separate wait-queue found through a hash on the
 921         lock address). If either bit is set, enter spinlock protocol.
 922
 923         So when allocating an object we enable locking or not
 924         depending on something.
 925
 926         If an object is lockable, then certain fields should only be
 927         accessed while the lock is held.  So they get to be 'private'
 928         or 'hidden'.
 929
 930         Granularity of language-imposed locking is probably object and
 931         method.  i.e. a method takes a lock on an object.  Presumably
 932         this is what the 'synchronised' type attribute means.
 933
 934
 935
 936
 937 *introspection
 938         or reflection...
 939
 940         In 'go' this allows the actual type of an object to be
 941         explored to some extent - the name and type etc of each field
 942         of a struct can be extracted.  I suspect that is used.
 943         I don't really like typecase that it uses though.  Maybe
 944         I need to understand how unions should work...
 945
 946
 947 *Memory management
 948         Garbage collections seems to be popular.  I hate it.
 949         It makes this stuff "just work" which is good, but it feels
 950         very untidy.  Also there is no guarantee when it happens
 951         so destructors which free other resources like fds aren't
 952         always safe.
 953         Reference counting is a real over head, particularly on small
 954         objects but talloc has good results...
 955         One problem with refcounting is that you need it to be atomic
 956         in a multithread environment.  But the GC would need
 957         to stop everything in multithread too.
 958         A program which was known to not run for long could be compiled
 959         with refcounting disabled and just never to free anything...
 960         Though that is bad for destructors too.
 961
 962         Probably should allow manual management with no refcounting,
 963         and  'once' tagging on pointer types so the compiler can check
 964         no refs are taken rather than counting them all.
 965
 966
 967         If we had really good reference counting, then locks could
 968         make use of them too.  A 'lock' function returns a 'locked'
 969         object and when that object is released the lock is dropped
 970         by the destructor.
 971         So just going out-of-scope is enough to unlock, but
 972          var = get_lock(lock)
 973          do stuff
 974          var = None
 975
 976         would allow it to be explicit.
 977         This also allows lock ownership to be passed around.
 978         If a function type declares that it absorbs a reference, then
 979         it is thereby allowed to unlock something that other code
 980         locked.
 981
 982         Possibly the most interesting part of managing refcount is
 983         determining which references in heap objects are counted
 984         as references in the stack should be fairly easy to track.
 985         e.g. a double-linked list won't want two counts - the 'back'
 986         link should be ignored.
 987         And possible it won't want to count at all if it is e.g. in a
 988         hash table we might not want to count at all - when the
 989         refcount becomes zero just remove from the list.
 990
 991         So:  a reference can be:
 992            - counted:  meaning it contributes to the refcount
 993            - clearable: meaning that when the object is deleted, the
 994                         reference can be found and destroyed was with
 995                         the bask link in a double-link list
 996            - dependant: meaning that it depends on another reference,
 997                         and will only be held while that other
 998                         reference exists.
 999
1000         Getting the compiler to check these is probably too hard.
1001         Dependant references might be do-able if we had a lock on the
1002         holder of the prime link..  but that only makes sense for
1003         transient dependant references.
1004         Checking that the destructor actually destroys all clearable
1005         references is asking too much.
1006         Maybe the 'clearable' declaration could include a label for
1007         the code in the destructor where the 'clear' happens - or
1008         vice-versa.  i.e. annotate each clearing with the clearable
1009         field that this clears.
1010
1011         compiler could track movement of references and insert 'get' and
1012         'put' where absolutely necessary based on annotations.
1013
1014         So the programmer could still make mistakes, but they are at a
1015         higher level
1016
1017 *Higher order functions and first class functions
1018
1019         Functions are first class if they can be passed around like
1020         other values, and of course called - given args and allowed to
1021         return a value.
1022         Higher order function take functions as arguments, such as
1023         'map' which takes a function and a list and applies the
1024         function to each member of the list.
1025         Then there is currying where you just pass one arg to a
1026         function and it returns a function which can be applied to the
1027         second arg.  The type of 'curry' would be rather subtle:
1028           curry : function(x,..y)->z, x  ->  function(..y)->z
1029
1030         Then there are closures.  This is a function completed with
1031         some bound variables, just like an object.
1032
1033         i.e. a call frame *is* an object and any section of code in
1034         the frame can be given a name and can then be called from out
1035         side the frame.  After parsing the code in the function we can
1036         determine if there are any code blocks that might be exported
1037         and determine which variables are in there.
1038         These get placed in an object allocated on the heap.  All
1039         other variables remain on the stack.
1040         i.e. we create an object to contain the closed functions.
1041         But we don't pass the object around ... we pass the function
1042         around.  That is a little odd.
1043         Unless a function *is* an object.  But what does that mean?
1044         It doesn't obviously have state, and it only has one
1045         behaviour.  i.e. referencing a label in it makes no sense.
1046         Only calling it does.  So that is its one behaviour -
1047         callable.
1048         So a "regular" object should be callable too? Why not?
1049         So foo.bar is an object which curries foo.
1050           x = foo.bar
1051           x(hello)
1052         is the same as
1053           for.bar(hell)
1054         What is 'x'?  a pointer to an object and a function.
1055         It is X with the interface narrowed down to a single function.
1056         The type doesn't have a name, though the interface might.