ocean-lang.org Git - ocean-D/blob - Blog-outline

   1
   2 Supposing I were to write a series documenting my language experiments.
   3 What wouldI include:
   4
   5
   6 1/ Design rules / guidelines
   7
   8    - serve programmer, not compiler.
   9       Obviously the compiler servers the programmer, and many things
  10       make life easier for both.
  11
  12    - Allow programmer to tell the compiler what they are thinking.
  13
  14    - Builtin types should not be too special.  Whatever is avail to
  15      them should be available to others.  So operators and literals
  16      should be available to all types.
  17
  18    - Similar things should look similar.  This encourages a uniform
  19      syntax which is easier to remember.  It can also encourage
  20      mnemonics.
  21
  22    - Different things should look different.   Just because two
  23      distinct concepts can be implemented with a single syntax doesn't
  24      mean they should be.  Code is read more than it is written, so it
  25      should be clear from the syntax what the intent is.
  26
  27    - Build on previous languages.  Programmers already know several
  28      languages.  They shouldn't have to re-learn everything.  Only do
  29      an old feature in a new way if it add substantial value.
  30
  31    - Allow a range of verbosity - let the programmer choose.
  32
  33 1a/ Ocean - better than C
  34
  35 2/ literate programming and md2c
  36
  37 2a/ lexical structure
  38
  39    - simple - we have lots of history - use it.
  40    - general - use same scanner for multiple purposes
  41    - newlines and  indents
  42
  43    - comments and white space
  44    - newlines and indents
  45    - identifiers: ID_START ID_CONT plus 2 lists
  46    - specials
  47    - reserved words - must be a subset of identifiers
  48    - strings ' " `   '''  "
  49    - numbers 0 0x 0b e p
  50
  51 3/ LR grammar basics  with a calculator and error recovery
  52
  53 4/ Adding indent/newline handling to LR Grammar
  54
  55 5/ Statements
  56      Structured control flow.
  57      : INDENT UNDENT   -or-   { }
  58        if EXPRESSION BLOCK
  59      or
  60        if BLOCK then BLOCK
  61      ditto for 'while'.  Means 'do{}while' not needed;
  62      Do I want "while: then:" or "while: do:" ??
  63
  64      ditto for 'switch' but this has a twist.  The case labels can
  65      be a new enum.  Also labels can appear multiple times!
  66      including default???
  67      Also for/then/else and while/do/then/else
  68
  69      'next variable'   instead of 'continue' or 'loop'
  70      'last variable'   instead of 'break'
  71
  72      'return value' and 'use value'
  73
  74      'for ???'
  75      The point of 'for' is to allow a co-routine.
  76      So we want really simple heads just like go
  77       for initialise; next
  78       while condition
  79      or
  80       for varlist, iterator := value.each() then iterator.next()
  81      calls iterator.next()
  82
  83      funcall
  84      assignment/binding
  85
  86      defer? fallthrough? exceptions?
  87
  88
  89      for while/if/switch then do case else
  90
  91      'for' requires 'while'
  92      'switch' excludes 'then'
  93
  94      [for] while [then] do [case] [else]
  95      if then [else]
  96      switch {case} [else]
  97
  98      'then' is implied after 'if expression'
  99      'do' is implied after 'while expression'
 100
 101      But how do I get 'then' after 'while expression:' ??
 102
 103      for a = 0; while a < 10; then a+= 1:
 104         print a
 105
 106      I could have 'for then while' ...
 107         for SimpleStatements
 108         then SimpleStatements
 109         while Expression:
 110              statements
 111         else:
 112              something
 113
 114
 115      We don't need 'break' as we can 'use false' in the condition
 116      block.
 117      we probably don't need 'continue' as we can 'use true' and have
 118      the 'but always execute this bit' in the statement block.
 119
 120 5.0/
 121     variables and scope
 122
 123     is hole-in-scope allowed?  No
 124     Can we import a scope
 125         - from a module?  - yes with version number or name list.  No over-riding allowed.
 126         - from a struct? - nice, but not nice enough.
 127                         might be useful in 'switch'  Might be OK with explicit list.
 128         - does a syntax like $foo help? Why not just use binding.
 129
 130     We introduce a variable with(?)
 131         name : type = value
 132     but could that be ambiguous with ':' being used to start a block?  Probably not.
 133     Multiple variables could then be:
 134         name:type, name:type, name:type = val, val, val
 135     but that looks rather silly.
 136         name, name, name: type = val, val, val
 137     Then we cannot assign a new and an old in the one asignment, to a tuple (from a function).
 138     Could use "var!" to assert a new variable.
 139
 140      a!, b = fred()
 141
 142     a bit ugly?
 143
 144     An issue is that we introduce new state incrementally through a block
 145     of code, but we don't really want to keep indenting.
 146     Yet it would be nice to mark where the variable's scope begins and ends.
 147     a! could introduce with assignment and  close with usage (?)
 148
 149     a! = function()
 150     if a >= 0:
 151         x += a?;
 152
 153     if a = function()
 154        use a >= 0
 155     then: x += a;
 156
 157     I guess passing 'a!' with pass-by-reference, or generaly taking a reference
 158     would be bad.
 159
 160     How much do we really need to declare variables?
 161     In most cases types can be computed.  In those cases we just need to guard
 162     against typos.  Also make it clear to human reader what intention is.
 163     For the former, ensuring each name is both assigned and use is probably enough.
 164     For the latter, we want to differentiate between "assign" and "re-assign"
 165
 166     Maybe '=' normally declares a new var, and '!=' over-rides?.  Or :=.
 167     i.e. ?= for any '?' changes an existing value.
 168     Just '=' defines a new name.
 169     But what about tuple assignment?  only newnames are allowed.  += doesn't work, nor
 170     does :=
 171
 172     Swap assignment?  Move assignment.  They require lvalue on both sides.
 173         swap a, b (b becomes a)
 174         move a, b (b becomes nil)
 175
 176     move shouldn't be needed for locals as data flow will figure it out.
 177     It is needed for members of structures.  Can I use a symbol? <=?? <<= ?= =< ==
 178     Or do I mark the lvalue '@a' means "return the value of 'a' and set 'a' to NULL"
 179     b := @a.sort()
 180
 181     I like '=' for assigning an immutable binding, and 'x=' for some 'x' for
 182     mutable bindings. So '+=' adds and '*=' multiplies.
 183     ':=' replaces.
 184     But what introduces a new name?
 185        var x
 186     is a little bit noisy.
 187        x .= value
 188     means "has value now, but might change"  The visual connection between
 189     "." and ":" might help.
 190     Maybe
 191        x := value
 192     does two things (two dots): declares the name and assigns the value.
 193        x .= value
 194     just does one thing - it assigns a value.
 195
 196     To introduce a variable without giving it a value we assign '_'
 197
 198         x := _
 199         if foo:
 200                 x .= thing
 201         else:
 202                 x .= other
 203         print x
 204
 205
 206     Type can be given in <>
 207
 208         x<int> := 27
 209
 210     Though that might not be good as <T> is otherwise unbound.
 211     We could go with
 212         x:int := 27
 213
 214     Though there are more colons than I would like.  Probably <> is OK.
 215
 216     I like the idea of binding a name to a field rather than to a value.
 217     This is like by-reference function parameters.
 218     It is different from a pointer because it is really just a syntactic shorthand.
 219     i.e. the binding is constant.
 220     I'm not sure if this is really useful though.
 221
 222     I also like the idea of the Pascal "with".  I have occasionally missed it.
 223     It exposed fields from a structure into the namespace.
 224     Unfortunate it isn't obvious how to expose two structures, particularly
 225     of the same type.  I guess
 226         x = &thing
 227     x.field  y.field  is good enough.
 228
 229
 230     Do I really need multiple assignment?
 231     It is useful for 'swap', but I think I prefer an explicit 'swap'.
 232     It allows unbinding of tuples, but is not
 233        a = tupple
 234        a.1  a.2  a.3
 235     just as good?  I guess names are better, but if names are important
 236     maybe they should be declared in the tuple.
 237
 238     One benefit of multiple assignment is that a "simple statement"
 239     can declare multiple variables, useful in a "for" clause.
 240     That could just as well be handled with a 'simple block' which
 241     is 'simple_block ; simple_statement'
 242
 243
 244
 245     Q: Do I really need "x:=_" in the above?
 246     As the "print x" usage is not an initial usage, there must be
 247     a prior assignment - or two.  So I could make it work, but do I want to?
 248     It would mean that when reading code I cannot easily tell the lifetime
 249     of a name.
 250
 251     Maybe I use a 'var' statement to declare names
 252     var a<int>, b<str>
 253
 254     What if I allowed a suffix statement which maintained the scope
 255     of the cond statement, but could affect the more general scope.
 256
 257     if foo:
 258         x := thing
 259     else:
 260         x := other
 261     finally:
 262         b := x
 263     print b
 264
 265     I could declare that a name is bound throughout the whole block in which
 266     is appears and if it appears in multiple blocks, one of those must contain
 267     the others.
 268     On every path that leads to any usage, the name must be initially bound.
 269     So if it is defined in one branch of an 'if' but not the other, then it must
 270     be local to that if.
 271     If it is bound with a do loop, it must be local.
 272     If it is bound in all case and the 'else', then it could be more global.
 273
 274     multiple assignment is useful to collect procedure return
 275     a, x+, b: = myfunction()
 276
 277     myfunction():
 278     a=$1; x+=$2; b:= $next;
 279
 280     i.e. after evaluating a function, all the return values are
 281     available as $N or $name until the next procedure call.
 282     In that case $$ could be an error
 283
 284         myfunction()
 285         switch $$:
 286         case filenotfound: whatever.
 287
 288     What about functions called inside expressions?
 289     The value cannot be used as there isn't one.  So all code must be
 290     dead until $$ is tested.
 291     A function could identify return values as $xx names.  A type
 292     might be 'error' which has a special behaviour.
 293
 294     I wonder about
 295         some_expression ? $$+1 : other
 296     no point
 297         (some_expression ?: other-1) + 1
 298
 299 5.0a - decision time.
 300
 301    A new local name (variable) can be introduced with:
 302         name ::= value  // binding is constant
 303          name::type =
 304         name := value // initial assignment
 305          name:type =
 306         name = value // name must already be defined and is being replaced.
 307
 308    This binding extends at least until the end of the enclosing
 309    statement.  If the statement loops, the binding ends with the loop
 310    If the statement is one of alternates, the binding continue only
 311    if all branches introduced it, or only into parallel branches
 312    in later conditionals.
 313
 314    After the minimal extent, a new binding will over-ride.
 315    The name must have been used before it is over-ridden.
 316
 317    Bindings can be changed with
 318          name op= value
 319    acts as
 320          name = name op value
 321
 322    Multiple assignments are not supported.  Use
 323       a=1; b:=2; c = 3;
 324    if you want.  To swap two bindings we have
 325         swap lvalue, lvalue
 326    More that 2 can be given and they are rotated with first value
 327    landing in final lvalue
 328
 329
 330    Assigning to record fields and array elements normally uses
 331    =, though := can be used for record fields when initialising
 332    a record before first use.
 333
 334    Assigning to a reference normally makes the reference refer to
 335    something else.  If the reference is to a struct or array,  then
 336    foo.field or foo[index] can be used to assign to a member.
 337    So assign to the whole thing being referenced, use
 338       foo. = value
 339    which is easily confused with
 340       foo .= value // damn.
 341
 342    For each name we need to identify places in the code where is
 343    initialized and change, and then each usage needs to link back to
 344    one of those.  If a variable is changed conditionally:
 345       if x: a = 1
 346    subsequent usage of 'a' link back to the end of that whole
 347    statement.
 348
 349    We can compare two expressions using the target of these links when
 350    comparing bindings.  If they match, then the values from the
 351    expressions are the same.  This can be used to determine where a
 352    name is valid.
 353
 354    Q: what about concurrent memory models.  Need to study this.
 355
 356
 357 X/ types-1
 358
 359         boolean.  Needed for 'if' and comparisons etc.
 360         Maybe places that might expect boolean actually expect object with 'test'
 361         method
 362
 363         'order': Less, Eql, Greater
 364                 a ?= b  or   a <? b
 365                 Can it fit any trinary need, like Boolean works for
 366                         yes/no, on/off, true/false, open/closed, ....
 367                         Given two, find the third?
 368                 trinary: True, False, Neutral/unknown/irrelevant/maybe
 369                         "a is the least" - true, false, or there isn't a least.
 370                         "Is this ordering correct?"  <?
 371
 372
 373         char, rune, string
 374                 For export, utf8 utf16 utf32 ASCII also available
 375
 376         Numbers.
 377                 signed, unsigned, cyclic,
 378                 widths: 8, 16, 32, 64
 379                 arbitrary precision, if compiler cannot determine width
 380
 381                 Rational(?)
 382                 IEEE754  floating point single,double,quad,float
 383
 384 X/ types 2
 385
 386         arrays: [len:member]
 387         enums
 388         structs { name:type; name:type }
 389         varient records - vary by enum or type  pointer
 390
 391         functions (name:type, name:type -> name:type, name:type)
 392         interfaces { name:functiontype, name:functiontype,...}
 393                 Each function has an implied(?) first argument
 394                 "self:self".  The type 'self' can be used in other
 395                 args and return values.
 396
 397 X/ types 3
 398
 399         algebraic types
 400                 <:  :>  &(intersection) |(union)
 401         parametric types - type or constant parameter
 402         value-dependent types.  Value can be quite distant
 403         linear types: number of references is part of type and can depend on value
 404         temporal types: linear progression depends of value.  "clock" concept
 405                 needed.
 406         parallel types(?) can be accessed in multiple threads.  Maybe
 407                 atomic types(?).
 408                 dependent types that depend on an atomic can also be parallel
 409                 e.g. they becomes writable when an atomic has some value.
 410                 a refcnt atomic could interface with 'linear'...
 411
 412         A borrowed reference might need to indicate where it is borrowed
 413         from?  There must be some strong reference which we "know" won't be
 414         dropped.  e.g. it could be read-only for the lifetime of the borrow.
 415
 416
 417 5a/ functions, procedures and tuples.
 418
 419    A function can return 1 value.  A procedure can return
 420    0 or more - a tuple to be precise.
 421    So where can a procedure be used?
 422      - direct assignment
 423      no, all return values are available in $N until next procedure call.
 424
 425    or maybe proc(a,b,c) -> x,y,z  or proc(a,b,c, out:x,y,z)
 426
 427 Q/
 428   dynamic dispatch and polymorphism.
 429
 430   Dispatching a method call against a reference of incomplete type can be handled
 431   is various ways.  We need to understand these to understand consequences of choices
 432   about how methods are attaches to types.
 433
 434   1/ If only one, or may be 2, methods are needed for the apparent type, then they
 435      can be passed around with the pointer.
 436      This is exactly what qsort() allows. 'comparable' has a single method
 437   2/ If only one interface is needed, a pointer to that interface's implementation
 438      This requires interfaces to be separate well defined things, which isn't
 439      the case for 'go' I think
 440   3/ The object can contain a pointer to one of the above or to a function which
 441      finds and returns a given interface or method.  This requires each interface
 442      or method to have a globally unique name, which isn't too hard to manage using
 443      the relocating linker
 444
 445   I think that every object which has interfaces needs to have that lookup function.
 446   It may be in the object, or may need to be part of any reference.
 447   Arrays etc can be parameterised by a concrete type so they can hold one function
 448   for many references to small objects.
 449
 450   A module can define an interface to some other type, or to an interface.
 451   So a module might define a "sort" interface to an "array of comparable" interface.
 452   If a module imports several modules which all add different interfaces to an
 453   external interface, then the importing module must define a lookup function
 454   which finds all the different methods by 'name'.
 455
 456   Any data type can have a collection of methods.  Some of these might belong
 457   to an interface.  The other can only be used when you have a reference of the
 458   type, not of an interface.
 459   A data type can declare that it contains a dispatch function, or the compiler
 460   will use fat pointers.
 461
 462   A data type might instead contain an enum - which might be much smaller.
 463   This assumes that all subtypes are in the one module and the compiler can
 464   create switch statements to handle all interface methods.
 465
 466 X/
 467   Error returns from functions.  Exceptions?
 468   -errno works surprisingly well with ERR_PTR().
 469   But NaN works even better.
 470
 471   Sometime we might want to handle errors in normal flow.
 472   Sometime we might want them to be exceptional.
 473   go distinguishes by allowing "foo, err := function"
 474   to catch the error.
 475   I could allow 'foo' to 'hold' the error and so
 476    foo := function
 477    except foo == errortype:
 478         do stuff
 479    Without an 'except', the block is aborted.
 480
 481    How that that work for procedures? They explicitly return err if needed?
 482    Or any return can be conditional
 483
 484
 485 6/ Declarations
 486
 487 7/ Expressions
 488     + - * / %     & | ^   && || !  &^  &~
 489     >>  <<  <  >  <= >= !=
 490     += -= *= /= %= &= |= ^=   (one token or 2?)
 491        and or not cand cor "and then" "or else"
 492        else  then
 493        ?:   if/else
 494        max min (link 'and' 'or', or max() min()
 495     lambda somehow?  fn(
 496
 497       x.y   x()  x[]  x{}  x<>
 498      .1 .2 .3 for tuple access.
 499
 500     * as a prefix operator dereferences
 501     < as a prefix operator dereferences and sets to nil.
 502
 503     precedence - how many levels
 504
 505     with tuple:
 506       $1 $2 $3 ...
 507
 508     Integer division?
 509      - overload '/'
 510      - 'div'
 511      - '\'
 512      - '//' - no, that's a comment
 513      - /_  ( divide to the floor)
 514      - /-  (divide and discard remainder). then
 515         -/ could be 'keep remainder'.
 516       but /- could be 'divide by a negated value'  4/-5
 517
 518     String concatention?
 519      - overload '+'
 520      - use '++' as general 'join' operator
 521
 522 6/ Pointers are special
 523    The type can carry refcount info and locking info
 524
 525 7/ assignment and binding
 526     patterns. i.e. destructuring.
 527     If a structure is a namespace, then "with" might populate the active namespace...
 528     That could trigger hole-in-scope though, which is bad.
 529     Patterns are mostly used in switch/case.
 530      switch value: case pattern
 531     and the point of 'pattern' is that it might not match. e.g. it might assume
 532     some element is not NULL, and has a particular structure.
 533     e.g. if this is a cons-cell, do that, if it is nil, do the other.
 534
 535     Ahhh, no.  This is used for tagged structures.
 536     The case determines that a given tag is active, and makes the relevant
 537     fields easily available.  Is that syntactic sugar needed?  I don't think so.
 538     A switch might be useful, not it doesn't need syntax.
 539     May the name '_' could be bound to the recent 'use'd value so
 540      switch some_funct(...):
 541         case _.tag = foo : ??
 542     that's a bit ugly.
 543       tagswitch some_funct(..):
 544         case foo: print _.foo_name
 545     that is probably cleaner.
 546       So 'tagswitch' is a shorthand for
 547            switch: _ = X; use _.tag:
 548
 549 8/ Operator choices - is this part of Expressions?
 550     a else b    // same as a ?: b
 551     a if b else c      // b ? a : c
 552
 553 9/ Type syntax
 554
 555    I'd like to use ':', but not sure if it is being over-used for
 556    blocks.
 557    However:
 558      field : type
 559    declares a field to have the type, so
 560         fred:int = 27
 561    creates and integer call fred with value 27.
 562    A type can be:
 563      A name "int"
 564      A struct "{field: type, field:type, ..}"
 565      An array "[ length : type ]"
 566      A procedure "(arg:type, arg:type -> result:type, result:type)"
 567      A function "(arg:type, arg:type): result_type
 568      A tagged union in a struct
 569         "{ struct fields, tag -> name:type, name:type; tag2->name:type ...}"
 570      A borrowed pointer "*type".
 571      An owner pointer "~type"
 572      A counted pointer "+type"
 573      A collected pointer "?type".
 574
 575 xx/ output formatting
 576
 577 10/ modules, packages, exports/imports and  linkage.
 578 ....
 579
 580 11/ standard library
 581     strings.  hash, file, regexp, trees, channels/pipes
 582
 583 12/ reflection
 584    - useful for tracing written in-language
 585 unit tests and  mock objects?