2 Supposing I were to write a series documenting my language experiments.
6 1/ Design rules / guidelines
8 - serve programmer, not compiler.
9 Obviously the compiler servers the programmer, and many things
10 make life easier for both.
12 - Allow programmer to tell the compiler what they are thinking.
14 - Builtin types should not be too special. Whatever is avail to
15 them should be available to others. So operators and literals
16 should be available to all types.
18 - Similar things should look similar. This encourages a uniform
19 syntax which is easier to remember. It can also encourage
22 - Different things should look different. Just because two
23 distinct concepts can be implemented with a single syntax doesn't
24 mean they should be. Code is read more than it is written, so it
25 should be clear from the syntax what the intent is.
27 - Build on previous languages. Programmers already know several
28 languages. They shouldn't have to re-learn everything. Only do
29 an old feature in a new way if it add substantial value.
31 - Allow a range of verbosity - let the programmer choose.
33 1a/ Ocean - better than C
35 2/ literate programming and md2c
39 - simple - we have lots of history - use it.
40 - general - use same scanner for multiple purposes
41 - newlines and indents
43 - comments and white space
44 - newlines and indents
45 - identifiers: ID_START ID_CONT plus 2 lists
47 - reserved words - must be a subset of identifiers
51 3/ LR grammar basics with a calculator and error recovery
53 4/ Adding indent/newline handling to LR Grammar
56 Structured control flow.
57 : INDENT UNDENT -or- { }
61 ditto for 'while'. Means 'do{}while' not needed;
62 Do I want "while: then:" or "while: do:" ??
64 ditto for 'switch' but this has a twist. The case labels can
65 be a new enum. Also labels can appear multiple times!
67 Also for/then/else and while/do/then/else
69 'next variable' instead of 'continue' or 'loop'
70 'last variable' instead of 'break'
72 'return value' and 'use value'
75 The point of 'for' is to allow a co-routine.
76 So we want really simple heads just like go
80 for varlist, iterator := value.each() then iterator.next()
86 defer? fallthrough? exceptions?
89 for while/if/switch then do case else
91 'for' requires 'while'
92 'switch' excludes 'then'
94 [for] while [then] do [case] [else]
98 'then' is implied after 'if expression'
99 'do' is implied after 'while expression'
101 But how do I get 'then' after 'while expression:' ??
103 for a = 0; while a < 10; then a+= 1:
106 I could have 'for then while' ...
108 then SimpleStatements
115 We don't need 'break' as we can 'use false' in the condition
117 we probably don't need 'continue' as we can 'use true' and have
118 the 'but always execute this bit' in the statement block.
123 is hole-in-scope allowed? No
124 Can we import a scope
125 - from a module? - yes with version number or name list. No over-riding allowed.
126 - from a struct? - nice, but not nice enough.
127 might be useful in 'switch' Might be OK with explicit list.
128 - does a syntax like $foo help? Why not just use binding.
130 We introduce a variable with(?)
132 but could that be ambiguous with ':' being used to start a block? Probably not.
133 Multiple variables could then be:
134 name:type, name:type, name:type = val, val, val
135 but that looks rather silly.
136 name, name, name: type = val, val, val
137 Then we cannot assign a new and an old in the one asignment, to a tuple (from a function).
138 Could use "var!" to assert a new variable.
144 An issue is that we introduce new state incrementally through a block
145 of code, but we don't really want to keep indenting.
146 Yet it would be nice to mark where the variable's scope begins and ends.
147 a! could introduce with assignment and close with usage (?)
157 I guess passing 'a!' with pass-by-reference, or generaly taking a reference
160 How much do we really need to declare variables?
161 In most cases types can be computed. In those cases we just need to guard
162 against typos. Also make it clear to human reader what intention is.
163 For the former, ensuring each name is both assigned and use is probably enough.
164 For the latter, we want to differentiate between "assign" and "re-assign"
166 Maybe '=' normally declares a new var, and '!=' over-rides?. Or :=.
167 i.e. ?= for any '?' changes an existing value.
168 Just '=' defines a new name.
169 But what about tuple assignment? only newnames are allowed. += doesn't work, nor
172 Swap assignment? Move assignment. They require lvalue on both sides.
173 swap a, b (b becomes a)
174 move a, b (b becomes nil)
176 move shouldn't be needed for locals as data flow will figure it out.
177 It is needed for members of structures. Can I use a symbol? <=?? <<= ?= =< ==
178 Or do I mark the lvalue '@a' means "return the value of 'a' and set 'a' to NULL"
181 I like '=' for assigning an immutable binding, and 'x=' for some 'x' for
182 mutable bindings. So '+=' adds and '*=' multiplies.
184 But what introduces a new name?
186 is a little bit noisy.
188 means "has value now, but might change" The visual connection between
189 "." and ":" might help.
192 does two things (two dots): declares the name and assigns the value.
194 just does one thing - it assigns a value.
196 To introduce a variable without giving it a value we assign '_'
206 Type can be given in <>
210 Though that might not be good as <T> is otherwise unbound.
214 Though there are more colons than I would like. Probably <> is OK.
216 I like the idea of binding a name to a field rather than to a value.
217 This is like by-reference function parameters.
218 It is different from a pointer because it is really just a syntactic shorthand.
219 i.e. the binding is constant.
220 I'm not sure if this is really useful though.
222 I also like the idea of the Pascal "with". I have occasionally missed it.
223 It exposed fields from a structure into the namespace.
224 Unfortunate it isn't obvious how to expose two structures, particularly
225 of the same type. I guess
227 x.field y.field is good enough.
230 Do I really need multiple assignment?
231 It is useful for 'swap', but I think I prefer an explicit 'swap'.
232 It allows unbinding of tuples, but is not
235 just as good? I guess names are better, but if names are important
236 maybe they should be declared in the tuple.
238 One benefit of multiple assignment is that a "simple statement"
239 can declare multiple variables, useful in a "for" clause.
240 That could just as well be handled with a 'simple block' which
241 is 'simple_block ; simple_statement'
245 Q: Do I really need "x:=_" in the above?
246 As the "print x" usage is not an initial usage, there must be
247 a prior assignment - or two. So I could make it work, but do I want to?
248 It would mean that when reading code I cannot easily tell the lifetime
251 Maybe I use a 'var' statement to declare names
254 What if I allowed a suffix statement which maintained the scope
255 of the cond statement, but could affect the more general scope.
265 I could declare that a name is bound throughout the whole block in which
266 is appears and if it appears in multiple blocks, one of those must contain
268 On every path that leads to any usage, the name must be initially bound.
269 So if it is defined in one branch of an 'if' but not the other, then it must
271 If it is bound with a do loop, it must be local.
272 If it is bound in all case and the 'else', then it could be more global.
274 multiple assignment is useful to collect procedure return
275 a, x+, b: = myfunction()
278 a=$1; x+=$2; b:= $next;
280 i.e. after evaluating a function, all the return values are
281 available as $N or $name until the next procedure call.
282 In that case $$ could be an error
286 case filenotfound: whatever.
288 What about functions called inside expressions?
289 The value cannot be used as there isn't one. So all code must be
290 dead until $$ is tested.
291 A function could identify return values as $xx names. A type
292 might be 'error' which has a special behaviour.
295 some_expression ? $$+1 : other
297 (some_expression ?: other-1) + 1
299 5.0a - decision time.
301 A new local name (variable) can be introduced with:
302 name ::= value // binding is constant
304 name := value // initial assignment
306 name = value // name must already be defined and is being replaced.
308 This binding extends at least until the end of the enclosing
309 statement. If the statement loops, the binding ends with the loop
310 If the statement is one of alternates, the binding continue only
311 if all branches introduced it, or only into parallel branches
312 in later conditionals.
314 After the minimal extent, a new binding will over-ride.
315 The name must have been used before it is over-ridden.
317 Bindings can be changed with
322 Multiple assignments are not supported. Use
324 if you want. To swap two bindings we have
326 More that 2 can be given and they are rotated with first value
327 landing in final lvalue
330 Assigning to record fields and array elements normally uses
331 =, though := can be used for record fields when initialising
332 a record before first use.
334 Assigning to a reference normally makes the reference refer to
335 something else. If the reference is to a struct or array, then
336 foo.field or foo[index] can be used to assign to a member.
337 So assign to the whole thing being referenced, use
339 which is easily confused with
340 foo .= value // damn.
342 For each name we need to identify places in the code where is
343 initialized and change, and then each usage needs to link back to
344 one of those. If a variable is changed conditionally:
346 subsequent usage of 'a' link back to the end of that whole
349 We can compare two expressions using the target of these links when
350 comparing bindings. If they match, then the values from the
351 expressions are the same. This can be used to determine where a
354 Q: what about concurrent memory models. Need to study this.
359 boolean. Needed for 'if' and comparisons etc.
360 Maybe places that might expect boolean actually expect object with 'test'
363 'order': Less, Eql, Greater
365 Can it fit any trinary need, like Boolean works for
366 yes/no, on/off, true/false, open/closed, ....
367 Given two, find the third?
368 trinary: True, False, Neutral/unknown/irrelevant/maybe
369 "a is the least" - true, false, or there isn't a least.
370 "Is this ordering correct?" <?
374 For export, utf8 utf16 utf32 ASCII also available
377 signed, unsigned, cyclic,
378 widths: 8, 16, 32, 64
379 arbitrary precision, if compiler cannot determine width
382 IEEE754 floating point single,double,quad,float
388 structs { name:type; name:type }
389 varient records - vary by enum or type pointer
391 functions (name:type, name:type -> name:type, name:type)
392 interfaces { name:functiontype, name:functiontype,...}
393 Each function has an implied(?) first argument
394 "self:self". The type 'self' can be used in other
395 args and return values.
400 <: :> &(intersection) |(union)
401 parametric types - type or constant parameter
402 value-dependent types. Value can be quite distant
403 linear types: number of references is part of type and can depend on value
404 temporal types: linear progression depends of value. "clock" concept
406 parallel types(?) can be accessed in multiple threads. Maybe
408 dependent types that depend on an atomic can also be parallel
409 e.g. they becomes writable when an atomic has some value.
410 a refcnt atomic could interface with 'linear'...
412 A borrowed reference might need to indicate where it is borrowed
413 from? There must be some strong reference which we "know" won't be
414 dropped. e.g. it could be read-only for the lifetime of the borrow.
417 5a/ functions, procedures and tuples.
419 A function can return 1 value. A procedure can return
420 0 or more - a tuple to be precise.
421 So where can a procedure be used?
423 no, all return values are available in $N until next procedure call.
425 or maybe proc(a,b,c) -> x,y,z or proc(a,b,c, out:x,y,z)
428 dynamic dispatch and polymorphism.
430 Dispatching a method call against a reference of incomplete type can be handled
431 is various ways. We need to understand these to understand consequences of choices
432 about how methods are attaches to types.
434 1/ If only one, or may be 2, methods are needed for the apparent type, then they
435 can be passed around with the pointer.
436 This is exactly what qsort() allows. 'comparable' has a single method
437 2/ If only one interface is needed, a pointer to that interface's implementation
438 This requires interfaces to be separate well defined things, which isn't
439 the case for 'go' I think
440 3/ The object can contain a pointer to one of the above or to a function which
441 finds and returns a given interface or method. This requires each interface
442 or method to have a globally unique name, which isn't too hard to manage using
443 the relocating linker
445 I think that every object which has interfaces needs to have that lookup function.
446 It may be in the object, or may need to be part of any reference.
447 Arrays etc can be parameterised by a concrete type so they can hold one function
448 for many references to small objects.
450 A module can define an interface to some other type, or to an interface.
451 So a module might define a "sort" interface to an "array of comparable" interface.
452 If a module imports several modules which all add different interfaces to an
453 external interface, then the importing module must define a lookup function
454 which finds all the different methods by 'name'.
456 Any data type can have a collection of methods. Some of these might belong
457 to an interface. The other can only be used when you have a reference of the
458 type, not of an interface.
459 A data type can declare that it contains a dispatch function, or the compiler
460 will use fat pointers.
462 A data type might instead contain an enum - which might be much smaller.
463 This assumes that all subtypes are in the one module and the compiler can
464 create switch statements to handle all interface methods.
467 Error returns from functions. Exceptions?
468 -errno works surprisingly well with ERR_PTR().
469 But NaN works even better.
471 Sometime we might want to handle errors in normal flow.
472 Sometime we might want them to be exceptional.
473 go distinguishes by allowing "foo, err := function"
475 I could allow 'foo' to 'hold' the error and so
477 except foo == errortype:
479 Without an 'except', the block is aborted.
481 How that that work for procedures? They explicitly return err if needed?
482 Or any return can be conditional
488 + - * / % & | ^ && || ! &^ &~
490 += -= *= /= %= &= |= ^= (one token or 2?)
491 and or not cand cor "and then" "or else"
494 max min (link 'and' 'or', or max() min()
498 .1 .2 .3 for tuple access.
500 * as a prefix operator dereferences
501 < as a prefix operator dereferences and sets to nil.
503 precedence - how many levels
512 - '//' - no, that's a comment
513 - /_ ( divide to the floor)
514 - /- (divide and discard remainder). then
515 -/ could be 'keep remainder'.
516 but /- could be 'divide by a negated value' 4/-5
520 - use '++' as general 'join' operator
522 6/ Pointers are special
523 The type can carry refcount info and locking info
525 7/ assignment and binding
526 patterns. i.e. destructuring.
527 If a structure is a namespace, then "with" might populate the active namespace...
528 That could trigger hole-in-scope though, which is bad.
529 Patterns are mostly used in switch/case.
530 switch value: case pattern
531 and the point of 'pattern' is that it might not match. e.g. it might assume
532 some element is not NULL, and has a particular structure.
533 e.g. if this is a cons-cell, do that, if it is nil, do the other.
535 Ahhh, no. This is used for tagged structures.
536 The case determines that a given tag is active, and makes the relevant
537 fields easily available. Is that syntactic sugar needed? I don't think so.
538 A switch might be useful, not it doesn't need syntax.
539 May the name '_' could be bound to the recent 'use'd value so
540 switch some_funct(...):
541 case _.tag = foo : ??
543 tagswitch some_funct(..):
544 case foo: print _.foo_name
545 that is probably cleaner.
546 So 'tagswitch' is a shorthand for
547 switch: _ = X; use _.tag:
549 8/ Operator choices - is this part of Expressions?
550 a else b // same as a ?: b
551 a if b else c // b ? a : c
555 I'd like to use ':', but not sure if it is being over-used for
559 declares a field to have the type, so
561 creates and integer call fred with value 27.
564 A struct "{field: type, field:type, ..}"
565 An array "[ length : type ]"
566 A procedure "(arg:type, arg:type -> result:type, result:type)"
567 A function "(arg:type, arg:type): result_type
568 A tagged union in a struct
569 "{ struct fields, tag -> name:type, name:type; tag2->name:type ...}"
570 A borrowed pointer "*type".
571 An owner pointer "~type"
572 A counted pointer "+type"
573 A collected pointer "?type".
575 xx/ output formatting
577 10/ modules, packages, exports/imports and linkage.
581 strings. hash, file, regexp, trees, channels/pipes
584 - useful for tracing written in-language
585 unit tests and mock objects?