1 Design notes for a language - 2013 version
2 ===========================
7 A large part of design is compromise. Once the good ideas are formed,
8 they must then be blended nicely, and this blending always involves
11 So we need a clear philosophy to guide this compromise. This
12 philosophy does not serve to provide simple answers, but rather
13 provides some unifying themes that can but use to motivate, or at
14 least to justify, the answer.
17 The compiler needs maximal information so that it can detect errors
18 and optimise implementation.
20 The syntax needs to be clean and non-distracting so that a human
21 reader can easily see what the code is trying to say.
23 These two principles are clearly in conflict. One wants to add
24 more to the program, the other - less. Finding a balance between
25 these two for each design element is key to success. The best answer
26 will involve finding a syntax which provides information without
27 distraction. This will not always be possible (if ever) but is a
31 Language should enable a feature to be provided, rather than provide
33 This means that the feature can be in a library, and can be changed
34 using the language itself. This is inherently more flexible.
35 However if the language doesn't "know" about a feature, it is less
36 able to type-check and optimise it.
39 3/ Things that are different should look different.
40 This allows the compiler to correct silly mistakes more easily, and
41 reduces the chance of making them.
43 So trying to use just a few core concepts is a mistake. It muddies
44 human thinking and makes life harder for the compiler.
47 familiar is better than novel
48 mnemonic is better than obscure
49 short is better than long
50 consistent is better than irregular
51 pronounceable is better than unpronounceable
60 expressions or statements
62 But what form could we use for a case/switch? Maybe we shouldn't.
64 syntax of structure blocks
65 How types are constructed and described
66 - scalar: int, float, char, Bool, enum
67 - vector: zero based. with size
69 - tuples - how first-class are they?
70 - union/variant/alternate
71 - non-discriminated unions?
72 - lookup table, stack, queue,
73 - functions and methods
77 Does a "pointer to a pointer" make sense in a strongly statically
79 Should enum values be pervasive, or require typename: prefixes?
80 'import' should be allowed to import an enum namespace.
82 - pointer + length. What is the role of 'capacity' in Go?
83 formal parameters - in/out/inout? Just a struct.
84 are the names of parameters part of the type, or not.
85 Probably not. But names in struct are part of the type. So not
87 return values from function: 'return', or name=value
89 what does 'break' mean in a closure? or 'continue'? or 'return'.
90 I guess the closure includes implicit labels. _loop _func
91 So "return label value,value" could return from enclosing block named
95 case/typecase/match/switch
97 defer/finally/after...
99 - does it create a block: try BLOCK finally STUFF
100 - does it run on function exit
101 - can it attach to the lifetime of a value:
102 x = take_lock(semaphore)
103 unlock implicitly happens when x binding is broken.
104 printf - formatting, input/output in general
105 manifest constants of various types
106 What are strings? What operations are in the language?
107 What sorts of numbers are built in?
108 Sized ints, unlimited ints, float, fixed, imaginary, complex
109 Initialization and default values
113 If parameter is untyped, then it can still be useful if
114 any needed function (e.g. compare, add) are passed in as well.
115 These functions could be in a structure ... which could be a vtable.(virgil)
117 overloading of bindings.
118 - functions/methods with different arg types
119 - operators for different objects
120 - function with different first arg type is like a method
121 and can be useful for sin(), min() abs() etc.
122 Certain names could be declared as methods and converted to
123 method syntax when called.
125 - useful for validating array references
126 - useful for avoiding 'default' case in switch
127 - can be deduced from some arithmetic: 'mod' particularly
128 overriding of bindings
129 - relates to inheritance.
130 Nature of bindings: immutable, const, var, computed
131 target: script, jit, pre-compile, ...
133 memory allocation and management
134 static or dynamic typing
136 co-routines, generators
139 assignment semantics - references, copies
140 binding versus assignment: particular relevant with tuples
142 avoid redundancy, but redundancy can be good.
144 foreign function interface
145 pattern matching for destructuring
146 This is the general case for "tuple = tuple" with the question of
147 which bits on the left are literal and which are formal names.
148 This is only an issue if the appearance of a value starts like
149 an identifier "Some(value)" "None".
150 'unsafe' at different levels
152 - memory allocation/casting
155 can the language/compiler make it easier to trace code execution?
156 flag every time a field changes
162 structure embedding using "import struct" or "import fields from
165 function attributes: optimise_options, inline, ....
168 - go doesn't have them - relies on 'after'.
171 why is this so much fun? I never use it in C!!! or Python.
172 introspection in Go look weird and is probably related to lame type
175 nullable pointer: builtin functionality or via a union?
176 array indexing: should checks be explicit where not resolved by type?
177 operators: type must match. A constant can be one of several types.
178 iterate through options until expression is type-correct. If
179 there are several constants .... might need to be careful.
180 a = 1 << n '1' must match 'a'
189 Parsing of keywords and identifiers and comments are fairly
191 keywords and identifiers are [_\alpha][_\alpha\num]*
192 comments are //..EOL or /*..*/
193 symbols are longest string that is known
194 spaces separate as needed
195 NEWLINE and indent changes are tokens which are handled at the next
197 That leaves strings and numbers.
199 Strings bounded by '' or "" may not contain newlines. I would
200 rather they didn't contain delimeter either. \q and \Q can be used
201 if the quote is really need.
203 Need something that can contain newlines and is (almost) completely
204 literal. Everything upto the close must be included verbatim.
205 Go uses ``, Python use ''' '''
206 ''' is nice because it is more unique and less likely to want to be
207 in the string. But I would really like a multi-line string to be
208 completely verbatim. That would require a uniquifier at the start:
212 Also, does the newlines immediately after the ''' go into the
214 What about just before the closing '''??
217 "" and '' both enclose strings in which various \escapes are
218 interpreted and newlines are not permitted.
219 ''' must come at end of line and following text upto ''' one a
220 line by itself forms a string. Trailing newline is included. The
223 And what about numbers?
224 These are combinations of:
227 'e' or 'E' possibly followed by + or -
228 'x' or 'b' or 'o' for base
231 Maybe 'i' or 'j' to indicate 'imaginary'.
232 But what if I want polar co-ordinates?
233 Should I require 12 + Im(12)
235 A type would need to be able to declare that is can be
236 initialised by a number, with one of several suffixes?
238 So at tokenisation we accept any string starting with a digit,
239 containing [0-9._exboEXBO] or [-+] immediately after [Ee].
241 Note that decimals cannot start with '.', so 0.9 is needed
244 Also there are no length suffixes. We have untyped constants
245 like Go does. Though we could allow [iufl] as extra letter to
246 support that if we wanted to.
251 Pascal: array [0..5] of string;
252 C: string[5] (sort of)
255 Except for Pascal which is too verbose, it is hard to choose
256 between them. Putting 'string' at the end makes sense and
257 is more pronounceable (Pascal reads well but looks ugly).
258 But I find it hard to get used to the look of Go.
259 That probably leaves Rust which looks a bit like an array
260 constant: [ onestring, anotherstring, etc]
289 I prefer the ': type' of Pascal and Rust. I like semicolons to
290 separate as comma might separate names with same type (maybe)
294 would be consistent if ':' introduced the type, but that would
295 mean arrays should be "[]:string" which is unnecessary.
296 So maybe the ':' terminates the list of names. Then we need a
297 special separator: (arguments) -> (return type)
299 Do we want a name to introduce? "struct" "fun"/"fn"? If so we
300 should for consistency have array ... like Pascal.
303 { name:type, name:type} is a struct
304 [length:type ] is an array ([5:int] reads "array of 5 integers").
305 (from:type,from:type -> to:type, to:type) is a function
306 { tag -> name:type, name:type; tag2 -> name:type, name:type } is
309 That is really rather elegant. May be a bit too terse though, don't
312 'tags' get assigned globally unique numbers, so multiple tagged
313 sets can be embedded in another.
314 However if the first one is "tag = 0 -> ... ,", then zero etc
315 is assigned and embedding is more limited.
317 Does that just leave pointers?
318 - closures have same apparent type as functions
320 - tuples only have implicit types - unless they are structs or
321 function parameter/return
322 - vtables ? interfaces
325 Maybe just a leading '*'.
326 But do we want to distinguish among:
331 These are properties of the object in some cases, the pointer in
333 An object can be: stack, owned, counted, collected
334 A pointer can be: strong or weak.
335 But a strong pointer needs to know what to do when dropped:
336 - nothing (so may as well be weak)
338 - dec ref and maybe destroy/free
340 So a pointer needs to be one of:
341 borrowed, owned, counted, collected
343 Pointers to stack are always borrowed. Raw pointers are borrowed
344 too, but can only exist in unsafe code.
346 And vtables. These are usually created automatically from
347 method and interface definitions. So they don't need a syntax.
348 Interfaces do though. They are sets of named functions. So
349 they look just like structures but all the fields are functions.
350 In fact they *are* structures. They use embedding for
353 Pointers to different values are quite different.
354 - pointer to scalar or record is just an address
355 - pointer to fixed-sized array is just an address
356 - pointer to var-sized array is address plus size
357 - pointer to implementation is a pointer as above plus a pointer
359 This makes var-sized arrays with vtables a little awkward.
361 But is there really any value in allowing implementations on
362 var sized arrays? Yes, I think that would be nice. I guess
363 they get converted to struct for a vtable pointer, with the struct
364 stored ... in the 'once' space.
366 Should it be possible to add to a struct after it is defined?
367 This sort-of makes sense for unions where extra cases could be
370 With structures we can embed old in new. Maybe that way around is
372 This new union has all the value in the old one, plus some more
374 A value from the old union can be assigned to a variable of the
375 new union, but not viceversa without range checking.
376 How does multiple embedding work with unions? With an offset?
379 Initialisers for different types:
380 Array: [ value, value, num: value, ]
381 struct: { name: value }
382 union: { tag -> name:value, name:value
385 tuple: (value, value, value) - can map to several known types.
388 functions are more awkward. If the type is given explicitly as
389 def name : type = value
390 then we don't want to repeat any of it, but if we don't have an
391 explicit type, we need to at least list parameter names.
392 def foo = (arg, arg, arg) {
397 or assume '_' is the argument list
402 Real asymmetry. What is the inverse of 'return' ?
403 import? with? receive?
404 'receive' might work with co-routines? Maybe not.
407 Another thought about structs.
408 There are 2 sorts: Those which are used for external
409 communication and must be in exact order with no padding,
410 and those which are only used internally and the compiler is free
411 to reorder as it chooses. The second sort could usefully be
412 extended later. This would allow a literate programming style
413 where elements are added to the structure at much the same time
414 as the code which uses it.
415 External communication structures could be flagged as big-endian
416 or little-endian (or host-endian). or should each field have
417 an end? Probably not.
421 I don't really like the '*' prefix operator in C, nor the need for
422 '->' after pointers where '.' is enough after a struct.
423 Pointers to arrays behave the same as the array. It would be nice
424 if pointers to all things behaved just like the things.
426 There are some exceptions though:
428 this could make the pointer point to foo, or copy the contents
429 of 'foo' into the thing pointed to by 'p'.
430 Both are valid so we need a simple syntax to differentiate.
432 Could compare the addresses of the boxes for e.g. double-lock
433 deadlock avoidance, or could compare the values themselves.
435 Does this say "the same box" or "the boxes hold the same
436 values"? Both are necessary.
438 i.e. these are the things that you can do with a pointers: assign
440 if p.foo and p[foo] and p(foo) automatically dereference, what
441 syntax do I want for the occasional explicit dereference. Or do I
442 want a syntax to inhibit auto-dereference?
444 could assign to the box rather than assign a new box. Or
446 could assign to the box, while
448 could assign a new box. But pointers to pointers become
449 problematic.... or do they. If general the languages imposed
450 whatever referencing or dereferencing is needed to match types.
451 If & is present, then it skips up one level.
453 But parameter passing should be like assignment, and the default
454 should be by-reference if possible. So assignment should be
455 by-reference. So we need
459 or decide that annotations on the right are deduced based on type,
464 p < q should normally compare the values though.
465 p == q should compare pointers
467 If p is a pointer to pointer to int, then
469 means a double dereference on the else, which I think is wrong.
471 should alway cause p to point to something new.
473 can change the thing p points to. so
475 would be needed. That's a bit ugly.
477 does that help? It shows something odd is happening.
479 At least we are familiar with that.
481 As pointers are distinct from other values (e.g. they cannot have
482 methods), they could have a different assignment operator.
483 p = value # assigns the pointer
484 p := value # copies the value into p.
486 Maybe I'm trying too hard here. 'p' is the pointer. Any operation
487 on 'p' other than comparison dereferences it. That is ugly.
489 could be pointer comparison
492 like in 'crack': http://www.mindhog.net/~mmuller/projects/crack/Manual-0.3.html
494 Maybe "p." means "dereference as many times as needed".
495 So it never receives a pointer, only a non-pointer value.
496 To change a pointer you could
497 newp = thing where newp :int* = p
499 i.e. explicitly name the type of the value being copied.
502 Thought: assignment is different from function parameter binding,
503 as the latter can never mean a 'copy' if the target is a pointer.
507 There might be a bigger picture here. 'pointers' are very
508 different values to most others. They cannot be the receiver of
509 methods, only the thing they point to can. But what about pointers
510 to arrays aka slices. Can they be the receivers of methods?
511 Yes in exactly the same way that arrays can.
512 Which is like saying that 'pointers' can receive methods, but only
513 the methods that their targets receive.
514 This is a little awkward for slices because they need to carry a
515 parameter: the size. Unless we support parameters more broadly and
516 decide that any type might have arbitrary runtime parameters.
517 So a slice becomes a hidden object with a size and a pointer.
520 What I would like is:
522 compares the pointed-to values. It is exactly a.cmp(b) < 0
524 does that same: a.cmp(b) == 0
531 My biggest problem is that the mapping "x < y => a.lt(y)" combined
532 with the fact that field selection auto-dereferences, means that
533 "a < b" compare content. So "a == b" must also compare content.
534 So let's go for 'a === b' asks 'are these as indistinguishable as
536 And in 'unsafe-pointer' mode, a <<< b can compare pointers.
538 Given that, what does "a = b" do? It really should assign
540 So remaining question is how to assign values.
541 a = .... // always changes a, not what a points to
542 a. = .... // always changes exactly what a points to
543 a.. = ....// changes the thing pointed to by what a points to.
546 This is like "swap a, b" in a way.
548 Getting the 'type' of that would be interesting. owned? borrowed?
551 Structures can be internal (no implied order or padding) or
552 external (order and alignment rules apply, endian can be
554 Unions can define a local enum, or use a predefined enum (Boolean
556 For unions, the enum might be implicitly stored or a stage
557 parameter, or it might be an explicit structure member.
558 Do we want names for these things like 'struct' in C, or do we want
559 to use 'traits' to distinguish, or something implicit?
562 In C, the enum values become global names and the variable has one
563 of those values, which are integers.
564 In Go, enum values are local to the type: type.foo type.bar.
565 In Rust, the values are global function names which can return
566 a suitably qualified union value.
568 I like namespace control. I also like enums being tied closely to
569 unions. So if an enum is actually a union, then ??
570 If 'v' is a variable, a 'e' is an enumerate value with member x,y
571 v can be compared with typename.e
573 v.e.x and v.e.y are only accessible if v is known to be e
575 or v.e could be one of these things that is a value or nothing, and
576 can be used with 'else'.
577 If there is no ambiguity, v.x is valid and can be used if v.e is
578 known. Or can be used with an implicit test of v.e
580 is implicitly "if v.x && v.x==3"
585 The compiler understands 2 types of pointers: strong and weak.
586 Weak pointers are held in the context of some strong pointer. The
587 strong pointer may be to the same object, or it maybe to some
588 container object or allocation domain which is known not to free
589 anything for the time being.
590 Weak pointers can only be stored in life-limited variables which
591 are subordinate in lifetime to the strong reference. This includes
592 local variables in a lower scope, and variables inside the
593 strong-referenced value.
595 Strong pointers can be moved, copied, or destroyed.
596 When moved, the previous storage must become invalid.
597 When copied, the storage manager much be told. This might
598 copy the object, add a refcount, or do nothing and leave it
599 up to mark/sweep garbage collection.
600 When destroyed, the storage manager again must know. It
601 might decrement the reference count, free the object, or again
604 "same object" is a fairly vague term which needs to be made
606 It could be a structure - holding two pointers, one strong and one
607 weak. If they were to the same type that would be pointless. If
608 the strong were to a container and the weak were to an object that
609 is somehow 'inside' that container, it would make sense.
610 It could also be a linked collection of structures such as a linked
611 list or a binary tree. In order for the language to be able to
612 "reason" about such a thing it would be necessary for it to "know"
613 about such a thing, so there would need to be a way to describe
614 it. There need to be rules for when two structures (including
615 arrays) are part of the 'same' object and when they are part of
616 different objects. We also need to know if these things are
617 hierarchical: whether something can be a 'sub-object'.
619 This important rule is that all weak references in an object must
620 be invalidated before or as the superior strong reference is
622 This has many interesting cases, one of which is the doubly linked
623 list. In such a list the forward link might be seen as strong, so
624 the backward link would need to be seen as weak. It is a
625 distinguished weak link as it can always be found from the object.
626 So to move the strong reference, we need to be sure all weak
627 references are gone. That doesn't mean we need to invalidate all
628 weak references in the whole object, as we know how to get to the
630 So it seems some weak references as 'findable' - probably in
631 various ways. The language needs to know if an object can
633 - no weak references (so moving strong references is trivial)
634 - findable weak references (so moving a strong reference just
635 requires focused removal), or
636 - general weak references (so moving a strong reference requires
637 invalidating all references)
639 That case of doubly linked lists is also interesting as a structure
640 can be on multiple lists. This can be managed in a couple of ways:
641 - refcount the structure so it can have multiple strong links for
643 - have one list subordinate to another so that the struct must be
644 removed from the subordinate one (or verified not present in)
645 before being removed from primary one. This is probably the
646 sort assertion that would need to be specified in the language
649 When we have a weak reference stored in a structure we need to be
650 able to explain why it is safe. And when we remove a strong
651 reference, we need to explain why we know that all dependant weak
652 references are known to be resolved.
654 Attributes of things.
655 Different fields, locals, globals etc may have different
656 attributes beyond their type, or maybe as part of their type.
657 Some of these can change during the lifetime of the name or value.
658 I need to collect them all together to try to understand them.
660 constant - a value assigned (or bound) once and never changed.
661 if it is scoped then it is bound once whenever the scope is
662 entered though possibly to a different value each time.
663 A name might get reused within a scope if it is
664 only assigned when dead. e.g.
672 here 'a' has been reused as the foo' value dies
673 after the first print (it can never be seen again).
674 This 'a' could still be 'constant'
676 There is a question here: when would 'foo' be
677 destroyed if we had destructors? It feels like it
678 should be destroyed as soon as it dies. However if
679 it holds a lock or other resource, we might want it
680 to only be destroyed when it goes out of scope.
682 stable - This is like 'constant' but applies to a value
683 rather than a name. It means the value (which
684 might be a large data structure) will not change
685 for the moment. 'stable' needs to be asserted
686 while being the sole owner of a value.
687 add-only- This too applies to a value. It says that other
688 values can be added to the main value, but not
690 This only applies to values included by dependant
692 readonly- This attribute of a borrowed reference means that
693 it will only be used to read. The value might
694 change, but this reference won't be used to change
696 A value can only become stable while all borrowed
697 references are readonly.
698 mutable - This is maybe just the absence of the above.
699 A mutable field can be changed as long as we have a
700 mutable pointer to the structure.
701 A mutable local variable can be updated when not
702 dead and can possibly have it's address taken.
704 owned - This is a strong reference - when it dies the
706 Maybe we can have 'indirect' references through
707 e.g. a refcounter. Then when the ref dies, the
708 refcounter is decremented. And if it dies, the
710 borrowed - A weak reference. Must be in the context of some
711 strong (owned) reference to something. The strong
712 reference might be to the same thing, or it might
713 be to a container object which is add-only
714 dependant- A reference in a structure cannot be borrowed but
715 it can be 'dependant'. This means that the
716 referred value must exist somewhere else.... need
721 I think we will need user-defined attributes. These record
722 important aspects of the state of a value. These can be
723 asserted by unsafe code, and required by functions etc and the
724 compiler can check that requirements are always met.
725 So a list_head might have an 'empty' flag which is cleared by
726 'list_add' and has a known value after the "list_empty" test.
727 The 'stable' flag in one data structure might be implied by
728 the 'held' flag in a lock.
730 Integers could have a 'non-zero' flag which allows them to be
731 the divisor. They might also have an upper bound - like
733 I sometimes "know" that a number is a power of two, which is
736 A similar flag would apply to pointers which may, or may not, be NULL.
739 I don't like typedef in C. It hides stuff.
740 I see a type name and I don't immediately know if it is a
741 pointer or a structure.
742 Structures need names as do enums and unions.
743 Pointers don't need names.
744 Scalars (uint32 etc) have names and don't need more, unless
745 some sort of 'unit' concert were included.
746 What about arrays? That are like pointers so maybe not.
747 But array contains a size. It is like a record in that it
748 combines to things - member type and size. Is that enough to
749 justify a name? I suspect not, so in the first instance, only
750 things with names can have names.
754 Parametric polymorphism is quite important.
755 Various types can be parameterised by other types. This
756 sets the types of fields in structs and members of arrays etc.
757 Also want values as parameters for the size of arrays.
758 And internal parameterisation so the size of an array can match
759 some int field, or an enum can select an option.
761 Functions/methods are very important here. The type
762 parameters may be implied by the actuals. This then needs
763 to imply the types of the results.
764 Each result can be an intersection?? Or maybe it can be
765 a type-function of the parameters - but do we allow
768 A parametric type is like a function. It needs a name and
770 type foo<a:comparable,b:bucket<a>> = {
774 A method for a parametric type is then a parameteric-typed
776 interface foo<a,b> = {
777 fn foodo(first:a, second:a -> bool)
779 methods might want an extra type parameter, 'self'.
780 Using this would require the compiler to know that two objects
781 didn't just conform to the same interfaces, but were of the
782 same type. Shouldn't be too hard.
784 But we also want a function to be parameterised.
785 choose<a,b>(which:Bool, first:a, second:b -> a|b)
787 choose<a:A, b:A>(which:Bool, first:a, second:b ->
790 When this is called, the actual parameters have a list of types,
791 including any interfaces. Each use of a type-parm intersects
792 possible types until we have minimum set. Then results is
793 a possible list of types... needs to be the intersections..
797 Interfaces list methods that can be called on a value.
798 Often we want an interface with a single method. This is
799 particularly useful for the methods that operators map to,
800 but there are lots of others. Constructor, destructor etc
802 Go effectively does this as you only need to declare the
803 methods that you will use - the effective type is the sum
805 One advantage is you don't need two names for everything
806 Comparable:compare, addable:add, destroyable:destroy.
808 But it can be good documentation to collect related
809 methods together into an 'interface'. It can also provide
812 method<A> foo(self:A, b:int -> bob:rune);
814 This declares a method which can then be realised for
815 different concrete types.
820 declares the same but "bar.foo", not "foo".
821 Maybe syntax should be more similar.
822 interface foo(self:Self: b:int -> bob:rune); ??
824 But now we have a preference for polluting the global
826 If I want a method that is just for my type, it should not
827 require an interface. I guess:
828 method mytype.mymethod(thing, thing -> thong)
830 method foo.bar<mytype>(...)
831 is an interface method.
832 impl foo for mytype {
835 can save some typing..
837 Every type effectively defines an interface which applies only
838 to itself using its name.
840 Syntax is still a bit ugly. I don't want to assume 'self'
841 is the parameter name, and I don't want it to appear as
842 first arg. So where do I put it?
844 func (name Type) methodname (parameters) (result)
845 Which works because all methodnames are global ... sort of.
846 I want two methodname namespaces:
848 - inherited from an interface. I could use
850 func (name Type) .compare(other Type)
852 interface .compare<T>(other T)
854 Another thought. I sometimes want an object to respond to
855 the same interface in two different ways. A bit like being on two lists.
856 I might have a table with byname and bynum lookup methods which both
858 mytable.byname['fred'] = entry
859 print mytable.byid[42].name
860 Sometimes ".foo" is an attribute, sometimes a view...
862 So an interface can be bound to the object, or to a name in the object.
864 Supposing that I thought this was a good idea, how would it
866 The general idea is you keep track of what memory is allocated
867 in what sized blocks (as you have to for 'free' anyway) and
868 keep one spare bit for each piece of memory.
871 - clear all those bits.
872 - scan all allocated memory looking for anything that
873 might be a pointer into the target area, and
874 setting the relevant bit.
875 - when you set a bit, you scan that memory as well
876 - when finished, any memory which doesn't have the bit
877 set is ipso-facto free.
879 Variations involve breaking the heap up into sections,
880 possible with different age properties, and only cleaning
881 one section at a time. That might mean keeping a record of
882 all pointers that cross a boundary from one section to
885 Another variation involves copying boxes to make free space
886 more compact. This requires finding all inward pointers.
889 It is really hard to see this as a valuable thing. The
890 programmer should know what is going on in general, and the
891 language should be able to track and enforce proper handling.
893 It seems useful for language which encourage lots of
894 short-lived objects, e.g. for string and list manipulations.
895 The language should be able to track these, free them when
896 no-longer needed, and copy them to a more stable area when
897 they are made more permanent.
899 I think I'm going to have to write some code and see what
904 When are two types "the same" ?
905 Some languages use "same name". Others use "same structure".
906 Both have their appeal.
907 Go uses structure unless both types have names - then it requires
909 I wonder if "int" is a name. Probably it is.
910 I'm cautious of structural equivalence of structures. The 'names'
911 of the fields are important and I don't think two fields are the
912 same just because they are spelt the same.
913 'arrays' don't have names, so structure works there.
914 Parameterised types should match structurally too - array is just a
915 parameterised type. Ditto for pointers.
916 Two structures declared separately are different though, even if
917 the names and types are the same.
918 What about functions: Are the parameter/return names part of the
919 type, or are they merely internal name bindings? They look like
920 structures but aren't usually used as structured. Yet when arg
921 lists are long it could make code more readable to tag some params.
922 I guess we fall back on "unnamed things match named things".
923 If both function declarations have names, they must be the same
924 declaration. If either doesn't they match on structure.
926 A manifest value doesn't have a type. So if I say:
928 a = { foo=1, bar=2 };
929 then there is no type. That value could be of any type for which
930 it is a match. If the type of 'a' is known, it must match. If
931 not, the binding is illegal. There is some finite list of types
932 which this could match and we effectively try them all until one is
933 type-legal, or we run out.
937 casting numbers to larger types is probably safe??
938 What about casting a Foo to a union which has Foo as one branch.
939 This can be a compile-time rewritten to a simple union creation.
941 Expressions or statements?
942 i.e. do I want to be able to say:
943 a = if x then b else c fi
944 so a becomes 'b' or 'c'.
945 I tend to think not. if/then/else/fi is a statement and making
946 it an expression as well can be confusing and make it hard for the
947 compiler to determine what your mistake is.
948 This also applies to 'switch/case' but not to while/for.
952 but that doesn't generalise to switch very well. Does that matter?
956 which is quite different to the reader - both human and machine.
965 Should this just be syntactic sugar over if/then/else.
966 I don't like the idea of typecase or Rust:match as it
967 provides functionality that is 'only' in the case statement,
968 not stand alone. To that extent I like Go:type assertion as
971 But what about fall-through. It is implicit in C and explicit
972 in Go with yukky rules.
973 Why do we find that we need effective 'gotos' there, but not
975 Maybe it is like dropping an else.
988 No, it isn't quite like that.
990 switch (num_bits) { /* fall through... */
991 case 8: if (byte & 128) num++;
992 case 7: if (byte & 64) num++;
993 case 6: if (byte & 32) num++;
994 case 5: if (byte & 16) num++;
995 case 4: if (byte & 8) num++;
996 case 3: if (byte & 4) num++;
997 case 2: if (byte & 2) num++;
998 case 1: if (byte & 1) num++;
1002 readdir uses it: switch is on 'state' and we might handle
1003 several consecutive states.
1005 case TypeSegmentMap:
1007 /* These only get merged while in a checkpoint. */
1008 if (fs->qphase == fs->phase)
1010 printk("Suitably in-phase\n");
1015 one case is a special-case of another. The enum
1016 doesn't show full structure. Really want to repeat
1017 the TypeSegmentMap an TypeQuota with TypeFile etc.
1021 watermark += RELEASE_RESERVED * fs->max_segment;
1024 watermark += ACCOUNT_RESERVED * fs->max_segment;
1028 /* Definitely no water mark here. */
1032 Again, there is a progression of values, like the first case.
1035 1/ state-machine and we handle a state and move on
1036 2/ values have a specific/general relationship.
1037 and probably other less general things.
1040 Maybe "switch" chooses just one branch, while "select" chooses
1041 all relevant branches? Or something
1044 Break / continue / return
1045 Do these have labels? I almost never need labels. But I
1046 often use "continue" inside a 'switch' to break out and
1047 continue the surrounding loop. A 'break' from nested 'for'
1050 But labels are ugly - how are they indented? Or are they part
1051 of the 'loop' syntax?
1052 Sometimes I want to break out of an 'if' so I make it a
1053 'while', which looks odd.
1054 But breaking out of an 'if' would mean that to conditionally
1055 break a while would need a label. But break from 'if' is
1056 also conditional, so maybe not needed.... gets ugly.
1058 The label could be implicit. 'break if' ???
1060 This is really a controlled 'goto'. We can go to:
1061 - end of block. 'continue' for while, 'break' for switch
1062 - out of block. 'break' for while', 'break' for switch
1064 What if we tagged blocks as being 'breakable'? Too
1067 What about "break variable" which leave the scope of the
1069 This would be hard to break out of a while loop.
1070 Or "next i" would continue a for loop
1071 Or any variable bound once in the loop.
1072 then "last i" could break out. or 'done' or 'have' or return!
1079 Anonymous/blank variable.
1080 Go uses '_' as a 'blank' identifier. It can be assigned or bound
1081 to anything which evaluated the thing but ignores it. This makes
1083 For 'switch' it might be nice to have a similar thing which holds
1084 the value which is being switched on.
1085 If the name is used, the value must be 'true', else it must match
1086 the value. This is a rather odd semantic. And what is the scope
1087 of the name? Including blocks within switch would be useful, but
1088 then nested switches redefine a name, which is less good.
1090 So maybe it should be
1092 case _1 == 3: whatever
1093 case _1 == 4: otherthing
1097 { curly brackets are common, but a bit ugly }
1098 BEING begin/end is rather verbose END
1099 COMMENT ending with reverse-spelled open is cute but silly TNEMMOC
1101 indenting stuff after a colon is really cool
1102 and works rather well I think.
1103 It doesn't easily allow do { } while(); constructs, though
1109 There is no harm in allowing a '{' instead of the ':', and then
1110 ending when a '}' is found.
1113 Indenting.... how far can I push this?
1114 - two consecutive non-empty lines must have compatible indents
1115 meaning the strings of leading space/tabs are either identical,
1116 or one is a prefix of the other. Anything else is an error.
1118 If the second is longer, then second line continues the first.
1119 If they are the same, the first line is terminated if that make
1121 If the second is shorter it must match an earlier indent and
1122 all parse back to there is terminated.
1123 This is all much like python
1125 - Can we require that smaller units much be indented correctly.
1126 i.e. if a nonterminal starts half way into a line and wraps,
1127 then the next line must indent the same depth:
1130 for a fairly ugly example. i.e. the cfactor is still open, so
1131 the next line must be indented to the start of the cfactor.
1132 This isn't needed for syntax but avoids confusing looking code.
1133 Need to be careful to allow opening bracket at end of line.
1134 i.e. if the only part of the nonterminal is a literal, then that
1135 non-terminal doesn't count.
1137 A "pragma" or "safety feature" would select among:
1139 - indent for termination.
1142 Go uses newlines without worrying about indents. I don't think I
1143 like this. Indents are important sometimes and newlines aren't
1146 So yes: options. which specify tab size for 'check all indents'.
1149 I seem to be using the humble ':' too much:
1150 1/ The conditional expression a?b:c
1151 2/ The indented block opener
1154 3/ path separator for modules
1156 4/ To introduce a type:
1157 var foo : type = value;
1161 if a == b ? io:printf() : print='yes':
1162 Then I could get confused.
1163 The first ':' can't be the block opener because the ?: isn't
1165 But the second ':' could introduce the global module :print
1166 So ?: doesn't conflict with if:, but we can't use ':' for modules.
1167 So either use '::' like Rust and C++, or '.' like other namespaces.
1168 :: is very noisy. Maybe that is good?
1169 . is very quite and make the lhs look like a value, not a module.
1170 Does that matter? Different things should look different.
1172 Large scale structure:
1173 Key elements of a program are:
1175 these define types, methods, functions,
1176 and give names to modules .. and other things
1178 These always live in functions/methods I think.
1180 These mostly live in statements or some declarations.
1182 So a program should be all declarations. But that isn't much fun
1183 for "scripts". Or would scripts look better if they were forced to
1184 keep code together with a low-syntax 'main' function.
1189 declarations start with the thing being declared:
1190 func type mod const var impl
1191 each is followed by a name and a value - maybe.
1192 Not sure where imports fit in
1193 import name-to-import 'path-to-module'
1197 Assignments/bindings.
1198 There are 3 things one might want to say:
1199 1/ This new name is bound to this value for the rest of
1201 2/ This new name is bound to this value now, but it
1202 might get changed later in this scope
1203 3/ This name is already present in this scope and I'm
1204 now changing the value
1205 We could use 'var' 'let' or 'set' or 'const' or 'let mut' or 'with'
1206 to introduce one or another.
1207 We could use "=" vs ":=" for different assignments.
1208 We could not care about different between 2 and 3
1209 We could use a sigil to differentiate 'new' from 'old'.
1210 In a tuple "a,b,c = 1,2,3" we could conceivably have all 3.
1211 We could add :type to name to introduce new name.
1213 a = b // only allowed if 'a' already defined and mutable
1214 a: = b // defines new 'a'
1216 DON'T KNOW about constants....
1218 We could also distinguish "this is a default value" and "this
1219 is an initial value" where the former can be replaced without
1220 use. Probably not useful though.
1222 Least typing should be "new constant name". Next should be
1223 "reused mutable name". Most typing for "new mutable name".
1225 += -= etc mean pre-existing mutable. So := should too?
1227 I really don't like the
1229 thing. Having different bindings has got to be ugly, whether
1230 we make them so they are visually ugly, or assume them so
1231 they are conceptually ugly.
1232 a:=, b+=, c= function()
1233 a:, b+, c = function()
1237 need to know if a[xx] has been assigned yet?
1238 If we don't want to pre-initialise arrays, then probably yes.
1239 For 'hash' arrays, there might be requirements about checking
1242 I like: ":=" means "replace current value".
1243 So '=' means 'assign initial value. But how do we indicate
1245 Could require 'var' in front?
1247 var (a, b, c) = function()
1248 (var a), b+, c: = function()
1249 a:, b+, var c = function()
1252 New thought: The matching idea from Rust could make
1253 interesting assignment.
1255 could always be made true by binding 'fred' to 20.
1256 if action(state, NEWLINE) = REDUCE(production)
1257 will fail if the action is SHIFT, or bind production
1258 if the action is a REDUCE.
1259 if (a,b,c) = (1,2,3)
1260 if a and b are already bound but c is not, this tests
1261 a and b and if they match 1 and 2, binds c to 3 and continues.
1262 Probably need a sigil on the 'c' or 'production'. '?'
1264 if action(state, NEWLINE) = REDUCE(production?)
1266 REDUCE(production?) :
1270 This 'binding' is very different to assigning to an array
1271 element or a struct field.
1274 There could be quite a lot of different assignment-like
1277 - 'global' constants which have a fixed value, at least for
1278 the invocation of a program
1279 - 'fields', both in structures and stack frames, which can be
1280 updated and have their address taken, so are mutable
1281 - 'constants' which are assigned just once in their scope -
1282 though possibly in multiple code locations due to branches
1283 if foo() { a = 1} else {a = 2}
1284 'a' could be a constant
1285 - 'reused constants' - one name which is used repeatedly but
1286 each time it is really a different variable. There are
1287 clear points before each 'assignment' were it can be
1288 declared 'invalid'. e.g. a 'status' return value that is
1289 only used for immediate operations.
1290 These are never assigned while 'live'. So
1291 a = 1; if b {a = 2}; print a
1292 is not an option as the value from 'a=1' could be printed
1293 and so is live during the if statement.
1294 These can even be 'updated'.
1295 a = 1; a = 2*a; print a;
1296 Semantics are identical to
1297 a = 1; A = 2*a; print A;
1298 so it is a reused-constant.
1299 A for-loop control valuable should be a reused constant.
1300 - 'variable'. This is a value that can be changed in
1301 different ways on different paths. It is conceptually one
1302 value that changes. Possibly an accumulator
1305 - a name (local variable)
1306 - a value.name (field)
1307 - a value[expression] (array slot)
1308 Can we also assign to a function call?
1310 That would mean that 'foo' returns an lvalue. That is not
1313 But I'm still not happy with how lvalues work.
1315 Does that make 'a' point to the thing or does it copy the
1317 I think it that form it must make 'a' the lvalue.
1319 would copy the content.
1320 If 'a' and 'b' are pointers to pointers to integers, then
1324 would all do different things in C. Is there any other way to
1325 achieve those same things with less syntax?
1326 Type enforcement can make the '*' on the right redundant.
1328 How do I move a reference, as opposed to borrow a reference?
1329 Or copy an object? In Rust it depends one the type. If there
1330 are mutable fields or owning pointers, it cannot be copied so
1331 it is moved. '&' is used to borrow a reference.
1332 I'm not sold on the type-dependant answer. I think the syntax
1333 should make is obvious.
1335 I think I want syntax for "take the value and replace with
1338 then we could swap with
1340 But that is a comparison! Could use "<-" but it doesn't look
1341 like '='. a =< b ?? Then I could pass "<b" to a function
1342 to move out the value.
1344 "move a value", "copy a value", "borrow a reference to a value"
1347 "receive a .... what?
1349 Pointers really are quite special so we need to be careful how
1350 they move. Given the specialness, what can we say about
1351 pointers to pointers? I think they must always be borrowed.
1355 a,b,c = function(args)
1356 is a bit ugly because we might want to do different things
1357 with the returned values, and we might have trouble
1358 remembering their names. However I don't mind
1360 as it is just a short-hand: except for 'swap'.
1361 So we could do something totally different:
1363 a,b,c = r.foo, r.bar, r.baz
1364 where foo, bar, baz are the names of the return values in the
1366 Usually you would just use the r.foo etc as needed.
1367 Alternately something like:
1368 import function(args):
1373 The requirement of no shadowing mean we have to be careful
1374 with names, and probably allow:
1375 import function(args) as result, errors
1376 The syntax is a bit horrible but I'm warm to the concept.
1377 It is like the "with" statement in Pascal.
1381 Pattern matching seems cool, particularly for variant records,
1382 but also for popping things off arrays:
1383 while ra = [first, rest]:
1386 which requires the pattern to match a single-element list.
1387 while first = ra.pop():
1389 A pattern match seems to be a short hand for multiple tests
1390 and some assignments:
1391 if ra = Cons(a, Cons(b, _))
1393 if ra.type == Cons && ra.next.type == Cons:
1394 a, b = ra.first, ra.next.first
1395 We could achieve the same brevity with
1396 if a,b ~= ra.first, ra.next.first
1397 which is expected to fail if the runtime types are wrong.
1402 Matching is a lot like regexp matching (surprised?)
1403 But the language knows about it (like perl and regexp).
1404 However variables have to match whatever is there, inside the
1405 given context. They cannot specify what to match.
1406 You can have "Cons(a,b) where a < b", but you cannot do
1407 "string matches a[]+b[]" where a in hash"
1408 i.e the common match syntax cannot let a variable match a
1409 range from components from which a choice must be made.
1413 Constants are numbers or strings or declared constant words.
1414 Numbers and strings can be tagged with a single letter suffix.
1415 A type may 'claim' certain suffixes, so "complex" could claim
1416 'i' on numbers, and "regexp" could claim "r" on strings.
1417 They can also claim unadorned constants, but then only one
1418 non-internal type can be type-correct.
1420 'e' and 'p' are used for exponent and power and aren't
1421 allowed as number suffixes.
1423 Types can provide constant functions which are evaluated
1424 entirely at compile time.
1426 "Ceylon" allows iso suffixes: k M G T P for big,
1427 m u n p f for small. It's cute but I think I prefer 'eN' suffixes:
1428 k == e3 M == e6 etc.
1430 If we allow a constant to have a suffix like 'r' for strings
1431 or j, i, th for complex numbers, how does a type specify that
1432 is can use that? It needs to be an initialisation function.
1433 "I can be a string if 'r'" "I can be a number if 'th'".
1435 This is an implementation for ''r or 0th
1436 implement new<regexp> for ''r {
1437 typify(n:''r -> regexp)
1440 The interface 'n' is a parameterised interface
1441 interface new<result> {
1442 typify(n:self -> r:result)
1445 These need to be 'constant' or 'referentially transparent'
1446 or 'pure' functions that can be evaluated at compile time.
1449 while loops are easy-ish. What about list loop and more
1450 generic things. 'foreach' etc.
1451 Co-routines can have a function which returns successive
1452 values, via 'yield' or lazy evaluation.
1453 Or a closure can be passed to a looping function.
1454 Or a 'iteration object' can be returned on which we call 'next'
1456 Loops: (repeating myself?)
1457 So many possibilities
1458 for loop while repeat/until foreach etc etc
1459 Really like exit in the middle
1466 For loops like to initialise and increment:
1472 foreach loops need a variable(set) and a list of values, which
1473 might be generated lazily.
1474 Alternate to lazy generation, could pass the body as a closure
1475 to an 'each' function like Rust does.
1476 Lazy generation could involve a closure being returned from a
1477 function. The closure would need some state, so the 'return'
1478 couldn't discard it completely. These could be 'inout'
1481 How to address the need for "search and perform special case if
1483 for each possibility
1487 Do this if loop completed normally
1489 Do this if loop aborted.
1492 for each possibility:
1496 handle not-found case.
1498 http://www.scribd.com/doc/38873257/Knuth-1974-Structured-Programming-With-Go-to-Statements
1499 talks about 'events'
1501 loop until event1, event2, ...
1508 This is a lot like exceptions. However 'until' has a different feel to it than 'try'.
1509 'use' is better than 'raise'.
1511 I like this as a generisation of switch:
1520 All 'use' values must be either unique identifiers or
1521 expressions with values that have a common type.
1522 If unknown identifiers, then they comprrise values in a
1524 The labels in the 'then' part are value of that type.
1525 A value *can* appear twice with some annotation(?) in which
1526 case all relevant branches are taken.
1527 Maybe the notation is 'next' at the end of the statement
1529 If there is no 'next', then the labels cannot appear again.
1530 Actually we allow 'next' and 'last' if and only if at least
1531 one label occurs again. If there is any next/last, then there
1532 must be one at the end of the block.
1536 Go has nice feature where you can perform an action before an 'if'
1538 if a = something(); a > 3 { use a }
1540 This is particularly useful in elseif constructs
1543 else if b = somethingelse(); b < 5 { use b }
1546 without the if action, we need an extra level of indenting.
1549 b = something else();
1553 But only a simple statement is allowed.
1556 with: a = something()
1559 else with b = somethingelse():
1563 Looks a bit clumsy and probably ambiguous. Maybe:
1569 no, that is much worse.
1571 We can make "with stuff" unambiguous by requiring a second body
1573 commands that can start new scope
1577 Another idea. We don't allow assignments in expressions, but
1578 we allow a "where" clause at the end which contains a list of
1579 bindings. The expressions are only evaluated where needed:
1580 if a < 5 && b > 6 where b := n/2:
1582 will not divide n by 2 if a >= 5.
1583 This means that list can only be bindings, not general statements.
1584 If they are new bindings, how are they scoped? At least to the
1585 bodies of the if or while and the rest of the expression.
1586 If a binding were optional, it couldn't carry through:
1587 if a < 5 || b > 6 where b := n/2:
1588 if a<5, b would not be bound, but it would be bound in the 'else'.
1589 This lazy evaluation could work with a prefixed "with" clause too.
1590 Would assignments be OK? I think they could be confusing.
1591 Lazy evaluated binding could allow two different paths which want
1592 to evaluate at different times to just use one piece of source code.
1601 function block with 'return'
1605 Maybe 'return' looks a bit heavy and a synonym might be better.
1606 yield? use? done? I like 'use' for now.
1607 The 'if' is one big scope, so binding created in the function
1608 block remain in the 'then' statements.
1619 How would that look with {}?
1631 The same syntax could allow conditional expressions:
1636 though that is rather verbose and part of the appear of ?: is brevity.
1639 If do want statements in the condtion of 'if' and 'while', optionally.
1648 An expression block may contain "use" directives: "use True".
1649 They act a bit like a "return" - we have the answer.
1650 For while, it means that we can avoid "do {} while" as a separate case
1656 We may also be able to avoid 'break' in some cases with "use False"
1657 at arbitrary places. 'continue' isn't quite so easy, though a whileswtch
1660 In a 'switch' we can "use" values other than True or False.
1661 If a name is used which isn't defined, then it is assumed to be a new
1662 enum, so all 'use' statements must use undefined names, and all
1663 cases much use the same set of names.
1667 case do_it: print "Hello"
1668 case skip_it: abort()
1670 case names can appear multiple times with mcase(?)
1673 case 8: if byte & 128 {num += 1}
1674 case 7, 8: if byte & 64 {num += 1}
1675 case 6..8: if byte & 32 {num += 1}
1676 case 5..8: if byte & 16 {num += 1}
1677 case 4..8: if byte & 8 {num += 1}
1678 case 3..8: if byte & 4 {num += 1}
1679 case 2..8: if byte & 2 {num += 1}
1680 case 1..8: if byte & 1 {num += 1}
1684 "last i" exits the enclosing loop that modifies 'i'
1697 for assignment ; condition ; statement:
1700 foreach varlist = object
1702 So a general while statement is
1703 for initialisation (can introduce variables)
1709 First do 'for', then 'condtion'.
1710 if 'condition' is true, do 'do' and 'then' and try cond again
1711 if 'condition' ever false, do 'else' and stop.
1714 'if' and 'switch' are much the same
1715 'then' and 'case true' ditto
1716 'else' and 'case false' as well
1717 'while' case be followed be cases as well as by 'else'.
1719 With a 'for' I want a close to run if we didn't 'break'.
1720 I think 'else' does that.
1722 Do I want to use "else" for "default" in case?
1723 "then" happens if "use true" or no "use" value
1724 "case value" happens if "use value"
1725 "else" happens if "use false" or "use value" with no match.
1729 Only values with Interface 'printable' can be printed.
1730 The function is passed a width, a precision, and stuff.
1731 So string can just have '%v' for a value, or '%20l' for a left
1733 But what about '*' parameters - and checking there are the right
1735 Could allow a function to have a compile-time preliminary section
1736 which can throw errors and possibly infer types.
1737 This probably requires there to be compile-time functions which
1738 run fully at compile time. Though any functions that can, should
1739 unless tagged run-time-only.
1740 Function must be able to either flag a compile time errors.
1741 At runtime this is ignored, so a runtime error or coping strategy
1744 So: function runs until non-constant is needed, or heap change etc.
1745 If no compile error triggered, all runs again at run time.
1746 But for libraries to work, the compile-time part must be easily
1747 available in the library without the need to rebuild the object code.
1749 This doesn't really answer the problem..
1752 print "%v", foo.fmt(20,0,'l')
1754 It would be nice to have curried functions here so that the 'fmt'
1755 function could take a stream from which it could get a buffer to fill.
1756 Or foo.fmt could return a function which acts on a stream ... that might be
1759 Exceptions and errors.
1760 Errors need to be easy to catch, but even that isn't really enough.
1761 They need to be hard to not-catch.
1762 So any possible error must be caught or explicitly passed over.
1763 That suggests that the type of a method should include the list of
1764 exceptions that can be returned.
1766 "try" is really pointless. An 'except' clause can simply be
1767 available to every statement. Maybe spell it 'catch'.
1772 catch file_not_found:
1781 } catch file_not_found {
1785 A 'catch' can occur anywhere. It will catch any unhandled errors
1786 that happen before the location. So:
1788 catch file_not_found:
1793 One thing the 'try' does provide is a synch point. in 'catch' we
1794 know that everything upto 'try' succeeded, so names bound there are
1795 certain to be bound.
1796 Without 'try' we only know that the current scope was entered, and
1797 that any code since the last equivalent catch which could not have
1798 thrown the error must have been executed.
1799 This is probably enough providing that 'equivalent catch' is well
1801 At any point in the code there is a set of possible exceptions.
1802 A 'catch' subtracts from the set and any op that can raise adds to
1804 Within a 'catch' we know that any that happened before these
1805 exceptions last became possible has happened.
1807 This requires that exceptions can live in a set. Maybe there is a
1808 big discriminated union of exceptions? Or an hierarchy of them?
1809 I think an inheritance tree .. though that is hard to merge. i.e
1810 if two modules each declare some new exceptions ... not a big deal.
1811 We know the type of the exception set for each imported module.
1812 But wait: If I import two modules and want to pass up exceptions
1813 inherited from either, what is my exception type? I either need to
1814 offset enums when inheriting them, which is messy, or have
1815 globally unique enum values, which are like type names.
1819 catch exception_name(parameters):
1820 that only applies to previous operations that could return an
1821 exception with that value
1824 Do I want an 'else' clause for 'catch' or should 'catch' clause
1825 break/continue/return to get past the real work-to-do.
1827 Can destructors throw exceptions?
1828 How do exceptions work in constructors? There needs to be a point
1829 where values transition from 'local' to 'in object' much like
1830 the 'return' in a function.
1833 Should break/continue/return be defined as exceptions? The
1834 (implicit) label they go to is the expression 'type'
1841 'scope' applies to various things:
1842 - how to have narrow scope for bindings in block. Just
1843 having declarations at the top which go out-of-scope at the
1844 bottom is too restrictive
1845 - which names in a module are visible outside that module.
1846 I think they must all be visible within the module, where that
1848 - how to import names from modules. "import xx from yy" is nice,
1849 "import * from yy" is good but risky if 'yy' changes, so maybe
1850 "import v1* from yy" which means "all names listed in v1. Or
1851 "import * from yy.v1".
1852 - what is the scope for 'finally' or 'later', 'defer', 'after' ?
1864 The introduce bindings but they are either for a single statement
1865 or for a block. We can keep limit indents a bit by combining
1866 an if-list or loop with the 'with'. I wonder if that is enough.
1868 Having completely free scoping would require
1870 anywhere, and "unlet a" anywhere too. I think that is probably too
1874 For extra-module visibility there are options:
1875 - 'public' tag for all public names
1876 - 'private' tag for all non-public names
1877 - Some convention with capitals or underscores to direct which
1878 is which, but with UTF I don't trust capitals, and I hate
1879 imposing underscores except for "don't use this" cases.
1880 - 'export' lists (or 'hide' lists)
1881 'export' lists would be good for wild-card imports.
1882 - external/internal section of the module
1883 Would need section in each 'struct' too.
1885 I think top-level symbols that might be imported should be
1886 explictly exported, and exported to a list if a wild-card import
1888 Names in a struct which should be hidden can be marked 'private'
1889 as hiding them is less critical.
1891 All symbols should be available even without import, using e.g.
1893 This allows instant access in simple scripts without pre-planning.
1894 If 'module' is imported, then just "module:symbol" is sufficient.
1895 Possible ":module" searches current directory, then system path,
1896 while "::module" only searches system path.
1901 I think 'after' scope should be tied to a name so when the name
1902 goes out of scope, the action triggers.
1906 though most such actions are to free memory - which must be
1907 automagic, or to release resources (locks, files, etc) which
1908 should be the destructor of some value.
1909 i.e. 'open' returns an 'owned' value so it is auto-closed when
1912 'name' could be a loop label. That might be interesting.
1914 But should 'after' file when the value changes, or when the name
1915 disappears. I think it should be when the value changes where
1917 Maybe it only makes sense on 'owned' values?
1922 Tuples are present as the parameters to a "function". But where
1924 Tuples are usually (a,b,c) but can be useful as a:b:c. This is
1925 briefer. As long a 'tuple' isn't a first class type,
1926 "a" can be a 1-element tuple or an expression, depending on
1928 This is useful for a '%' function for string conversion.
1929 The args to '%' are tuples where first value is sent 'stringify'
1930 with remaining bits. so 3:'x' will print in hex ... bit ugly
1932 Maybe not needed then.
1934 mapping between tuples and structures is probably useful.
1935 Assigning tuple to tuple with matching
1936 return a tuple from a function .. well, return a struct but allow
1937 matching to a tuple of names.
1938 Tuple could equally be an array if length matches.
1940 virgil maps all tuples to lists of values, so nesting disappears.
1941 This is pleasingly simple. It means that you cannot have tuples
1942 as individual arguments or return values but if you cannot query
1943 the size (which would make them arrays) that doesn't matter.
1945 A tuple can be assigned (bound) to an array if the size is
1946 flexible, but binding contents of an array to a tuple is only
1947 possible if the array size is known at compile time.
1948 So the type of printf could (almost) be
1949 printf(string, printable[])
1950 and it could still be called as
1951 printf("I have %d and %d", foo, bar);
1956 as a use of tuples. It expands to a disjunction of separate tests.
1959 Expressions with side effects.
1960 "functions" will often have side effects, but what about
1965 They are problematic because the interact badly with
1966 order-of-operation rules.
1967 They are useful because they allow brevity.
1968 if x && (a += 5) < 7 ...
1973 order doesn't apply immediately after && because && is a
1974 serialisation point.
1975 if with a=5: a < 6 and then with b=6:
1983 possibly with punctuation?
1984 Do we want annotations on the name:
1988 i.e. can we embed an identifier inside the type
1989 struct foo *(*ID)();
1991 I think the C experiment shows that to be a failure. You still need
1992 the types without the identifiers, so it is best to always have them
1994 i.e. a type should always be identified by a set phrase.
1996 var name: phrase = value
2001 http://beust.com/weblog/2011/08/18/a-gentle-introduction-to-dependent-types/
2002 suggests they only work for immutable values and I don't doubt
2003 they are easier there. With mutable values we would need to allow
2004 the type of a name to change ... or be dependant on some variable.
2005 So if "array of N integers" is the type of A, then
2007 doesn't change the type. i.e. if the parameter is variable, the
2008 value can be mutable.
2009 So this boils down to describing invariants which the compiler will
2010 validate for us. i.e. a proof assistant.
2012 An object can be in a number of different stages. Some of these
2013 happen during initialisation, but others can be just part of the
2014 overall life-cycle of the object. During different stages,
2015 different properties can persist. For example an object may be
2016 immutable for a certain well defined period of 'time'. If the type
2017 system can know of this, then any references that only exist during
2018 that time can be known to be immutable, while references that last
2021 There is scope for more than 'immutable' and 'mutable'.
2022 e.g. an object might be in a 'growth' state, where new components
2023 are added, but none are removed.
2024 The set of objects in a container be unchanging, while the objects
2025 themselves may change.
2026 At some stage, the set of contained objects might transition from
2027 one type to another type - or from one stage to another stage.
2028 For the language to be able to affirm this it would need to know
2029 that some operation visits all member elements of some type/stage.
2030 This might need to be (unsafely) affirm by the programmer rather
2031 than mathematically proven by the compiler.
2034 So there are two ideas here. Once is more explicit statements
2035 about life lines. An object can identifier several 'stages'
2036 which can be transitioned by various operations.
2037 Each component can have different type properties in different
2039 The type system can restrict certain methods to certain stages.
2040 The compiler must *know* the stage of a pointer. It is static
2041 info store in the lifeline analysis. When an object is known to
2042 'own' certain borrowed pointers, the lifetime of those borrowed
2043 pointers is affected by changes in the object's stage.
2046 - mutability: shallow or deep. Addition or subtraction or both or neither.
2048 - NULL, nullable, or non-null for pointers
2049 - size of arrays as related to other values - variable or
2051 - 'once' or counted or collected for some pointers.
2053 Some stages are exported, some are internal only.
2055 It might be helpful to have different names for different aspects
2056 of type variability.
2058 Parameterised types: The type depends on some type parameter
2059 Dependent types: the type depends on some nearby value
2060 Temporal types: the type depends on a state which changes over
2061 time. Such a state might be represented in some variable,
2062 but it shouldn't need to be
2063 Linear types: The number of strong references is dependent on
2064 some variable or age thing. So this is an aspect of a type
2067 These need to be specified and then enforced.
2068 Parameterised types make sense where-ever a type has multiple
2070 This includes a struct with multiple elements and a function with
2071 arguments and returns.
2075 func max<f:comparable>(a,b:f):f
2077 Type checking this should be fairly easy. Implementing would be
2078 easy if f was always a pointer...
2080 Dependent types work in a structure or a function call.
2081 We probably want arbitrary computations on values(?).
2082 If changing the value means changing the type we need types of
2083 different precision e.g. mutable and immutable.
2084 Optional and non-optional.
2085 Or we need multi-assignment to change the value and the typed name
2087 Or we can allow that the type is flexible between uses.
2088 So the type assertions are only verified when the type is used.
2089 So I can change inter-dependent fields in any order and it will
2092 Temporal types might be tricky to describe. We need to identify a
2093 clock. It might be an internal field or it might be an external
2095 By "external" we must mean with a super-ordinate object.
2096 e.g. a 'container' might contain lots of 'members' and the type
2097 of each member might be dependent on a field in the container.
2098 The clock could even be the program counter.
2102 An interpreter written in C, with a compiler written in target,
2103 that links to llvm, seems like a good plan.
2107 Modules / packages / crates.
2108 This is the primary mechanism for namespace control and for
2110 Within a package, all names defined in that package are visible
2111 providing the right path is used. Outside the package, only
2112 exported names are visible. Some names are exported by default
2113 but can be marked "private". Others are private by default and but
2114 can be marked "public".
2116 Packages can define:
2117 - types: struct, enum, array, function, interface etc
2123 Does it make sense for a package to export a method for a type that
2124 the package imported. As the name of the method is in the
2125 namespace of the type, it probably isn't polite to add something to
2126 someone else's namespace.
2127 But a new name for a type could be defined, added to, and exported.
2129 How are name conflicts handled.
2130 Certainly you cannot define the same name twice in the same scope.
2132 you might import a name with a wildcard, then define it.
2133 You might embed a struct/enum in a local structure enum and
2134 add name which already exists.
2136 This is interesting because the library that you import might get
2137 a new revision. It would be a pain for old code to break when
2140 So importing a library should specify a version. only names from
2141 that version are imported.
2143 You don't need to explicitly import if you don't want namespace
2145 Just use ":modulename:symbol" to get any symbol from any module.
2146 If you want to avoid using ":modulename:" you can
2147 import name, name from modulename
2149 import version_number from modulename
2151 exported symbols needed to be version-tagged. That could get ugly.
2153 private-by-default could be "public.2" rather than "public" which
2154 isn't too bad, but what about public by default?
2155 I guess I need to decide which symbols are which!
2156 There are "items": type/constant/function/package
2157 And there is fields, which include methods.
2159 Go: exports only things that start with upper case letter
2160 Rust: exports all field/methods but nothing else.
2162 We could have an "export" clause for items:
2163 export version item, item, item, item
2164 This makes "static" the default which is safest, and makes the
2165 "interface" easy to see.
2167 For fields, we could have an "export version" block, or
2168 "export version" exports all fields/methods listed *before* the
2170 So when you add more to the end, they aren't exported.
2171 Though it is probably more consistent to have
2172 export version field,field,field
2176 Should modules have "variables"? or any state?
2177 If so, they can usefully have an 'init' routine.
2181 modules live in files with related name. If file name is a valid
2182 identifier, we can just use that name in the code.
2183 If it isn't, or if we want to change it for some other reason, we
2184 would need to 'import identifier "filename"'
2185 Also, the file might be in a directory hierarchy, where '/' might
2186 not be the right separator.
2187 So we need to import a path, possible into a module, and provide
2188 a local name for that, if default doesn't suit.
2189 import "file" = name
2190 import "file" { import "file" {import identifier=new, id, id}}
2194 Overloading seems to be largely disliked. Hard to be sure why.
2196 Overloading + - * / % is already done for numbers.
2197 Overloading [] could be very useful for hashes
2198 Overloading () && || ?: would be very bad because they have
2199 strong meanings. Ditto for =
2200 Overloading -> . prefix-* prefix-& could be weird
2201 Overloading & | ^ << >> &^ ~ might be OK
2202 Overloading == would be bad, it has to apply to all types anyway?
2203 Overloading < <= > >= ???
2206 some operators need boolean. Possibly a 'testable' interface
2207 could allow anything to be tested. ?: && || !
2208 some operators compare. So a 'comparable' interface?
2210 some operators are limited and can freely be overloaded:
2211 + - * / % & | ^ << >> &^ prefix-~ prefix-`-'
2212 some operators are special:
2215 [[ &^ is not really different from "& ^" except that "&^=" is
2217 [[ what about bit operations without having to explicitly shift?
2220 Maybe it would be cleaner to have a simple opp to convert a
2221 number to a bit. #1 Then we clear a bit with "a &^#= 5" ??
2223 Or a 'bitset' type could define '+' and '-' to add/remove bits.
2226 Overloading of a binary operator is a method-call on the first
2227 argument passing the second.
2229 So for binary operations like + - * / , left and right side must
2230 be identical, like with Go.
2231 But what about '<<' and '>>'. Go allows the RHS to be any unsigned
2232 int type. This isn't nicely consistent. To be consistent we need
2233 to required either a type or an interface. The type would need to
2234 be unsigned int. The interface would need to be
2235 to_unsigned_int_clipping which returns 0 or MAXINT if it cannot
2236 give a faithful unsigned int. This is much the same type as the
2239 So having an 'int' and 'uint' interface might be defensible
2241 So each operator is defined in terms of a method call.
2242 e.g. a + b => a.plus(b)
2243 a && b => if a.test() then a else b fi
2244 a < b => a.compare(b) == less_than
2245 a[b] => *a.lookup(b, create:Bool)
2247 All of the first set of operators can also be combined with '=',
2249 ++ -- : are the really needed? +=1 -=1 are just as good?
2250 Though the could call a.incr() and a.decr()
2252 Do we want a += b to be different from a = a.plus(b) ???
2253 For a list it would be good to know that we don't need to duplicate.
2256 Wait... for 'plus' to be a valid method it would need to define the
2257 type of the second arg, and the return type. Don't like that.
2258 So treat it not as a known method, but a simple textual
2260 a + b is translated to a.plus(b) which is in any particular
2263 On the other hand, test and compare are from an interface.
2264 What about lookup? Do parameterized methods help here?
2266 Type of plus is <X>(X,X)->X
2267 Type of shiftleft is <X>(X,uint)->X
2268 Type of compare is <X>(X,X)-> (-1,0,1)
2270 List of (possible) operators.
2277 "is" "!is" "is<=" pointer comparison.
2281 >>> signed right shift?
2283 String catentation? '+'?? '++' ??
2286 "in" dispatches on the right-hand-op.
2287 "string" does substring search
2288 "int" does bit test.
2290 if 'in' can to bit-test, how do I do bit set/clear?
2291 |- |+ &- &+ Not good as +- can be prefix
2292 -| +| ?? Hard to type...
2293 What is good for set-addition and set-subtraction?
2295 is niceish for adding an element to foo, at end I guess.
2296 But it doesn't generalise to subtraction.
2299 Could use ++ and -- except that I want ++ for concat.
2302 'else' - like || or ?: "a else b" == a if !!a, or b if !a
2303 'then' ?? a then b is either b or NULL.
2310 'as' for cast: "value as type".
2311 I think I prefer "type(value)", though the 'as' allows
2312 separate name spaces for types and functions
2314 'div' for integer division? Probably not.
2316 'if'. "a if b" is either 'a' or False.
2317 This isn't always type safe, but
2319 becomes 'a' if 'b' is true, or 'c' if not.
2320 so "a else b" is short hand for "a if a else b"
2329 & takes the address of a field?
2330 [n:] takes address of an array member
2331 Can we take address of a local variable? Or do we define
2332 the local variable as a pointer and allocate from stack.
2333 * dereference a pointer. This is normally automatic, but not
2336 < extract the value and set variable to nil.
2340 co-operatively schedules threads across N processes with
2341 all OS call non-blocking seems like the way to go.
2349 - how to create new threads?
2351 function or closure?
2352 - how to control threads
2354 - how to wait for threads?
2355 maybe don't. When they die, they die. Any waiting
2357 - What to do for IPC?
2359 - shared memory with locks
2361 - synchronous interfaces?
2362 - how much control of shared memory is needed?
2364 Message passing is clearly a good tool but I'm not sure it should
2365 be the only one. Something that abstracts locking could be good.
2366 E.g. a pointer that needs to be 'activated' before it can be used.
2367 i.e. a refcounted object which has "lock" which returns a 'once'
2368 pointer to a subtype which provide more access. The original
2369 allows read access to some things.
2371 Or a type where some/all methods are required to own a particular
2372 lock. Actually it should be that some fields are only accessible
2373 when some lock is held. So the type is parameterised by a lock.
2374 By default fields are only accessible on a single thread. If
2375 a lock is provided, then they become accessible when that lock is
2379 I wonder if we need more pointer types, or use dependant types..
2382 - 'ref counted' - needs locking.. so:
2385 - borrowed references might be onethread or need-locks
2388 ParaSail goes even further and parallelises everything. Any two
2389 expressions that aren't interdependent are potentially run in
2390 parallel. This requires the compile to know about interdependence.
2391 ParaSail puts in lots of requirements to avoid aliasing etc.
2392 You can never (I think) have two pointers to the one thing. This
2393 feels overly restrictive to me, but could often happen in practice.
2394 If the compiler could detect this - or at least make good use of
2399 I really like the idea that locking is done with 'once' pointers.
2400 The locks are released just like memory is freed.
2402 Weird idea: I wonder if a hash table of spinlocks would work so the same
2403 data structs can be used with or without locks.
2404 The obvious problem is that alias can catch stacked locks and I deadlock
2405 waiting for a lock that I hold.
2406 With little effort I could tell if it was me.
2407 What would I gain though? hashing pointers does take some time...
2408 It would remove the need to store spinlocks in any data structure.
2411 The base hardware locking primitive seems to be
2412 load-linked/store-conditional, or cmp-swap. If I were to expose
2413 that in the language I would really need to understand all the
2414 stuff about memory models, which might be fun to learn I guess....
2419 starts BLOCK in another task. When it completes, thread is done.
2420 Any non-shareable values mentioned there become unavailable.
2421 Any refcounted values that are also used after BLOCK are increfed first.
2425 if we have multiple threads then we need to have a scheduler.
2426 We could use OS threads and use the OS scheduler, but that is likely
2427 to impose an overhead that we don't want.
2428 So even if we schedule only when a thread blocks, we still need some
2429 priority info to decide who goes next, and some way to share
2430 threads across a pool of OS threads.
2431 Explicitly assigning numbers is very boring and error prone.
2432 Putting things in order somehow would be nicer.
2433 With event-loop code we can have an ordered list of what
2434 to do next. This is effectly an explicitly coded scheduler.
2435 To do something like that means putting all routines into some
2436 data structure in the language. That would be good for introspection
2438 Some threads are blocked, some are ready. A thread can be placed in
2439 a group for round robin. A group can be ....(don't know what I was
2444 How do I get a new 'box' to hold a value?
2445 For 'stack' boxes, it just happens automagically as needed.
2446 For various heaps we need some syntax, like
2448 but we need to know which heap (per-thread, common, whatever)
2452 But I'm not very keen on builtin function - they pollute the
2454 var name : type = @heapname
2455 I don't think I just want a sigil like Rust as I suspect there
2456 could be multiple sources of allocation, or styles of
2457 allocation - enough that names might be useful.
2459 Whatever it is, we need to indicate:
2460 - a type - could come from the variable being assigned
2461 - parameters to the type such as array size or union
2462 discriminant (does that make sense?)
2463 - region to allocate in
2464 - type of pointer: unsafe, counted, owned, managed,
2466 These could be associated with the region.
2467 'Managed' (garbage collected) and 'never-free'
2468 really need to be in their own regions, but
2469 counted, owned, unsafe can be together.
2471 However they may well be managed separately.
2472 The compiler only knows about 'owned' and
2473 'borrowed' references. When an owned reference
2474 is dropped the manager knows whether to dec a ref
2475 or explicit free or do nothing.
2476 Allocating from a 'never-free' manager will return a
2477 borrowed reference from the manager.
2479 Given that the type might need to be parameterised it is best
2480 to have it explicit and not deduced from the var. Let the var
2481 deduce from the allocated value.
2483 So we want some syntax that includes allocator and type and
2484 says "allocate it". new(type, heap) is obvious, but we cannot
2485 pass types to any other functions.
2486 heap(type) has same problem.
2487 typename<parameters>.alloc(heap) is too noisy
2488 We could include the heap as a parameter?
2490 when there are no parameters
2492 But.... in many cases an allocation goes straight into a
2493 struct field, so the type is known and repeating it is boring
2494 and while it can be type checked and so not error prone, it
2495 shouldn't be necessary.
2496 But we still need parameters......
2500 Opinions vary. They are like gotos : dangerous but useful.
2501 Can we constrain them?
2503 1/ total-global is a bad idea. module-global at most. This
2504 avoids namespace pollution. Still accessible from other
2505 modules as "that_module.globalvar".
2506 2/ makes referential transparency hard: so include 'read/writes
2507 mod-static' and 'read/writes foreign static' properties of
2509 3/ local variables must never 'hide' global variables ... or
2510 anything in a greater scope.
2511 4/ it should be easy to pass lots of function parameters.
2512 @struct means "take all the names in this struct and if
2513 any are names of function arguments, use them as such.
2515 Globals are bad because very few things are 'once only' Even
2516 a random number generator context could be per-thread.
2518 If two functions don't touch globals, and operate on
2519 unconnected data structures, or are read-only on shared one,
2520 then they can run in parallel. Detecting this would be good.
2521 Warning when it cannot be done would be nice. Programmer
2522 would need to suggest it.
2524 A global that is set once in an early 'stage' of the program
2525 and never changed again does not break referential
2526 transparency for most of the life of the program.
2528 Analysing these lifetime issues might be difficult. Static
2529 analysis of all known types might be part of the problem.
2530 Loaded code (.so) could not be known about at compile time so
2531 types would need to be explicit, but they must always be in a
2532 stage *after* they are loaded.
2535 - function need 'access globals' in their type
2536 - program stages might help isolate these.
2537 So global gets type 'read-only is stage X', and function
2538 gets 'called in stage X'. But what is 'X'.
2542 If a name is never changed but not declared constant,is that an error?
2543 if a value is set but never used, is that an error?
2552 Closures are a cool idea - a little inline function with
2553 access to the current stack frame - a bit like a method for
2554 the current frame rather than for a separate object.
2556 - to create arbitrary code structures like maps
2557 and threads bodys and innovative loops
2558 - to pass compare functions to sorts code etc
2559 - I wonder what else..
2560 - by holding state they can work as generators
2561 - this requires the closure to outlive the frame!
2562 - closures are good for 'after' functions.
2564 We need to know what happens to names in the closure.
2565 Are the values copied, or are references taken, or what?
2566 The answers are probably obvious in various contexts, but need
2567 to be clearly enumerated.
2569 A possible trouble with closures is that they can play tricks
2570 with the lifetime of a variable. e.g. if we create a closure
2571 inside a loop, using a variable that was declared in the loop,
2572 then does each closure get the *same* variable, or do they
2573 each get separate variables -- effectively duplicating the
2575 I think that neither is really appropriate as either could be
2576 confusing. If however the name binding was a constant, then
2577 copying the value into the closure may well be appropriate.
2579 In general I don't think that closing over a variable should
2580 extend its life time. Go does this and allows the address of
2581 a local variable to be passed out - the variable is
2582 automatically made into a heap variable. I think it best to
2583 require an explicit heap allocation if you want to return a
2584 pointer. It is less likely to be confusing.
2585 A pointer obtained with '&' should only ever be a 'borrowed'
2586 pointer. There must be a strong reference which protects it.
2589 Syntax for closures?
2590 Go uses func (args) result { body }
2591 Rust uses |arg list| { body }
2592 Python uses lambda args : body
2594 For small closures, listing the args is boring.
2595 Using "$1 $2" is simple and brief.
2596 Still need a syntax to say "this is a closure".
2598 where expression is expected could do it.
2601 could be a more general syntax. Not sure it can be
2602 differentiated on an LR basis though...
2603 That is a similar syntax to list comprehensions
2604 [ a : for a in range(50) if a % 2 ]
2605 No: list comprehensions are in [] - the array syntax.
2606 But {} is like a structure... or do we use tuples for
2609 I think I want in-line closures to be expression only.
2610 I think burying statements inside expressions is wrong.
2611 This argues for "x if b else c".
2612 If you want a more complex closure, you need a local
2613 function declaration.
2614 This means that you cannot pass a large closure directly
2615 to a function like Rust does. Given issues with 'break'
2616 and 'continue', this is probably a good thing. But are
2617 there times when it would be good? If there were it should
2618 probably be handled by something like 'do' which has
2619 a closure as a distinct syntactic object. Does it need
2620 args? All local variables are available, but they aren't
2621 available to the function which is passed the closure.
2622 Passing local vars in would be clumsy - really do need
2626 I've never done anything like type analysis, so my first
2627 thought is that it sounds easy, but is probably not else more
2630 The first and probably most difficult task is to create a good
2631 type model. This will allow us to annotate every node in the
2632 code graph with appropriate type information, and the make
2634 The deduction we want to be able to make are:
2635 - is this correct? or at least consistent
2636 - what are the unspecified or underspecified types.
2638 Scalars, strings, Arrays, structs, unions, functions, closures
2639 are different arrangements of data.
2640 Scalars, strings, functions, closures are immutable, as can be
2641 the others. Immutable values can be copied instead of
2645 mutable or immutable - can I change them
2646 volatile or non-volatile - others can change them
2647 once, borrowed(wrt some once)
2648 compile-time, or runtime.
2650 Every variable has a definite type, as does any function
2651 result. The args to a function determine the types of the
2652 results, though those types may be a complex function of the
2653 arg types. e.g. intersection, union.
2655 Constants have a list of possible types. We choose the first
2656 type for which there is no type error.
2657 If there are multiple constants, they are tested in
2658 depth-first, left to right order.
2660 Literate programming
2661 There should be more of this. Language should make it easy by
2662 allowing order of code to match order of presentation. This
2663 means we should be able to add to any namespace at a later
2664 date by reintroducing it with some sort of additive syntax:
2665 type foo += { .... stuff }
2666 Maybe could allow previous stuff to be repeated, which could
2667 be filled in by tool and formatted differently by
2670 In literate mode, everything is a comment except between
2671 [[[ and ]]] each all alone on a line.
2673 It would be nice to allow the description to be processed as
2674 markdown. For this, the compiler needs to extract precisely
2675 the code blocks (not the span-level code though).
2676 I think that and line indented 4 or more spaces following a
2677 block line starts a code block which ends when a non-empty has
2678 fewer than 4 leading blanks.
2679 First line in code block cannot start with list char: - * + 1 >
2681 Github markdown also allows ``` to start a code block, and ```
2682 to end it. ```plato would turn on 'plato' syntax
2684 stackoverflow would use <!-- language: plato -->
2688 Asciidoc http://www.methods.co.nz/asciidoc/ - nah, ugly
2689 pandoc http://johnmacfarlane.net/pandoc/README.html#pandocs-markdown
2691 reStructuredText is similar to markdown. It uses "::" at the
2692 end of a line to suggest following lines until undent are
2695 pandoc is like markdown. One thing it adds is
2699 as well as ``` like github
2701 When writing a program in literate style, is there a need to
2702 write code out of order? I suspect that there is, as we might
2703 want to add a second task for a loop to do, or refine a test.
2704 Actually - that is a trick one. Can we *change* previous
2707 We could deliberately leave ellipses everywhere with names?
2708 The block would have a name as well as the ellipsis.
2710 This really needs to be tried.
2714 If I want this to be very general programming language, it
2715 should be possible to write in various different styles.
2717 - easy access to regexps like perl
2718 - dynamic typed programming
2719 - lisp-like programming?
2721 A string could have an 'r' suffix to make it behave as a
2723 if "foo.*bar"r < mystring
2724 if (a,b) = "foo\([bar]*\)x\(.\)"r < mystring
2726 Dynamic typing needs a universal type which must be
2728 This is like an extensible union which I already considered
2729 with universal enums. But that doesn't really work...
2732 Lisp? That is really a totally different engine with a totally
2733 different syntax. Hard to see how it could possibly fit.
2734 Any 'eval' based operation is really impossible to fit within
2737 List processing more generally? If we have arrays that can be
2738 chopped up and joined, we start to have something a bit
2740 Add list comprehension and map/reduce functions it might get
2741 interesting. But then memory management gets interesting too.
2742 Could 'tuples' be claimed by a type like literals can?
2745 Random stuff - don't know where this fits.
2747 If a variable is know to be of some interface type, and we assign
2748 a value of a more refined type, then the variable can be known to
2749 have that more refined type until the value could change.
2750 This might be useful for struct fields if we have exclusive access,
2751 or named function return values.
2754 Strings need to be a built-in type. But how should they look?
2755 NUL termination has problems - counting.
2756 Storing a count in front can be wasteful if we want 64bit counts
2758 Could take the utf8 approach:
2760 0-127 : 1 byte 7bit string length. (128 bytes)
2761 128-191: 2 byte 14bit string length (16 Kilobytes)
2762 192-255: 8 byte 62bit string length (4 exabytes)
2764 We always nul terminate so strings can be passed to 'C'.
2767 I really don't like the fact that ints can overflow without me
2769 I think any possible overflow must be checked, unless the var is
2770 explicitly declared as cyclic (modulus).
2771 This means the language must know the range of values, or include
2776 - profiling. collecting statistics of where time is spent
2777 - coverage. Count number of types each block is entered and allow
2778 this to be accumulated over multiple runs.
2779 Then format code with highlighting of un-executed and hot-spot code.
2781 For interpreter, this can all be internal.
2782 For compiler, we need an external data file. Probably a log that we append
2783 on completion and process later, using uuid hashes of functions.
2785 Unit-tests, with mocking. This needs lots of experimentation.
2789 Sometimes it might be nice to assert that the order of things mustn't matter.
2790 When building a function up during literate programing you might want to
2791 say the certain things happen, but you want to be sure their order isn't important.
2792 i.e. they all appear to happen at once.
2793 [Other times there are dependencies that must be honoured.
2794 For these, "after foo" or "before foo" might be useful
2796 In particularly, when updating a data structure a number of steps might
2797 be needed. Each should be able to access the 'before' field and assign the
2799 Maybe an explicit syntax for 'before' or 'orig', and an error if there is
2800 room for uncertainty.
2801 We probably want a name for a block of code in which a value is varying.
2802 Invarient apply at start and end but not during. Values assigned are not
2803 visible until the end.