Oceani - Cataract Creek version

author NeilBrown <neil@brown.name>

Tue, 28 Dec 2021 00:44:56 +0000 (11:44 +1100)

committer NeilBrown <neil@brown.name>

Tue, 28 Dec 2021 04:45:05 +0000 (15:45 +1100)
author NeilBrown <neil@brown.name>
Tue, 28 Dec 2021 00:44:56 +0000 (11:44 +1100)
committer NeilBrown <neil@brown.name>
Tue, 28 Dec 2021 04:45:05 +0000 (15:45 +1100)
diff --git a/csrc/oceani-tests.mdc b/csrc/oceani-tests.mdc

index 182c77c8367b2e553c6619f410e2eee512a671cb..49274ebd20be61d88b0eff106f868cc6cc40156a 100644 (file)
--- a/csrc/oceani-tests.mdc
+++ b/csrc/oceani-tests.mdc
@@ -1080,7 +1080,7 @@ various places that `type_err()` are called.
  
  ###### output: type_err5
         .tmp.code:8:7: error: type already declared: foo
-       .tmp.code:2:7: info: this is location of declartion: foo
+       .tmp.code:2:7: info: this is location of declaration: foo
         .tmp.code:10:13: Syntax error in struct field: :
         .tmp.code:5:7: error: type has recursive definition: baz
         .tmp.code:2:7: error: type has recursive definition: foo
diff --git a/csrc/oceani.mdc b/csrc/oceani.mdc

index 4b1b32a042719f79719c97a6d4ab3e8e419957aa..98430c21e93b1951f7ce7a80f26338c2ab7a82ca 100644 (file)
--- a/csrc/oceani.mdc
+++ b/csrc/oceani.mdc
@@ -1,4 +1,4 @@
-# Ocean Interpreter - Jamison Creek version
+# Ocean Interpreter - Cataract Creek version
  
  Ocean is intended to be a compiled language, so this interpreter is
  not targeted at being the final product.  It is, rather, an intermediate
@@ -29,40 +29,55 @@ be.
  
  ## Current version
  
-This third version of the interpreter exists to test out some initial
-ideas relating to types.  Particularly it adds arrays (indexed from
-zero) and simple structures.  Basic control flow and variable scoping
-are already fairly well established, as are basic numerical and
-boolean operators.
-
-Some operators that have only recently been added, and so have not
-generated all that much experience yet are "and then" and "or else" as
-short-circuit Boolean operators (which have since been remove), and the
-"if ...  else" trinary operator which can select between two expressions
-based on a third (which appears syntactically in the middle).
-
-The "func" clause currently only allows a "main" function to be
-declared.  That will be extended when proper function support is added.
-
-An element that is present purely to make a usable language, and
-without any expectation that they will remain, is the "print" statement
-which performs simple output.
-
-The current scalar types are "number", "Boolean", and "string".
-Boolean will likely stay in its current form, the other two might, but
-could just as easily be changed.
+This fourth version of the interpreter showcases the latest iteration of
+the design for parsing indents and line breaks, provides a first cut of
+support for references and functions, and introduces some syntax for a
+array with run-time length - currently only used for "argv".
+
+Indents are now explicit elements of the grammar and we no longer
+require a colon before an indent.  The colon still appears on "if" and
+"while" statements and others but it now marks the end of the
+expression.  Where there is no expression, such as after "else", there
+is no colon - an indent can immediately follow the "else".
+
+References can refer objects of any type which has a name, and this
+explicitly excludes other pointers and arrays.  This makes a very clear
+difference between references and things that they refer to, which we
+will see makes the description of function parameters simpler.  As a
+structure can hold a reference it is still quite possible to have a
+reference to a reference, but there will be a structure which keeps the
+inner and outer clearly separate.
+
+Functions can receive by-value or by-reference parameters with by-ref being
+declared like references.  If a non-reference is passed to a reference
+parameter, it is passed by-reference.  Functions can return a single
+value, or can return a collections of values which acts like a structure.
+
+The only IO that is currently possible is that "input" can be received
+in the sense that command line arguments are available to `main()`, and
+"output" can be generated with the "print" statement.  It is quite
+possible that neither of these will remain in the final language.
+
+The current scalar types are "number", "Boolean", "string" and the new
+"reference".  Boolean will likely stay in its current form, "string" and
+"number" are still open to being revised.  Compound types are structures
+and arrays.
  
  ## Naming
  
  Versions of the interpreter which obviously do not support a complete
-language will be named after creeks and streams.  This one is Jamison
+language will be named after creeks and streams.  This one is Cataract
  Creek.
  
-Once we have something reasonably resembling a complete language, the
-names of rivers will be used.
+Once we have support for methods, the names of rivers will be used.
+With semantic analysis can start tracking changes to effective types
+typing within the code (e.g.  a ref becoming "known to not be NULL"),
+names of lakes will be used.
+
  Early versions of the compiler will be named after seas.  Major
  releases of the compiler will be named after oceans.  Hopefully I will
-be finished once I get to the Pacific Ocean release.
+be finished once I get to the Pacific Ocean release - otherwise I might
+need to use Lunar Maria.
  
  ## Outline
  
@@ -72,24 +87,33 @@ for validating the parsing.
  So the main requirements of the interpreter are:
  
  - Parse the program, possibly with tracing,
-- Analyse the parsed program to ensure consistency,
+- Analyse the parsed program to ensure consistency and deduce implicit
+  information.
  - Print the program,
  - Execute the "main" function in the program, if no parsing or
    consistency errors were found.
  
  This is all performed by a single C program extracted with
-`parsergen`.
+`parsergen`, using the `scanner` library.
+
+There will be two formats for printing the program: a default that uses
+indenting to show structure and an alternate that uses bracketing.  So a
+`--bracket` command line option is needed for that.
  
-There will be two formats for printing the program: a default and one
-that uses bracketing.  So a `--bracket` command line option is needed
-for that.  Normally the first code section found is used, however an
-alternate section can be requested so that a file (such as this one)
-can contain multiple programs.  This is effected with the `--section`
-option.
+The program appear in an "`mdcode`" file and is normally the first
+top-level code section found.  However an alternate section can
+be requested so that a file (such as this one) can contain multiple
+programs.  This is effected with the `--section` option.
  
  This code must be compiled with `-fplan9-extensions` so that anonymous
  structures can be used.
  
+The information gathered while parsing, and used while executing, is all
+stored in a single `parse_context` data structure.  And this exposition
+of the program progresses we will add various fields to this structure.
+It will be pass to many function, and all reduction code (called by the
+parsing engine) will have easy access to it.
+
  ###### File: oceani.mk
  
         myCFLAGS := -Wall -g -fplan9-extensions
@@ -167,6 +191,9 @@ structures can be used.
         };
         const char *options = "tpnbs";
  
+       /* pr_err() is used to report inconsistencies in the mdcode,
+        * particularly missing or duplicate section names.
+        */
         static void pr_err(char *msg)                   // NOTEST
         {
                 fprintf(stderr, "%s\n", msg);           // NOTEST
@@ -181,10 +208,7 @@ structures can be used.
                 char *section = NULL;
                 struct parse_context context = {
                         .config = {
-                               .ignored = (1 << TK_mark),
-                               .number_chars = ".,_+- ",
-                               .word_start = "_",
-                               .word_cont = "_",
+                               ## scanner configuration
                         },
                 };
                 int doprint=0, dotrace=0, doexec=1, brackets=0;
@@ -274,11 +298,25 @@ structures can be used.
                 exit(context.parse_error ? 1 : 0);
         }
  
+Minimal configuration is needed for the scanner.  Unknown marks
+(punctuation) are not permitted so we ignore that.  A wide range of
+character are permitted in numbers, so that both period and comma can
+be used for the decimal marker, and both space and underscore can be
+used to separate groups of digits.  Only the defauls are allowed in
+identifiers with the exception that underscore can both start and
+continue an identifier.
+
+###### scanner configuration
+       .ignored = (1 << TK_mark),
+       .number_chars = ".,_+- ",
+       .word_start = "_",
+       .word_cont = "_",
+
  ### Analysis
  
-The four requirements of parse, analyse, print, interpret apply to
-each language element individually so that is how most of the code
-will be structured.
+The four requirements of parse, analyse, print, and interpret apply to
+each language element individually so that is how most of the code will
+be structured.
  
  Three of the four are fairly self explanatory.  The one that requires
  a little explanation is the analysis step.
@@ -286,17 +324,21 @@ a little explanation is the analysis step.
  The current language design does not require the types of variables to
  be declared, but they must still have a single type.  Different
  operations impose different requirements on the variables, for example
-addition requires both arguments to be numeric, and assignment
-requires the variable on the left to have the same type as the
-expression on the right.
+addition requires both arguments to be numeric, and assignment requires
+the variable on the left to have the same type as the expression on the
+right, or to be a reference to that type.  There are currently no type
+that are distinct but compatible, though that will change when more
+numeric types are introduced and again when interfaces are added.  Until
+then the type or a variable is determined either from the declaration of
+the initial assignment, but the code tries not to assume that.
  
  Analysis involves propagating these type requirements around and
-consequently setting the type of each variable.  If any requirements
-are violated (e.g. a string is compared with a number) or if a
-variable needs to have two different types, then an error is raised
-and the program will not run.
+consequently setting the type of each variable.  If any requirements are
+violated (e.g.  a string is compared with a number) or if a variable
+needs to have two different types, then an error is raised and the
+program will not run.
  
-If the same variable is declared in both branchs of an 'if/else', or
+If the same variable is declared in both branches of an 'if/else', or
  in all cases of a 'switch' then the multiple instances may be merged
  into just one variable if the variable is referenced after the
  conditional statement.  When this happens, the types must naturally be
@@ -304,26 +346,32 @@ consistent across all the branches.  When the variable is not used
  outside the if, the variables in the different branches are distinct
  and can be of different types.
  
-Undeclared names may only appear in "use" statements and "case" expressions.
-These names are given a type of "label" and a unique value.
-This allows them to fill the role of a name in an enumerated type, which
-is useful for testing the `switch` statement.
+Local variables, global constants, and functions are all named in the
+same namespace.  If a name is used before it is declared, it is assumed
+to be global, either a constant or a function.  It must be declare
+eventually, and this is checked in the analysis phase after all code has
+been parsed.
  
  As we will see, the condition part of a `while` statement can return
-either a Boolean or some other type.  This requires that the expected
-type that gets passed around comprises a type and a flag to indicate
-that `Tbool` is also permitted.
-
-As there are, as yet, no distinct types that are compatible, there
-isn't much subtlety in the analysis.  When we have distinct number
-types, this will become more interesting.
+either a Boolean or some other type, and as we have seen an asignment to
+a reference allows either a reference or the refered-to type to be
+given.  This requires that the expected type that gets passed around is
+accompanied by some flags, one to indicate that `Boolean` is also
+permitted and one to indicate that a reference to the given type is also
+permitted.
+
+Possibly the most interesting part of analysis at present involves some
+flags can be set during analysis of an expression, such as whether it
+can be used and a "lvalue" (i.e.  it can be assigned to) and whether it
+can be computed at compile-time, or whether it must wait until runtime.
+These will be introduced in due course.
  
  #### Error reporting
  
  When analysis discovers an inconsistency it needs to report an error;
  just refusing to run the code ensures that the error doesn't cascade,
  but by itself it isn't very useful.  A clear understanding of the sort
-of error message that are useful will help guide the process of
+of error messages that are useful will help guide the process of
  analysis.
  
  At a simplistic level, the only sort of error that type analysis can
@@ -337,30 +385,33 @@ particular type, by indicating the location where the type was set,
  whether by declaration or usage.
  
  Using a recursive-descent analysis we can easily detect a problem at
-multiple locations. In "`hello:= "there"; 4 + hello`" the addition
-will detect that one argument is not a number and the usage of `hello`
-will detect that a number was wanted, but not provided.  In this
-(early) version of the language, we will generate error reports at
-multiple locations, so the use of `hello` will report an error and
-explain were the value was set, and the addition will report an error
-and say why numbers are needed.  To be able to report locations for
-errors, each language element will need to record a file location
-(line and column) and each variable will need to record the language
-element where its type was set.  For now we will assume that each line
-of an error message indicates one location in the file, and up to 2
-types.  So we provide a `printf`-like function which takes a format, a
-location (a `struct exec` which has not yet been introduced), and 2
-types. "`%1`" reports the first type, "`%2`" reports the second.  We
-will need a function to print the location, once we know how that is
-stored. e As will be explained later, there are sometimes extra rules for
-type matching and they might affect error messages, we need to pass those
-in too.
+multiple locations.  In "`hello := "there"; print 4 + hello`" the
+addition will detect that one argument is not a number and the usage of
+`hello` will detect that a number was wanted, but not provided.  We
+could generate an error at either location, or even at both.  In this
+version of the language, we pass down the expected type, and the handler
+for variables notices that `hello` is not the correct type and reports
+an error.  So errors are large reported at the leaves.
+
+To be able to report locations for errors, each language element will
+need to record a file location (line and column) and each variable will
+need to record the language element where its type was set.  For now we
+will assume that each line of an error message indicates one location in
+the file, and up to 2 types.  So we provide a `printf`-like function
+which takes a format, a location (a `struct exec` which has not yet been
+introduced), and 2 types.  "`%1`" reports the first type, "`%2`" reports
+the second.  We will need a function to print the location, once we know
+how that is stored.  As explained earlier, there are sometimes extra
+rules for type matching (to accept Bool or reference) and they might
+affect error messages, we need to pass those in too.
  
  As well as type errors, we sometimes need to report problems with
-tokens, which might be unexpected or might name a type that has not
-been defined.  For these we have `tok_err()` which reports an error
-with a given token.  Each of the error functions sets the flag in the
-context so indicate that parsing failed.
+tokens, which might be unexpected or badly formatted, or might be a name
+that has been defined twice or not at all.  For these we have
+`tok_err()` which reports an error with a given token.  Each of the
+error functions updates a count of error in the context to indicate that
+parsing failed.  We use a counter so it is easy to determine if a new
+error occurred during a particular stage of analysis.
  
  ###### forward decls
  
@@ -407,15 +458,14 @@ context so indicate that parsing failed.
                 c->parse_error += 1;
         }
  
-## Entities: declared and predeclared.
+## Entities: values, types, variables, and code.
  
-There are various "things" that the language and/or the interpreter
-needs to know about to parse and execute a program.  These include
-types, variables, values, and executable code.  These are all lumped
-together under the term "entities" (calling them "objects" would be
-confusing) and introduced here.  The following section will present the
-different specific code elements which comprise or manipulate these
-various entities.
+It could be said that the focus of a language is values, which are
+organised into types, stored in variables, and manipulated with
+executable code.  This section introduces each of these entities and
+provide a foundation for them.  Once they are all in place, the next
+section will flesh them out, particular the mode complex executable code
+entities.
  
  ### Executables
  
@@ -424,30 +474,39 @@ executable is just an operation combined with one or two other
  executables.  This allows for expressions and lists etc.  Other times an
  executable is something quite specific like a constant or variable name.
  So we define a `struct exec` to be a general executable with a type, and
-a `struct binode` which is a subclass of `exec`, forms a node in a
+a `struct binodes` which is a subclass of `exec`, forms a node in a
  binary tree, and holds an operation.  The simplest operation is "List"
  which can be used to combine several execs together.
  
+When parsing a list of binodes, whether with the `List` operator or some
+other, it is most convenient to append to the end, so a list is a list
+and a thin.  When using the list it is more convenient to consider
+a list to be a thing and a list.  So we need a function to re-order
+a list.  `reorder_bilist` serves this purpose.
+
  There will be other subclasses, and to access these we need to be able
  to `cast` the `exec` into the various other types.  The first field in
  any `struct exec` is the type from the `exec_types` enum.
  
  ###### macros
         #define cast(structname, pointer) ({            \
-               const typeof( ((struct structname *)0)->type) *__mptr = &(pointer)->type; \
+               const typeof( ((struct structname *)0)->type) *__mptr =         \
+                                                        &(pointer)->type;      \
                 if (__mptr && *__mptr != X##structname) abort();                \
                 (struct structname *)( (char *)__mptr);})
  
         #define new(structname) ({                                              \
-               struct structname *__ptr = ((struct structname *)calloc(1,sizeof(struct structname))); \
-               __ptr->type = X##structname;                                            \
-               __ptr->line = -1; __ptr->column = -1;                                   \
+               struct structname *__ptr = ((struct structname *)calloc(        \
+                                               1,sizeof(struct structname)));  \
+               __ptr->type = X##structname;                                    \
+               __ptr->line = -1; __ptr->column = -1;                           \
                 __ptr;})
  
-       #define new_pos(structname, token) ({                                           \
-               struct structname *__ptr = ((struct structname *)calloc(1,sizeof(struct structname))); \
-               __ptr->type = X##structname;                                            \
-               __ptr->line = token.line; __ptr->column = token.col;                    \
+       #define new_pos(structname, token) ({                                   \
+               struct structname *__ptr = ((struct structname *)calloc(        \
+                                               1,sizeof(struct structname)));  \
+               __ptr->type = X##structname;                                    \
+               __ptr->line = token.line; __ptr->column = token.col;            \
                 __ptr;})
  
  ###### ast
@@ -481,13 +540,33 @@ any `struct exec` is the type from the `exec_types` enum.
                 }
                 if (loc->type == Xbinode)
                         return __fput_loc(cast(binode,loc)->left, f) ||
-                              __fput_loc(cast(binode,loc)->right, f);  // NOTEST
-               return 0;       // NOTEST
+                              __fput_loc(cast(binode,loc)->right, f); // NOTEST
+               return 0;                                              // NOTEST
         }
         static void fput_loc(struct exec *loc, FILE *f)
         {
                 if (!__fput_loc(loc, f))
-                       fprintf(f, "??:??: ");  // NOTEST
+                       fprintf(f, "??:??: ");                  // NOTEST
+       }
+
+       // Move all nodes from 'b' to 'rv', reversing their order.
+       // In 'b' 'left' is a list, and 'right' is the last node.
+       // In 'rv', left' is the first node and 'right' is a list.
+       static struct binode *reorder_bilist(struct binode *b)
+       {
+               struct binode *rv = NULL;
+
+               while (b) {
+                       struct exec *t = b->right;
+                       b->right = rv;
+                       rv = b;
+                       if (b->left)
+                               b = cast(binode, b->left);
+                       else
+                               b = NULL;
+                       rv->left = t;
+               }
+               return rv;
         }
  
  Each different type of `exec` node needs a number of functions defined,
@@ -591,6 +670,10 @@ If `Erval` is set, then the value cannot be assigned to because it is
  a temporary result.  If `Erval` is clear but `Econst` is set, then
  the value can only be assigned once, when the variable is declared.
  
+Various propagate cases can pass "perr_local" to analyse components of
+an expression which do not affect the result type of the whole
+expression.
+
  ###### ast
  
         enum val_rules {Rboolok = 1<<0, Rrefok = 1<<1,};
@@ -598,13 +681,15 @@ the value can only be assigned once, when the variable is declared.
                        Emaycopy = 1<<3, Erval = 1<<4, Econst = 1<<5};
  
  ###### forward decls
-       static struct type *propagate_types(struct exec *prog, struct parse_context *c, enum prop_err *perr,
-                                           struct type *type, enum val_rules rules);
+       static struct type *propagate_types(
+               struct exec *prog, struct parse_context *c, enum prop_err *perr,
+               struct type *type, enum val_rules rules);
  ###### core functions
  
-       static struct type *__propagate_types(struct exec *prog, struct parse_context *c, enum prop_err *perr,
-                                             enum prop_err *perr_local,
-                                             struct type *type, enum val_rules rules)
+       static struct type *__propagate_types(
+               struct exec *prog, struct parse_context *c,
+               enum prop_err *perr, enum prop_err *perr_local,
+               struct type *type, enum val_rules rules)
         {
                 struct type *t;
  
@@ -626,12 +711,16 @@ the value can only be assigned once, when the variable is declared.
                 return Tnone;
         }
  
-       static struct type *propagate_types(struct exec *prog, struct parse_context *c, enum prop_err *perr,
-                                           struct type *type, enum val_rules rules)
+       static struct type *propagate_types(struct exec *prog,
+                                           struct parse_context *c,
+                                           enum prop_err *perr,
+                                           struct type *type,
+                                           enum val_rules rules)
         {
                 int pre_err = c->parse_error;
                 enum prop_err perr_local = 0;
-               struct type *ret = __propagate_types(prog, c, perr, &perr_local, type, rules);
+               struct type *ret = __propagate_types(prog, c, perr, &perr_local,
+                                                    type, rules);
  
                 *perr |= perr_local & (Efail | Eretry);
                 if (c->parse_error > pre_err)
@@ -641,28 +730,45 @@ the value can only be assigned once, when the variable is declared.
  
  #### Interpreting
  
-Interpreting an `exec` doesn't require anything but the `exec`.  State
-is stored in variables and each variable will be directly linked from
-within the `exec` tree.  The exception to this is the `main` function
-which needs to look at command line arguments.  This function will be
-interpreted separately.
+Interpreting an `exec` primarily requires the `exec` and the variable
+storage information stored in the parse state.  Apart from modifying
+those variables, and possibly performing other side-effects, an exec can
+return a value.  `struct value` is used for passing around small values
+and a pointer to that structure can be used for larger values.
+
+Specifically, each `exec` case can return a value combined with a type
+in `struct lrval`.  The type may be `Tnone` but must be non-NULL.  Some
+`exec`s will return the location of a value, which can be updated, in
+`lval`.  Others will set `lval` to NULL indicating that there is a value
+of appropriate type in `rval`.
  
-Each `exec` can return a value combined with a type in `struct lrval`.
-The type may be `Tnone` but must be non-NULL.  Some `exec`s will return
-the location of a value, which can be updated, in `lval`.  Others will
-set `lval` to NULL indicating that there is a value of appropriate type
-in `rval`.
+Callers call either `interp_exec()` if they just want the value, or
+`linterp_exec()~ if they need an lvalue.  `dinterp_exec()` is called
+when there is a destination for the value to go.  This is used for
+function calls which return a value that is not an lvalue, but is too
+large to store in `struct value`.
+
+Each of these call `_interp_exec()` which calls the appropriate exec case.
  
  ###### forward decls
         static struct value interp_exec(struct parse_context *c, struct exec *e,
                                         struct type **typeret);
-###### core functions
+###### ast
+
+       struct value {
+               union {
+                       char ptr[1];
+                       ## value union fields
+               };
+       };
  
         struct lrval {
                 struct type *type;
                 struct value rval, *lval;
         };
  
+###### core functions
+
         /* If dest is passed, dtype must give the expected type, and
          * result can go there, in which case type is returned as NULL.
          */
@@ -758,10 +864,10 @@ in `rval`.
  
  Values come in a wide range of types, with more likely to be added.
  Each type needs to be able to print its own values (for convenience at
-least) as well as to compare two values, at least for equality and
-possibly for order.  For now, values might need to be duplicated and
-freed, though eventually such manipulations will be better integrated
-into the language.
+least, and for printing manifest constants when generating code) as well
+as to compare two values, at least for equality and possibly for order.
+For now, values might need to be duplicated and freed, though eventually
+such manipulations will be better integrated into the language.
  
  Named type are stored in a simple linked list.  Objects of each type are
  "values" which are often passed around by value.
@@ -770,14 +876,15 @@ There are both explicitly named types, and anonymous types.  Anonymous
  cannot be accessed by name, but are used internally and have a name
  which might be reported in error messages.
  
-###### ast
-
-       struct value {
-               union {
-                       char ptr[1];
-                       ## value union fields
-               };
-       };
+The `prepare_type()` interface is called on a type in two circumstances.
+After the program has been parsed but before anything in executed it is
+called with `parse_time` set to one.  This can be used for processing
+information that was not fully available when the type description was
+parsed, such as types of fields in structures.  It is then called again
+at runtime when a variable declaration is processed.  This allows the
+details of a type to depend on runtime context, such as the size of an
+array being determined by a constant.  In this second case the
+`parse_time` parameter is set to zero.
  
  ###### ast late
         struct type {
@@ -787,14 +894,16 @@ which might be reported in error messages.
                 int size, align;
                 int anon;
                 void (*init)(struct type *type, struct value *val);
-               int (*prepare_type)(struct parse_context *c, struct type *type, int parse_time);
+               int (*prepare_type)(struct parse_context *c, struct type *type,
+                    int parse_time);
                 void (*print)(struct type *type, struct value *val, FILE *f);
                 void (*print_type)(struct type *type, FILE *f);
                 int (*cmp_order)(struct type *t1, struct type *t2,
                                  struct value *v1, struct value *v2);
                 int (*cmp_eq)(struct type *t1, struct type *t2,
                               struct value *v1, struct value *v2);
-               void (*dup)(struct type *type, struct value *vold, struct value *vnew);
+               void (*dup)(struct type *type, struct value *vold,
+                           struct value *vnew);
                 int (*test)(struct type *type, struct value *val);
                 void (*free)(struct type *type, struct value *val);
                 void (*free_type)(struct type *t);
@@ -1035,11 +1144,10 @@ when no longer needed.
  
  When propagating type information around the program, we need to
  determine if two types are compatible, where type `NULL` is compatible
-with anything.  There are two special cases with type compatibility,
-both related to the Conditional Statement which will be described
-later.  In some cases a Boolean can be accepted as well as some other
-primary type, and in others any type is acceptable except a label (`Vlabel`).
-A separate function encoding these cases will simplify some code later.
+with anything.  There are two special cases with type compatibility.
+In some cases a Boolean can be accepted as well as some other
+primary type.  In other cases a reference to the given type is
+acceptable in place of a value of the type itself.
  
  ###### type functions
  
@@ -1050,8 +1158,6 @@ A separate function encoding these cases will simplify some code later.
         static int type_compat(struct type *require, struct type *have,
                                enum val_rules rules)
         {
-               if ((rules & Rboolok) && have == Tbool)
-                       return 1;       // NOTEST
                 if (!require || !have)
                         return 1;
  
@@ -1327,31 +1433,10 @@ executable.
  ###### free exec cases
         case Xval: free_val(cast(val, e)); break;
  
-###### ast functions
-       // Move all nodes from 'b' to 'rv', reversing their order.
-       // In 'b' 'left' is a list, and 'right' is the last node.
-       // In 'rv', left' is the first node and 'right' is a list.
-       static struct binode *reorder_bilist(struct binode *b)
-       {
-               struct binode *rv = NULL;
-
-               while (b) {
-                       struct exec *t = b->right;
-                       b->right = rv;
-                       rv = b;
-                       if (b->left)
-                               b = cast(binode, b->left);
-                       else
-                               b = NULL;
-                       rv->left = t;
-               }
-               return rv;
-       }
-
  #### Labels
  
  Labels are a temporary concept until I implement enums.  There are an
-anonymous enum which is declared by usage.  Thet are only allowed in
+anonymous enum which is declared by usage.  They are only allowed in
  `use` statements and corresponding `case` entries.  They appear as a
  period followed by an identifier.  All identifiers that are "used" must
  have a "case".
@@ -1440,7 +1525,6 @@ match "case".
                 break;
         }
  
-
  ### Variables
  
  Variables are scoped named values.  We store the names in a linked list
@@ -1501,6 +1585,15 @@ cannot nest, so a declaration while a name is in-scope is an error.
                 ## variable fields
         };
  
+The parser will want to be able to free a variable, but as this will
+also be a reference to something stored in the parse context, there is
+not action needed.
+
+###### ast functions
+       void free_variable(struct variable *v)
+       {
+       }
+
  When a scope closes, the values of the variables might need to be freed.
  This happens in the context of some `struct exec` and each `exec` will
  need to know which variables need to be freed when it completes.  To
@@ -1883,7 +1976,7 @@ all pending-scope variables become conditionally scoped.
                         vp = &(*vp)->in_scope;
                 v->in_scope = *vp;
                 *vp = v;
-               return 0;               
+               return 0;
         }
  
         static void var_block_close(struct parse_context *c, enum closetype ct,
@@ -1915,7 +2008,9 @@ all pending-scope variables become conditionally scoped.
                         if (v->scope == InScope)
                                 v->scope_end = c->scope_count;
                         if (v->scope == InScope && e && !v->global) {
-                               /* This variable gets cleaned up when 'e' finishes */
+                               /* This variable gets cleaned up when
+                                * 'e' finishes
+                                */
                                 variable_unlink_exec(v);
                                 v->cleanup_exec = e;
                                 v->next_free = e->to_free;
@@ -1937,8 +2032,9 @@ all pending-scope variables become conditionally scoped.
                                         else
                                                 v->scope = OutScope;
                                         if (ct == CloseElse) {
-                                               /* All Pending variables with this name
-                                                * are now Conditional */
+                                               /* All Pending variables with
+                                                * this name are now Conditional
+                                                */
                                                 for (v2 = v;
                                                      v2 && v2->scope == PendingScope;
                                                      v2 = v2->previous)
@@ -1960,7 +2056,8 @@ all pending-scope variables become conditionally scoped.
                                 break;
                         case CloseFunction:
                                 if (v->scope == CondScope)
-                                       /* Condition cannot continue past end of function */
+                                       /* Condition cannot continue past end of
+                                        * function */
                                         v->scope = InScope;
                                 /* fallthrough */
                         case CloseSequential:
@@ -1989,7 +2086,7 @@ all pending-scope variables become conditionally scoped.
  
  #### Storing Values
  
-The value of a variable is store separately from the variable, on an
+The value of a variable is stored separately from the variable, on an
  analogue of a stack frame.  There are (currently) two frames that can be
  active.  A global frame which currently only stores constants, and a
  stacked frame which stores local variables.  Each variable knows if it
@@ -2000,7 +2097,7 @@ the frame needs to be reallocated as it grows so it can store those
  values.  The local frame doesn't get values until the interpreted phase
  is started, so there is no need to allocate until the size is known.
  
-We initialize the `frame_pos` to an impossible value, so that we can
+We initialise the `frame_pos` to an impossible value, so that we can
  tell if it was set or not later.
  
  ###### variable fields
@@ -2321,7 +2418,6 @@ correctly.
  ###### free exec cases
         case Xvar: free_var(cast(var, e)); break;
  
-
  ### Complex types
  
  Now that we have the shape of the interpreter in place we can add some
@@ -2367,7 +2463,8 @@ various places.
                 $0->right = $<1;
         }$
  
-Thus far the complex types we have are arrays and structs.
+Thus far the complex types we have are arrays, structs, functions and
+references.
  
  #### Arrays
  
@@ -2376,18 +2473,18 @@ Arrays can be declared by giving a size and a type, as `[size]type' so
  size can be either a literal number, or a named constant.  Some day an
  arbitrary expression will be supported.
  
-As a formal parameter to a function, the array can be declared with a
-new variable as the size: `name:[size::number]string`.  The `size`
-variable is set to the size of the array and must be a constant.  As
-`number` is the only supported type, it can be left out:
-`name:[size::]string`.
+As a formal parameter to a function, the array can be declared with
+unknown size `name:[]string`.  This is currently only supported for the
+"argv" parameter to "main" but will be extended more generally in a
+later version of the language.  The length of this array - or any array
+- can be found with the "[]" postfix operator.
  
-Arrays cannot be assigned.  When pointers are introduced we will also
-introduce array slices which can refer to part or all of an array -
-the assignment syntax will create a slice.  For now, an array can only
-ever be referenced by the name it is declared with.  It is likely that
-a "`copy`" primitive will eventually be define which can be used to
-make a copy of an array with controllable recursive depth.
+Arrays cannot be assigned.  When reference are extend to allow array
+slices which can refer to part or all of an array the assignment
+syntax will create a slice.  For now, an array can only ever be
+referenced by the name it is declared with.  It is likely that a
+"`copy`" primitive will eventually be defined which can be used to make a
+copy of an array with controllable recursive depth.
  
  For now we have two sorts of array, those with fixed size either because
  it is given as a literal number or because it is a struct member (which
@@ -2625,7 +2722,8 @@ with a const size by whether they are prepared at parse time or not.
                 propagate_types(b->right, c, perr_local, Tnum, 0);
                 t = propagate_types(b->left, c, perr, NULL, 0);
                 if (!t || t->compat != array_compat) {
-                       type_err(c, "error: %1 cannot be indexed", prog, t, 0, NULL);
+                       type_err(c, "error: %1 cannot be indexed", prog, t, 0,
+                                NULL);
                         return NULL;
                 } else {
                         if (!type_compat(type, t->array.member, rules)) {
@@ -2641,7 +2739,8 @@ with a const size by whether they are prepared at parse time or not.
                  */
                 t = propagate_types(b->left, c, perr, NULL, 0);
                 if (!t || t->compat != array_compat) {
-                       type_err(c, "error: %1 cannot provide length", prog, t, 0, NULL);
+                       type_err(c, "error: %1 cannot provide length", prog, t,
+                                0, NULL);
                         return NULL;
                 }
                 if (!type_compat(type, Tnum, rules))
@@ -2965,7 +3064,8 @@ function will be needed.
  
                 if (t && t->size >= 0) {
                         tok_err(c, "error: type already declared", &$ID);
-                       tok_err(c, "info: this is location of declartion", &t->first_use);
+                       tok_err(c, "info: this is location of declaration",
+                               &t->first_use);
                         t = NULL;
                 }
                 if (!t)
@@ -3081,8 +3181,9 @@ function will be needed.
  #### References
  
  References, or pointers, are values that refer to another value.  They
-can only refer to a `struct`, though as a struct can embed anything they
-can effectively refer to anything.
+can only refer to a type that is named, which excludes arrays or other
+references.  As these can be included in a struct which is named, it is
+still possible to reference an array or reference - though indirectly.
  
  References are potentially dangerous as they might refer to some
  variable which no longer exists - either because a stack frame
@@ -3128,14 +3229,14 @@ The same syntax is used for accessing fields both in structs and in
  references to structs.  It would be correct to use `ref@.a`, but not
  necessary.
  
-`@new()` creates an object of whatever type is needed for the program
-to by type-correct.  In future iterations of Ocean, arguments a
-constructor will access arguments, so the the syntax now looks like a
-function call.  `@free` can be assigned any reference that was returned
-by `@new()`, and it will be freed.  `@nil` is a value of whatever
-reference type is appropriate, and is stable and never the address of
-anything in the heap or on the stack.  A reference can be assigned
-`@nil` or compared against that value.
+`@new()` creates an object of whatever type is needed for the program to
+by type-correct.  In future iterations of Ocean, a constructor will
+access arguments, so the the syntax now looks like a function call.
+`@free` can be assigned any reference that was returned by `@new()`, and
+it will be freed.  `@nil` is a value of whatever reference type is
+appropriate, and is stable and never the address of anything in the heap
+or on the stack.  A reference can be assigned `nil` or compared against
+that value.
  
  ###### declare terminals
         $TERM @
@@ -3192,8 +3293,9 @@ anything in the heap or on the stack.  A reference can be assigned
                 return val->ref != NULL;
         }
  
-       static struct type *reference_fieldref(struct type *t, struct parse_context *c,
-                                              struct fieldref *f, struct value **vp)
+       static struct type *reference_fieldref(
+               struct type *t, struct parse_context *c, struct fieldref *f,
+               struct value **vp)
         {
                 struct type *rt = t->reference.referent;
  
@@ -3341,7 +3443,6 @@ anything in the heap or on the stack.  A reference can be assigned
                 break;  // NOTEST
         }
  
-
  ###### interp exec cases
         case Xref: {
                 struct ref *r = cast(ref, e);
@@ -3433,7 +3534,6 @@ anything in the heap or on the stack.  A reference can be assigned
                                         rvtype->name.len, rvtype->name.txt);
                 break;
  
-
  #### Functions
  
  A function is a chunk of code which can be passed parameters and can
@@ -3446,7 +3546,7 @@ list, such as
  
  ##### Example: function 1
  
-       func main(av:[ac::number]string; env:[envc::number]string)
+       func main(av:[]string; env:[]string)
                 code block
  
  or as an indented list of one parameter per line (though each line can
@@ -3455,8 +3555,8 @@ be a ';' separated list)
  ##### Example: function 2
  
         func main
-               argv:[argc::number]string
-               env:[envc::number]string
+               argv:[]string
+               env:[]string
         do
                 code block
  
@@ -3810,7 +3910,7 @@ it in the "SimpleStatement Grammar" which will be described later.
  
  ## Complex executables: statements and expressions
  
-Now that we have types and values and variables and most of the basic
+Now that we have types, values, variables, and most of the basic
  Terms which provide access to these, we can explore the more complex
  code that combine all of these to get useful work done.  Specifically
  statements and expressions.
@@ -3836,7 +3936,7 @@ first - to start the precedence list.
  Conditional expressions are of the form "value `if` condition `else`
  other_value".  They associate to the right, so everything to the right
  of `else` is part of an else value, while only a higher-precedence to
-the left of `if` is the if values.  Between `if` and `else` there is no
+the left of `if` is the if value.  Between `if` and `else` there is no
  room for ambiguity, so a full conditional expression is allowed in
  there.
  
@@ -4074,7 +4174,7 @@ expression operator, and the `CMPop` non-terminal will match one of them.
                 if (t)
                         propagate_types(b->right, c, perr, t, 0);
                 else {
-                       t = propagate_types(b->right, c, perr, NULL, 0);        // NOTEST
+                       t = propagate_types(b->right, c, perr, NULL, 0); // NOTEST
                         if (t)  // NOTEST
                                 t = propagate_types(b->left, c, perr, t, 0);    // NOTEST
                 }
@@ -4114,13 +4214,13 @@ expression operator, and the `CMPop` non-terminal will match one of them.
  The remaining expressions with the highest precedence are arithmetic,
  string concatenation, string conversion, and testing.  String concatenation
  (`++`) has the same precedence as multiplication and division, but lower
-than the uniary.
+than the unary.
  
-Testing comes in two forms.  A single question mark (`?`) is a uniary
+Testing comes in two forms.  A single question mark (`?`) is a unary
  operator which converts come types into Boolean.  The general meaning is
-"is this a value value" and there will be more uses as the language
-develops.  A double questionmark (`??`) is a binary operator (Choose),
-with same precedence as multiplication, which returns the LHS if it
+"is this a valid value" and there will be more uses as the language
+develops.  A double question-mark (`??`) is a binary operator (Choose),
+with the same precedence as multiplication, which returns the LHS if it
  tests successfully, else returns the RHS.
  
  String conversion is a temporary feature until I get a better type
@@ -4448,14 +4548,8 @@ contains simple statements.  So both of:
  
         if condition { a=b; print f }
  
-are valid.
-
-In either case the list is constructed from a `binode` list with
-`Block` as the operator.  When parsing the list it is most convenient
-to append to the end, so a list is a list and a statement.  When using
-the list it is more convenient to consider a list to be a statement
-and a list.  So we need a function to re-order a list.
-`reorder_bilist` serves this purpose.
+are valid.  In either case the list is constructed from a `binode` list
+with `Block` as the operator.
  
  The only stand-alone statement we introduce at this stage is `pass`
  which does nothing and is represented as a `NULL` pointer in a `Block`
@@ -4479,8 +4573,7 @@ the common header for all reductions to use.
         Block -> { IN OptNL Statementlist OUT OptNL } ${ $0 = $<Sl; }$
         |        { SimpleStatements } ${ $0 = reorder_bilist($<SS); }$
         |        SimpleStatements ; ${ $0 = reorder_bilist($<SS); }$
-       |        SimpleStatements EOL ${ $0 = reorder_bilist($<SS); 
-               }$
+       |        SimpleStatements EOL ${ $0 = reorder_bilist($<SS); }$
         |        IN OptNL Statementlist OUT ${ $0 = $<Sl; }$
  
         OpenBlock -> OpenScope { IN OptNL Statementlist OUT OptNL } ${ $0 = $<Sl; }$
@@ -4613,7 +4706,7 @@ expressions and prints the values separated by spaces and terminated
  by a newline.  No control of formatting is possible.
  
  `print` uses `ExpressionList` to collect the expressions and stores them
-on the left side of a `Print` binode unlessthere is a trailing comma
+on the left side of a `Print` binode unless there is a trailing comma
  when the list is stored on the `right` side and no trailing newline is
  printed.
  
@@ -5312,9 +5405,12 @@ casepart` to track a list of case parts.
                              cp && !t; cp = cp->next)
                                 t = propagate_types(cp->value, c, perr, NULL, 0);
                         if (!t && cs->condpart)
-                               t = propagate_types(cs->condpart, c, perr, NULL, Rboolok);      // NOTEST
+                               t = propagate_types(cs->condpart, c, perr, // NOTEST
+                                                   NULL, Rboolok);
                         if (!t && cs->looppart)
-                               t = propagate_types(cs->looppart, c, perr, NULL, Rboolok);      // NOTEST
+       
+                       t = propagate_types(cs->looppart, c, perr, NULL, // NOTEST
+                                                   Rboolok);
                         // Now we have a type (I hope) push it down
                         if (t) {
                                 for (cp = cs->casepart; cp; cp = cp->next)
@@ -5402,12 +5498,8 @@ All the language elements so far can be used in various places.  Now
  it is time to clarify what those places are.
  
  At the top level of a file there will be a number of declarations.
-Many of the things that can be declared haven't been described yet,
-such as functions, procedures, imports, and probably more.
-For now there are two sorts of things that can appear at the top
-level.  They are predefined constants, `struct` types, and the `main`
-function.  While the syntax will allow the `main` function to appear
-multiple times, that will trigger an error if it is actually attempted.
+These can be for predefined constants, `struct` types, and functions -
+particularly the `main` function.
  
  The various declarations do not return anything.  They store the
  various declarations in the parse context.
@@ -5448,9 +5540,9 @@ always `InScope`, even before(!) they have been declared.  The value of
  a top level constant can be given as an expression, and this is
  evaluated after parsing and before execution.
  
-A function call can be used to evaluate a constant, but it will not have
-access to any program state, once such statement becomes meaningful.
-e.g.  arguments and filesystem will not be visible.
+A function call can syntactically be used to evaluate a constant, but as
+yet we don't detect which functions are safe to use that way, so this
+does not actually work.
  
  Constants are defined in a section that starts with the reserved word
  `const` and then has a block with a list of assignment statements.
@@ -5869,11 +5961,6 @@ is a bit more interesting at this level.
                 c->local = NULL;
         }
  
-###### ast functions
-       void free_variable(struct variable *v)
-       {
-       }
-
  ## And now to test it out.
  
  Having a language requires having a "hello world" program.  I'll
@@ -5902,6 +5989,11 @@ things which will likely grow as the languages grows.
                 name:string
                 alive:Boolean
  
+       func fibonacci(n:number):number
+               if n <= 2:
+                       use 1
+               else use fibonacci(n-1) + fibonacci(n-2)
+
         func main(argv:[]string)
                 print "Hello World, what lovely oceans you have!"
                 print "Are there", five, "?"
@@ -5953,6 +6045,13 @@ things which will likely grow as the languages grows.
                         f1 = f2
                         f2 = f3
                 print ""
+               for
+                       f := 1
+                       print "Fibonacci:",
+               then f = f + 1
+               while f < 13:
+                       print "", fibonacci(f),
+               print
  
                 /* Binary search... */
                 for
author	NeilBrown <neil@brown.name>
	Tue, 28 Dec 2021 00:44:56 +0000 (11:44 +1100)
committer	NeilBrown <neil@brown.name>
	Tue, 28 Dec 2021 04:45:05 +0000 (15:45 +1100)
csrc/oceani-tests.mdc		patch \| blob \| history
csrc/oceani.mdc		patch \| blob \| history