1 # Ocean Interpreter - Cataract Creek version
3 Ocean is intended to be a compiled language, so this interpreter is
4 not targeted at being the final product. It is, rather, an intermediate
5 stage and fills that role in two distinct ways.
7 Firstly, it exists as a platform to experiment with the early language
8 design. An interpreter is easy to write and easy to get working, so
9 the barrier for entry is lower if I aim to start with an interpreter.
11 Secondly, the plan for the Ocean compiler is to write it in the
12 [Ocean language](http://ocean-lang.org). To achieve this we naturally
13 need some sort of boot-strap process and this interpreter - written in
14 portable C - will fill that role. It will be used to bootstrap the
17 Two features that are not needed to fill either of these roles are
18 performance and completeness. The interpreter only needs to be fast
19 enough to run small test programs and occasionally to run the compiler
20 on itself. It only needs to be complete enough to test aspects of the
21 design which are developed before the compiler is working, and to run
22 the compiler on itself. Any features not used by the compiler when
23 compiling itself are superfluous. They may be included anyway, but
26 Nonetheless, the interpreter should end up being reasonably complete,
27 and any performance bottlenecks which appear and are easily fixed, will
32 This fourth version of the interpreter showcases the latest iteration of
33 the design for parsing indents and line breaks, provides a first cut of
34 support for references and functions, and introduces some syntax for a
35 array with run-time length - currently only used for "argv".
37 Indents are now explicit elements of the grammar and we no longer
38 require a colon before an indent. The colon still appears on "if" and
39 "while" statements and others but it now marks the end of the
40 expression. Where there is no expression, such as after "else", there
41 is no colon - an indent can immediately follow the "else".
43 References can refer objects of any type which has a name, and this
44 explicitly excludes other pointers and arrays. This makes a very clear
45 difference between references and things that they refer to, which we
46 will see makes the description of function parameters simpler. As a
47 structure can hold a reference it is still quite possible to have a
48 reference to a reference, but there will be a structure which keeps the
49 inner and outer clearly separate.
51 Functions can receive by-value or by-reference parameters with by-ref being
52 declared like references. If a non-reference is passed to a reference
53 parameter, it is passed by-reference. Functions can return a single
54 value, or can return a collections of values which acts like a structure.
56 The only IO that is currently possible is that "input" can be received
57 in the sense that command line arguments are available to `main()`, and
58 "output" can be generated with the "print" statement. It is quite
59 possible that neither of these will remain in the final language.
61 The current scalar types are "number", "Boolean", "string" and the new
62 "reference". Boolean will likely stay in its current form, "string" and
63 "number" are still open to being revised. Compound types are structures
68 Versions of the interpreter which obviously do not support a complete
69 language will be named after creeks and streams. This one is Cataract
72 Once we have support for methods, the names of rivers will be used.
73 With semantic analysis can start tracking changes to effective types
74 typing within the code (e.g. a ref becoming "known to not be NULL"),
75 names of lakes will be used.
77 Early versions of the compiler will be named after seas. Major
78 releases of the compiler will be named after oceans. Hopefully I will
79 be finished once I get to the Pacific Ocean release - otherwise I might
80 need to use Lunar Maria.
84 As well as parsing and executing a program, the interpreter can print
85 out the program from the parsed internal structure. This is useful
86 for validating the parsing.
87 So the main requirements of the interpreter are:
89 - Parse the program, possibly with tracing,
90 - Analyse the parsed program to ensure consistency and deduce implicit
93 - Execute the "main" function in the program, if no parsing or
94 consistency errors were found.
96 This is all performed by a single C program extracted with
97 `parsergen`, using the `scanner` library.
99 There will be two formats for printing the program: a default that uses
100 indenting to show structure and an alternate that uses bracketing. So a
101 `--bracket` command line option is needed for that.
103 The program appear in an "`mdcode`" file and is normally the first
104 top-level code section found. However an alternate section can
105 be requested so that a file (such as this one) can contain multiple
106 programs. This is effected with the `--section` option.
108 This code must be compiled with `-fplan9-extensions` so that anonymous
109 structures can be used.
111 The information gathered while parsing, and used while executing, is all
112 stored in a single `parse_context` data structure. And this exposition
113 of the program progresses we will add various fields to this structure.
114 It will be pass to many function, and all reduction code (called by the
115 parsing engine) will have easy access to it.
117 ###### File: oceani.mk
119 myCFLAGS := -Wall -g -fplan9-extensions
120 CFLAGS := $(filter-out $(myCFLAGS),$(CFLAGS)) $(myCFLAGS)
121 myLDLIBS:= libparser.o libscanner.o libmdcode.o -licuuc
122 LDLIBS := $(filter-out $(myLDLIBS),$(LDLIBS)) $(myLDLIBS)
124 all :: $(LDLIBS) oceani
125 oceani.c oceani.h : oceani.mdc parsergen
126 ./parsergen -o oceani --LALR --tag Parser oceani.mdc
127 oceani.mk: oceani.mdc md2c
130 oceani: oceani.o $(LDLIBS)
131 $(CC) $(CFLAGS) -o oceani oceani.o $(LDLIBS)
133 ###### Parser: header
135 struct parse_context;
138 struct parse_context {
139 struct token_config config;
147 #define container_of(ptr, type, member) ({ \
148 const typeof( ((type *)0)->member ) *__mptr = (ptr); \
149 (type *)( (char *)__mptr - offsetof(type,member) );})
151 #define config2context(_conf) container_of(_conf, struct parse_context, \
154 ###### Parser: reduce
155 struct parse_context *c = config2context(config);
163 #include <sys/mman.h>
182 static char Usage[] =
183 "Usage: oceani --trace --print --noexec --brackets --section=SectionName prog.ocn\n";
184 static const struct option long_options[] = {
185 {"trace", 0, NULL, 't'},
186 {"print", 0, NULL, 'p'},
187 {"noexec", 0, NULL, 'n'},
188 {"brackets", 0, NULL, 'b'},
189 {"section", 1, NULL, 's'},
192 const char *options = "tpnbs";
194 /* pr_err() is used to report inconsistencies in the mdcode,
195 * particularly missing or duplicate section names.
197 static void pr_err(char *msg) // NOTEST
199 fprintf(stderr, "%s\n", msg); // NOTEST
202 int main(int argc, char *argv[])
207 struct section *s = NULL, *ss;
208 char *section = NULL;
209 struct parse_context context = {
211 ## scanner configuration
214 int doprint=0, dotrace=0, doexec=1, brackets=0;
216 while ((opt = getopt_long(argc, argv, options, long_options, NULL))
219 case 't': dotrace=1; break;
220 case 'p': doprint=1; break;
221 case 'n': doexec=0; break;
222 case 'b': brackets=1; break;
223 case 's': section = optarg; break;
224 default: fprintf(stderr, Usage);
228 if (optind >= argc) {
229 fprintf(stderr, "oceani: no input file given\n");
232 fd = open(argv[optind], O_RDONLY);
234 fprintf(stderr, "oceani: cannot open %s\n", argv[optind]);
237 context.file_name = argv[optind];
238 len = lseek(fd, 0, 2);
239 file = mmap(NULL, len, PROT_READ, MAP_SHARED, fd, 0);
240 s = code_extract(file, file+len, pr_err);
242 fprintf(stderr, "oceani: could not find any code in %s\n",
247 ## context initialization
250 for (ss = s; ss; ss = ss->next) {
251 struct text sec = ss->section;
252 if (sec.len == strlen(section) &&
253 strncmp(sec.txt, section, sec.len) == 0)
257 fprintf(stderr, "oceani: cannot find section %s\n",
264 fprintf(stderr, "oceani: no code found in requested section\n"); // NOTEST
265 goto cleanup; // NOTEST
268 parse_oceani(ss->code, &context.config, dotrace ? stderr : NULL);
270 resolve_consts(&context);
271 prepare_types(&context);
272 if (!context.parse_error && !analyse_funcs(&context)) {
273 fprintf(stderr, "oceani: type error in program - not running.\n");
274 context.parse_error += 1;
282 if (doexec && !context.parse_error)
283 interp_main(&context, argc - optind, argv + optind);
286 struct section *t = s->next;
291 // FIXME parser should pop scope even on error
292 while (context.scope_depth > 0)
296 ## free context types
297 ## free context storage
298 exit(context.parse_error ? 1 : 0);
301 Minimal configuration is needed for the scanner. Unknown marks
302 (punctuation) are not permitted so we ignore that. A wide range of
303 character are permitted in numbers, so that both period and comma can
304 be used for the decimal marker, and both space and underscore can be
305 used to separate groups of digits. Only the defauls are allowed in
306 identifiers with the exception that underscore can both start and
307 continue an identifier.
309 ###### scanner configuration
310 .ignored = (1 << TK_mark),
311 .number_chars = ".,_+- ",
317 The four requirements of parse, analyse, print, and interpret apply to
318 each language element individually so that is how most of the code will
321 Three of the four are fairly self explanatory. The one that requires
322 a little explanation is the analysis step.
324 The current language design does not require the types of variables to
325 be declared, but they must still have a single type. Different
326 operations impose different requirements on the variables, for example
327 addition requires both arguments to be numeric, and assignment requires
328 the variable on the left to have the same type as the expression on the
329 right, or to be a reference to that type. There are currently no type
330 that are distinct but compatible, though that will change when more
331 numeric types are introduced and again when interfaces are added. Until
332 then the type or a variable is determined either from the declaration of
333 the initial assignment, but the code tries not to assume that.
335 Analysis involves propagating these type requirements around and
336 consequently setting the type of each variable. If any requirements are
337 violated (e.g. a string is compared with a number) or if a variable
338 needs to have two different types, then an error is raised and the
339 program will not run.
341 If the same variable is declared in both branches of an 'if/else', or
342 in all cases of a 'switch' then the multiple instances may be merged
343 into just one variable if the variable is referenced after the
344 conditional statement. When this happens, the types must naturally be
345 consistent across all the branches. When the variable is not used
346 outside the if, the variables in the different branches are distinct
347 and can be of different types.
349 Local variables, global constants, and functions are all named in the
350 same namespace. If a name is used before it is declared, it is assumed
351 to be global, either a constant or a function. It must be declare
352 eventually, and this is checked in the analysis phase after all code has
355 As we will see, the condition part of a `while` statement can return
356 either a Boolean or some other type, and as we have seen an asignment to
357 a reference allows either a reference or the refered-to type to be
358 given. This requires that the expected type that gets passed around is
359 accompanied by some flags, one to indicate that `Boolean` is also
360 permitted and one to indicate that a reference to the given type is also
363 Possibly the most interesting part of analysis at present involves some
364 flags can be set during analysis of an expression, such as whether it
365 can be used and a "lvalue" (i.e. it can be assigned to) and whether it
366 can be computed at compile-time, or whether it must wait until runtime.
367 These will be introduced in due course.
371 When analysis discovers an inconsistency it needs to report an error;
372 just refusing to run the code ensures that the error doesn't cascade,
373 but by itself it isn't very useful. A clear understanding of the sort
374 of error messages that are useful will help guide the process of
377 At a simplistic level, the only sort of error that type analysis can
378 report is that the type of some construct doesn't match a contextual
379 requirement. For example, in `4 + "hello"` the addition provides a
380 contextual requirement for numbers, but `"hello"` is not a number. In
381 this particular example no further information is needed as the types
382 are obvious from local information. When a variable is involved that
383 isn't the case. It may be helpful to explain why the variable has a
384 particular type, by indicating the location where the type was set,
385 whether by declaration or usage.
387 Using a recursive-descent analysis we can easily detect a problem at
388 multiple locations. In "`hello := "there"; print 4 + hello`" the
389 addition will detect that one argument is not a number and the usage of
390 `hello` will detect that a number was wanted, but not provided. We
391 could generate an error at either location, or even at both. In this
392 version of the language, we pass down the expected type, and the handler
393 for variables notices that `hello` is not the correct type and reports
394 an error. So errors are large reported at the leaves.
396 To be able to report locations for errors, each language element will
397 need to record a file location (line and column) and each variable will
398 need to record the language element where its type was set. For now we
399 will assume that each line of an error message indicates one location in
400 the file, and up to 2 types. So we provide a `printf`-like function
401 which takes a format, a location (a `struct exec` which has not yet been
402 introduced), and 2 types. "`%1`" reports the first type, "`%2`" reports
403 the second. We will need a function to print the location, once we know
404 how that is stored. As explained earlier, there are sometimes extra
405 rules for type matching (to accept Bool or reference) and they might
406 affect error messages, we need to pass those in too.
408 As well as type errors, we sometimes need to report problems with
409 tokens, which might be unexpected or badly formatted, or might be a name
410 that has been defined twice or not at all. For these we have
411 `tok_err()` which reports an error with a given token. Each of the
412 error functions updates a count of error in the context to indicate that
413 parsing failed. We use a counter so it is easy to determine if a new
414 error occurred during a particular stage of analysis.
418 static void fput_loc(struct exec *loc, FILE *f);
419 static void type_err(struct parse_context *c,
420 char *fmt, struct exec *loc,
421 struct type *t1, enum val_rules rules, struct type *t2);
422 static void tok_err(struct parse_context *c, char *fmt, struct token *t);
424 ###### core functions
426 static void type_err(struct parse_context *c,
427 char *fmt, struct exec *loc,
428 struct type *t1, enum val_rules rules, struct type *t2)
430 fprintf(stderr, "%s:", c->file_name);
431 fput_loc(loc, stderr);
432 for (; *fmt ; fmt++) {
439 case '%': fputc(*fmt, stderr); break; // NOTEST
440 default: fputc('?', stderr); break; // NOTEST
442 type_print(t1, stderr);
445 type_print(t2, stderr);
454 static void tok_err(struct parse_context *c, char *fmt, struct token *t)
456 fprintf(stderr, "%s:%d:%d: %s: %.*s\n", c->file_name, t->line, t->col, fmt,
457 t->txt.len, t->txt.txt);
461 ## Entities: values, types, variables, and code.
463 It could be said that the focus of a language is values, which are
464 organised into types, stored in variables, and manipulated with
465 executable code. This section introduces each of these entities and
466 provide a foundation for them. Once they are all in place, the next
467 section will flesh them out, particular the mode complex executable code
472 Executables can be lots of different things. In many cases an
473 executable is just an operation combined with one or two other
474 executables. This allows for expressions and lists etc. Other times an
475 executable is something quite specific like a constant or variable name.
476 So we define a `struct exec` to be a general executable with a type, and
477 a `struct binodes` which is a subclass of `exec`, forms a node in a
478 binary tree, and holds an operation. The simplest operation is "List"
479 which can be used to combine several execs together.
481 When parsing a list of binodes, whether with the `List` operator or some
482 other, it is most convenient to append to the end, so a list is a list
483 and a thin. When using the list it is more convenient to consider
484 a list to be a thing and a list. So we need a function to re-order
485 a list. `reorder_bilist` serves this purpose.
487 There will be other subclasses, and to access these we need to be able
488 to `cast` the `exec` into the various other types. The first field in
489 any `struct exec` is the type from the `exec_types` enum.
492 #define cast(structname, pointer) ({ \
493 const typeof( ((struct structname *)0)->type) *__mptr = \
495 if (__mptr && *__mptr != X##structname) abort(); \
496 (struct structname *)( (char *)__mptr);})
498 #define new(structname) ({ \
499 struct structname *__ptr = ((struct structname *)calloc( \
500 1,sizeof(struct structname))); \
501 __ptr->type = X##structname; \
502 __ptr->line = -1; __ptr->column = -1; \
505 #define new_pos(structname, token) ({ \
506 struct structname *__ptr = ((struct structname *)calloc( \
507 1,sizeof(struct structname))); \
508 __ptr->type = X##structname; \
509 __ptr->line = token.line; __ptr->column = token.col; \
518 enum exec_types type;
528 struct exec *left, *right;
533 static int __fput_loc(struct exec *loc, FILE *f)
537 if (loc->line >= 0) {
538 fprintf(f, "%d:%d: ", loc->line, loc->column);
541 if (loc->type == Xbinode)
542 return __fput_loc(cast(binode,loc)->left, f) ||
543 __fput_loc(cast(binode,loc)->right, f); // NOTEST
546 static void fput_loc(struct exec *loc, FILE *f)
548 if (!__fput_loc(loc, f))
549 fprintf(f, "??:??: "); // NOTEST
552 // Move all nodes from 'b' to 'rv', reversing their order.
553 // In 'b' 'left' is a list, and 'right' is the last node.
554 // In 'rv', left' is the first node and 'right' is a list.
555 static struct binode *reorder_bilist(struct binode *b)
557 struct binode *rv = NULL;
560 struct exec *t = b->right;
564 b = cast(binode, b->left);
572 Each different type of `exec` node needs a number of functions defined,
573 a bit like methods. We must be able to free it, print it, analyse it
574 and execute it. Once we have specific `exec` types we will need to
575 parse them too. Let's take this a bit more slowly.
579 The parser generator requires a `free_foo` function for each struct
580 that stores attributes and they will often be `exec`s and subtypes
581 there-of. So we need `free_exec` which can handle all the subtypes,
582 and we need `free_binode`.
586 static void free_binode(struct binode *b)
595 ###### core functions
596 static void free_exec(struct exec *e)
607 static void free_exec(struct exec *e);
609 ###### free exec cases
610 case Xbinode: free_binode(cast(binode, e)); break;
614 Printing an `exec` requires that we know the current indent level for
615 printing line-oriented components. As will become clear later, we
616 also want to know what sort of bracketing to use. It will also be used
617 to sometime print comments after an exec to explain some of the results
622 static void do_indent(int i, char *str)
629 ###### core functions
630 static void print_binode(struct binode *b, int indent, int bracket)
634 case List: abort(); // must be handled by parent NOTEST
635 ## print binode cases
639 static void print_exec(struct exec *e, int indent, int bracket)
645 print_binode(cast(binode, e), indent, bracket); break;
653 static void print_exec(struct exec *e, int indent, int bracket);
657 As discussed, analysis involves propagating type requirements around the
658 program and looking for errors.
660 So `propagate_types` is passed an expected type (being a `struct type`
661 pointer together with some `val_rules` flags) that the `exec` is
662 expected to return, and returns the type that it does return, either of
663 which can be `NULL` signifying "unknown". A `prop_err` flag set is
664 passed by reference. It has `Efail` set when an error is found, and
665 `Eretry` when the type for some element is set via propagation. If
666 any expression cannot be evaluated a compile time, `Eruntime` is set.
667 If the expression can be copied, `Emaycopy` is set.
669 If `Erval` is set, then the value cannot be assigned to because it is
670 a temporary result. If `Erval` is clear but `Econst` is set, then
671 the value can only be assigned once, when the variable is declared.
673 Various propagate cases can pass "perr_local" to analyse components of
674 an expression which do not affect the result type of the whole
679 enum val_rules {Rboolok = 1<<0, Rrefok = 1<<1,};
680 enum prop_err {Efail = 1<<0, Eretry = 1<<1, Eruntime = 1<<2,
681 Emaycopy = 1<<3, Erval = 1<<4, Econst = 1<<5};
684 static struct type *propagate_types(
685 struct exec *prog, struct parse_context *c, enum prop_err *perr,
686 struct type *type, enum val_rules rules);
687 ###### core functions
689 static struct type *__propagate_types(
690 struct exec *prog, struct parse_context *c,
691 enum prop_err *perr, enum prop_err *perr_local,
692 struct type *type, enum val_rules rules)
699 switch (prog->type) {
702 struct binode *b = cast(binode, prog);
704 case List: abort(); // NOTEST
705 ## propagate binode cases
709 ## propagate exec cases
714 static struct type *propagate_types(struct exec *prog,
715 struct parse_context *c,
718 enum val_rules rules)
720 int pre_err = c->parse_error;
721 enum prop_err perr_local = 0;
722 struct type *ret = __propagate_types(prog, c, perr, &perr_local,
725 *perr |= perr_local & (Efail | Eretry);
726 if (c->parse_error > pre_err)
733 Interpreting an `exec` primarily requires the `exec` and the variable
734 storage information stored in the parse state. Apart from modifying
735 those variables, and possibly performing other side-effects, an exec can
736 return a value. `struct value` is used for passing around small values
737 and a pointer to that structure can be used for larger values.
739 Specifically, each `exec` case can return a value combined with a type
740 in `struct lrval`. The type may be `Tnone` but must be non-NULL. Some
741 `exec`s will return the location of a value, which can be updated, in
742 `lval`. Others will set `lval` to NULL indicating that there is a value
743 of appropriate type in `rval`.
745 Callers call either `interp_exec()` if they just want the value, or
746 `linterp_exec()~ if they need an lvalue. `dinterp_exec()` is called
747 when there is a destination for the value to go. This is used for
748 function calls which return a value that is not an lvalue, but is too
749 large to store in `struct value`.
751 Each of these call `_interp_exec()` which calls the appropriate exec case.
754 static struct value interp_exec(struct parse_context *c, struct exec *e,
755 struct type **typeret);
761 ## value union fields
767 struct value rval, *lval;
770 ###### core functions
772 /* If dest is passed, dtype must give the expected type, and
773 * result can go there, in which case type is returned as NULL.
775 static struct lrval _interp_exec(struct parse_context *c, struct exec *e,
776 struct value *dest, struct type *dtype);
778 static struct value interp_exec(struct parse_context *c, struct exec *e,
779 struct type **typeret)
781 struct lrval ret = _interp_exec(c, e, NULL, NULL);
783 if (!ret.type) abort();
787 dup_value(ret.type, ret.lval, &ret.rval);
791 static struct value *linterp_exec(struct parse_context *c, struct exec *e,
792 struct type **typeret)
794 struct lrval ret = _interp_exec(c, e, NULL, NULL);
796 if (!ret.type) abort();
800 free_value(ret.type, &ret.rval);
804 /* dinterp_exec is used when the destination type is certain and
805 * the value has a place to go.
807 static void dinterp_exec(struct parse_context *c, struct exec *e,
808 struct value *dest, struct type *dtype,
811 struct lrval ret = _interp_exec(c, e, dest, dtype);
815 free_value(dtype, dest);
817 dup_value(dtype, ret.lval, dest);
819 memcpy(dest, &ret.rval, dtype->size);
822 static struct lrval _interp_exec(struct parse_context *c, struct exec *e,
823 struct value *dest, struct type *dtype)
825 /* If the result is copied to dest, ret.type is set to NULL */
827 struct value rv = {}, *lrv = NULL;
830 rvtype = ret.type = Tnone;
840 struct binode *b = cast(binode, e);
841 struct value left, right, *lleft;
842 struct type *ltype, *rtype;
843 ltype = rtype = Tnone;
845 case List: abort(); // NOTEST
846 ## interp binode cases
848 free_value(ltype, &left);
849 free_value(rtype, &right);
859 ## interp exec cleanup
865 Values come in a wide range of types, with more likely to be added.
866 Each type needs to be able to print its own values (for convenience at
867 least, and for printing manifest constants when generating code) as well
868 as to compare two values, at least for equality and possibly for order.
869 For now, values might need to be duplicated and freed, though eventually
870 such manipulations will be better integrated into the language.
872 Named type are stored in a simple linked list. Objects of each type are
873 "values" which are often passed around by value.
875 There are both explicitly named types, and anonymous types. Anonymous
876 cannot be accessed by name, but are used internally and have a name
877 which might be reported in error messages.
879 The `prepare_type()` interface is called on a type in two circumstances.
880 After the program has been parsed but before anything in executed it is
881 called with `parse_time` set to one. This can be used for processing
882 information that was not fully available when the type description was
883 parsed, such as types of fields in structures. It is then called again
884 at runtime when a variable declaration is processed. This allows the
885 details of a type to depend on runtime context, such as the size of an
886 array being determined by a constant. In this second case the
887 `parse_time` parameter is set to zero.
893 struct token first_use;
896 void (*init)(struct type *type, struct value *val);
897 int (*prepare_type)(struct parse_context *c, struct type *type,
899 void (*print)(struct type *type, struct value *val, FILE *f);
900 void (*print_type)(struct type *type, FILE *f);
901 int (*cmp_order)(struct type *t1, struct type *t2,
902 struct value *v1, struct value *v2);
903 int (*cmp_eq)(struct type *t1, struct type *t2,
904 struct value *v1, struct value *v2);
905 void (*dup)(struct type *type, struct value *vold,
907 int (*test)(struct type *type, struct value *val);
908 void (*free)(struct type *type, struct value *val);
909 void (*free_type)(struct type *t);
918 struct type *typelist;
925 static struct type *find_type(struct parse_context *c, struct text s)
927 struct type *t = c->typelist;
929 while (t && (t->anon ||
930 text_cmp(t->name, s) != 0))
935 static struct type *_add_type(struct parse_context *c, struct text s,
936 struct type *proto, int anon)
940 n = calloc(1, sizeof(*n));
947 n->next = c->typelist;
952 static struct type *add_type(struct parse_context *c, struct text s,
955 return _add_type(c, s, proto, 0);
958 static struct type *add_anon_type(struct parse_context *c,
959 struct type *proto, char *name, ...)
965 vasprintf(&t.txt, name, ap);
967 t.len = strlen(t.txt);
968 return _add_type(c, t, proto, 1);
971 static struct type *find_anon_type(struct parse_context *c,
972 struct type *proto, char *name, ...)
974 struct type *t = c->typelist;
979 vasprintf(&nm.txt, name, ap);
981 nm.len = strlen(name);
983 while (t && (!t->anon ||
984 text_cmp(t->name, nm) != 0))
990 return _add_type(c, nm, proto, 1);
993 static void free_type(struct type *t)
995 /* The type is always a reference to something in the
996 * context, so we don't need to free anything.
1000 static void free_value(struct type *type, struct value *v)
1003 type->free(type, v);
1004 memset(v, 0x5a, type->size);
1008 static void type_print(struct type *type, FILE *f)
1011 fputs("*unknown*type*", f); // NOTEST
1012 else if (type->name.len && !type->anon)
1013 fprintf(f, "%.*s", type->name.len, type->name.txt);
1014 else if (type->print_type)
1015 type->print_type(type, f);
1016 else if (type->name.len && type->anon)
1017 fprintf(f, "\"%.*s\"", type->name.len, type->name.txt);
1019 fputs("*invalid*type*", f); // NOTEST
1022 static void val_init(struct type *type, struct value *val)
1024 if (type && type->init)
1025 type->init(type, val);
1028 static void dup_value(struct type *type,
1029 struct value *vold, struct value *vnew)
1031 if (type && type->dup)
1032 type->dup(type, vold, vnew);
1035 static int value_cmp(struct type *tl, struct type *tr,
1036 struct value *left, struct value *right)
1038 if (tl && tl->cmp_order)
1039 return tl->cmp_order(tl, tr, left, right);
1040 if (tl && tl->cmp_eq)
1041 return tl->cmp_eq(tl, tr, left, right);
1042 return -1; // NOTEST
1045 static void print_value(struct type *type, struct value *v, FILE *f)
1047 if (type && type->print)
1048 type->print(type, v, f);
1050 fprintf(f, "*Unknown*"); // NOTEST
1053 static void prepare_types(struct parse_context *c)
1057 enum { none, some, cannot } progress = none;
1062 for (t = c->typelist; t; t = t->next) {
1064 tok_err(c, "error: type used but not declared",
1066 if (t->size == 0 && t->prepare_type) {
1067 if (t->prepare_type(c, t, 1))
1069 else if (progress == cannot)
1070 tok_err(c, "error: type has recursive definition",
1080 progress = cannot; break;
1082 progress = none; break;
1087 ###### forward decls
1089 static void free_value(struct type *type, struct value *v);
1090 static int type_compat(struct type *require, struct type *have, enum val_rules rules);
1091 static void type_print(struct type *type, FILE *f);
1092 static void val_init(struct type *type, struct value *v);
1093 static void dup_value(struct type *type,
1094 struct value *vold, struct value *vnew);
1095 static int value_cmp(struct type *tl, struct type *tr,
1096 struct value *left, struct value *right);
1097 static void print_value(struct type *type, struct value *v, FILE *f);
1099 ###### free context types
1101 while (context.typelist) {
1102 struct type *t = context.typelist;
1104 context.typelist = t->next;
1112 Type can be specified for local variables, for fields in a structure,
1113 for formal parameters to functions, and possibly elsewhere. Different
1114 rules may apply in different contexts. As a minimum, a named type may
1115 always be used. Currently the type of a formal parameter can be
1116 different from types in other contexts, so we have a separate grammar
1122 Type -> IDENTIFIER ${
1123 $0 = find_type(c, $ID.txt);
1125 $0 = add_type(c, $ID.txt, NULL);
1126 $0->first_use = $ID;
1131 FormalType -> Type ${ $0 = $<1; }$
1132 ## formal type grammar
1136 Values of the base types can be numbers, which we represent as
1137 multi-precision fractions, strings, Booleans and labels. When
1138 analysing the program we also need to allow for places where no value
1139 is meaningful (type `Tnone`) and where we don't know what type to
1140 expect yet (type is `NULL`).
1142 Values are never shared, they are always copied when used, and freed
1143 when no longer needed.
1145 When propagating type information around the program, we need to
1146 determine if two types are compatible, where type `NULL` is compatible
1147 with anything. There are two special cases with type compatibility.
1148 In some cases a Boolean can be accepted as well as some other
1149 primary type. In other cases a reference to the given type is
1150 acceptable in place of a value of the type itself.
1152 ###### type functions
1154 int (*compat)(struct type *this, struct type *other, enum val_rules rules);
1156 ###### ast functions
1158 static int type_compat(struct type *require, struct type *have,
1159 enum val_rules rules)
1161 if (!require || !have)
1164 if (require->compat)
1165 return require->compat(require, have, rules);
1167 return require == have;
1172 #include "parse_string.h"
1173 #include "parse_number.h"
1176 myLDLIBS := libnumber.o libstring.o -lgmp
1177 LDLIBS := $(filter-out $(myLDLIBS),$(LDLIBS)) $(myLDLIBS)
1179 ###### type union fields
1180 enum vtype {Vnone, Vstr, Vnum, Vbool, Vlabel} vtype;
1182 ###### value union fields
1188 ###### ast functions
1189 static void _free_value(struct type *type, struct value *v)
1193 switch (type->vtype) {
1195 case Vstr: free(v->str.txt); break;
1196 case Vnum: mpq_clear(v->num); break;
1202 ###### value functions
1204 static void _val_init(struct type *type, struct value *val)
1206 switch(type->vtype) {
1207 case Vnone: // NOTEST
1210 mpq_init(val->num); break;
1212 val->str.txt = malloc(1);
1219 val->label = 0; // NOTEST
1224 static void _dup_value(struct type *type,
1225 struct value *vold, struct value *vnew)
1227 switch (type->vtype) {
1228 case Vnone: // NOTEST
1231 vnew->label = vold->label; // NOTEST
1234 vnew->bool = vold->bool;
1237 mpq_init(vnew->num);
1238 mpq_set(vnew->num, vold->num);
1241 vnew->str.len = vold->str.len;
1242 vnew->str.txt = malloc(vnew->str.len);
1243 memcpy(vnew->str.txt, vold->str.txt, vnew->str.len);
1248 static int _value_cmp(struct type *tl, struct type *tr,
1249 struct value *left, struct value *right)
1254 switch (tl->vtype) {
1255 case Vlabel: cmp = left->label == right->label ? 0 : 1; break;
1256 case Vnum: cmp = mpq_cmp(left->num, right->num); break;
1257 case Vstr: cmp = text_cmp(left->str, right->str); break;
1258 case Vbool: cmp = left->bool - right->bool; break;
1259 case Vnone: cmp = 0; // NOTEST
1264 static void _print_value(struct type *type, struct value *v, FILE *f)
1266 switch (type->vtype) {
1267 case Vnone: // NOTEST
1268 fprintf(f, "*no-value*"); break; // NOTEST
1269 case Vlabel: // NOTEST
1270 fprintf(f, "*label-%d*", v->label); break; // NOTEST
1272 fprintf(f, "%.*s", v->str.len, v->str.txt); break;
1274 fprintf(f, "%s", v->bool ? "True":"False"); break;
1279 mpf_set_q(fl, v->num);
1280 gmp_fprintf(f, "%.10Fg", fl);
1287 static void _free_value(struct type *type, struct value *v);
1289 static int bool_test(struct type *type, struct value *v)
1294 static struct type base_prototype = {
1296 .print = _print_value,
1297 .cmp_order = _value_cmp,
1298 .cmp_eq = _value_cmp,
1300 .free = _free_value,
1303 static struct type *Tbool, *Tstr, *Tnum, *Tnone, *Tlabel;
1305 ###### ast functions
1306 static struct type *add_base_type(struct parse_context *c, char *n,
1307 enum vtype vt, int size)
1309 struct text txt = { n, strlen(n) };
1312 t = add_type(c, txt, &base_prototype);
1315 t->align = size > sizeof(void*) ? sizeof(void*) : size;
1316 if (t->size & (t->align - 1))
1317 t->size = (t->size | (t->align - 1)) + 1; // NOTEST
1321 ###### context initialization
1323 Tbool = add_base_type(&context, "Boolean", Vbool, sizeof(char));
1324 Tbool->test = bool_test;
1325 Tstr = add_base_type(&context, "string", Vstr, sizeof(struct text));
1326 Tnum = add_base_type(&context, "number", Vnum, sizeof(mpq_t));
1327 Tnone = add_base_type(&context, "none", Vnone, 0);
1328 Tlabel = add_base_type(&context, "label", Vlabel, sizeof(void*));
1332 We have already met values as separate objects. When manifest constants
1333 appear in the program text, that must result in an executable which has
1334 a constant value. So the `val` structure embeds a value in an
1347 ###### ast functions
1348 struct val *new_val(struct type *T, struct token tk)
1350 struct val *v = new_pos(val, tk);
1355 ###### declare terminals
1362 $0 = new_val(Tbool, $1);
1366 $0 = new_val(Tbool, $1);
1371 $0 = new_val(Tnum, $1);
1372 if (number_parse($0->val.num, tail, $1.txt) == 0) {
1373 mpq_init($0->val.num);
1374 tok_err(c, "error: unsupported number format", &$NUM);
1376 tok_err(c, "error: unsupported number suffix", &$1);
1380 $0 = new_val(Tstr, $1);
1381 string_parse(&$1, '\\', &$0->val.str, tail);
1383 tok_err(c, "error: unsupported string suffix",
1388 $0 = new_val(Tstr, $1);
1389 string_parse(&$1, '\\', &$0->val.str, tail);
1391 tok_err(c, "error: unsupported string suffix",
1395 ###### print exec cases
1398 struct val *v = cast(val, e);
1399 if (v->vtype == Tstr)
1401 // FIXME how to ensure numbers have same precision.
1402 print_value(v->vtype, &v->val, stdout);
1403 if (v->vtype == Tstr)
1408 ###### propagate exec cases
1411 struct val *val = cast(val, prog);
1412 if (!type_compat(type, val->vtype, rules))
1413 type_err(c, "error: expected %1 found %2",
1414 prog, type, rules, val->vtype);
1419 ###### interp exec cases
1421 rvtype = cast(val, e)->vtype;
1422 dup_value(rvtype, &cast(val, e)->val, &rv);
1425 ###### ast functions
1426 static void free_val(struct val *v)
1429 free_value(v->vtype, &v->val);
1433 ###### free exec cases
1434 case Xval: free_val(cast(val, e)); break;
1438 Labels are a temporary concept until I implement enums. There are an
1439 anonymous enum which is declared by usage. They are only allowed in
1440 `use` statements and corresponding `case` entries. They appear as a
1441 period followed by an identifier. All identifiers that are "used" must
1444 For now, we have a global list of labels, and don't check that all "use"
1456 ###### free exec cases
1460 ###### print exec cases
1462 struct label *l = cast(label, e);
1463 printf(".%.*s", l->name.len, l->name.txt);
1469 struct labels *next;
1473 ###### parse context
1474 struct labels *labels;
1476 ###### ast functions
1477 static int label_lookup(struct parse_context *c, struct text name)
1479 struct labels *l, **lp = &c->labels;
1480 while (*lp && text_cmp((*lp)->name, name) < 0)
1482 if (*lp && text_cmp((*lp)->name, name) == 0)
1483 return (*lp)->value;
1484 l = calloc(1, sizeof(*l));
1487 if (c->next_label == 0)
1489 l->value = c->next_label;
1495 ###### free context storage
1496 while (context.labels) {
1497 struct labels *l = context.labels;
1498 context.labels = l->next;
1502 ###### declare terminals
1506 struct label *l = new_pos(label, $ID);
1510 ###### propagate exec cases
1512 struct label *l = cast(label, prog);
1513 l->value = label_lookup(c, l->name);
1514 if (!type_compat(type, Tlabel, rules))
1515 type_err(c, "error: expected %1 found %2",
1516 prog, type, rules, Tlabel);
1520 ###### interp exec cases
1522 struct label *l = cast(label, e);
1523 rv.label = l->value;
1530 Variables are scoped named values. We store the names in a linked list
1531 of "bindings" sorted in lexical order, and use sequential search and
1538 struct binding *next; // in lexical order
1542 This linked list is stored in the parse context so that "reduce"
1543 functions can find or add variables, and so the analysis phase can
1544 ensure that every variable gets a type.
1546 ###### parse context
1548 struct binding *varlist; // In lexical order
1550 ###### ast functions
1552 static struct binding *find_binding(struct parse_context *c, struct text s)
1554 struct binding **l = &c->varlist;
1559 (cmp = text_cmp((*l)->name, s)) < 0)
1563 n = calloc(1, sizeof(*n));
1570 Each name can be linked to multiple variables defined in different
1571 scopes. Each scope starts where the name is declared and continues
1572 until the end of the containing code block. Scopes of a given name
1573 cannot nest, so a declaration while a name is in-scope is an error.
1575 ###### binding fields
1576 struct variable *var;
1580 struct variable *previous;
1582 struct binding *name;
1583 struct exec *where_decl;// where name was declared
1584 struct exec *where_set; // where type was set
1588 The parser will want to be able to free a variable, but as this will
1589 also be a reference to something stored in the parse context, there is
1592 ###### ast functions
1593 void free_variable(struct variable *v)
1597 When a scope closes, the values of the variables might need to be freed.
1598 This happens in the context of some `struct exec` and each `exec` will
1599 need to know which variables need to be freed when it completes. To
1600 improve visibility, we add a comment when printing any `exec` that
1601 embodies a scope to list the variables that must be freed when it ends.
1604 struct variable *to_free;
1606 ####### variable fields
1607 struct exec *cleanup_exec;
1608 struct variable *next_free;
1610 ####### interp exec cleanup
1613 for (v = e->to_free; v; v = v->next_free) {
1614 struct value *val = var_value(c, v);
1615 free_value(v->type, val);
1619 ###### print exec extras
1622 do_indent(indent, "/* FREE");
1623 for (v = e->to_free; v; v = v->next_free) {
1624 printf(" %.*s", v->name->name.len, v->name->name.txt);
1625 printf("[%d,%d]", v->scope_start, v->scope_end);
1626 if (v->frame_pos >= 0)
1627 printf("(%d+%d)", v->frame_pos,
1628 v->type ? v->type->size:0);
1633 ###### ast functions
1634 static void variable_unlink_exec(struct variable *v)
1636 struct variable **vp;
1637 if (!v->cleanup_exec)
1639 for (vp = &v->cleanup_exec->to_free;
1640 *vp; vp = &(*vp)->next_free) {
1644 v->cleanup_exec = NULL;
1649 While the naming seems strange, we include local constants in the
1650 definition of variables. A name declared `var := value` can
1651 subsequently be changed, but a name declared `var ::= value` cannot -
1654 ###### variable fields
1657 Scopes in parallel branches can be partially merged. More
1658 specifically, if a given name is declared in both branches of an
1659 if/else then its scope is a candidate for merging. Similarly if
1660 every branch of an exhaustive switch (e.g. has an "else" clause)
1661 declares a given name, then the scopes from the branches are
1662 candidates for merging.
1664 Note that names declared inside a loop (which is only parallel to
1665 itself) are never visible after the loop. Similarly names defined in
1666 scopes which are not parallel, such as those started by `for` and
1667 `switch`, are never visible after the scope. Only variables defined in
1668 both `then` and `else` (including the implicit then after an `if`, and
1669 excluding `then` used with `for`) and in all `case`s and `else` of a
1670 `switch` or `while` can be visible beyond the `if`/`switch`/`while`.
1672 Labels, which are a bit like variables, follow different rules.
1673 Labels are not explicitly declared, but if an undeclared name appears
1674 in a context where a label is legal, that effectively declares the
1675 name as a label. The declaration remains in force (or in scope) at
1676 least to the end of the immediately containing block and conditionally
1677 in any larger containing block which does not declare the name in some
1678 other way. Importantly, the conditional scope extension happens even
1679 if the label is only used in one parallel branch of a conditional --
1680 when used in one branch it is treated as having been declared in all
1683 Merge candidates are tentatively visible beyond the end of the
1684 branching statement which creates them. If the name is used, the
1685 merge is affirmed and they become a single variable visible at the
1686 outer layer. If not - if it is redeclared first - the merge lapses.
1688 To track scopes we have an extra stack, implemented as a linked list,
1689 which roughly parallels the parse stack and which is used exclusively
1690 for scoping. When a new scope is opened, a new frame is pushed and
1691 the child-count of the parent frame is incremented. This child-count
1692 is used to distinguish between the first of a set of parallel scopes,
1693 in which declared variables must not be in scope, and subsequent
1694 branches, whether they may already be conditionally scoped.
1696 We need a total ordering of scopes so we can easily compare to variables
1697 to see if they are concurrently in scope. To achieve this we record a
1698 `scope_count` which is actually a count of both beginnings and endings
1699 of scopes. Then each variable has a record of the scope count where it
1700 enters scope, and where it leaves.
1702 To push a new frame *before* any code in the frame is parsed, we need a
1703 grammar reduction. This is most easily achieved with a grammar
1704 element which derives the empty string, and creates the new scope when
1705 it is recognised. This can be placed, for example, between a keyword
1706 like "if" and the code following it.
1710 struct scope *parent;
1714 ###### parse context
1717 struct scope *scope_stack;
1719 ###### variable fields
1720 int scope_start, scope_end;
1722 ###### ast functions
1723 static void scope_pop(struct parse_context *c)
1725 struct scope *s = c->scope_stack;
1727 c->scope_stack = s->parent;
1729 c->scope_depth -= 1;
1730 c->scope_count += 1;
1733 static void scope_push(struct parse_context *c)
1735 struct scope *s = calloc(1, sizeof(*s));
1737 c->scope_stack->child_count += 1;
1738 s->parent = c->scope_stack;
1740 c->scope_depth += 1;
1741 c->scope_count += 1;
1747 OpenScope -> ${ scope_push(c); }$
1749 Each variable records a scope depth and is in one of four states:
1751 - "in scope". This is the case between the declaration of the
1752 variable and the end of the containing block, and also between
1753 the usage with affirms a merge and the end of that block.
1755 The scope depth is not greater than the current parse context scope
1756 nest depth. When the block of that depth closes, the state will
1757 change. To achieve this, all "in scope" variables are linked
1758 together as a stack in nesting order.
1760 - "pending". The "in scope" block has closed, but other parallel
1761 scopes are still being processed. So far, every parallel block at
1762 the same level that has closed has declared the name.
1764 The scope depth is the depth of the last parallel block that
1765 enclosed the declaration, and that has closed.
1767 - "conditionally in scope". The "in scope" block and all parallel
1768 scopes have closed, and no further mention of the name has been seen.
1769 This state includes a secondary nest depth (`min_depth`) which records
1770 the outermost scope seen since the variable became conditionally in
1771 scope. If a use of the name is found, the variable becomes "in scope"
1772 and that secondary depth becomes the recorded scope depth. If the
1773 name is declared as a new variable, the old variable becomes "out of
1774 scope" and the recorded scope depth stays unchanged.
1776 - "out of scope". The variable is neither in scope nor conditionally
1777 in scope. It is permanently out of scope now and can be removed from
1778 the "in scope" stack. When a variable becomes out-of-scope it is
1779 moved to a separate list (`out_scope`) of variables which have fully
1780 known scope. This will be used at the end of each function to assign
1781 each variable a place in the stack frame.
1783 ###### variable fields
1784 int depth, min_depth;
1785 enum { OutScope, PendingScope, CondScope, InScope } scope;
1786 struct variable *in_scope;
1788 ###### parse context
1790 struct variable *in_scope;
1791 struct variable *out_scope;
1793 All variables with the same name are linked together using the
1794 'previous' link. Those variable that have been affirmatively merged all
1795 have a 'merged' pointer that points to one primary variable - the most
1796 recently declared instance. When merging variables, we need to also
1797 adjust the 'merged' pointer on any other variables that had previously
1798 been merged with the one that will no longer be primary.
1800 A variable that is no longer the most recent instance of a name may
1801 still have "pending" scope, if it might still be merged with most
1802 recent instance. These variables don't really belong in the
1803 "in_scope" list, but are not immediately removed when a new instance
1804 is found. Instead, they are detected and ignored when considering the
1805 list of in_scope names.
1807 The storage of the value of a variable will be described later. For now
1808 we just need to know that when a variable goes out of scope, it might
1809 need to be freed. For this we need to be able to find it, so assume that
1810 `var_value()` will provide that.
1812 ###### variable fields
1813 struct variable *merged;
1815 ###### ast functions
1817 static void variable_merge(struct variable *primary, struct variable *secondary)
1821 primary = primary->merged;
1823 for (v = primary->previous; v; v=v->previous)
1824 if (v == secondary || v == secondary->merged ||
1825 v->merged == secondary ||
1826 v->merged == secondary->merged) {
1827 v->scope = OutScope;
1828 v->merged = primary;
1829 if (v->scope_start < primary->scope_start)
1830 primary->scope_start = v->scope_start;
1831 if (v->scope_end > primary->scope_end)
1832 primary->scope_end = v->scope_end; // NOTEST
1833 variable_unlink_exec(v);
1837 ###### forward decls
1838 static struct value *var_value(struct parse_context *c, struct variable *v);
1840 ###### free global vars
1842 while (context.varlist) {
1843 struct binding *b = context.varlist;
1844 struct variable *v = b->var;
1845 context.varlist = b->next;
1848 struct variable *next = v->previous;
1850 if (v->global && v->frame_pos >= 0) {
1851 free_value(v->type, var_value(&context, v));
1852 if (v->depth == 0 && v->type->free == function_free)
1853 // This is a function constant
1854 free_exec(v->where_decl);
1861 #### Manipulating Bindings
1863 When a name is conditionally visible, a new declaration discards the old
1864 binding - the condition lapses. Similarly when we reach the end of a
1865 function (outermost non-global scope) any conditional scope must lapse.
1866 Conversely a usage of the name affirms the visibility and extends it to
1867 the end of the containing block - i.e. the block that contains both the
1868 original declaration and the latest usage. This is determined from
1869 `min_depth`. When a conditionally visible variable gets affirmed like
1870 this, it is also merged with other conditionally visible variables with
1873 When we parse a variable declaration we either report an error if the
1874 name is currently bound, or create a new variable at the current nest
1875 depth if the name is unbound or bound to a conditionally scoped or
1876 pending-scope variable. If the previous variable was conditionally
1877 scoped, it and its homonyms becomes out-of-scope.
1879 When we parse a variable reference (including non-declarative assignment
1880 "foo = bar") we report an error if the name is not bound or is bound to
1881 a pending-scope variable; update the scope if the name is bound to a
1882 conditionally scoped variable; or just proceed normally if the named
1883 variable is in scope.
1885 When we exit a scope, any variables bound at this level are either
1886 marked out of scope or pending-scoped, depending on whether the scope
1887 was sequential or parallel. Here a "parallel" scope means the "then"
1888 or "else" part of a conditional, or any "case" or "else" branch of a
1889 switch. Other scopes are "sequential".
1891 When exiting a parallel scope we check if there are any variables that
1892 were previously pending and are still visible. If there are, then
1893 they weren't redeclared in the most recent scope, so they cannot be
1894 merged and must become out-of-scope. If it is not the first of
1895 parallel scopes (based on `child_count`), we check that there was a
1896 previous binding that is still pending-scope. If there isn't, the new
1897 variable must now be out-of-scope.
1899 When exiting a sequential scope that immediately enclosed parallel
1900 scopes, we need to resolve any pending-scope variables. If there was
1901 no `else` clause, and we cannot determine that the `switch` was exhaustive,
1902 we need to mark all pending-scope variable as out-of-scope. Otherwise
1903 all pending-scope variables become conditionally scoped.
1906 enum closetype { CloseSequential, CloseFunction, CloseParallel, CloseElse };
1908 ###### ast functions
1910 static struct variable *var_decl(struct parse_context *c, struct text s)
1912 struct binding *b = find_binding(c, s);
1913 struct variable *v = b->var;
1915 switch (v ? v->scope : OutScope) {
1917 /* Caller will report the error */
1921 v && v->scope == CondScope;
1923 v->scope = OutScope;
1927 v = calloc(1, sizeof(*v));
1928 v->previous = b->var;
1932 v->min_depth = v->depth = c->scope_depth;
1934 v->in_scope = c->in_scope;
1935 v->scope_start = c->scope_count;
1941 static struct variable *var_ref(struct parse_context *c, struct text s)
1943 struct binding *b = find_binding(c, s);
1944 struct variable *v = b->var;
1945 struct variable *v2;
1947 switch (v ? v->scope : OutScope) {
1950 /* Caller will report the error */
1953 /* All CondScope variables of this name need to be merged
1954 * and become InScope
1956 v->depth = v->min_depth;
1958 for (v2 = v->previous;
1959 v2 && v2->scope == CondScope;
1961 variable_merge(v, v2);
1969 static int var_refile(struct parse_context *c, struct variable *v)
1971 /* Variable just went out of scope. Add it to the out_scope
1972 * list, sorted by ->scope_start
1974 struct variable **vp = &c->out_scope;
1975 while ((*vp) && (*vp)->scope_start < v->scope_start)
1976 vp = &(*vp)->in_scope;
1982 static void var_block_close(struct parse_context *c, enum closetype ct,
1985 /* Close off all variables that are in_scope.
1986 * Some variables in c->scope may already be not-in-scope,
1987 * such as when a PendingScope variable is hidden by a new
1988 * variable with the same name.
1989 * So we check for v->name->var != v and drop them.
1990 * If we choose to make a variable OutScope, we drop it
1993 struct variable *v, **vp, *v2;
1996 for (vp = &c->in_scope;
1997 (v = *vp) && v->min_depth > c->scope_depth;
1998 (v->scope == OutScope || v->name->var != v)
1999 ? (*vp = v->in_scope, var_refile(c, v))
2000 : ( vp = &v->in_scope, 0)) {
2001 v->min_depth = c->scope_depth;
2002 if (v->name->var != v)
2003 /* This is still in scope, but we haven't just
2007 v->min_depth = c->scope_depth;
2008 if (v->scope == InScope)
2009 v->scope_end = c->scope_count;
2010 if (v->scope == InScope && e && !v->global) {
2011 /* This variable gets cleaned up when
2014 variable_unlink_exec(v);
2015 v->cleanup_exec = e;
2016 v->next_free = e->to_free;
2021 case CloseParallel: /* handle PendingScope */
2025 if (c->scope_stack->child_count == 1)
2026 /* first among parallel branches */
2027 v->scope = PendingScope;
2028 else if (v->previous &&
2029 v->previous->scope == PendingScope)
2030 /* all previous branches used name */
2031 v->scope = PendingScope;
2033 v->scope = OutScope;
2034 if (ct == CloseElse) {
2035 /* All Pending variables with
2036 * this name are now Conditional
2039 v2 && v2->scope == PendingScope;
2041 v2->scope = CondScope;
2045 /* Not possible as it would require
2046 * parallel scope to be nested immediately
2047 * in a parallel scope, and that never
2051 /* Not possible as we already tested for
2058 if (v->scope == CondScope)
2059 /* Condition cannot continue past end of
2063 case CloseSequential:
2066 v->scope = OutScope;
2069 /* There was no 'else', so we can only become
2070 * conditional if we know the cases were exhaustive,
2071 * and that doesn't mean anything yet.
2072 * So only labels become conditional..
2075 v2 && v2->scope == PendingScope;
2077 v2->scope = OutScope;
2080 case OutScope: break;
2089 The value of a variable is stored separately from the variable, on an
2090 analogue of a stack frame. There are (currently) two frames that can be
2091 active. A global frame which currently only stores constants, and a
2092 stacked frame which stores local variables. Each variable knows if it
2093 is global or not, and what its index into the frame is.
2095 Values in the global frame are known immediately they are relevant, so
2096 the frame needs to be reallocated as it grows so it can store those
2097 values. The local frame doesn't get values until the interpreted phase
2098 is started, so there is no need to allocate until the size is known.
2100 We initialise the `frame_pos` to an impossible value, so that we can
2101 tell if it was set or not later.
2103 ###### variable fields
2107 ###### variable init
2110 ###### parse context
2112 short global_size, global_alloc;
2114 void *global, *local;
2116 ###### forward decls
2117 static struct value *global_alloc(struct parse_context *c, struct type *t,
2118 struct variable *v, struct value *init);
2120 ###### ast functions
2122 static struct value *var_value(struct parse_context *c, struct variable *v)
2125 if (!c->local || !v->type)
2126 return NULL; // NOTEST
2127 if (v->frame_pos + v->type->size > c->local_size) {
2128 printf("INVALID frame_pos\n"); // NOTEST
2131 return c->local + v->frame_pos;
2133 if (c->global_size > c->global_alloc) {
2134 int old = c->global_alloc;
2135 c->global_alloc = (c->global_size | 1023) + 1024;
2136 c->global = realloc(c->global, c->global_alloc);
2137 memset(c->global + old, 0, c->global_alloc - old);
2139 return c->global + v->frame_pos;
2142 static struct value *global_alloc(struct parse_context *c, struct type *t,
2143 struct variable *v, struct value *init)
2146 struct variable scratch;
2148 if (t->prepare_type)
2149 t->prepare_type(c, t, 1); // NOTEST
2151 if (c->global_size & (t->align - 1))
2152 c->global_size = (c->global_size + t->align) & ~(t->align-1);
2157 v->frame_pos = c->global_size;
2159 c->global_size += v->type->size;
2160 ret = var_value(c, v);
2162 memcpy(ret, init, t->size);
2164 val_init(t, ret); // NOTEST
2168 As global values are found -- struct field initializers, labels etc --
2169 `global_alloc()` is called to record the value in the global frame.
2171 When the program is fully parsed, each function is analysed, we need to
2172 walk the list of variables local to that function and assign them an
2173 offset in the stack frame. For this we have `scope_finalize()`.
2175 We keep the stack from dense by re-using space for between variables
2176 that are not in scope at the same time. The `out_scope` list is sorted
2177 by `scope_start` and as we process a varible, we move it to an FIFO
2178 stack. For each variable we consider, we first discard any from the
2179 stack anything that went out of scope before the new variable came in.
2180 Then we place the new variable just after the one at the top of the
2183 ###### ast functions
2185 static void scope_finalize(struct parse_context *c, struct type *ft)
2187 int size = ft->function.local_size;
2188 struct variable *next = ft->function.scope;
2189 struct variable *done = NULL;
2192 struct variable *v = next;
2193 struct type *t = v->type;
2200 if (v->frame_pos >= 0)
2202 while (done && done->scope_end < v->scope_start)
2203 done = done->in_scope;
2205 pos = done->frame_pos + done->type->size;
2207 pos = ft->function.local_size;
2208 if (pos & (t->align - 1))
2209 pos = (pos + t->align) & ~(t->align-1);
2211 if (size < pos + v->type->size)
2212 size = pos + v->type->size;
2216 c->out_scope = NULL;
2217 ft->function.local_size = size;
2220 ###### free context storage
2221 free(context.global);
2223 #### Variables as executables
2225 Just as we used a `val` to wrap a value into an `exec`, we similarly
2226 need a `var` to wrap a `variable` into an exec. While each `val`
2227 contained a copy of the value, each `var` holds a link to the variable
2228 because it really is the same variable no matter where it appears.
2229 When a variable is used, we need to remember to follow the `->merged`
2230 link to find the primary instance.
2232 When a variable is declared, it may or may not be given an explicit
2233 type. We need to record which so that we can report the parsed code
2242 struct variable *var;
2245 ###### variable fields
2253 VariableDecl -> IDENTIFIER : ${ {
2254 struct variable *v = var_decl(c, $1.txt);
2255 $0 = new_pos(var, $1);
2260 v = var_ref(c, $1.txt);
2262 type_err(c, "error: variable '%v' redeclared",
2264 type_err(c, "info: this is where '%v' was first declared",
2265 v->where_decl, NULL, 0, NULL);
2268 | IDENTIFIER :: ${ {
2269 struct variable *v = var_decl(c, $1.txt);
2270 $0 = new_pos(var, $1);
2276 v = var_ref(c, $1.txt);
2278 type_err(c, "error: variable '%v' redeclared",
2280 type_err(c, "info: this is where '%v' was first declared",
2281 v->where_decl, NULL, 0, NULL);
2284 | IDENTIFIER : Type ${ {
2285 struct variable *v = var_decl(c, $1.txt);
2286 $0 = new_pos(var, $1);
2292 v->explicit_type = 1;
2294 v = var_ref(c, $1.txt);
2296 type_err(c, "error: variable '%v' redeclared",
2298 type_err(c, "info: this is where '%v' was first declared",
2299 v->where_decl, NULL, 0, NULL);
2302 | IDENTIFIER :: Type ${ {
2303 struct variable *v = var_decl(c, $1.txt);
2304 $0 = new_pos(var, $1);
2311 v->explicit_type = 1;
2313 v = var_ref(c, $1.txt);
2315 type_err(c, "error: variable '%v' redeclared",
2317 type_err(c, "info: this is where '%v' was first declared",
2318 v->where_decl, NULL, 0, NULL);
2323 Variable -> IDENTIFIER ${ {
2324 struct variable *v = var_ref(c, $1.txt);
2325 $0 = new_pos(var, $1);
2327 /* This might be a global const or a label
2328 * Allocate a var with impossible type Tnone,
2329 * which will be adjusted when we find out what it is,
2330 * or will trigger an error.
2332 v = var_decl(c, $1.txt);
2339 cast(var, $0)->var = v;
2342 ###### print exec cases
2345 struct var *v = cast(var, e);
2347 struct binding *b = v->var->name;
2348 printf("%.*s", b->name.len, b->name.txt);
2355 if (loc && loc->type == Xvar) {
2356 struct var *v = cast(var, loc);
2358 struct binding *b = v->var->name;
2359 fprintf(stderr, "%.*s", b->name.len, b->name.txt);
2361 fputs("???", stderr); // NOTEST
2363 fputs("NOTVAR", stderr); // NOTEST
2366 ###### propagate exec cases
2370 struct var *var = cast(var, prog);
2371 struct variable *v = var->var;
2373 type_err(c, "%d:BUG: no variable!!", prog, NULL, 0, NULL); // NOTEST
2374 return Tnone; // NOTEST
2377 if (v->type == Tnone && v->where_decl == prog)
2378 type_err(c, "error: variable used but not declared: %v",
2379 prog, NULL, 0, NULL);
2380 if (v->type == NULL) {
2381 if (type && !(*perr & Efail)) {
2383 v->where_set = prog;
2386 } else if (!type_compat(type, v->type, rules)) {
2387 type_err(c, "error: expected %1 but variable '%v' is %2", prog,
2388 type, rules, v->type);
2389 type_err(c, "info: this is where '%v' was set to %1", v->where_set,
2390 v->type, rules, NULL);
2392 if (!v->global || v->frame_pos < 0)
2399 ###### interp exec cases
2402 struct var *var = cast(var, e);
2403 struct variable *v = var->var;
2406 lrv = var_value(c, v);
2411 ###### ast functions
2413 static void free_var(struct var *v)
2418 ###### free exec cases
2419 case Xvar: free_var(cast(var, e)); break;
2423 Now that we have the shape of the interpreter in place we can add some
2424 complex types and connected them in to the data structures and the
2425 different phases of parse, analyse, print, interpret.
2427 Being "complex" the language will naturally have syntax to access
2428 specifics of objects of these types. These will fit into the grammar as
2429 "Terms" which are the things that are combined with various operators to
2430 form an "Expression". Where a Term is formed by some operation on another
2431 Term, the subordinate Term will always come first, so for example a
2432 member of an array will be expressed as the Term for the array followed
2433 by an index in square brackets. The strict rule of using postfix
2434 operations makes precedence irrelevant within terms. To provide a place
2435 to put the grammar for terms of each type, we will start out by
2436 introducing the "Term" grammar production, with contains at least a
2437 simple "Value" (to be explained later).
2439 We also take this opportunity to introduce the "ExpressionsList" which
2440 is a simple comma-separated list of expressions - it may be used in
2443 ###### declare terminals
2448 Term -> Value ${ $0 = $<1; }$
2449 | Variable ${ $0 = $<1; }$
2453 ExpressionList -> ExpressionList , Expression ${
2466 Thus far the complex types we have are arrays, structs, functions and
2471 Arrays can be declared by giving a size and a type, as `[size]type' so
2472 `freq:[26]number` declares `freq` to be an array of 26 numbers. The
2473 size can be either a literal number, or a named constant. Some day an
2474 arbitrary expression will be supported.
2476 As a formal parameter to a function, the array can be declared with
2477 unknown size `name:[]string`. This is currently only supported for the
2478 "argv" parameter to "main" but will be extended more generally in a
2479 later version of the language. The length of this array - or any array
2480 - can be found with the "[]" postfix operator.
2482 Arrays cannot be assigned. When reference are extend to allow array
2483 slices which can refer to part or all of an array the assignment
2484 syntax will create a slice. For now, an array can only ever be
2485 referenced by the name it is declared with. It is likely that a
2486 "`copy`" primitive will eventually be defined which can be used to make a
2487 copy of an array with controllable recursive depth.
2489 For now we have two sorts of array, those with fixed size either because
2490 it is given as a literal number or because it is a struct member (which
2491 cannot have a runtime-changing size), and those with a size that is
2492 determined at runtime - local variables with a const size. The former
2493 have their size calculated at parse time, the latter at run time.
2495 For the latter type, the `size` field of the type is the size of a
2496 pointer, and the array is reallocated every time it comes into scope.
2498 We differentiate struct fields with a const size from local variables
2499 with a const size by whether they are prepared at parse time or not.
2501 ###### type union fields
2504 int unspec; // size is unspecified - vsize must be set.
2507 struct variable *vsize;
2508 struct type *member;
2511 ###### value union fields
2512 void *array; // used if not static_size
2514 ###### value functions
2516 static int array_prepare_type(struct parse_context *c, struct type *type,
2519 struct value *vsize;
2521 if (type->array.static_size)
2522 return 1; // NOTEST - guard against reentry
2523 if (type->array.unspec && parse_time)
2524 return 1; // NOTEST - unspec is still incomplete
2525 if (parse_time && type->array.vsize && !type->array.vsize->global)
2526 return 1; // NOTEST - should be impossible
2528 if (type->array.vsize) {
2529 vsize = var_value(c, type->array.vsize);
2531 return 1; // NOTEST - should be impossible
2533 mpz_tdiv_q(q, mpq_numref(vsize->num), mpq_denref(vsize->num));
2534 type->array.size = mpz_get_si(q);
2539 if (type->array.member->size <= 0)
2540 return 0; // NOTEST - error caught before here
2542 type->array.static_size = 1;
2543 type->size = type->array.size * type->array.member->size;
2544 type->align = type->array.member->align;
2549 static void array_init(struct type *type, struct value *val)
2552 void *ptr = val->ptr;
2556 if (!type->array.static_size) {
2557 val->array = calloc(type->array.size,
2558 type->array.member->size);
2561 for (i = 0; i < type->array.size; i++) {
2563 v = (void*)ptr + i * type->array.member->size;
2564 val_init(type->array.member, v);
2568 static void array_free(struct type *type, struct value *val)
2571 void *ptr = val->ptr;
2573 if (!type->array.static_size)
2575 for (i = 0; i < type->array.size; i++) {
2577 v = (void*)ptr + i * type->array.member->size;
2578 free_value(type->array.member, v);
2580 if (!type->array.static_size)
2584 static int array_compat(struct type *require, struct type *have,
2585 enum val_rules rules)
2587 if (have->compat != require->compat)
2589 /* Both are arrays, so we can look at details */
2590 if (!type_compat(require->array.member, have->array.member, 0))
2592 if (have->array.unspec && require->array.unspec &&
2593 have->array.size != require->array.size)
2595 if (have->array.unspec || require->array.unspec)
2597 if (require->array.vsize == NULL && have->array.vsize == NULL)
2598 return require->array.size == have->array.size;
2600 return require->array.vsize == have->array.vsize;
2603 static void array_print_type(struct type *type, FILE *f)
2606 if (type->array.vsize) {
2607 struct binding *b = type->array.vsize->name;
2608 fprintf(f, "%.*s%s]", b->name.len, b->name.txt,
2609 type->array.unspec ? "::" : "");
2610 } else if (type->array.size)
2611 fprintf(f, "%d]", type->array.size);
2614 type_print(type->array.member, f);
2617 static struct type array_prototype = {
2619 .prepare_type = array_prepare_type,
2620 .print_type = array_print_type,
2621 .compat = array_compat,
2623 .size = sizeof(void*),
2624 .align = sizeof(void*),
2627 ###### declare terminals
2632 | [ NUMBER ] Type ${ {
2638 if (number_parse(num, tail, $2.txt) == 0)
2639 tok_err(c, "error: unrecognised number", &$2);
2641 tok_err(c, "error: unsupported number suffix", &$2);
2644 elements = mpz_get_ui(mpq_numref(num));
2645 if (mpz_cmp_ui(mpq_denref(num), 1) != 0) {
2646 tok_err(c, "error: array size must be an integer",
2648 } else if (mpz_cmp_ui(mpq_numref(num), 1UL << 30) >= 0)
2649 tok_err(c, "error: array size is too large",
2654 $0 = t = add_anon_type(c, &array_prototype, "array[%d]", elements );
2655 t->array.size = elements;
2656 t->array.member = $<4;
2657 t->array.vsize = NULL;
2660 | [ IDENTIFIER ] Type ${ {
2661 struct variable *v = var_ref(c, $2.txt);
2664 tok_err(c, "error: name undeclared", &$2);
2665 else if (!v->constant)
2666 tok_err(c, "error: array size must be a constant", &$2);
2668 $0 = add_anon_type(c, &array_prototype, "array[%.*s]", $2.txt.len, $2.txt.txt);
2669 $0->array.member = $<4;
2671 $0->array.vsize = v;
2674 ###### formal type grammar
2677 $0 = add_anon_type(c, &array_prototype, "array[]");
2678 $0->array.member = $<Type;
2680 $0->array.unspec = 1;
2681 $0->array.vsize = NULL;
2689 | Term [ Expression ] ${ {
2690 struct binode *b = new(binode);
2698 struct binode *b = new(binode);
2704 ###### print binode cases
2706 print_exec(b->left, -1, bracket);
2708 print_exec(b->right, -1, bracket);
2713 print_exec(b->left, -1, bracket);
2717 ###### propagate binode cases
2719 /* left must be an array, right must be a number,
2720 * result is the member type of the array
2722 propagate_types(b->right, c, perr_local, Tnum, 0);
2723 t = propagate_types(b->left, c, perr, NULL, 0);
2724 if (!t || t->compat != array_compat) {
2725 type_err(c, "error: %1 cannot be indexed", prog, t, 0,
2729 if (!type_compat(type, t->array.member, rules)) {
2730 type_err(c, "error: have %1 but need %2", prog,
2731 t->array.member, rules, type);
2733 return t->array.member;
2738 /* left must be an array, result is a number
2740 t = propagate_types(b->left, c, perr, NULL, 0);
2741 if (!t || t->compat != array_compat) {
2742 type_err(c, "error: %1 cannot provide length", prog, t,
2746 if (!type_compat(type, Tnum, rules))
2747 type_err(c, "error: have %1 but need %2", prog,
2752 ###### interp binode cases
2758 lleft = linterp_exec(c, b->left, <ype);
2759 right = interp_exec(c, b->right, &rtype);
2761 mpz_tdiv_q(q, mpq_numref(right.num), mpq_denref(right.num));
2765 if (ltype->array.static_size)
2768 ptr = *(void**)lleft;
2769 rvtype = ltype->array.member;
2770 if (i >= 0 && i < ltype->array.size)
2771 lrv = ptr + i * rvtype->size;
2773 val_init(ltype->array.member, &rv); // UNSAFE
2778 lleft = linterp_exec(c, b->left, <ype);
2779 mpq_set_ui(rv.num, ltype->array.size, 1);
2787 A `struct` is a data-type that contains one or more other data-types.
2788 It differs from an array in that each member can be of a different
2789 type, and they are accessed by name rather than by number. Thus you
2790 cannot choose an element by calculation, you need to know what you
2793 The language makes no promises about how a given structure will be
2794 stored in memory - it is free to rearrange fields to suit whatever
2795 criteria seems important.
2797 Structs are declared separately from program code - they cannot be
2798 declared in-line in a variable declaration like arrays can. A struct
2799 is given a name and this name is used to identify the type - the name
2800 is not prefixed by the word `struct` as it would be in C.
2802 Structs are only treated as the same if they have the same name.
2803 Simply having the same fields in the same order is not enough. This
2804 might change once we can create structure initializers from a list of
2807 Each component datum is identified much like a variable is declared,
2808 with a name, one or two colons, and a type. The type cannot be omitted
2809 as there is no opportunity to deduce the type from usage. An initial
2810 value can be given following an equals sign, so
2812 ##### Example: a struct type
2818 would declare a type called "complex" which has two number fields,
2819 each initialised to zero.
2821 Struct will need to be declared separately from the code that uses
2822 them, so we will need to be able to print out the declaration of a
2823 struct when reprinting the whole program. So a `print_type_decl` type
2824 function will be needed.
2826 ###### type union fields
2835 } *fields; // This is created when field_list is analysed.
2837 struct fieldlist *prev;
2840 } *field_list; // This is created during parsing
2843 ###### type functions
2844 void (*print_type_decl)(struct type *type, FILE *f);
2845 struct type *(*fieldref)(struct type *t, struct parse_context *c,
2846 struct fieldref *f, struct value **vp);
2848 ###### value functions
2850 static void structure_init(struct type *type, struct value *val)
2854 for (i = 0; i < type->structure.nfields; i++) {
2856 v = (void*) val->ptr + type->structure.fields[i].offset;
2857 if (type->structure.fields[i].init)
2858 dup_value(type->structure.fields[i].type,
2859 type->structure.fields[i].init,
2862 val_init(type->structure.fields[i].type, v);
2866 static void structure_free(struct type *type, struct value *val)
2870 for (i = 0; i < type->structure.nfields; i++) {
2872 v = (void*)val->ptr + type->structure.fields[i].offset;
2873 free_value(type->structure.fields[i].type, v);
2877 static void free_fieldlist(struct fieldlist *f)
2881 free_fieldlist(f->prev);
2886 static void structure_free_type(struct type *t)
2889 for (i = 0; i < t->structure.nfields; i++)
2890 if (t->structure.fields[i].init) {
2891 free_value(t->structure.fields[i].type,
2892 t->structure.fields[i].init);
2894 free(t->structure.fields);
2895 free_fieldlist(t->structure.field_list);
2898 static int structure_prepare_type(struct parse_context *c,
2899 struct type *t, int parse_time)
2902 struct fieldlist *f;
2904 if (!parse_time || t->structure.fields)
2907 for (f = t->structure.field_list; f; f=f->prev) {
2911 if (f->f.type->size <= 0)
2913 if (f->f.type->prepare_type)
2914 f->f.type->prepare_type(c, f->f.type, parse_time);
2916 if (f->init == NULL)
2920 propagate_types(f->init, c, &perr, f->f.type, 0);
2921 } while (perr & Eretry);
2923 c->parse_error += 1; // NOTEST
2926 t->structure.nfields = cnt;
2927 t->structure.fields = calloc(cnt, sizeof(struct field));
2928 f = t->structure.field_list;
2930 int a = f->f.type->align;
2932 t->structure.fields[cnt] = f->f;
2933 if (t->size & (a-1))
2934 t->size = (t->size | (a-1)) + 1;
2935 t->structure.fields[cnt].offset = t->size;
2936 t->size += ((f->f.type->size - 1) | (a-1)) + 1;
2940 if (f->init && !c->parse_error) {
2941 struct value vl = interp_exec(c, f->init, NULL);
2942 t->structure.fields[cnt].init =
2943 global_alloc(c, f->f.type, NULL, &vl);
2951 static int find_struct_index(struct type *type, struct text field)
2954 for (i = 0; i < type->structure.nfields; i++)
2955 if (text_cmp(type->structure.fields[i].name, field) == 0)
2957 return IndexInvalid;
2960 static struct type *structure_fieldref(struct type *t, struct parse_context *c,
2961 struct fieldref *f, struct value **vp)
2963 if (f->index == IndexUnknown) {
2964 f->index = find_struct_index(t, f->name);
2966 type_err(c, "error: cannot find requested field in %1",
2967 f->left, t, 0, NULL);
2972 struct value *v = *vp;
2973 v = (void*)v->ptr + t->structure.fields[f->index].offset;
2976 return t->structure.fields[f->index].type;
2979 static struct type structure_prototype = {
2980 .init = structure_init,
2981 .free = structure_free,
2982 .free_type = structure_free_type,
2983 .print_type_decl = structure_print_type,
2984 .prepare_type = structure_prepare_type,
2985 .fieldref = structure_fieldref,
2998 enum { IndexUnknown = -1, IndexInvalid = -2 };
3000 ###### free exec cases
3002 free_exec(cast(fieldref, e)->left);
3006 ###### declare terminals
3011 | Term . IDENTIFIER ${ {
3012 struct fieldref *fr = new_pos(fieldref, $2);
3015 fr->index = IndexUnknown;
3019 ###### print exec cases
3023 struct fieldref *f = cast(fieldref, e);
3024 print_exec(f->left, -1, bracket);
3025 printf(".%.*s", f->name.len, f->name.txt);
3029 ###### propagate exec cases
3033 struct fieldref *f = cast(fieldref, prog);
3034 struct type *st = propagate_types(f->left, c, perr, NULL, 0);
3036 if (!st || !st->fieldref)
3037 type_err(c, "error: field reference on %1 is not supported",
3038 f->left, st, 0, NULL);
3040 t = st->fieldref(st, c, f, NULL);
3041 if (t && !type_compat(type, t, rules))
3042 type_err(c, "error: have %1 but need %2", prog,
3049 ###### interp exec cases
3052 struct fieldref *f = cast(fieldref, e);
3054 struct value *lleft = linterp_exec(c, f->left, <ype);
3056 rvtype = ltype->fieldref(ltype, c, f, &lrv);
3060 ###### top level grammar
3062 StructName -> IDENTIFIER ${ {
3063 struct type *t = find_type(c, $ID.txt);
3065 if (t && t->size >= 0) {
3066 tok_err(c, "error: type already declared", &$ID);
3067 tok_err(c, "info: this is location of declaration",
3072 t = add_type(c, $ID.txt, NULL);
3077 DeclareStruct -> struct StructName FieldBlock Newlines ${ {
3078 struct type *t = $<SN;
3079 struct type tmp = *t;
3081 *t = structure_prototype;
3084 t->first_use = tmp.first_use;
3086 t->structure.field_list = $<FB;
3090 FieldBlock -> { IN OptNL FieldLines OUT OptNL } ${ $0 = $<FL; }$
3091 | { SimpleFieldList } ${ $0 = $<SFL; }$
3092 | IN OptNL FieldLines OUT ${ $0 = $<FL; }$
3093 | SimpleFieldList EOL ${ $0 = $<SFL; }$
3095 FieldLines -> SimpleFieldList Newlines ${ $0 = $<SFL; }$
3096 | FieldLines SimpleFieldList Newlines ${ {
3097 struct fieldlist *f = $<SFL;
3108 SimpleFieldList -> Field ${ $0 = $<F; }$
3109 | SimpleFieldList ; Field ${
3113 | SimpleFieldList ; ${
3116 | ERROR ${ tok_err(c, "Syntax error in struct field", &$1); }$
3118 Field -> IDENTIFIER : Type = Expression ${ {
3119 $0 = calloc(1, sizeof(struct fieldlist));
3120 $0->f.name = $ID.txt;
3121 $0->f.type = $<Type;
3125 | IDENTIFIER : Type ${
3126 $0 = calloc(1, sizeof(struct fieldlist));
3127 $0->f.name = $ID.txt;
3128 $0->f.type = $<Type;
3131 ###### forward decls
3132 static void structure_print_type(struct type *t, FILE *f);
3134 ###### value functions
3135 static void structure_print_type(struct type *t, FILE *f)
3139 fprintf(f, "struct %.*s\n", t->name.len, t->name.txt);
3141 for (i = 0; i < t->structure.nfields; i++) {
3142 struct field *fl = t->structure.fields + i;
3143 fprintf(f, " %.*s : ", fl->name.len, fl->name.txt);
3144 type_print(fl->type, f);
3145 if (fl->type->print && fl->init) {
3147 if (fl->type == Tstr)
3149 print_value(fl->type, fl->init, f);
3150 if (fl->type == Tstr)
3157 ###### print type decls
3162 while (target != 0) {
3164 for (t = context.typelist; t ; t=t->next)
3165 if (!t->anon && t->print_type_decl &&
3175 t->print_type_decl(t, stdout);
3183 References, or pointers, are values that refer to another value. They
3184 can only refer to a type that is named, which excludes arrays or other
3185 references. As these can be included in a struct which is named, it is
3186 still possible to reference an array or reference - though indirectly.
3188 References are potentially dangerous as they might refer to some
3189 variable which no longer exists - either because a stack frame
3190 containing it has been discarded or because the value was allocated on
3191 the heap and has now been free. Ocean does not yet provide any
3192 protection against these problems. It will in due course.
3194 With references comes the opportunity and the need to explicitly
3195 allocate values on the "heap" and to free them. We currently provide
3196 fairly basic support for this.
3198 Reference make use of the `@` symbol in various ways. A type that starts
3199 with `@` is a reference to whatever follows. A reference value
3200 followed by an `@` acts as the referred value, though the `@` is often
3201 not needed. Finally, an expression that starts with `@` is a special
3202 reference related expression. Some examples might help.
3204 ##### Example: Reference examples
3211 bar.number = 23; bar.string = "hello"
3222 Obviously this is very contrived. `ref` is a reference to a `foo` which
3223 is initially set to refer to the value stored in `bar` - no extra syntax
3224 is needed to "Take the address of" `bar` - the fact that `ref` is a
3225 reference means that only the address make sense.
3227 When `ref.a` is accessed, that is whatever value is stored in `bar.a`.
3228 The same syntax is used for accessing fields both in structs and in
3229 references to structs. It would be correct to use `ref@.a`, but not
3232 `@new()` creates an object of whatever type is needed for the program to
3233 by type-correct. In future iterations of Ocean, a constructor will
3234 access arguments, so the the syntax now looks like a function call.
3235 `@free` can be assigned any reference that was returned by `@new()`, and
3236 it will be freed. `@nil` is a value of whatever reference type is
3237 appropriate, and is stable and never the address of anything in the heap
3238 or on the stack. A reference can be assigned `nil` or compared against
3241 ###### declare terminals
3244 ###### type union fields
3247 struct type *referent;
3250 ###### value union fields
3253 ###### value functions
3255 static void reference_print_type(struct type *t, FILE *f)
3258 type_print(t->reference.referent, f);
3261 static int reference_cmp(struct type *tl, struct type *tr,
3262 struct value *left, struct value *right)
3264 return left->ref == right->ref ? 0 : 1;
3267 static void reference_dup(struct type *t,
3268 struct value *vold, struct value *vnew)
3270 vnew->ref = vold->ref;
3273 static void reference_free(struct type *t, struct value *v)
3275 /* Nothing to do here */
3278 static int reference_compat(struct type *require, struct type *have,
3279 enum val_rules rules)
3282 if (require->reference.referent == have)
3284 if (have->compat != require->compat)
3286 if (have->reference.referent != require->reference.referent)
3291 static int reference_test(struct type *type, struct value *val)
3293 return val->ref != NULL;
3296 static struct type *reference_fieldref(
3297 struct type *t, struct parse_context *c, struct fieldref *f,
3300 struct type *rt = t->reference.referent;
3305 return rt->fieldref(rt, c, f, vp);
3307 type_err(c, "error: field reference on %1 is not supported",
3308 f->left, rt, 0, NULL);
3312 static struct type reference_prototype = {
3313 .print_type = reference_print_type,
3314 .cmp_eq = reference_cmp,
3315 .dup = reference_dup,
3316 .test = reference_test,
3317 .free = reference_free,
3318 .compat = reference_compat,
3319 .fieldref = reference_fieldref,
3320 .size = sizeof(void*),
3321 .align = sizeof(void*),
3327 struct type *t = find_type(c, $ID.txt);
3329 t = add_type(c, $ID.txt, NULL);
3332 $0 = find_anon_type(c, &reference_prototype, "@%.*s",
3333 $ID.txt.len, $ID.txt.txt);
3334 $0->reference.referent = t;
3337 ###### core functions
3338 static int text_is(struct text t, char *s)
3340 return (strlen(s) == t.len &&
3341 strncmp(s, t.txt, t.len) == 0);
3350 enum ref_func { RefNew, RefFree, RefNil } action;
3351 struct type *reftype;
3355 ###### SimpleStatement Grammar
3357 | @ IDENTIFIER = Expression ${ {
3358 struct ref *r = new_pos(ref, $ID);
3360 if (!text_is($ID.txt, "free"))
3361 tok_err(c, "error: only \"@free\" makes sense here",
3365 r->action = RefFree;
3369 ###### expression grammar
3370 | @ IDENTIFIER ( ) ${
3371 // Only 'new' valid here
3372 if (!text_is($ID.txt, "new")) {
3373 tok_err(c, "error: Only reference function is \"@new()\"",
3376 struct ref *r = new_pos(ref,$ID);
3382 // Only 'nil' valid here
3383 if (!text_is($ID.txt, "nil")) {
3384 tok_err(c, "error: Only reference value is \"@nil\"",
3387 struct ref *r = new_pos(ref,$ID);
3393 ###### print exec cases
3395 struct ref *r = cast(ref, e);
3396 switch (r->action) {
3398 printf("@new()"); break;
3400 printf("@nil"); break;
3402 do_indent(indent, "@free = ");
3403 print_exec(r->right, indent, bracket);
3409 ###### propagate exec cases
3411 struct ref *r = cast(ref, prog);
3412 switch (r->action) {
3414 if (type && type->free != reference_free) {
3415 type_err(c, "error: @new() can only be used with references, not %1",
3416 prog, type, 0, NULL);
3419 if (type && !r->reftype) {
3426 if (type && type->free != reference_free)
3427 type_err(c, "error: @nil can only be used with reference, not %1",
3428 prog, type, 0, NULL);
3429 if (type && !r->reftype) {
3436 t = propagate_types(r->right, c, perr_local, NULL, 0);
3437 if (t && t->free != reference_free)
3438 type_err(c, "error: @free can only be assigned a reference, not %1",
3446 ###### interp exec cases
3448 struct ref *r = cast(ref, e);
3449 switch (r->action) {
3452 rv.ref = calloc(1, r->reftype->reference.referent->size);
3453 rvtype = r->reftype;
3457 rvtype = r->reftype;
3460 rv = interp_exec(c, r->right, &rvtype);
3461 free_value(rvtype->reference.referent, rv.ref);
3469 ###### free exec cases
3471 struct ref *r = cast(ref, e);
3472 free_exec(r->right);
3477 ###### Expressions: dereference
3485 struct binode *b = new(binode);
3491 ###### print binode cases
3493 print_exec(b->left, -1, bracket);
3497 print_exec(b->left, -1, bracket);
3500 ###### propagate binode cases
3502 /* left must be a reference, and we return what it refers to */
3503 /* FIXME how can I pass the expected type down? */
3504 t = propagate_types(b->left, c, perr, NULL, 0);
3506 if (!t || t->free != reference_free)
3507 type_err(c, "error: Cannot dereference %1", b, t, 0, NULL);
3509 return t->reference.referent;
3513 /* left must be lval, we create reference to it */
3514 if (!type || type->free != reference_free)
3515 t = propagate_types(b->left, c, perr, type, 0); // NOTEST impossible
3517 t = propagate_types(b->left, c, perr,
3518 type->reference.referent, 0);
3520 t = find_anon_type(c, &reference_prototype, "@%.*s",
3521 t->name.len, t->name.txt);
3524 ###### interp binode cases
3526 left = interp_exec(c, b->left, <ype);
3528 rvtype = ltype->reference.referent;
3532 rv.ref = linterp_exec(c, b->left, &rvtype);
3533 rvtype = find_anon_type(c, &reference_prototype, "@%.*s",
3534 rvtype->name.len, rvtype->name.txt);
3539 A function is a chunk of code which can be passed parameters and can
3540 return results. Each function has a type which includes the set of
3541 parameters and the return value. As yet these types cannot be declared
3542 separately from the function itself.
3544 The parameters can be specified either in parentheses as a ';' separated
3547 ##### Example: function 1
3549 func main(av:[]string; env:[]string)
3552 or as an indented list of one parameter per line (though each line can
3553 be a ';' separated list)
3555 ##### Example: function 2
3563 In the first case a return type can follow the parentheses after a colon,
3564 in the second it is given on a line starting with the word `return`.
3566 ##### Example: functions that return
3568 func add(a:number; b:number): number
3578 Rather than returning a type, the function can specify a set of local
3579 variables to return as a struct. The values of these variables when the
3580 function exits will be provided to the caller. For this the return type
3581 is replaced with a block of result declarations, either in parentheses
3582 or bracketed by `return` and `do`.
3584 ##### Example: functions returning multiple variables
3586 func to_cartesian(rho:number; theta:number):(x:number; y:number)
3599 For constructing the lists we use a `List` binode, which will be
3600 further detailed when Expression Lists are introduced.
3602 ###### type union fields
3605 struct binode *params;
3606 struct type *return_type;
3607 struct variable *scope;
3608 int inline_result; // return value is at start of 'local'
3612 ###### value union fields
3613 struct exec *function;
3615 ###### type functions
3616 void (*check_args)(struct parse_context *c, enum prop_err *perr,
3617 struct type *require, struct exec *args);
3619 ###### value functions
3621 static void function_free(struct type *type, struct value *val)
3623 free_exec(val->function);
3624 val->function = NULL;
3627 static int function_compat(struct type *require, struct type *have,
3628 enum val_rules rules)
3630 // FIXME can I do anything here yet?
3634 static struct exec *take_addr(struct exec *e)
3636 struct binode *rv = new(binode);
3642 static void function_check_args(struct parse_context *c, enum prop_err *perr,
3643 struct type *require, struct exec *args)
3645 /* This should be 'compat', but we don't have a 'tuple' type to
3646 * hold the type of 'args'
3648 struct binode *arg = cast(binode, args);
3649 struct binode *param = require->function.params;
3652 struct var *pv = cast(var, param->left);
3653 struct type *t = pv->var->type, *t2;
3655 type_err(c, "error: insufficient arguments to function.",
3656 args, NULL, 0, NULL);
3660 t2 = propagate_types(arg->left, c, perr, t, Rrefok);
3661 if (t->free == reference_free &&
3662 t->reference.referent == t2 &&
3664 arg->left = take_addr(arg->left);
3665 } else if (!(*perr & Efail) && !type_compat(t2, t, 0)) {
3666 type_err(c, "error: cannot pass rval when reference expected",
3667 arg->left, NULL, 0, NULL);
3669 param = cast(binode, param->right);
3670 arg = cast(binode, arg->right);
3673 type_err(c, "error: too many arguments to function.",
3674 args, NULL, 0, NULL);
3677 static void function_print(struct type *type, struct value *val, FILE *f)
3680 print_exec(val->function, 1, 0);
3683 static void function_print_type_decl(struct type *type, FILE *f)
3687 for (b = type->function.params; b; b = cast(binode, b->right)) {
3688 struct variable *v = cast(var, b->left)->var;
3689 fprintf(f, "%.*s%s", v->name->name.len, v->name->name.txt,
3690 v->constant ? "::" : ":");
3691 type_print(v->type, f);
3696 if (type->function.return_type != Tnone) {
3698 if (type->function.inline_result) {
3700 struct type *t = type->function.return_type;
3702 for (i = 0; i < t->structure.nfields; i++) {
3703 struct field *fl = t->structure.fields + i;
3706 fprintf(f, "%.*s:", fl->name.len, fl->name.txt);
3707 type_print(fl->type, f);
3711 type_print(type->function.return_type, f);
3715 static void function_free_type(struct type *t)
3717 free_exec(t->function.params);
3720 static struct type function_prototype = {
3721 .size = sizeof(void*),
3722 .align = sizeof(void*),
3723 .free = function_free,
3724 .compat = function_compat,
3725 .check_args = function_check_args,
3726 .print = function_print,
3727 .print_type_decl = function_print_type_decl,
3728 .free_type = function_free_type,
3731 ###### declare terminals
3738 FuncName -> IDENTIFIER ${ {
3739 struct variable *v = var_decl(c, $1.txt);
3740 struct var *e = new_pos(var, $1);
3747 v = var_ref(c, $1.txt);
3749 type_err(c, "error: function '%v' redeclared",
3751 type_err(c, "info: this is where '%v' was first declared",
3752 v->where_decl, NULL, 0, NULL);
3758 Args -> ArgsLine NEWLINE ${ $0 = $<AL; }$
3759 | Args ArgsLine NEWLINE ${ {
3760 struct binode *b = $<AL;
3761 struct binode **bp = &b;
3763 bp = (struct binode **)&(*bp)->left;
3768 ArgsLine -> ${ $0 = NULL; }$
3769 | Varlist ${ $0 = $<1; }$
3770 | Varlist ; ${ $0 = $<1; }$
3772 Varlist -> Varlist ; ArgDecl ${
3773 $0 = new_pos(binode, $2);
3786 ArgDecl -> IDENTIFIER : FormalType ${ {
3787 struct variable *v = var_decl(c, $ID.txt);
3788 $0 = new_pos(var, $ID);
3795 ##### Function calls
3797 A function call can appear either as an expression or as a statement.
3798 We use a new 'Funcall' binode type to link the function with a list of
3799 arguments, form with the 'List' nodes.
3801 We have already seen the "Term" which is how a function call can appear
3802 in an expression. To parse a function call into a statement we include
3803 it in the "SimpleStatement Grammar" which will be described later.
3809 | Term ( ExpressionList ) ${ {
3810 struct binode *b = new(binode);
3813 b->right = reorder_bilist($<EL);
3817 struct binode *b = new(binode);
3824 ###### SimpleStatement Grammar
3826 | Term ( ExpressionList ) ${ {
3827 struct binode *b = new(binode);
3830 b->right = reorder_bilist($<EL);
3834 ###### print binode cases
3837 do_indent(indent, "");
3838 print_exec(b->left, -1, bracket);
3840 for (b = cast(binode, b->right); b; b = cast(binode, b->right)) {
3843 print_exec(b->left, -1, bracket);
3853 ###### propagate binode cases
3856 /* Every arg must match formal parameter, and result
3857 * is return type of function
3859 struct binode *args = cast(binode, b->right);
3860 struct var *v = cast(var, b->left);
3862 if (!v->var->type || v->var->type->check_args == NULL) {
3863 type_err(c, "error: attempt to call a non-function.",
3864 prog, NULL, 0, NULL);
3868 v->var->type->check_args(c, perr_local, v->var->type, args);
3869 if (v->var->type->function.inline_result)
3872 return v->var->type->function.return_type;
3875 ###### interp binode cases
3878 struct var *v = cast(var, b->left);
3879 struct type *t = v->var->type;
3880 void *oldlocal = c->local;
3881 int old_size = c->local_size;
3882 void *local = calloc(1, t->function.local_size);
3883 struct value *fbody = var_value(c, v->var);
3884 struct binode *arg = cast(binode, b->right);
3885 struct binode *param = t->function.params;
3888 struct var *pv = cast(var, param->left);
3889 struct type *vtype = NULL;
3890 struct value val = interp_exec(c, arg->left, &vtype);
3892 c->local = local; c->local_size = t->function.local_size;
3893 lval = var_value(c, pv->var);
3894 c->local = oldlocal; c->local_size = old_size;
3895 memcpy(lval, &val, vtype->size);
3896 param = cast(binode, param->right);
3897 arg = cast(binode, arg->right);
3899 c->local = local; c->local_size = t->function.local_size;
3900 if (t->function.inline_result && dtype) {
3901 _interp_exec(c, fbody->function, NULL, NULL);
3902 memcpy(dest, local, dtype->size);
3903 rvtype = ret.type = NULL;
3905 rv = interp_exec(c, fbody->function, &rvtype);
3906 c->local = oldlocal; c->local_size = old_size;
3911 ## Complex executables: statements and expressions
3913 Now that we have types, values, variables, and most of the basic
3914 Terms which provide access to these, we can explore the more complex
3915 code that combine all of these to get useful work done. Specifically
3916 statements and expressions.
3918 Expressions are various combinations of Terms. We will use operator
3919 precedence to ensure correct parsing. The simplest Expression is just a
3920 Term - others will follow.
3925 Expression -> Term ${ $0 = $<Term; }$
3926 ## expression grammar
3928 ### Expressions: Conditional
3930 Our first user of the `binode` will be conditional expressions, which
3931 is a bit odd as they actually have three components. That will be
3932 handled by having 2 binodes for each expression. The conditional
3933 expression is the lowest precedence operator which is why we define it
3934 first - to start the precedence list.
3936 Conditional expressions are of the form "value `if` condition `else`
3937 other_value". They associate to the right, so everything to the right
3938 of `else` is part of an else value, while only a higher-precedence to
3939 the left of `if` is the if value. Between `if` and `else` there is no
3940 room for ambiguity, so a full conditional expression is allowed in
3946 ###### declare terminals
3950 ###### expression grammar
3952 | Expression if Expression else Expression $$ifelse ${ {
3953 struct binode *b1 = new(binode);
3954 struct binode *b2 = new(binode);
3964 ###### print binode cases
3967 b2 = cast(binode, b->right);
3968 if (bracket) printf("(");
3969 print_exec(b2->left, -1, bracket);
3971 print_exec(b->left, -1, bracket);
3973 print_exec(b2->right, -1, bracket);
3974 if (bracket) printf(")");
3977 ###### propagate binode cases
3980 /* cond must be Tbool, others must match */
3981 struct binode *b2 = cast(binode, b->right);
3984 propagate_types(b->left, c, perr_local, Tbool, 0);
3985 t = propagate_types(b2->left, c, perr, type, 0);
3986 t2 = propagate_types(b2->right, c, perr, type ?: t, 0);
3990 ###### interp binode cases
3993 struct binode *b2 = cast(binode, b->right);
3994 left = interp_exec(c, b->left, <ype);
3996 rv = interp_exec(c, b2->left, &rvtype);
3998 rv = interp_exec(c, b2->right, &rvtype);
4002 ### Expressions: Boolean
4004 The next class of expressions to use the `binode` will be Boolean
4005 expressions. `and` and `or` are short-circuit operators that don't
4006 evaluate the second expression if not necessary.
4013 ###### declare terminals
4018 ###### expression grammar
4019 | Expression or Expression ${ {
4020 struct binode *b = new(binode);
4026 | Expression and Expression ${ {
4027 struct binode *b = new(binode);
4033 | not Expression ${ {
4034 struct binode *b = new(binode);
4040 ###### print binode cases
4042 if (bracket) printf("(");
4043 print_exec(b->left, -1, bracket);
4045 print_exec(b->right, -1, bracket);
4046 if (bracket) printf(")");
4049 if (bracket) printf("(");
4050 print_exec(b->left, -1, bracket);
4052 print_exec(b->right, -1, bracket);
4053 if (bracket) printf(")");
4056 if (bracket) printf("(");
4058 print_exec(b->right, -1, bracket);
4059 if (bracket) printf(")");
4062 ###### propagate binode cases
4066 /* both must be Tbool, result is Tbool */
4067 propagate_types(b->left, c, perr, Tbool, 0);
4068 propagate_types(b->right, c, perr, Tbool, 0);
4069 if (type && type != Tbool)
4070 type_err(c, "error: %1 operation found where %2 expected", prog,
4075 ###### interp binode cases
4077 rv = interp_exec(c, b->left, &rvtype);
4079 rv = interp_exec(c, b->right, NULL);
4082 rv = interp_exec(c, b->left, &rvtype);
4084 rv = interp_exec(c, b->right, NULL);
4087 rv = interp_exec(c, b->right, &rvtype);
4091 ### Expressions: Comparison
4093 Of slightly higher precedence that Boolean expressions are Comparisons.
4094 A comparison takes arguments of any comparable type, but the two types
4097 To simplify the parsing we introduce an `eop` which can record an
4098 expression operator, and the `CMPop` non-terminal will match one of them.
4105 ###### ast functions
4106 static void free_eop(struct eop *e)
4120 ###### declare terminals
4121 $LEFT < > <= >= == != CMPop
4123 ###### expression grammar
4124 | Expression CMPop Expression ${ {
4125 struct binode *b = new(binode);
4135 CMPop -> < ${ $0.op = Less; }$
4136 | > ${ $0.op = Gtr; }$
4137 | <= ${ $0.op = LessEq; }$
4138 | >= ${ $0.op = GtrEq; }$
4139 | == ${ $0.op = Eql; }$
4140 | != ${ $0.op = NEql; }$
4142 ###### print binode cases
4150 if (bracket) printf("(");
4151 print_exec(b->left, -1, bracket);
4153 case Less: printf(" < "); break;
4154 case LessEq: printf(" <= "); break;
4155 case Gtr: printf(" > "); break;
4156 case GtrEq: printf(" >= "); break;
4157 case Eql: printf(" == "); break;
4158 case NEql: printf(" != "); break;
4159 default: abort(); // NOTEST
4161 print_exec(b->right, -1, bracket);
4162 if (bracket) printf(")");
4165 ###### propagate binode cases
4172 /* Both must match but not be labels, result is Tbool */
4173 t = propagate_types(b->left, c, perr, NULL, 0);
4175 propagate_types(b->right, c, perr, t, 0);
4177 t = propagate_types(b->right, c, perr, NULL, 0); // NOTEST
4179 t = propagate_types(b->left, c, perr, t, 0); // NOTEST
4181 if (!type_compat(type, Tbool, 0))
4182 type_err(c, "error: Comparison returns %1 but %2 expected", prog,
4183 Tbool, rules, type);
4187 ###### interp binode cases
4196 left = interp_exec(c, b->left, <ype);
4197 right = interp_exec(c, b->right, &rtype);
4198 cmp = value_cmp(ltype, rtype, &left, &right);
4201 case Less: rv.bool = cmp < 0; break;
4202 case LessEq: rv.bool = cmp <= 0; break;
4203 case Gtr: rv.bool = cmp > 0; break;
4204 case GtrEq: rv.bool = cmp >= 0; break;
4205 case Eql: rv.bool = cmp == 0; break;
4206 case NEql: rv.bool = cmp != 0; break;
4207 default: rv.bool = 0; break; // NOTEST
4212 ### Expressions: Arithmetic etc.
4214 The remaining expressions with the highest precedence are arithmetic,
4215 string concatenation, string conversion, and testing. String concatenation
4216 (`++`) has the same precedence as multiplication and division, but lower
4219 Testing comes in two forms. A single question mark (`?`) is a unary
4220 operator which converts come types into Boolean. The general meaning is
4221 "is this a valid value" and there will be more uses as the language
4222 develops. A double question-mark (`??`) is a binary operator (Choose),
4223 with the same precedence as multiplication, which returns the LHS if it
4224 tests successfully, else returns the RHS.
4226 String conversion is a temporary feature until I get a better type
4227 system. `$` is a prefix operator which expects a string and returns
4230 `+` and `-` are both infix and prefix operations (where they are
4231 absolute value and negation). These have different operator names.
4233 We also have a 'Bracket' operator which records where parentheses were
4234 found. This makes it easy to reproduce these when printing. Possibly I
4235 should only insert brackets were needed for precedence. Putting
4236 parentheses around an expression converts it into a Term,
4242 Absolute, Negate, Test,
4246 ###### declare terminals
4248 $LEFT * / % ++ ?? Top
4252 ###### expression grammar
4253 | Expression Eop Expression ${ {
4254 struct binode *b = new(binode);
4261 | Expression Top Expression ${ {
4262 struct binode *b = new(binode);
4269 | Uop Expression ${ {
4270 struct binode *b = new(binode);
4278 | ( Expression ) ${ {
4279 struct binode *b = new_pos(binode, $1);
4288 Eop -> + ${ $0.op = Plus; }$
4289 | - ${ $0.op = Minus; }$
4291 Uop -> + ${ $0.op = Absolute; }$
4292 | - ${ $0.op = Negate; }$
4293 | $ ${ $0.op = StringConv; }$
4294 | ? ${ $0.op = Test; }$
4296 Top -> * ${ $0.op = Times; }$
4297 | / ${ $0.op = Divide; }$
4298 | % ${ $0.op = Rem; }$
4299 | ++ ${ $0.op = Concat; }$
4300 | ?? ${ $0.op = Choose; }$
4302 ###### print binode cases
4310 if (bracket) printf("(");
4311 print_exec(b->left, indent, bracket);
4313 case Plus: fputs(" + ", stdout); break;
4314 case Minus: fputs(" - ", stdout); break;
4315 case Times: fputs(" * ", stdout); break;
4316 case Divide: fputs(" / ", stdout); break;
4317 case Rem: fputs(" % ", stdout); break;
4318 case Concat: fputs(" ++ ", stdout); break;
4319 case Choose: fputs(" ?? ", stdout); break;
4320 default: abort(); // NOTEST
4322 print_exec(b->right, indent, bracket);
4323 if (bracket) printf(")");
4329 if (bracket) printf("(");
4331 case Absolute: fputs("+", stdout); break;
4332 case Negate: fputs("-", stdout); break;
4333 case StringConv: fputs("$", stdout); break;
4334 case Test: fputs("?", stdout); break;
4335 default: abort(); // NOTEST
4337 print_exec(b->right, indent, bracket);
4338 if (bracket) printf(")");
4341 /* Avoid double brackets... */
4342 if (!bracket) printf("(");
4343 print_exec(b->right, indent, bracket);
4344 if (!bracket) printf(")");
4347 ###### propagate binode cases
4353 /* both must be numbers, result is Tnum */
4356 /* as propagate_types ignores a NULL,
4357 * unary ops fit here too */
4358 propagate_types(b->left, c, perr, Tnum, 0);
4359 propagate_types(b->right, c, perr, Tnum, 0);
4360 if (!type_compat(type, Tnum, 0))
4361 type_err(c, "error: Arithmetic returns %1 but %2 expected", prog,
4367 /* both must be Tstr, result is Tstr */
4368 propagate_types(b->left, c, perr, Tstr, 0);
4369 propagate_types(b->right, c, perr, Tstr, 0);
4370 if (!type_compat(type, Tstr, 0))
4371 type_err(c, "error: Concat returns %1 but %2 expected", prog,
4377 /* op must be string, result is number */
4378 propagate_types(b->left, c, perr, Tstr, 0);
4379 if (!type_compat(type, Tnum, 0))
4381 "error: Can only convert string to number, not %1",
4382 prog, type, 0, NULL);
4387 /* LHS must support ->test, result is Tbool */
4388 t = propagate_types(b->right, c, perr, NULL, 0);
4390 type_err(c, "error: '?' requires a testable value, not %1",
4396 /* LHS and RHS must match and are returned. Must support
4399 t = propagate_types(b->left, c, perr, type, rules);
4400 t = propagate_types(b->right, c, perr, t, rules);
4401 if (t && t->test == NULL)
4402 type_err(c, "error: \"??\" requires a testable value, not %1",
4408 return propagate_types(b->right, c, perr, type, rules);
4410 ###### interp binode cases
4413 rv = interp_exec(c, b->left, &rvtype);
4414 right = interp_exec(c, b->right, &rtype);
4415 mpq_add(rv.num, rv.num, right.num);
4418 rv = interp_exec(c, b->left, &rvtype);
4419 right = interp_exec(c, b->right, &rtype);
4420 mpq_sub(rv.num, rv.num, right.num);
4423 rv = interp_exec(c, b->left, &rvtype);
4424 right = interp_exec(c, b->right, &rtype);
4425 mpq_mul(rv.num, rv.num, right.num);
4428 rv = interp_exec(c, b->left, &rvtype);
4429 right = interp_exec(c, b->right, &rtype);
4430 mpq_div(rv.num, rv.num, right.num);
4435 left = interp_exec(c, b->left, <ype);
4436 right = interp_exec(c, b->right, &rtype);
4437 mpz_init(l); mpz_init(r); mpz_init(rem);
4438 mpz_tdiv_q(l, mpq_numref(left.num), mpq_denref(left.num));
4439 mpz_tdiv_q(r, mpq_numref(right.num), mpq_denref(right.num));
4440 mpz_tdiv_r(rem, l, r);
4441 val_init(Tnum, &rv);
4442 mpq_set_z(rv.num, rem);
4443 mpz_clear(r); mpz_clear(l); mpz_clear(rem);
4448 rv = interp_exec(c, b->right, &rvtype);
4449 mpq_neg(rv.num, rv.num);
4452 rv = interp_exec(c, b->right, &rvtype);
4453 mpq_abs(rv.num, rv.num);
4456 rv = interp_exec(c, b->right, &rvtype);
4459 left = interp_exec(c, b->left, <ype);
4460 right = interp_exec(c, b->right, &rtype);
4462 rv.str = text_join(left.str, right.str);
4465 right = interp_exec(c, b->right, &rvtype);
4469 struct text tx = right.str;
4472 if (tx.txt[0] == '-') {
4477 if (number_parse(rv.num, tail, tx) == 0)
4480 mpq_neg(rv.num, rv.num);
4482 printf("Unsupported suffix: %.*s\n", tx.len, tx.txt);
4486 right = interp_exec(c, b->right, &rtype);
4488 rv.bool = !!rtype->test(rtype, &right);
4491 left = interp_exec(c, b->left, <ype);
4492 if (ltype->test(ltype, &left)) {
4497 rv = interp_exec(c, b->right, &rvtype);
4500 ###### value functions
4502 static struct text text_join(struct text a, struct text b)
4505 rv.len = a.len + b.len;
4506 rv.txt = malloc(rv.len);
4507 memcpy(rv.txt, a.txt, a.len);
4508 memcpy(rv.txt+a.len, b.txt, b.len);
4512 ### Blocks, Statements, and Statement lists.
4514 Now that we have expressions out of the way we need to turn to
4515 statements. There are simple statements and more complex statements.
4516 Simple statements do not contain (syntactic) newlines, complex statements do.
4518 Statements often come in sequences and we have corresponding simple
4519 statement lists and complex statement lists.
4520 The former comprise only simple statements separated by semicolons.
4521 The later comprise complex statements and simple statement lists. They are
4522 separated by newlines. Thus the semicolon is only used to separate
4523 simple statements on the one line. This may be overly restrictive,
4524 but I'm not sure I ever want a complex statement to share a line with
4527 Note that a simple statement list can still use multiple lines if
4528 subsequent lines are indented, so
4530 ###### Example: wrapped simple statement list
4535 is a single simple statement list. This might allow room for
4536 confusion, so I'm not set on it yet.
4538 A simple statement list needs no extra syntax. A complex statement
4539 list has two syntactic forms. It can be enclosed in braces (much like
4540 C blocks), or it can be introduced by an indent and continue until an
4541 unindented newline (much like Python blocks). With this extra syntax
4542 it is referred to as a block.
4544 Note that a block does not have to include any newlines if it only
4545 contains simple statements. So both of:
4547 if condition: a=b; d=f
4549 if condition { a=b; print f }
4551 are valid. In either case the list is constructed from a `binode` list
4552 with `Block` as the operator.
4554 The only stand-alone statement we introduce at this stage is `pass`
4555 which does nothing and is represented as a `NULL` pointer in a `Block`
4556 list. Other stand-alone statements will follow once the infrastructure
4559 As many statements will use binodes, we declare a binode pointer 'b' in
4560 the common header for all reductions to use.
4562 ###### Parser: reduce
4573 Block -> { IN OptNL Statementlist OUT OptNL } ${ $0 = $<Sl; }$
4574 | { SimpleStatements } ${ $0 = reorder_bilist($<SS); }$
4575 | SimpleStatements ; ${ $0 = reorder_bilist($<SS); }$
4576 | SimpleStatements EOL ${ $0 = reorder_bilist($<SS); }$
4577 | IN OptNL Statementlist OUT ${ $0 = $<Sl; }$
4579 OpenBlock -> OpenScope { IN OptNL Statementlist OUT OptNL } ${ $0 = $<Sl; }$
4580 | OpenScope { SimpleStatements } ${ $0 = reorder_bilist($<SS); }$
4581 | OpenScope SimpleStatements ; ${ $0 = reorder_bilist($<SS); }$
4582 | OpenScope SimpleStatements EOL ${ $0 = reorder_bilist($<SS); }$
4583 | IN OpenScope OptNL Statementlist OUT ${ $0 = $<Sl; }$
4585 UseBlock -> { IN OpenScope OptNL Statementlist OUT OptNL } ${ $0 = $<Sl; }$
4586 | { OpenScope SimpleStatements } ${ $0 = reorder_bilist($<SS); }$
4587 | IN OpenScope OptNL Statementlist OUT ${ $0 = $<Sl; }$
4589 ColonBlock -> { IN OptNL Statementlist OUT OptNL } ${ $0 = $<Sl; }$
4590 | { SimpleStatements } ${ $0 = reorder_bilist($<SS); }$
4591 | : SimpleStatements ; ${ $0 = reorder_bilist($<SS); }$
4592 | : SimpleStatements EOL ${ $0 = reorder_bilist($<SS); }$
4593 | : IN OptNL Statementlist OUT ${ $0 = $<Sl; }$
4595 Statementlist -> ComplexStatements ${ $0 = reorder_bilist($<CS); }$
4597 ComplexStatements -> ComplexStatements ComplexStatement ${
4599 $0 = $<1; // NOTEST - impossible
4607 | ComplexStatement ${
4609 $0 = NULL; // NOTEST - impossible
4619 ComplexStatement -> SimpleStatements Newlines ${
4620 $0 = reorder_bilist($<SS);
4622 | SimpleStatements ; Newlines ${
4623 $0 = reorder_bilist($<SS);
4625 ## ComplexStatement Grammar
4628 SimpleStatements -> SimpleStatements ; SimpleStatement ${
4634 | SimpleStatement ${
4643 SimpleStatement -> pass ${ $0 = NULL; }$
4644 | ERROR ${ tok_err(c, "Syntax error in statement", &$1); }$
4645 ## SimpleStatement Grammar
4647 ###### print binode cases
4649 // block, one per line
4650 if (b->left == NULL)
4651 do_indent(indent, "pass\n");
4653 print_exec(b->left, indent, bracket);
4655 print_exec(b->right, indent, bracket);
4658 ###### propagate binode cases
4661 /* If any statement returns something other than Tnone
4662 * or Tbool then all such must return same type.
4663 * As each statement may be Tnone or something else,
4664 * we must always pass NULL (unknown) down, otherwise an incorrect
4665 * error might occur. We never return Tnone unless it is
4670 for (e = b; e; e = cast(binode, e->right)) {
4671 *perr |= *perr_local;
4673 t = propagate_types(e->left, c, perr_local, NULL, rules);
4674 if ((rules & Rboolok) && (t == Tbool || t == Tnone))
4676 if (t == Tnone && e->right)
4677 /* Only the final statement *must* return a value
4685 type_err(c, "error: expected %1, found %2",
4686 e->left, type, rules, t);
4692 ###### interp binode cases
4694 while (rvtype == Tnone &&
4697 rv = interp_exec(c, b->left, &rvtype);
4698 b = cast(binode, b->right);
4702 ### The Print statement
4704 `print` is a simple statement that takes a comma-separated list of
4705 expressions and prints the values separated by spaces and terminated
4706 by a newline. No control of formatting is possible.
4708 `print` uses `ExpressionList` to collect the expressions and stores them
4709 on the left side of a `Print` binode unless there is a trailing comma
4710 when the list is stored on the `right` side and no trailing newline is
4716 ##### declare terminals
4719 ###### SimpleStatement Grammar
4721 | print ExpressionList ${
4722 $0 = b = new_pos(binode, $1);
4725 b->left = reorder_bilist($<EL);
4727 | print ExpressionList , ${ {
4728 $0 = b = new_pos(binode, $1);
4730 b->right = reorder_bilist($<EL);
4734 $0 = b = new_pos(binode, $1);
4740 ###### print binode cases
4743 do_indent(indent, "print");
4744 b2 = cast(binode, b->left ?: b->right);
4747 print_exec(b2->left, -1, bracket);
4750 b2 = cast(binode, b2->right);
4758 ###### propagate binode cases
4761 /* don't care but all must be consistent */
4763 b = cast(binode, b->left);
4765 b = cast(binode, b->right);
4767 propagate_types(b->left, c, perr_local, NULL, 0);
4768 b = cast(binode, b->right);
4772 ###### interp binode cases
4776 struct binode *b2 = cast(binode, b->left);
4778 b2 = cast(binode, b->right);
4779 for (; b2; b2 = cast(binode, b2->right)) {
4780 left = interp_exec(c, b2->left, <ype);
4781 print_value(ltype, &left, stdout);
4782 free_value(ltype, &left);
4786 if (b->right == NULL)
4792 ###### Assignment statement
4794 An assignment will assign a value to a variable, providing it hasn't
4795 been declared as a constant. The analysis phase ensures that the type
4796 will be correct so the interpreter just needs to perform the
4797 calculation. There is a form of assignment which declares a new
4798 variable as well as assigning a value. If a name is used before
4799 it is declared, it is assumed to be a global constant which are allowed to
4800 be declared at any time.
4806 ###### declare terminals
4809 ###### SimpleStatement Grammar
4810 | Term = Expression ${
4811 $0 = b= new(binode);
4816 | VariableDecl = Expression ${
4817 $0 = b= new(binode);
4824 if ($1->var->where_set == NULL) {
4826 "Variable declared with no type or value: %v",
4830 $0 = b = new(binode);
4837 ###### print binode cases
4840 do_indent(indent, "");
4841 print_exec(b->left, -1, bracket);
4843 print_exec(b->right, -1, bracket);
4850 struct variable *v = cast(var, b->left)->var;
4851 do_indent(indent, "");
4852 print_exec(b->left, -1, bracket);
4853 if (cast(var, b->left)->var->constant) {
4855 if (v->explicit_type) {
4856 type_print(v->type, stdout);
4861 if (v->explicit_type) {
4862 type_print(v->type, stdout);
4868 print_exec(b->right, -1, bracket);
4875 ###### propagate binode cases
4879 /* Both must match, or left may be ref and right an lval
4880 * Type must support 'dup',
4881 * For Assign, left must not be constant.
4884 *perr &= ~(Erval | Econst);
4885 t = propagate_types(b->left, c, perr, NULL, 0);
4890 struct type *t2 = propagate_types(b->right, c, perr_local,
4892 if (!t2 || t2 == t || (*perr_local & Efail))
4893 ; // No more effort needed
4894 else if (t->free == reference_free &&
4895 t->reference.referent == t2 &&
4896 !(*perr_local & Erval))
4897 b->right = take_addr(b->right);
4898 else if (t->free == reference_free &&
4899 t->reference.referent == t2 &&
4900 (*perr_local & Erval))
4901 type_err(c, "error: Cannot assign an rval to a reference.",
4904 t = propagate_types(b->right, c, perr_local, NULL, 0);
4906 propagate_types(b->left, c, perr, t, 0);
4909 type_err(c, "error: cannot assign to an rval", b,
4911 else if (b->op == Assign && (*perr & Econst)) {
4912 type_err(c, "error: Cannot assign to a constant: %v",
4913 b->left, NULL, 0, NULL);
4914 if (b->left->type == Xvar) {
4915 struct var *var = cast(var, b->left);
4916 struct variable *v = var->var;
4917 type_err(c, "info: name was defined as a constant here",
4918 v->where_decl, NULL, 0, NULL);
4921 if (t && t->dup == NULL && !(*perr_local & Emaycopy))
4922 type_err(c, "error: cannot assign value of type %1", b, t, 0, NULL);
4923 if (b->left->type == Xvar && (*perr_local & Efail))
4924 type_err(c, "info: variable '%v' was set as %1 here.",
4925 cast(var, b->left)->var->where_set, t, rules, NULL);
4930 ###### interp binode cases
4933 lleft = linterp_exec(c, b->left, <ype);
4935 dinterp_exec(c, b->right, lleft, ltype, 1);
4941 struct variable *v = cast(var, b->left)->var;
4944 val = var_value(c, v);
4945 if (v->type->prepare_type)
4946 v->type->prepare_type(c, v->type, 0);
4948 val_init(v->type, val);
4950 dinterp_exec(c, b->right, val, v->type, 0);
4954 ### The `use` statement
4956 The `use` statement is the last "simple" statement. It is needed when a
4957 statement block can return a value. This includes the body of a
4958 function which has a return type, and the "condition" code blocks in
4959 `if`, `while`, and `switch` statements.
4964 ###### declare terminals
4967 ###### SimpleStatement Grammar
4969 $0 = b = new_pos(binode, $1);
4974 ###### print binode cases
4977 do_indent(indent, "use ");
4978 print_exec(b->right, -1, bracket);
4983 ###### propagate binode cases
4986 /* result matches value */
4987 return propagate_types(b->right, c, perr, type, 0);
4989 ###### interp binode cases
4992 rv = interp_exec(c, b->right, &rvtype);
4995 ### The Conditional Statement
4997 This is the biggy and currently the only complex statement. This
4998 subsumes `if`, `while`, `do/while`, `switch`, and some parts of `for`.
4999 It is comprised of a number of parts, all of which are optional though
5000 set combinations apply. Each part is (usually) a key word (`then` is
5001 sometimes optional) followed by either an expression or a code block,
5002 except the `casepart` which is a "key word and an expression" followed
5003 by a code block. The code-block option is valid for all parts and,
5004 where an expression is also allowed, the code block can use the `use`
5005 statement to report a value. If the code block does not report a value
5006 the effect is similar to reporting `True`.
5008 The `else` and `case` parts, as well as `then` when combined with
5009 `if`, can contain a `use` statement which will apply to some
5010 containing conditional statement. `for` parts, `do` parts and `then`
5011 parts used with `for` can never contain a `use`, except in some
5012 subordinate conditional statement.
5014 If there is a `forpart`, it is executed first, only once.
5015 If there is a `dopart`, then it is executed repeatedly providing
5016 always that the `condpart` or `cond`, if present, does not return a non-True
5017 value. `condpart` can fail to return any value if it simply executes
5018 to completion. This is treated the same as returning `True`.
5020 If there is a `thenpart` it will be executed whenever the `condpart`
5021 or `cond` returns True (or does not return any value), but this will happen
5022 *after* `dopart` (when present).
5024 If `elsepart` is present it will be executed at most once when the
5025 condition returns `False` or some value that isn't `True` and isn't
5026 matched by any `casepart`. If there are any `casepart`s, they will be
5027 executed when the condition returns a matching value.
5029 The particular sorts of values allowed in case parts has not yet been
5030 determined in the language design, so nothing is prohibited.
5032 The various blocks in this complex statement potentially provide scope
5033 for variables as described earlier. Each such block must include the
5034 "OpenScope" nonterminal before parsing the block, and must call
5035 `var_block_close()` when closing the block.
5037 The code following "`if`", "`switch`" and "`for`" does not get its own
5038 scope, but is in a scope covering the whole statement, so names
5039 declared there cannot be redeclared elsewhere. Similarly the
5040 condition following "`while`" is in a scope the covers the body
5041 ("`do`" part) of the loop, and which does not allow conditional scope
5042 extension. Code following "`then`" (both looping and non-looping),
5043 "`else`" and "`case`" each get their own local scope.
5045 The type requirements on the code block in a `whilepart` are quite
5046 unusal. It is allowed to return a value of some identifiable type, in
5047 which case the loop aborts and an appropriate `casepart` is run, or it
5048 can return a Boolean, in which case the loop either continues to the
5049 `dopart` (on `True`) or aborts and runs the `elsepart` (on `False`).
5050 This is different both from the `ifpart` code block which is expected to
5051 return a Boolean, or the `switchpart` code block which is expected to
5052 return the same type as the casepart values. The correct analysis of
5053 the type of the `whilepart` code block is the reason for the
5054 `Rboolok` flag which is passed to `propagate_types()`.
5056 The `cond_statement` cannot fit into a `binode` so a new `exec` is
5057 defined. As there are two scopes which cover multiple parts - one for
5058 the whole statement and one for "while" and "do" - and as we will use
5059 the 'struct exec' to track scopes, we actually need two new types of
5060 exec. One is a `binode` for the looping part, the rest is the
5061 `cond_statement`. The `cond_statement` will use an auxilliary `struct
5062 casepart` to track a list of case parts.
5073 struct exec *action;
5074 struct casepart *next;
5076 struct cond_statement {
5078 struct exec *forpart, *condpart, *thenpart, *elsepart;
5079 struct binode *looppart;
5080 struct casepart *casepart;
5083 ###### ast functions
5085 static void free_casepart(struct casepart *cp)
5089 free_exec(cp->value);
5090 free_exec(cp->action);
5097 static void free_cond_statement(struct cond_statement *s)
5101 free_exec(s->forpart);
5102 free_exec(s->condpart);
5103 free_exec(s->looppart);
5104 free_exec(s->thenpart);
5105 free_exec(s->elsepart);
5106 free_casepart(s->casepart);
5110 ###### free exec cases
5111 case Xcond_statement: free_cond_statement(cast(cond_statement, e)); break;
5113 ###### ComplexStatement Grammar
5114 | CondStatement ${ $0 = $<1; }$
5116 ###### declare terminals
5117 $TERM for then while do
5124 // A CondStatement must end with EOL, as does CondSuffix and
5126 // ForPart, ThenPart, SwitchPart, CasePart are non-empty and
5127 // may or may not end with EOL
5128 // WhilePart and IfPart include an appropriate Suffix
5130 // ForPart, SwitchPart, and IfPart open scopes, o we have to close
5131 // them. WhilePart opens and closes its own scope.
5132 CondStatement -> ForPart OptNL ThenPart OptNL WhilePart CondSuffix ${
5135 $0->thenpart = $<TP;
5136 $0->looppart = $<WP;
5137 var_block_close(c, CloseSequential, $0);
5139 | ForPart OptNL WhilePart CondSuffix ${
5142 $0->looppart = $<WP;
5143 var_block_close(c, CloseSequential, $0);
5145 | WhilePart CondSuffix ${
5147 $0->looppart = $<WP;
5149 | SwitchPart OptNL CasePart CondSuffix ${
5151 $0->condpart = $<SP;
5152 $CP->next = $0->casepart;
5153 $0->casepart = $<CP;
5154 var_block_close(c, CloseSequential, $0);
5156 | SwitchPart : IN OptNL CasePart CondSuffix OUT Newlines ${
5158 $0->condpart = $<SP;
5159 $CP->next = $0->casepart;
5160 $0->casepart = $<CP;
5161 var_block_close(c, CloseSequential, $0);
5163 | IfPart IfSuffix ${
5165 $0->condpart = $IP.condpart; $IP.condpart = NULL;
5166 $0->thenpart = $IP.thenpart; $IP.thenpart = NULL;
5167 // This is where we close an "if" statement
5168 var_block_close(c, CloseSequential, $0);
5171 CondSuffix -> IfSuffix ${
5174 | Newlines CasePart CondSuffix ${
5176 $CP->next = $0->casepart;
5177 $0->casepart = $<CP;
5179 | CasePart CondSuffix ${
5181 $CP->next = $0->casepart;
5182 $0->casepart = $<CP;
5185 IfSuffix -> Newlines ${ $0 = new(cond_statement); }$
5186 | Newlines ElsePart ${ $0 = $<EP; }$
5187 | ElsePart ${$0 = $<EP; }$
5189 ElsePart -> else OpenBlock Newlines ${
5190 $0 = new(cond_statement);
5191 $0->elsepart = $<OB;
5192 var_block_close(c, CloseElse, $0->elsepart);
5194 | else OpenScope CondStatement ${
5195 $0 = new(cond_statement);
5196 $0->elsepart = $<CS;
5197 var_block_close(c, CloseElse, $0->elsepart);
5201 CasePart -> case Expression OpenScope ColonBlock ${
5202 $0 = calloc(1,sizeof(struct casepart));
5205 var_block_close(c, CloseParallel, $0->action);
5209 // These scopes are closed in CondStatement
5210 ForPart -> for OpenBlock ${
5214 ThenPart -> then OpenBlock ${
5216 var_block_close(c, CloseSequential, $0);
5220 // This scope is closed in CondStatement
5221 WhilePart -> while UseBlock OptNL do OpenBlock ${
5226 var_block_close(c, CloseSequential, $0->right);
5227 var_block_close(c, CloseSequential, $0);
5229 | while OpenScope Expression OpenScope ColonBlock ${
5234 var_block_close(c, CloseSequential, $0->right);
5235 var_block_close(c, CloseSequential, $0);
5239 IfPart -> if UseBlock OptNL then OpenBlock ${
5242 var_block_close(c, CloseParallel, $0.thenpart);
5244 | if OpenScope Expression OpenScope ColonBlock ${
5247 var_block_close(c, CloseParallel, $0.thenpart);
5249 | if OpenScope Expression OpenScope OptNL then Block ${
5252 var_block_close(c, CloseParallel, $0.thenpart);
5256 // This scope is closed in CondStatement
5257 SwitchPart -> switch OpenScope Expression ${
5260 | switch UseBlock ${
5264 ###### print binode cases
5266 if (b->left && b->left->type == Xbinode &&
5267 cast(binode, b->left)->op == Block) {
5269 do_indent(indent, "while {\n");
5271 do_indent(indent, "while\n");
5272 print_exec(b->left, indent+1, bracket);
5274 do_indent(indent, "} do {\n");
5276 do_indent(indent, "do\n");
5277 print_exec(b->right, indent+1, bracket);
5279 do_indent(indent, "}\n");
5281 do_indent(indent, "while ");
5282 print_exec(b->left, 0, bracket);
5287 print_exec(b->right, indent+1, bracket);
5289 do_indent(indent, "}\n");
5293 ###### print exec cases
5295 case Xcond_statement:
5297 struct cond_statement *cs = cast(cond_statement, e);
5298 struct casepart *cp;
5300 do_indent(indent, "for");
5301 if (bracket) printf(" {\n"); else printf("\n");
5302 print_exec(cs->forpart, indent+1, bracket);
5305 do_indent(indent, "} then {\n");
5307 do_indent(indent, "then\n");
5308 print_exec(cs->thenpart, indent+1, bracket);
5310 if (bracket) do_indent(indent, "}\n");
5313 print_exec(cs->looppart, indent, bracket);
5317 do_indent(indent, "switch");
5319 do_indent(indent, "if");
5320 if (cs->condpart && cs->condpart->type == Xbinode &&
5321 cast(binode, cs->condpart)->op == Block) {
5326 print_exec(cs->condpart, indent+1, bracket);
5328 do_indent(indent, "}\n");
5330 do_indent(indent, "then\n");
5331 print_exec(cs->thenpart, indent+1, bracket);
5335 print_exec(cs->condpart, 0, bracket);
5341 print_exec(cs->thenpart, indent+1, bracket);
5343 do_indent(indent, "}\n");
5348 for (cp = cs->casepart; cp; cp = cp->next) {
5349 do_indent(indent, "case ");
5350 print_exec(cp->value, -1, 0);
5355 print_exec(cp->action, indent+1, bracket);
5357 do_indent(indent, "}\n");
5360 do_indent(indent, "else");
5365 print_exec(cs->elsepart, indent+1, bracket);
5367 do_indent(indent, "}\n");
5372 ###### propagate binode cases
5374 propagate_types(b->right, c, perr_local, Tnone, 0);
5375 return propagate_types(b->left, c, perr, type, rules);
5377 ###### propagate exec cases
5378 case Xcond_statement:
5380 // forpart and looppart->right must return Tnone
5381 // thenpart must return Tnone if there is a loopart,
5382 // otherwise it is like elsepart.
5384 // be bool if there is no casepart
5385 // match casepart->values if there is a switchpart
5386 // either be bool or match casepart->value if there
5388 // elsepart and casepart->action must match the return type
5389 // expected of this statement.
5390 struct cond_statement *cs = cast(cond_statement, prog);
5391 struct casepart *cp;
5393 t = propagate_types(cs->forpart, c, perr, Tnone, 0);
5396 t = propagate_types(cs->thenpart, c, perr, Tnone, 0);
5398 if (cs->casepart == NULL) {
5399 propagate_types(cs->condpart, c, perr, Tbool, 0);
5400 propagate_types(cs->looppart, c, perr, Tbool, 0);
5402 /* Condpart must match case values, with bool permitted */
5404 for (cp = cs->casepart;
5405 cp && !t; cp = cp->next)
5406 t = propagate_types(cp->value, c, perr, NULL, 0);
5407 if (!t && cs->condpart)
5408 t = propagate_types(cs->condpart, c, perr, // NOTEST
5410 if (!t && cs->looppart)
5412 t = propagate_types(cs->looppart, c, perr, NULL, // NOTEST
5414 // Now we have a type (I hope) push it down
5416 for (cp = cs->casepart; cp; cp = cp->next)
5417 propagate_types(cp->value, c, perr, t, 0);
5418 propagate_types(cs->condpart, c, perr, t, Rboolok);
5419 propagate_types(cs->looppart, c, perr, t, Rboolok);
5422 // (if)then, else, and case parts must return expected type.
5423 if (!cs->looppart && !type)
5424 type = propagate_types(cs->thenpart, c, perr, NULL, rules);
5426 type = propagate_types(cs->elsepart, c, perr, NULL, rules);
5427 for (cp = cs->casepart;
5429 cp = cp->next) // NOTEST
5430 type = propagate_types(cp->action, c, perr, NULL, rules); // NOTEST
5433 propagate_types(cs->thenpart, c, perr, type, rules);
5434 propagate_types(cs->elsepart, c, perr, type, rules);
5435 for (cp = cs->casepart; cp ; cp = cp->next)
5436 propagate_types(cp->action, c, perr, type, rules);
5442 ###### interp binode cases
5444 // This just performs one iterration of the loop
5445 rv = interp_exec(c, b->left, &rvtype);
5446 if (rvtype == Tnone ||
5447 (rvtype == Tbool && rv.bool != 0))
5448 // rvtype is Tnone or Tbool, doesn't need to be freed
5449 interp_exec(c, b->right, NULL);
5452 ###### interp exec cases
5453 case Xcond_statement:
5455 struct value v, cnd;
5456 struct type *vtype, *cndtype;
5457 struct casepart *cp;
5458 struct cond_statement *cs = cast(cond_statement, e);
5461 interp_exec(c, cs->forpart, NULL);
5463 while ((cnd = interp_exec(c, cs->looppart, &cndtype)),
5464 cndtype == Tnone || (cndtype == Tbool && cnd.bool != 0))
5465 interp_exec(c, cs->thenpart, NULL);
5467 cnd = interp_exec(c, cs->condpart, &cndtype);
5468 if ((cndtype == Tnone ||
5469 (cndtype == Tbool && cnd.bool != 0))) {
5470 // cnd is Tnone or Tbool, doesn't need to be freed
5471 rv = interp_exec(c, cs->thenpart, &rvtype);
5472 // skip else (and cases)
5476 for (cp = cs->casepart; cp; cp = cp->next) {
5477 v = interp_exec(c, cp->value, &vtype);
5478 if (value_cmp(cndtype, vtype, &v, &cnd) == 0) {
5479 free_value(vtype, &v);
5480 free_value(cndtype, &cnd);
5481 rv = interp_exec(c, cp->action, &rvtype);
5484 free_value(vtype, &v);
5486 free_value(cndtype, &cnd);
5488 rv = interp_exec(c, cs->elsepart, &rvtype);
5495 ### Top level structure
5497 All the language elements so far can be used in various places. Now
5498 it is time to clarify what those places are.
5500 At the top level of a file there will be a number of declarations.
5501 These can be for predefined constants, `struct` types, and functions -
5502 particularly the `main` function.
5504 The various declarations do not return anything. They store the
5505 various declarations in the parse context.
5507 ###### Parser: grammar
5510 Ocean -> OptNL DeclarationList
5512 ## declare terminals
5520 DeclarationList -> Declaration
5521 | DeclarationList Declaration
5523 Declaration -> ERROR Newlines ${
5524 tok_err(c, // NOTEST
5525 "error: unhandled parse error", &$1);
5531 ## top level grammar
5535 ### The `const` section
5537 As well as being defined in with the code that uses them, constants can
5538 be declared at the top level. These have full-file scope, so they are
5539 always `InScope`, even before(!) they have been declared. The value of
5540 a top level constant can be given as an expression, and this is
5541 evaluated after parsing and before execution.
5543 A function call can syntactically be used to evaluate a constant, but as
5544 yet we don't detect which functions are safe to use that way, so this
5545 does not actually work.
5547 Constants are defined in a section that starts with the reserved word
5548 `const` and then has a block with a list of assignment statements.
5549 For syntactic consistency, these must use the double-colon syntax to
5550 make it clear that they are constants. Type can also be given: if
5551 not, the type will be determined during analysis, as with other
5554 ###### parse context
5555 struct binode *constlist;
5557 ###### top level grammar
5561 DeclareConstant -> const { IN OptNL ConstList OUT OptNL } Newlines
5562 | const { SimpleConstList } Newlines
5563 | const IN OptNL ConstList OUT Newlines
5564 | const SimpleConstList Newlines
5566 ConstList -> ConstList SimpleConstLine
5569 SimpleConstList -> SimpleConstList ; Const
5573 SimpleConstLine -> SimpleConstList Newlines
5574 | ERROR Newlines ${ tok_err(c, "Syntax error in constant", &$1); }$
5577 CType -> Type ${ $0 = $<1; }$
5581 Const -> IDENTIFIER :: CType = Expression ${ {
5583 struct binode *bl, *bv;
5584 struct var *var = new_pos(var, $ID);
5586 v = var_decl(c, $ID.txt);
5588 v->where_decl = var;
5594 v = var_ref(c, $1.txt);
5595 if (v->type == Tnone) {
5596 v->where_decl = var;
5602 tok_err(c, "error: name already declared", &$1);
5603 type_err(c, "info: this is where '%v' was first declared",
5604 v->where_decl, NULL, 0, NULL);
5616 bl->left = c->constlist;
5621 ###### core functions
5622 static void resolve_consts(struct parse_context *c)
5626 enum { none, some, cannot } progress = none;
5628 c->constlist = reorder_bilist(c->constlist);
5631 for (b = cast(binode, c->constlist); b;
5632 b = cast(binode, b->right)) {
5634 struct binode *vb = cast(binode, b->left);
5635 struct var *v = cast(var, vb->left);
5636 if (v->var->frame_pos >= 0)
5640 propagate_types(vb->right, c, &perr,
5642 } while (perr & Eretry);
5644 c->parse_error += 1;
5645 else if (!(perr & Eruntime)) {
5647 struct value res = interp_exec(
5648 c, vb->right, &v->var->type);
5649 global_alloc(c, v->var->type, v->var, &res);
5651 if (progress == cannot)
5652 type_err(c, "error: const %v cannot be resolved.",
5662 progress = cannot; break;
5664 progress = none; break;
5669 ###### print const decls
5674 for (b = cast(binode, context.constlist); b;
5675 b = cast(binode, b->right)) {
5676 struct binode *vb = cast(binode, b->left);
5677 struct var *vr = cast(var, vb->left);
5678 struct variable *v = vr->var;
5684 printf(" %.*s :: ", v->name->name.len, v->name->name.txt);
5685 type_print(v->type, stdout);
5687 print_exec(vb->right, -1, 0);
5692 ###### free const decls
5693 free_binode(context.constlist);
5695 ### Function declarations
5697 The code in an Ocean program is all stored in function declarations.
5698 One of the functions must be named `main` and it must accept an array of
5699 strings as a parameter - the command line arguments.
5701 As this is the top level, several things are handled a bit differently.
5702 The function is not interpreted by `interp_exec` as that isn't passed
5703 the argument list which the program requires. Similarly type analysis
5704 is a bit more interesting at this level.
5706 ###### ast functions
5708 static struct type *handle_results(struct parse_context *c,
5709 struct binode *results)
5711 /* Create a 'struct' type from the results list, which
5712 * is a list for 'struct var'
5714 struct type *t = add_anon_type(c, &structure_prototype,
5719 for (b = results; b; b = cast(binode, b->right))
5721 t->structure.nfields = cnt;
5722 t->structure.fields = calloc(cnt, sizeof(struct field));
5724 for (b = results; b; b = cast(binode, b->right)) {
5725 struct var *v = cast(var, b->left);
5726 struct field *f = &t->structure.fields[cnt++];
5727 int a = v->var->type->align;
5728 f->name = v->var->name->name;
5729 f->type = v->var->type;
5731 f->offset = t->size;
5732 v->var->frame_pos = f->offset;
5733 t->size += ((f->type->size - 1) | (a-1)) + 1;
5736 variable_unlink_exec(v->var);
5738 free_binode(results);
5742 static struct variable *declare_function(struct parse_context *c,
5743 struct variable *name,
5744 struct binode *args,
5746 struct binode *results,
5750 struct value fn = {.function = code};
5752 var_block_close(c, CloseFunction, code);
5753 t = add_anon_type(c, &function_prototype,
5754 "func %.*s", name->name->name.len,
5755 name->name->name.txt);
5757 t->function.params = reorder_bilist(args);
5759 ret = handle_results(c, reorder_bilist(results));
5760 t->function.inline_result = 1;
5761 t->function.local_size = ret->size;
5763 t->function.return_type = ret;
5764 global_alloc(c, t, name, &fn);
5765 name->type->function.scope = c->out_scope;
5770 var_block_close(c, CloseFunction, NULL);
5772 c->out_scope = NULL;
5776 ###### declare terminals
5779 ###### top level grammar
5782 DeclareFunction -> func FuncName ( OpenScope ArgsLine ) Block Newlines ${
5783 $0 = declare_function(c, $<FN, $<Ar, Tnone, NULL, $<Bl);
5785 | func FuncName IN OpenScope Args OUT OptNL do Block Newlines ${
5786 $0 = declare_function(c, $<FN, $<Ar, Tnone, NULL, $<Bl);
5788 | func FuncName NEWLINE OpenScope OptNL do Block Newlines ${
5789 $0 = declare_function(c, $<FN, NULL, Tnone, NULL, $<Bl);
5791 | func FuncName ( OpenScope ArgsLine ) : Type Block Newlines ${
5792 $0 = declare_function(c, $<FN, $<Ar, $<Ty, NULL, $<Bl);
5794 | func FuncName ( OpenScope ArgsLine ) : ( ArgsLine ) Block Newlines ${
5795 $0 = declare_function(c, $<FN, $<AL, NULL, $<AL2, $<Bl);
5797 | func FuncName IN OpenScope Args OUT OptNL return Type Newlines do Block Newlines ${
5798 $0 = declare_function(c, $<FN, $<Ar, $<Ty, NULL, $<Bl);
5800 | func FuncName NEWLINE OpenScope return Type Newlines do Block Newlines ${
5801 $0 = declare_function(c, $<FN, NULL, $<Ty, NULL, $<Bl);
5803 | func FuncName IN OpenScope Args OUT OptNL return IN Args OUT OptNL do Block Newlines ${
5804 $0 = declare_function(c, $<FN, $<Ar, NULL, $<Ar2, $<Bl);
5806 | func FuncName NEWLINE OpenScope return IN Args OUT OptNL do Block Newlines ${
5807 $0 = declare_function(c, $<FN, NULL, NULL, $<Ar, $<Bl);
5810 ###### print func decls
5815 while (target != 0) {
5817 for (v = context.in_scope; v; v=v->in_scope)
5818 if (v->depth == 0 && v->type && v->type->check_args) {
5827 struct value *val = var_value(&context, v);
5828 printf("func %.*s", v->name->name.len, v->name->name.txt);
5829 v->type->print_type_decl(v->type, stdout);
5832 print_exec(val->function, 1, brackets);
5835 print_value(v->type, val, stdout);
5837 printf("/* frame size %d */\n", v->type->function.local_size);
5843 ###### core functions
5845 static int analyse_funcs(struct parse_context *c)
5849 for (v = c->in_scope; v; v = v->in_scope) {
5853 if (v->depth != 0 || !v->type || !v->type->check_args)
5855 ret = v->type->function.inline_result ?
5856 Tnone : v->type->function.return_type;
5857 val = var_value(c, v);
5860 propagate_types(val->function, c, &perr, ret, 0);
5861 } while (!(perr & Efail) && (perr & Eretry));
5862 if (!(perr & Efail))
5863 /* Make sure everything is still consistent */
5864 propagate_types(val->function, c, &perr, ret, 0);
5867 if (!v->type->function.inline_result &&
5868 !v->type->function.return_type->dup) {
5869 type_err(c, "error: function cannot return value of type %1",
5870 v->where_decl, v->type->function.return_type, 0, NULL);
5873 scope_finalize(c, v->type);
5878 static int analyse_main(struct type *type, struct parse_context *c)
5880 struct binode *bp = type->function.params;
5884 struct type *argv_type;
5886 argv_type = add_anon_type(c, &array_prototype, "argv");
5887 argv_type->array.member = Tstr;
5888 argv_type->array.unspec = 1;
5890 for (b = bp; b; b = cast(binode, b->right)) {
5894 propagate_types(b->left, c, &perr, argv_type, 0);
5896 default: /* invalid */ // NOTEST
5897 propagate_types(b->left, c, &perr, Tnone, 0); // NOTEST
5900 c->parse_error += 1;
5903 return !c->parse_error;
5906 static void interp_main(struct parse_context *c, int argc, char **argv)
5908 struct value *progp = NULL;
5909 struct text main_name = { "main", 4 };
5910 struct variable *mainv;
5916 mainv = var_ref(c, main_name);
5918 progp = var_value(c, mainv);
5919 if (!progp || !progp->function) {
5920 fprintf(stderr, "oceani: no main function found.\n");
5921 c->parse_error += 1;
5924 if (!analyse_main(mainv->type, c)) {
5925 fprintf(stderr, "oceani: main has wrong type.\n");
5926 c->parse_error += 1;
5929 al = mainv->type->function.params;
5931 c->local_size = mainv->type->function.local_size;
5932 c->local = calloc(1, c->local_size);
5934 struct var *v = cast(var, al->left);
5935 struct value *vl = var_value(c, v->var);
5943 t->array.size = argc;
5944 t->prepare_type(c, t, 0);
5945 array_init(v->var->type, vl);
5946 for (i = 0; i < argc; i++) {
5947 struct value *vl2 = vl->array + i * v->var->type->array.member->size;
5949 arg.str.txt = argv[i];
5950 arg.str.len = strlen(argv[i]);
5951 free_value(Tstr, vl2);
5952 dup_value(Tstr, &arg, vl2);
5956 al = cast(binode, al->right);
5958 v = interp_exec(c, progp->function, &vtype);
5959 free_value(vtype, &v);
5964 ## And now to test it out.
5966 Having a language requires having a "hello world" program. I'll
5967 provide a little more than that: a program that prints "Hello world"
5968 finds the GCD of two numbers, prints the first few elements of
5969 Fibonacci, performs a binary search for a number, and a few other
5970 things which will likely grow as the languages grows.
5972 ###### File: oceani.mk
5975 @echo "===== DEMO ====="
5976 ./oceani --section "demo: hello" oceani.mdc 55 33
5982 four ::= 2 + 2 ; five ::= 10/2
5983 const pie ::= "I like Pie";
5984 cake ::= "The cake is"
5992 func fibonacci(n:number):number
5995 else use fibonacci(n-1) + fibonacci(n-2)
5997 func main(argv:[]string)
5998 print "Hello World, what lovely oceans you have!"
5999 print "Are there", five, "?"
6000 print pi, pie, "but", cake
6002 A := $argv[1]; B := $argv[2]
6004 /* When a variable is defined in both branches of an 'if',
6005 * and used afterwards, the variables are merged.
6011 print "Is", A, "bigger than", B,"? ", bigger
6012 /* If a variable is not used after the 'if', no
6013 * merge happens, so types can be different
6016 double:string = "yes"
6017 print A, "is more than twice", B, "?", double
6020 print "double", B, "is", double
6031 print "GCD of", A, "and", B,"is", a
6033 print a, "is not positive, cannot calculate GCD"
6035 print b, "is not positive, cannot calculate GCD"
6040 print "Fibonacci:", f1,f2,
6041 then togo = togo - 1
6053 print "", fibonacci(f),
6056 /* Binary search... */
6061 mid := (lo + hi) / 2
6074 print "Yay, I found", target
6076 print "Closest I found was", lo
6081 // "middle square" PRNG. Not particularly good, but one my
6082 // Dad taught me - the first one I ever heard of.
6083 for i:=1; then i = i + 1; while i < size:
6084 n := list[i-1] * list[i-1]
6085 list[i] = (n / 100) % 10 000
6087 print "Before sort:",
6088 for i:=0; then i = i + 1; while i < size:
6092 for i := 1; then i=i+1; while i < size:
6093 for j:=i-1; then j=j-1; while j >= 0:
6094 if list[j] > list[j+1]:
6098 print " After sort:",
6099 for i:=0; then i = i + 1; while i < size:
6103 if 1 == 2 then print "yes"; else print "no"
6107 bob.alive = (bob.name == "Hello")
6108 print "bob", "is" if bob.alive else "isn't", "alive"