[find the source at oceani.mdc]

Ocean Interpreter - Jamison Creek version

Ocean is intended to be a compiled language, so this interpreter is not targeted at being the final product. It is, rather, an intermediate stage and fills that role in two distinct ways.

Firstly, it exists as a platform to experiment with the early language design. An interpreter is easy to write and easy to get working, so the barrier for entry is lower if I aim to start with an interpreter.

Secondly, the plan for the Ocean compiler is to write it in the Ocean language. To achieve this we naturally need some sort of boot-strap process and this interpreter - written in portable C - will fill that role. It will be used to bootstrap the Ocean compiler.

Two features that are not needed to fill either of these roles are performance and completeness. The interpreter only needs to be fast enough to run small test programs and occasionally to run the compiler on itself. It only needs to be complete enough to test aspects of the design which are developed before the compiler is working, and to run the compiler on itself. Any features not used by the compiler when compiling itself are superfluous. They may be included anyway, but they may not.

Nonetheless, the interpreter should end up being reasonably complete, and any performance bottlenecks which appear and are easily fixed, will be.

Current version

This third version of the interpreter exists to test out some initial ideas relating to types. Particularly it adds arrays (indexed from zero) and simple structures. Basic control flow and variable scoping are already fairly well established, as are basic numerical and boolean operators.

Some operators that have only recently been added, and so have not generated all that much experience yet are "and then" and "or else" as short-circuit Boolean operators, and the "if ... else" trinary operator which can select between two expressions based on a third (which appears syntactically in the middle).

Elements that are present purely to make a usable language, and without any expectation that they will remain, are the "program' clause, which provides a list of variables to received command-line arguments, and the "print" statement which performs simple output.

The current scalar types are "number", "Boolean", and "string". Boolean will likely stay in its current form, the other two might, but could just as easily be changed.

Naming

Versions of the interpreter which obviously do not support a complete language will be named after creeks and streams. This one is Jamison Creek.

Once we have something reasonably resembling a complete language, the names of rivers will be used. Early versions of the compiler will be named after seas. Major releases of the compiler will be named after oceans. Hopefully I will be finished once I get to the Pacific Ocean release.

Outline

As well as parsing and executing a program, the interpreter can print out the program from the parsed internal structure. This is useful for validating the parsing. So the main requirements of the interpreter are:

This is all performed by a single C program extracted with parsergen.

There will be two formats for printing the program: a default and one that uses bracketing. So a --bracket command line option is needed for that. Normally the first code section found is used, however an alternate section can be requested so that a file (such as this one) can contain multiple programs This is effected with the --section option.

This code must be compiled with -fplan9-extensions so that anonymous structures can be used.

File: oceani.mk
myCFLAGS := -Wall -g -fplan9-extensions
CFLAGS := $(filter-out $(myCFLAGS),$(CFLAGS)) $(myCFLAGS)
myLDLIBS:= libparser.o libscanner.o libmdcode.o -licuuc
LDLIBS := $(filter-out $(myLDLIBS),$(LDLIBS)) $(myLDLIBS)
## libs
all :: $(LDLIBS) oceani
oceani.c oceani.h : oceani.mdc parsergen
    ./parsergen -o oceani --LALR --tag Parser oceani.mdc
oceani.mk: oceani.mdc md2c
    ./md2c oceani.mdc

oceani: oceani.o $(LDLIBS)
    $(CC) $(CFLAGS) -o oceani oceani.o $(LDLIBS)
Parser: header
## macros
## ast
struct parse_context {
    struct token_config config;
    char *file_name;
    int parse_error;
    struct exec *prog;
    ## parse context
};
macros
#define container_of(ptr, type, member) ({                      \
    const typeof( ((type *)0)->member ) *__mptr = (ptr);    \
    (type *)( (char *)__mptr - offsetof(type,member) );})

#define config2context(_conf) container_of(_conf, struct parse_context, \
    config)
Parser: code
#include <unistd.h>
#include <stdlib.h>
#include <fcntl.h>
#include <errno.h>
#include <sys/mman.h>
#include <string.h>
#include <stdio.h>
#include <locale.h>
#include <malloc.h>
#include "mdcode.h"
#include "scanner.h"
#include "parser.h"

## includes

#include "oceani.h"

## forward decls
## value functions
## ast functions
## core functions

#include <getopt.h>
static char Usage[] = "Usage: oceani --trace --print --noexec --brackets"
                      "--section=SectionName prog.ocn\n";
static const struct option long_options[] = {
    {"trace",     0, NULL, 't'},
    {"print",     0, NULL, 'p'},
    {"noexec",    0, NULL, 'n'},
    {"brackets",  0, NULL, 'b'},
    {"section",   1, NULL, 's'},
    {NULL,        0, NULL, 0},
};
const char *options = "tpnbs";
int main(int argc, char *argv[])
{
    int fd;
    int len;
    char *file;
    struct section *s, *ss;
    char *section = NULL;
    struct parse_context context = {
        .config = {
            .ignored = (1 << TK_line_comment)
                     | (1 << TK_block_comment),
            .number_chars = ".,_+-",
            .word_start = "_",
            .word_cont = "_",
        },
    };
    int doprint=0, dotrace=0, doexec=1, brackets=0;
    int opt;
    while ((opt = getopt_long(argc, argv, options, long_options, NULL))
           != -1) {
        switch(opt) {
        case 't': dotrace=1; break;
        case 'p': doprint=1; break;
        case 'n': doexec=0; break;
        case 'b': brackets=1; break;
        case 's': section = optarg; break;
        default: fprintf(stderr, Usage);
            exit(1);
        }
    }
    if (optind >= argc) {
        fprintf(stderr, "oceani: no input file given\n");
        exit(1);
    }
    fd = open(argv[optind], O_RDONLY);
    if (fd < 0) {
        fprintf(stderr, "oceani: cannot open %s\n", argv[optind]);
        exit(1);
    }
    context.file_name = argv[optind];
    len = lseek(fd, 0, 2);
    file = mmap(NULL, len, PROT_READ, MAP_SHARED, fd, 0);
    s = code_extract(file, file+len, NULL);
    if (!s) {
        fprintf(stderr, "oceani: could not find any code in %s\n",
                argv[optind]);
        exit(1);
    }

    ## context initialization

    if (section) {
        for (ss = s; ss; ss = ss->next) {
            struct text sec = ss->section;
            if (sec.len == strlen(section) &&
                strncmp(sec.txt, section, sec.len) == 0)
                break;
        }
        if (!ss) {
            fprintf(stderr, "oceani: cannot find section %s\n",
                    section);
            exit(1);
        }
    } else
        ss = s;
    parse_oceani(ss->code, &context.config, dotrace ? stderr : NULL);

    if (!context.prog) {
        fprintf(stderr, "oceani: no program found.\n");
        context.parse_error = 1;
    }
    if (context.prog && doprint) {
        ## print const decls
        ## print type decls
        print_exec(context.prog, 0, brackets);
    }
    if (context.prog && doexec && !context.parse_error) {
        if (!analyse_prog(context.prog, &context)) {
            fprintf(stderr, "oceani: type error in program - not running.\n");
            exit(1);
        }
        interp_prog(context.prog, argv+optind+1);
    }
    free_exec(context.prog);

    while (s) {
        struct section *t = s->next;
        code_free(s->code);
        free(s);
        s = t;
    }
    ## free context vars
    ## free context types
    exit(context.parse_error ? 1 : 0);
}

Analysis

The four requirements of parse, analyse, print, interpret apply to each language element individually so that is how most of the code will be structured.

Three of the four are fairly self explanatory. The one that requires a little explanation is the analysis step.

The current language design does not require the types of variables to be declared, but they must still have a single type. Different operations impose different requirements on the variables, for example addition requires both arguments to be numeric, and assignment requires the variable on the left to have the same type as the expression on the right.

Analysis involves propagating these type requirements around and consequently setting the type of each variable. If any requirements are violated (e.g. a string is compared with a number) or if a variable needs to have two different types, then an error is raised and the program will not run.

If the same variable is declared in both branchs of an 'if/else', or in all cases of a 'switch' then the multiple instances may be merged into just one variable if the variable is references after the conditional statement. When this happens, the types must naturally be consistent across all the branches. When the variable is not used outside the if, the variables in the different branches are distinct and can be of different types.

Determining the types of all variables early is important for processing command line arguments. These can be assigned to any of several types of variable, but we must first know the correct type so any required conversion can happen. If a variable is associated with a command line argument but no type can be interpreted (e.g. the variable is only ever used in a print statement), then the type is set to 'string'.

Undeclared names may only appear in "use" statements and "case" expressions. These names are given a type of "label" and a unique value. This allows them to fill the role of a name in an enumerated type, which is useful for testing the switch statement.

As we will see, the condition part of a while statement can return either a Boolean or some other type. This requires that the expected type that gets passed around comprises a type and a flag to indicate that Tbool is also permitted.

As there are, as yet, no distinct types that are compatible, there isn't much subtlety in the analysis. When we have distinct number types, this will become more interesting.

Error reporting

When analysis discovers an inconsistency it needs to report an error; just refusing to run the code ensures that the error doesn't cascade, but by itself it isn't very useful. A clear understanding of the sort of error message that are useful will help guide the process of analysis.

At a simplistic level, the only sort of error that type analysis can report is that the type of some construct doesn't match a contextual requirement. For example, in 4 + "hello" the addition provides a contextual requirement for numbers, but "hello" is not a number. In this particular example no further information is needed as the types are obvious from local information. When a variable is involved that isn't the case. It may be helpful to explain why the variable has a particular type, by indicating the location where the type was set, whether by declaration or usage.

Using a recursive-descent analysis we can easily detect a problem at multiple locations. In "hello:= "there"; 4 + hello" the addition will detect that one argument is not a number and the usage of hello will detect that a number was wanted, but not provided. In this (early) version of the language, we will generate error reports at multiple locations, so the use of hello will report an error and explain were the value was set, and the addition will report an error and say why numbers are needed. To be able to report locations for errors, each language element will need to record a file location (line and column) and each variable will need to record the language element where its type was set. For now we will assume that each line of an error message indicates one location in the file, and up to 2 types. So we provide a printf-like function which takes a format, a language (a struct exec which has not yet been introduced), and 2 types. "%1" reports the first type, "%2" reports the second. We will need a function to print the location, once we know how that is stored. As will be explained later, there are sometimes extra rules for type matching and they might affect error messages, we need to pass those in too.

As well as type errors, we sometimes need to report problems with tokens, which might be unexpected or might name a type that has not been defined. For these we have tok_err() which reports an error with a given token. Each of the error functions sets the flag in the context so indicate that parsing failed.

forward decls
static void fput_loc(struct exec *loc, FILE *f);
core functions
static void type_err(struct parse_context *c,
                     char *fmt, struct exec *loc,
                     struct type *t1, int rules, struct type *t2)
{
    fprintf(stderr, "%s:", c->file_name);
    fput_loc(loc, stderr);
    for (; *fmt ; fmt++) {
        if (*fmt != '%') {
            fputc(*fmt, stderr);
            continue;
        }
        fmt++;
        switch (*fmt) {
        case '%': fputc(*fmt, stderr); break;   // NOTEST
        default: fputc('?', stderr); break; // NOTEST
        case '1':
            type_print(t1, stderr);
            break;
        case '2':
            type_print(t2, stderr);
            break;
        ## format cases
        }
    }
    fputs("\n", stderr);
    c->parse_error = 1;
}

static void tok_err(struct parse_context *c, char *fmt, struct token *t)
{
    fprintf(stderr, "%s:%d:%d: %s: %.*s\n", c->file_name, t->line, t->col, fmt,
        t->txt.len, t->txt.txt);
    c->parse_error = 1;
}

Entities: declared and predeclared.

There are various "things" that the language and/or the interpreter needs to know about to parse and execute a program. These include types, variables, values, and executable code. These are all lumped together under the term "entities" (calling them "objects" would be confusing) and introduced here. These will introduced and described here. The following section will present the different specific code elements which comprise or manipulate these various entities.

Types

Values come in a wide range of types, with more likely to be added. Each type needs to be able to parse and print its own values (for convenience at least) as well as to compare two values, at least for equality and possibly for order. For now, values might need to be duplicated and freed, though eventually such manipulations will be better integrated into the language.

Rather than requiring every numeric type to support all numeric operations (add, multiple, etc), we allow types to be able to present as one of a few standard types: integer, float, and fraction. The existence of these conversion functions eventaully enable types to determine if they are compatible with other types, though such types have not yet been implemented.

Named type are stored in a simple linked list. Objects of each type are "values" which are often passed around by value.

ast
struct value {
    struct type *type;
    union {
        ## value union fields
    };
};

struct type {
    struct text name;
    struct type *next;
    struct value (*init)(struct type *type);
    struct value (*prepare)(struct type *type);
    struct value (*parse)(struct type *type, char *str);
    void (*print)(struct value val);
    void (*print_type)(struct type *type, FILE *f);
    int (*cmp_order)(struct value v1, struct value v2);
    int (*cmp_eq)(struct value v1, struct value v2);
    struct value (*dup)(struct value val);
    void (*free)(struct value val);
    void (*free_type)(struct type *t);
    int (*compat)(struct type *this, struct type *other);
    long long (*to_int)(struct value *v);
    double (*to_float)(struct value *v);
    int (*to_mpq)(mpq_t *q, struct value *v);
    ## type functions
    union {
        ## type union fields
    };
};
parse context
struct type *typelist;
ast functions
static struct type *find_type(struct parse_context *c, struct text s)
{
    struct type *l = c->typelist;

    while (l &&
           text_cmp(l->name, s) != 0)
            l = l->next;
    return l;
}

static struct type *add_type(struct parse_context *c, struct text s,
                 struct type *proto)
{
    struct type *n;

    n = calloc(1, sizeof(*n));
    *n = *proto;
    n->name = s;
    n->next = c->typelist;
    c->typelist = n;
    return n;
}

static void free_type(struct type *t)
{
    /* The type is always a reference to something in the
     * context, so we don't need to free anything.
     */
}

static void free_value(struct value v)
{
    if (v.type)
        v.type->free(v);
}

static int type_compat(struct type *require, struct type *have, int rules)
{
    if ((rules & Rboolok) && have == Tbool)
        return 1;
    if ((rules & Rnolabel) && have == Tlabel)
        return 0;
    if (!require || !have)
        return 1;

    if (require->compat)
        return require->compat(require, have);

    return require == have;
}

static void type_print(struct type *type, FILE *f)
{
    if (!type)
        fputs("*unknown*type*", f);
    else if (type->name.len)
        fprintf(f, "%.*s", type->name.len, type->name.txt);
    else if (type->print_type)
        type->print_type(type, f);
    else
        fputs("*invalid*type*", f); // NOTEST
}

static struct value val_prepare(struct type *type)
{
    struct value rv;

    if (type)
        return type->prepare(type);
    rv.type = type;
    return rv;
}

static struct value val_init(struct type *type)
{
    struct value rv;

    if (type)
        return type->init(type);
    rv.type = type;
    return rv;
}

static struct value dup_value(struct value v)
{
    if (v.type)
        return v.type->dup(v);
    return v;
}

static int value_cmp(struct value left, struct value right)
{
    if (left.type && left.type->cmp_order)
        return left.type->cmp_order(left, right);
    if (left.type && left.type->cmp_eq)
        return left.type->cmp_eq(left, right);
    return -1;
}

static void print_value(struct value v)
{
    if (v.type && v.type->print)
        v.type->print(v);
    else
        printf("*Unknown*");        // NOTEST
}

static struct value parse_value(struct type *type, char *arg)
{
    struct value rv;

    if (type && type->parse)
        return type->parse(type, arg);
    rv.type = NULL;             // NOTEST
    return rv;              // NOTEST
}
forward decls
static void free_value(struct value v);
static int type_compat(struct type *require, struct type *have, int rules);
static void type_print(struct type *type, FILE *f);
static struct value val_init(struct type *type);
static struct value dup_value(struct value v);
static int value_cmp(struct value left, struct value right);
static void print_value(struct value v);
static struct value parse_value(struct type *type, char *arg);
free context types
while (context.typelist) {
    struct type *t = context.typelist;

    context.typelist = t->next;
    if (t->free_type)
        t->free_type(t);
    free(t);
}

Base Types

Values of the base types can be numbers, which we represent as multi-precision fractions, strings, Booleans and labels. When analysing the program we also need to allow for places where no value is meaningful (type Tnone) and where we don't know what type to expect yet (type is NULL).

Values are never shared, they are always copied when used, and freed when no longer needed.

When propagating type information around the program, we need to determine if two types are compatible, where type NULL is compatible with anything. There are two special cases with type compatibility, both related to the Conditional Statement which will be described later. In some cases a Boolean can be accepted as well as some other primary type, and in others any type is acceptable except a label (Vlabel). A separate function encoding these cases will simplify some code later.

When assigning command line arguments to variables, we need to be able to parse each type from a string.

The distinction beteen "prepare" and "init" needs to be explained. "init" sets up an initial value, such as "zero" or the empty string. "prepare" simply prepares the data structure so that if "free" gets called on it, it won't do something silly. Normally a value will be stored after "prepare" but before "free", but this might not happen if there are errors.

includes
#include <gmp.h>
#include "string.h"
#include "number.h"
libs
myLDLIBS := libnumber.o libstring.o -lgmp
LDLIBS := $(filter-out $(myLDLIBS),$(LDLIBS)) $(myLDLIBS)
type union fields
enum vtype {Vnone, Vstr, Vnum, Vbool, Vlabel} vtype;
value union fields
struct text str;
mpq_t num;
int bool;
void *label;
ast functions
static void _free_value(struct value v)
{
    switch (v.type->vtype) {
    case Vnone: break;
    case Vstr: free(v.str.txt); break;
    case Vnum: mpq_clear(v.num); break;
    case Vlabel:
    case Vbool: break;
    }
}
value functions
static struct value _val_prepare(struct type *type)
{
    struct value rv;

    rv.type = type;
    switch(type->vtype) {
    case Vnone:
        break;
    case Vnum:
        memset(&rv.num, 0, sizeof(rv.num));
        break;
    case Vstr:
        rv.str.txt = NULL;
        rv.str.len = 0;
        break;
    case Vbool:
        rv.bool = 0;
        break;
    case Vlabel:
        rv.label = NULL;
        break;
    }
    return rv;
}

static struct value _val_init(struct type *type)
{
    struct value rv;

    rv.type = type;
    switch(type->vtype) {
    case Vnone:     // NOTEST
        break;      // NOTEST
    case Vnum:
        mpq_init(rv.num); break;
    case Vstr:
        rv.str.txt = malloc(1);
        rv.str.len = 0;
        break;
    case Vbool:
        rv.bool = 0;
        break;
    case Vlabel:            // NOTEST
        rv.label = NULL;    // NOTEST
        break;          // NOTEST
    }
    return rv;
}

static struct value _dup_value(struct value v)
{
    struct value rv;
    rv.type = v.type;
    switch (rv.type->vtype) {
    case Vnone:     // NOTEST
        break;      // NOTEST
    case Vlabel:
        rv.label = v.label;
        break;
    case Vbool:
        rv.bool = v.bool;
        break;
    case Vnum:
        mpq_init(rv.num);
        mpq_set(rv.num, v.num);
        break;
    case Vstr:
        rv.str.len = v.str.len;
        rv.str.txt = malloc(rv.str.len);
        memcpy(rv.str.txt, v.str.txt, v.str.len);
        break;
    }
    return rv;
}

static int _value_cmp(struct value left, struct value right)
{
    int cmp;
    if (left.type != right.type)
        return left.type - right.type;  // NOTEST
    switch (left.type->vtype) {
    case Vlabel: cmp = left.label == right.label ? 0 : 1; break;
    case Vnum: cmp = mpq_cmp(left.num, right.num); break;
    case Vstr: cmp = text_cmp(left.str, right.str); break;
    case Vbool: cmp = left.bool - right.bool; break;
    case Vnone: cmp = 0;            // NOTEST
    }
    return cmp;
}

static void _print_value(struct value v)
{
    switch (v.type->vtype) {
    case Vnone:             // NOTEST
        printf("*no-value*"); break;    // NOTEST
    case Vlabel:                // NOTEST
        printf("*label-%p*", v.label); break; // NOTEST
    case Vstr:
        printf("%.*s", v.str.len, v.str.txt); break;
    case Vbool:
        printf("%s", v.bool ? "True":"False"); break;
    case Vnum:
        {
        mpf_t fl;
        mpf_init2(fl, 20);
        mpf_set_q(fl, v.num);
        gmp_printf("%Fg", fl);
        mpf_clear(fl);
        break;
        }
    }
}

static struct value _parse_value(struct type *type, char *arg)
{
    struct value val;
    struct text tx;
    int neg = 0;
    char tail[3] = "";

    val.type = type;
    switch(type->vtype) {
    case Vlabel:                // NOTEST
    case Vnone:             // NOTEST
        val.type = NULL;        // NOTEST
        break;              // NOTEST
    case Vstr:
        val.str.len = strlen(arg);
        val.str.txt = malloc(val.str.len);
        memcpy(val.str.txt, arg, val.str.len);
        break;
    case Vnum:
        if (*arg == '-') {
            neg = 1;
            arg++;
        }
        tx.txt = arg; tx.len = strlen(tx.txt);
        if (number_parse(val.num, tail, tx) == 0)
            mpq_init(val.num);
        else if (neg)
            mpq_neg(val.num, val.num);
        if (tail[0]) {
            printf("Unsupported suffix: %s\n", arg);
            val.type = NULL;
        }
        break;
    case Vbool:
        if (strcasecmp(arg, "true") == 0 ||
            strcmp(arg, "1") == 0)
            val.bool = 1;
        else if (strcasecmp(arg, "false") == 0 ||
                 strcmp(arg, "0") == 0)
            val.bool = 0;
        else {
            printf("Bad bool: %s\n", arg);
            val.type = NULL;
        }
        break;
    }
    return val;
}

static void _free_value(struct value v);

static struct type base_prototype = {
    .init = _val_init,
    .prepare = _val_prepare,
    .parse = _parse_value,
    .print = _print_value,
    .cmp_order = _value_cmp,
    .cmp_eq = _value_cmp,
    .dup = _dup_value,
    .free = _free_value,
};

static struct type *Tbool, *Tstr, *Tnum, *Tnone, *Tlabel;
ast functions
static struct type *add_base_type(struct parse_context *c, char *n, enum vtype vt)
{
    struct text txt = { n, strlen(n) };
    struct type *t;

    t = add_type(c, txt, &base_prototype);
    t->vtype = vt;
    return t;
}
context initialization
Tbool  = add_base_type(&context, "Boolean", Vbool);
Tstr   = add_base_type(&context, "string", Vstr);
Tnum   = add_base_type(&context, "number", Vnum);
Tnone  = add_base_type(&context, "none", Vnone);
Tlabel = add_base_type(&context, "label", Vlabel);

Variables

Variables are scoped named values. We store the names in a linked list of "bindings" sorted lexically, and use sequential search and insertion sort.

ast
struct binding {
    struct text name;
    struct binding *next;   // in lexical order
    ## binding fields
};

This linked list is stored in the parse context so that "reduce" functions can find or add variables, and so the analysis phase can ensure that every variable gets a type.

parse context
struct binding *varlist;  // In lexical order
ast functions
static struct binding *find_binding(struct parse_context *c, struct text s)
{
    struct binding **l = &c->varlist;
    struct binding *n;
    int cmp = 1;

    while (*l &&
        (cmp = text_cmp((*l)->name, s)) < 0)
            l = & (*l)->next;
    if (cmp == 0)
        return *l;
    n = calloc(1, sizeof(*n));
    n->name = s;
    n->next = *l;
    *l = n;
    return n;
}

Each name can be linked to multiple variables defined in different scopes. Each scope starts where the name is declared and continues until the end of the containing code block. Scopes of a given name cannot nest, so a declaration while a name is in-scope is an error.

binding fields
struct variable *var;
ast
struct variable {
    struct variable *previous;
    struct value val;
    struct binding *name;
    struct exec *where_decl;// where name was declared
    struct exec *where_set; // where type was set
    ## variable fields
};

While the naming seems strange, we include local constants in the definition of variables. A name declared var := value can subsequently be changed, but a name declared var ::= value cannot - it is constant

variable fields
int constant;

Scopes in parallel branches can be partially merged. More specifically, if a given name is declared in both branches of an if/else then its scope is a candidate for merging. Similarly if every branch of an exhaustive switch (e.g. has an "else" clause) declares a given name, then the scopes from the branches are candidates for merging.

Note that names declared inside a loop (which is only parallel to itself) are never visible after the loop. Similarly names defined in scopes which are not parallel, such as those started by for and switch, are never visible after the scope. Only variables defined in both then and else (including the implicit then after an if, and excluding then used with for) and in all cases and else of a switch or while can be visible beyond the if/switch/while.

Labels, which are a bit like variables, follow different rules. Labels are not explicitly declared, but if an undeclared name appears in a context where a label is legal, that effectively declares the name as a label. The declaration remains in force (or in scope) at least to the end of the immediately containing block and conditionally in any larger containing block which does not declare the name in some other way. Importantly, the conditional scope extension happens even if the label is only used in one parallel branch of a conditional -- when used in one branch it is treated as having been declared in all branches.

Merge candidates are tentatively visible beyond the end of the branching statement which creates them. If the name is used, the merge is affirmed and they become a single variable visible at the outer layer. If not - if it is redeclared first - the merge lapses.

To track scopes we have an extra stack, implemented as a linked list, which roughly parallels the parse stack and which is used exclusively for scoping. When a new scope is opened, a new frame is pushed and the child-count of the parent frame is incremented. This child-count is used to distinguish between the first of a set of parallel scopes, in which declared variables must not be in scope, and subsequent branches, whether they must already be conditionally scoped.

To push a new frame before any code in the frame is parsed, we need a grammar reduction. This is most easily achieved with a grammar element which derives the empty string, and creates the new scope when it is recognized. This can be placed, for example, between a keyword like "if" and the code following it.

ast
struct scope {
    struct scope *parent;
    int child_count;
};
parse context
int scope_depth;
struct scope *scope_stack;
ast functions
static void scope_pop(struct parse_context *c)
{
    struct scope *s = c->scope_stack;

    c->scope_stack = s->parent;
    free(s);
    c->scope_depth -= 1;
}

static void scope_push(struct parse_context *c)
{
    struct scope *s = calloc(1, sizeof(*s));
    if (c->scope_stack)
        c->scope_stack->child_count += 1;
    s->parent = c->scope_stack;
    c->scope_stack = s;
    c->scope_depth += 1;
}
Grammar
$void
OpenScope -> ${ scope_push(config2context(config)); }$

Each variable records a scope depth and is in one of four states:

The scope depth is not greater than the current parse context scope nest depth. When the block of that depth closes, the state will change. To achieve this, all "in scope" variables are linked together as a stack in nesting order.

The scope depth is the depth of the last parallel block that enclosed the declaration, and that has closed.

variable fields
int depth, min_depth;
enum { OutScope, PendingScope, CondScope, InScope } scope;
struct variable *in_scope;
parse context
struct variable *in_scope;

All variables with the same name are linked together using the 'previous' link. Those variable that have been affirmatively merged all have a 'merged' pointer that points to one primary variable - the most recently declared instance. When merging variables, we need to also adjust the 'merged' pointer on any other variables that had previously been merged with the one that will no longer be primary.

A variable that is no longer the most recent instance of a name may still have "pending" scope, if it might still be merged with most recent instance. These variables don't really belong in the "in_scope" list, but are not immediately removed when a new instance is found. Instead, they are detected and ignored when considering the list of in_scope names.

variable fields
struct variable *merged;
ast functions
static void variable_merge(struct variable *primary, struct variable *secondary)
{
    struct variable *v;

    if (primary->merged)
        // shouldn't happen
        primary = primary->merged;

    for (v = primary->previous; v; v=v->previous)
        if (v == secondary || v == secondary->merged ||
            v->merged == secondary ||
            (v->merged && v->merged == secondary->merged)) {
            v->scope = OutScope;
            v->merged = primary;
        }
}
free context vars
while (context.varlist) {
    struct binding *b = context.varlist;
    struct variable *v = b->var;
    context.varlist = b->next;
    free(b);
    while (v) {
        struct variable *t = v;

        v = t->previous;
        free_value(t->val);
        if (t->min_depth == 0)
            // This is a global constant
            free_exec(t->where_decl);
        free(t);
    }
}

Manipulating Bindings

When a name is conditionally visible, a new declaration discards the old binding - the condition lapses. Conversely a usage of the name affirms the visibility and extends it to the end of the containing block - i.e. the block that contains both the original declaration and the latest usage. This is determined from min_depth. When a conditionally visible variable gets affirmed like this, it is also merged with other conditionally visible variables with the same name.

When we parse a variable declaration we either report an error if the name is currently bound, or create a new variable at the current nest depth if the name is unbound or bound to a conditionally scoped or pending-scope variable. If the previous variable was conditionally scoped, it and its homonyms becomes out-of-scope.

When we parse a variable reference (including non-declarative assignment) we report an error if the name is not bound or is bound to a pending-scope variable; update the scope if the name is bound to a conditionally scoped variable; or just proceed normally if the named variable is in scope.

When we exit a scope, any variables bound at this level are either marked out of scope or pending-scoped, depending on whether the scope was sequential or parallel. Here a "parallel" scope means the "then" or "else" part of a conditional, or any "case" or "else" branch of a switch. Other scopes are "sequential".

When exiting a parallel scope we check if there are any variables that were previously pending and are still visible. If there are, then there weren't redeclared in the most recent scope, so they cannot be merged and must become out-of-scope. If it is not the first of parallel scopes (based on child_count), we check that there was a previous binding that is still pending-scope. If there isn't, the new variable must now be out-of-scope.

When exiting a sequential scope that immediately enclosed parallel scopes, we need to resolve any pending-scope variables. If there was no else clause, and we cannot determine that the switch was exhaustive, we need to mark all pending-scope variable as out-of-scope. Otherwise all pending-scope variables become conditionally scoped.

ast
enum closetype { CloseSequential, CloseParallel, CloseElse };
ast functions
static struct variable *var_decl(struct parse_context *c, struct text s)
{
    struct binding *b = find_binding(c, s);
    struct variable *v = b->var;

    switch (v ? v->scope : OutScope) {
    case InScope:
        /* Caller will report the error */
        return NULL;
    case CondScope:
        for (;
             v && v->scope == CondScope;
             v = v->previous)
            v->scope = OutScope;
        break;
    default: break;
    }
    v = calloc(1, sizeof(*v));
    v->previous = b->var;
    b->var = v;
    v->name = b;
    v->min_depth = v->depth = c->scope_depth;
    v->scope = InScope;
    v->in_scope = c->in_scope;
    c->in_scope = v;
    v->val = val_prepare(NULL);
    return v;
}

static struct variable *var_ref(struct parse_context *c, struct text s)
{
    struct binding *b = find_binding(c, s);
    struct variable *v = b->var;
    struct variable *v2;

    switch (v ? v->scope : OutScope) {
    case OutScope:
    case PendingScope:
        /* Caller will report the error */
        return NULL;
    case CondScope:
        /* All CondScope variables of this name need to be merged
         * and become InScope
         */
        v->depth = v->min_depth;
        v->scope = InScope;
        for (v2 = v->previous;
             v2 && v2->scope == CondScope;
             v2 = v2->previous)
            variable_merge(v, v2);
        break;
    case InScope:
        break;
    }
    return v;
}

static void var_block_close(struct parse_context *c, enum closetype ct)
{
    /* Close off all variables that are in_scope */
    struct variable *v, **vp, *v2;

    scope_pop(c);
    for (vp = &c->in_scope;
         v = *vp, v && v->depth > c->scope_depth && v->min_depth > c->scope_depth;
         ) {
        if (v->name->var == v) switch (ct) {
        case CloseElse:
        case CloseParallel: /* handle PendingScope */
            switch(v->scope) {
            case InScope:
            case CondScope:
                if (c->scope_stack->child_count == 1)
                    v->scope = PendingScope;
                else if (v->previous &&
                         v->previous->scope == PendingScope)
                    v->scope = PendingScope;
                else if (v->val.type == Tlabel)
                    v->scope = PendingScope;
                else if (v->name->var == v)
                    v->scope = OutScope;
                if (ct == CloseElse) {
                    /* All Pending variables with this name
                     * are now Conditional */
                    for (v2 = v;
                         v2 && v2->scope == PendingScope;
                         v2 = v2->previous)
                        v2->scope = CondScope;
                }
                break;
            case PendingScope:
                for (v2 = v;
                     v2 && v2->scope == PendingScope;
                     v2 = v2->previous)
                    if (v2->val.type != Tlabel)
                        v2->scope = OutScope;
                break;
            case OutScope: break;
            }
            break;
        case CloseSequential:
            if (v->val.type == Tlabel)
                v->scope = PendingScope;
            switch (v->scope) {
            case InScope:
                v->scope = OutScope;
                break;
            case PendingScope:
                /* There was no 'else', so we can only become
                 * conditional if we know the cases were exhaustive,
                 * and that doesn't mean anything yet.
                 * So only labels become conditional..
                 */
                for (v2 = v;
                     v2 && v2->scope == PendingScope;
                     v2 = v2->previous)
                    if (v2->val.type == Tlabel) {
                        v2->scope = CondScope;
                        v2->min_depth = c->scope_depth;
                    } else
                        v2->scope = OutScope;
                break;
            case CondScope:
            case OutScope: break;
            }
            break;
        }
        if (v->scope == OutScope || v->name->var != v)
            *vp = v->in_scope;
        else
            vp = &v->in_scope;
    }
}

Executables

Executables can be lots of different things. In many cases an executable is just an operation combined with one or two other executables. This allows for expressions and lists etc. Other times an executable is something quite specific like a constant or variable name. So we define a struct exec to be a general executable with a type, and a struct binode which is a subclass of exec, forms a node in a binary tree, and holds an operation. There will be other subclasses, and to access these we need to be able to cast the exec into the various other types.

macros
#define cast(structname, pointer) ({        \
    const typeof( ((struct structname *)0)->type) *__mptr = &(pointer)->type; \
    if (__mptr && *__mptr != X##structname) abort();        \
    (struct structname *)( (char *)__mptr);})

#define new(structname) ({                      \
    struct structname *__ptr = ((struct structname *)calloc(1,sizeof(struct structname))); \
    __ptr->type = X##structname;                        \
    __ptr->line = -1; __ptr->column = -1;                   \
    __ptr;})

#define new_pos(structname, token) ({                       \
    struct structname *__ptr = ((struct structname *)calloc(1,sizeof(struct structname))); \
    __ptr->type = X##structname;                        \
    __ptr->line = token.line; __ptr->column = token.col;            \
    __ptr;})
ast
enum exec_types {
    Xbinode,
    ## exec type
};
struct exec {
    enum exec_types type;
    int line, column;
};
struct binode {
    struct exec;
    enum Btype {
        ## Binode types
    } op;
    struct exec *left, *right;
};
ast functions
static int __fput_loc(struct exec *loc, FILE *f)
{
    if (!loc)
        return 0;       // NOTEST
    if (loc->line >= 0) {
        fprintf(f, "%d:%d: ", loc->line, loc->column);
        return 1;
    }
    if (loc->type == Xbinode)
        return __fput_loc(cast(binode,loc)->left, f) ||
               __fput_loc(cast(binode,loc)->right, f);
    return 0;
}
static void fput_loc(struct exec *loc, FILE *f)
{
    if (!__fput_loc(loc, f))
        fprintf(f, "??:??: ");  // NOTEST
}

Each different type of exec node needs a number of functions defined, a bit like methods. We must be able to be able to free it, print it, analyse it and execute it. Once we have specific exec types we will need to parse them too. Let's take this a bit more slowly.

Freeing

The parser generator requires a free_foo function for each struct that stores attributes and they will often be execs and subtypes there-of. So we need free_exec which can handle all the subtypes, and we need free_binode.

ast functions
static void free_binode(struct binode *b)
{
    if (!b)
        return;
    free_exec(b->left);
    free_exec(b->right);
    free(b);
}
core functions
static void free_exec(struct exec *e)
{
    if (!e)
        return;
    switch(e->type) {
        ## free exec cases
    }
}
forward decls
static void free_exec(struct exec *e);
free exec cases
case Xbinode: free_binode(cast(binode, e)); break;

Printing

Printing an exec requires that we know the current indent level for printing line-oriented components. As will become clear later, we also want to know what sort of bracketing to use.

ast functions
static void do_indent(int i, char *str)
{
    while (i--)
        printf("    ");
    printf("%s", str);
}
core functions
static void print_binode(struct binode *b, int indent, int bracket)
{
    struct binode *b2;
    switch(b->op) {
    ## print binode cases
    }
}

static void print_exec(struct exec *e, int indent, int bracket)
{
    if (!e)
        return;     // NOTEST
    switch (e->type) {
    case Xbinode:
        print_binode(cast(binode, e), indent, bracket); break;
    ## print exec cases
    }
}
forward decls
static void print_exec(struct exec *e, int indent, int bracket);

Analysing

As discussed, analysis involves propagating type requirements around the program and looking for errors.

So propagate_types is passed an expected type (being a struct type pointer together with some val_rules flags) that the exec is expected to return, and returns the type that it does return, either of which can be NULL signifying "unknown". An ok flag is passed by reference. It is set to 0 when an error is found, and 2 when any change is made. If it remains unchanged at 1, then no more propagation is needed.

ast
enum val_rules {Rnolabel = 1<<0, Rboolok = 1<<1, Rnoconstant = 2<<1};
format cases
case 'r':
    if (rules & Rnolabel)
        fputs(" (labels not permitted)", stderr);
    break;
core functions
static struct type *propagate_types(struct exec *prog, struct parse_context *c, int *ok,
                                    struct type *type, int rules)
{
    struct type *t;

    if (!prog)
        return Tnone;

    switch (prog->type) {
    case Xbinode:
    {
        struct binode *b = cast(binode, prog);
        switch (b->op) {
        ## propagate binode cases
        }
        break;
    }
    ## propagate exec cases
    }
    return Tnone;
}

Interpreting

Interpreting an exec doesn't require anything but the exec. State is stored in variables and each variable will be directly linked from within the exec tree. The exception to this is the whole program which needs to look at command line arguments. The program will be interpreted separately.

Each exec can return a value, which may be Tnone but must be non-NULL; Some execs will return the location of a value, which can be updates. To support this, each exec case must store either a value in val or the pointer to a value in lval. If lval is set, but a simple value is required, inter_exec() will dereference lval to get the value.

core functions
struct lrval {
    struct value val, *lval;
};

static struct lrval _interp_exec(struct exec *e);

static struct value interp_exec(struct exec *e)
{
    struct lrval ret = _interp_exec(e);

    if (ret.lval)
        return dup_value(*ret.lval);
    else
        return ret.val;
}

static struct value *linterp_exec(struct exec *e)
{
    struct lrval ret = _interp_exec(e);

    return ret.lval;
}

static struct lrval _interp_exec(struct exec *e)
{
    struct lrval ret;
    struct value rv, *lrv = NULL;
    rv.type = Tnone;
    if (!e) {
        ret.lval = lrv;
        ret.val = rv;
        return ret;
    }

    switch(e->type) {
    case Xbinode:
    {
        struct binode *b = cast(binode, e);
        struct value left, right, *lleft;
        left.type = right.type = Tnone;
        switch (b->op) {
        ## interp binode cases
        }
        free_value(left); free_value(right);
        break;
    }
    ## interp exec cases
    }
    ret.lval = lrv;
    ret.val = rv;
    return ret;
}

Complex types

Now that we have the shape of the interpreter in place we can add some complex types and connected them in to the data structures and the different phases of parse, analyse, print, interpret.

Thus far we have arrays and structs.

Arrays

Arrays can be declared by giving a size and a type, as [size]type' sofreq:[26]numberdeclaresfreq` to be an array of 26 numbers. The size can be an arbitrary expression which is evaluated when the name comes into scope.

Arrays cannot be assigned. When pointers are introduced we will also introduce array slices which can refer to part or all of an array - the assignment syntax will create a slice. For now, an array can only ever be referenced by the name it is declared with. It is likely that a "copy" primitive will eventually be define which can be used to make a copy of an array with controllable depth.

type union fields
struct {
    int size;
    struct variable *vsize;
    struct type *member;
} array;
value union fields
struct {
    struct value *elmnts;
} array;
value functions
static struct value array_prepare(struct type *type)
{
    struct value ret;

    ret.type = type;
    ret.array.elmnts = NULL;
    return ret;
}

static struct value array_init(struct type *type)
{
    struct value ret;
    int i;

    ret.type = type;
    if (type->array.vsize) {
        mpz_t q;
        mpz_init(q);
        mpz_tdiv_q(q, mpq_numref(type->array.vsize->val.num),
                   mpq_denref(type->array.vsize->val.num));
        type->array.size = mpz_get_si(q);
        mpz_clear(q);
    }
    ret.array.elmnts = calloc(type->array.size,
                              sizeof(ret.array.elmnts[0]));
    for (i = 0; ret.array.elmnts && i < type->array.size; i++)
        ret.array.elmnts[i] = val_init(type->array.member);
    return ret;
}

static void array_free(struct value val)
{
    int i;

    if (val.array.elmnts)
        for (i = 0; i < val.type->array.size; i++)
            free_value(val.array.elmnts[i]);
    free(val.array.elmnts);
}

static int array_compat(struct type *require, struct type *have)
{
    if (have->compat != require->compat)
        return 0;
    /* Both are arrays, so we can look at details */
    if (!type_compat(require->array.member, have->array.member, 0))
        return 0;
    if (require->array.vsize == NULL && have->array.vsize == NULL)
        return require->array.size == have->array.size;

    return require->array.vsize == have->array.vsize;
}

static void array_print_type(struct type *type, FILE *f)
{
    fputs("[", f);
    if (type->array.vsize) {
        struct binding *b = type->array.vsize->name;
        fprintf(f, "%.*s]", b->name.len, b->name.txt);
    } else
        fprintf(f, "%d]", type->array.size);
    type_print(type->array.member, f);
}

static struct type array_prototype = {
    .prepare = array_prepare,
    .init = array_init,
    .print_type = array_print_type,
    .compat = array_compat,
    .free = array_free,
};
type grammar
| [ NUMBER ] Type ${
    $0 = calloc(1, sizeof(struct type));
    *($0) = array_prototype;
    $0->array.member = $<4;
    $0->array.vsize = NULL;
    {
    struct parse_context *c = config2context(config);
    char tail[3];
    mpq_t num;
    if (number_parse(num, tail, $2.txt) == 0)
        tok_err(c, "error: unrecognised number", &$2);
    else if (tail[0])
        tok_err(c, "error: unsupported number suffix", &$2);
    else {
        $0->array.size = mpz_get_ui(mpq_numref(num));
        if (mpz_cmp_ui(mpq_denref(num), 1) != 0) {
            tok_err(c, "error: array size must be an integer",
                    &$2);
        } else if (mpz_cmp_ui(mpq_numref(num), 1UL << 30) >= 0)
            tok_err(c, "error: array size is too large",
                    &$2);
        mpq_clear(num);
    }
    $0->next= c->anon_typelist;
    c->anon_typelist = $0;
    }
}$

| [ IDENTIFIER ] Type ${ {
    struct parse_context *c = config2context(config);
    struct variable *v = var_ref(c, $2.txt);

    if (!v)
        tok_err(config2context(config), "error: name undeclared", &$2);
    else if (!v->constant)
        tok_err(config2context(config), "error: array size must be a constant", &$2);

    $0 = calloc(1, sizeof(struct type));
    *($0) = array_prototype;
    $0->array.member = $<4;
    $0->array.size = 0;
    $0->array.vsize = v;
    $0->next= c->anon_typelist;
    c->anon_typelist = $0;
} }$
parse context
struct type *anon_typelist;
free context types
while (context.anon_typelist) {
    struct type *t = context.anon_typelist;

    context.anon_typelist = t->next;
    free(t);
}
Binode types
Index,
variable grammar
| Variable [ Expression ] ${ {
    struct binode *b = new(binode);
    b->op = Index;
    b->left = $<1;
    b->right = $<3;
    $0 = b;
} }$
print binode cases
case Index:
    print_exec(b->left, -1, 0);
    printf("[");
    print_exec(b->right, -1, 0);
    printf("]");
    break;
propagate binode cases
case Index:
    /* left must be an array, right must be a number,
     * result is the member type of the array
     */
    propagate_types(b->right, c, ok, Tnum, 0);
    t = propagate_types(b->left, c, ok, NULL, rules & Rnoconstant);
    if (!t || t->compat != array_compat) {
        type_err(c, "error: %1 cannot be indexed", prog, t, 0, NULL);
        *ok = 0;
        return NULL;
    } else {
        if (!type_compat(type, t->array.member, rules)) {
            type_err(c, "error: have %1 but need %2", prog,
                     t->array.member, rules, type);
            *ok = 0;
        }
        return t->array.member;
    }
    break;
interp binode cases
case Index: {
    mpz_t q;
    long i;

    lleft = linterp_exec(b->left);
    right = interp_exec(b->right);
    mpz_init(q);
    mpz_tdiv_q(q, mpq_numref(right.num), mpq_denref(right.num));
    i = mpz_get_si(q);
    mpz_clear(q);

    if (i >= 0 && i < lleft->type->array.size)
        lrv = &lleft->array.elmnts[i];
    else
        rv = val_init(lleft->type->array.member);
    break;
}

Structs

A struct is a data-type that contains one or more other data-types. It differs from an array in that each member can be of a different type, and they are accessed by name rather than by number. Thus you cannot choose an element by calculation, you need to know what you want up-front.

The language makes no promises about how a given structure will be stored in memory - it is free to rearrange fields to suit whatever criteria seems important.

Structs are declared separately from program code - they cannot be declared in-line in a variable declaration like arrays can. A struct is given a name and this name is used to identify the type - the name is not prefixed by the word struct as it would be in C.

Structs are only treated as the same if they have the same name. Simply having the same fields in the same order is not enough. This might change once we can create structure initializes from a list of values.

Each component datum is identified much like a variable is declared, with a name, one or two colons, and a type. The type cannot be omitted as there is no opportunity to deduce the type from usage. An initial value can be given following an equals sign, so

Example: a struct type
struct complex:
    x:number = 0
    y:number = 0

would declare a type called "complex" which has two number fields, each initialised to zero.

Struct will need to be declared separately from the code that uses them, so we will need to be able to print out the declaration of a struct when reprinting the whole program. So a print_type_decl type function will be needed.

type union fields
struct {
    int nfields;
    struct field {
        struct text name;
        struct type *type;
        struct value init;
    } *fields;
} structure;
value union fields
struct {
    struct value *fields;
} structure;
type functions
void (*print_type_decl)(struct type *type, FILE *f);
value functions
static struct value structure_prepare(struct type *type)
{
    struct value ret;

    ret.type = type;
    ret.structure.fields = NULL;
    return ret;
}

static struct value structure_init(struct type *type)
{
    struct value ret;
    int i;

    ret.type = type;
    ret.structure.fields = calloc(type->structure.nfields,
                                  sizeof(ret.structure.fields[0]));
    for (i = 0; ret.structure.fields && i < type->structure.nfields; i++)
        ret.structure.fields[i] = val_init(type->structure.fields[i].type);
    return ret;
}

static void structure_free(struct value val)
{
    int i;

    if (val.structure.fields)
        for (i = 0; i < val.type->structure.nfields; i++)
            free_value(val.structure.fields[i]);
    free(val.structure.fields);
}

static void structure_free_type(struct type *t)
{
    int i;
    for (i = 0; i < t->structure.nfields; i++)
        free_value(t->structure.fields[i].init);
    free(t->structure.fields);
}

static struct type structure_prototype = {
    .prepare = structure_prepare,
    .init = structure_init,
    .free = structure_free,
    .free_type = structure_free_type,
    .print_type_decl = structure_print_type,
};
exec type
Xfieldref,
ast
struct fieldref {
    struct exec;
    struct exec *left;
    int index;
    struct text name;
};
free exec cases
case Xfieldref:
    free_exec(cast(fieldref, e)->left);
    free(e);
    break;
variable grammar
| Variable . IDENTIFIER ${ {
    struct fieldref *fr = new_pos(fieldref, $2);
    fr->left = $<1;
    fr->name = $3.txt;
    fr->index = -2;
    $0 = fr;
} }$
print exec cases
case Xfieldref:
{
    struct fieldref *f = cast(fieldref, e);
    print_exec(f->left, -1, 0);
    printf(".%.*s", f->name.len, f->name.txt);
    break;
}
ast functions
static int find_struct_index(struct type *type, struct text field)
{
    int i;
    for (i = 0; i < type->structure.nfields; i++)
        if (text_cmp(type->structure.fields[i].name, field) == 0)
            return i;
    return -1;
}
propagate exec cases
case Xfieldref:
{
    struct fieldref *f = cast(fieldref, prog);
    struct type *st = propagate_types(f->left, c, ok, NULL, 0);

    if (!st)
        type_err(c, "error: unknown type for field access", f->left,
                 NULL, 0, NULL);
    else if (st->prepare != structure_prepare)
        type_err(c, "error: field reference attempted on %1, not a struct",
                 f->left, st, 0, NULL);
    else if (f->index == -2) {
        f->index = find_struct_index(st, f->name);
        if (f->index < 0) {
            type_err(c, "error: cannot find requested field in %1",
                     f->left, st, 0, NULL);
            *ok = 0;
        }
    }
    if (f->index >= 0) {
        struct type *ft = st->structure.fields[f->index].type;
        if (!type_compat(type, ft, rules)) {
            type_err(c, "error: have %1 but need %2", prog,
                     ft, rules, type);
            *ok = 0;
        }
        return ft;
    }
    break;
}
interp exec cases
case Xfieldref:
{
    struct fieldref *f = cast(fieldref, e);
    struct value *lleft = linterp_exec(f->left);
    lrv = &lleft->structure.fields[f->index];
    break;
}
ast
struct fieldlist {
    struct fieldlist *prev;
    struct field f;
};
ast functions
static void free_fieldlist(struct fieldlist *f)
{
    if (!f)
        return;
    free_fieldlist(f->prev);
    free_value(f->f.init);
    free(f);
}
top level grammar
DeclareStruct -> struct IDENTIFIER FieldBlock ${ {
    struct type *t =
        add_type(config2context(config), $2.txt, &structure_prototype);
    int cnt = 0;
    struct fieldlist *f;

    for (f = $3; f; f=f->prev)
        cnt += 1;

    t->structure.nfields = cnt;
    t->structure.fields = calloc(cnt, sizeof(struct field));
    f = $3;
    while (cnt > 0) {
        cnt -= 1;
        t->structure.fields[cnt] = f->f;
        f->f.init = val_prepare(Tnone);
        f = f->prev;
    }
} }$

$*fieldlist
FieldBlock -> Open SimpleFieldList Close ${ $0 = $<2; }$
    | Open Newlines SimpleFieldList Close ${ $0 = $<3; }$
    | : FieldList  ${ $0 = $<2; }$

FieldList -> Field NEWLINE ${ $0 = $<1; }$
    | FieldList NEWLINE ${ $0 = $<1; }$
    | FieldList Field NEWLINE ${
        $2->prev = $<1;
        $0 = $<2;
    }$

SimpleFieldList -> Field ; ${ $0 = $<1; }$
    | SimpleFieldList Field ; ${
        $2->prev = $<1;
        $0 = $<2;
    }$

Field -> IDENTIFIER : Type = Expression ${ {
        int ok;

        $0 = calloc(1, sizeof(struct fieldlist));
        $0->f.name = $1.txt;
        $0->f.type = $<3;
        $0->f.init = val_prepare($0->f.type);
        do {
            ok = 1;
            propagate_types($<5, config2context(config), &ok, $3, 0);
        } while (ok == 2);
        if (!ok)
            config2context(config)->parse_error = 1;
        else
            $0->f.init = interp_exec($5);
    } }$
    | IDENTIFIER : Type ${
        $0 = calloc(1, sizeof(struct fieldlist));
        $0->f.name = $1.txt;
        $0->f.type = $<3;
        $0->f.init = val_init($3);
    }$
forward decls
static void structure_print_type(struct type *t, FILE *f);
value functions
static void structure_print_type(struct type *t, FILE *f)
{
    int i;

    fprintf(f, "struct %.*s:\n", t->name.len, t->name.txt);

    for (i = 0; i < t->structure.nfields; i++) {
        struct field *fl = t->structure.fields + i;
        fprintf(f, "    %.*s : ", fl->name.len, fl->name.txt);
        type_print(fl->type, f);
        if (fl->init.type->print) {
            fprintf(f, " = ");
            if (fl->init.type == Tstr)
                fprintf(f, "\"");
            print_value(fl->init);
            if (fl->init.type == Tstr)
                fprintf(f, "\"");
        }
        printf("\n");
    }
}
print type decls
{
    struct type *t;
    int target = -1;

    while (target != 0) {
        int i = 0;
        for (t = context.typelist; t ; t=t->next)
            if (t->print_type_decl) {
                i += 1;
                if (i == target)
                    break;
            }

        if (target == -1) {
            target = i;
        } else {
            t->print_type_decl(t, stdout);
            target -= 1;
        }
    }
}

Executables: the elements of code

Each code element needs to be parsed, printed, analysed, interpreted, and freed. There are several, so let's just start with the easy ones and work our way up.

Values

We have already met values as separate objects. When manifest constants appear in the program text, that must result in an executable which has a constant value. So the val structure embeds a value in an executable.

exec type
Xval,
ast
struct val {
    struct exec;
    struct value val;
};
Grammar
$*val
Value ->  True ${
        $0 = new_pos(val, $1);
        $0->val.type = Tbool;
        $0->val.bool = 1;
        }$
    | False ${
        $0 = new_pos(val, $1);
        $0->val.type = Tbool;
        $0->val.bool = 0;
        }$
    | NUMBER ${
        $0 = new_pos(val, $1);
        $0->val.type = Tnum;
        {
        char tail[3];
        if (number_parse($0->val.num, tail, $1.txt) == 0)
            mpq_init($0->val.num);
            if (tail[0])
                tok_err(config2context(config), "error: unsupported number suffix",
                        &$1);
        }
        }$
    | STRING ${
        $0 = new_pos(val, $1);
        $0->val.type = Tstr;
        {
        char tail[3];
        string_parse(&$1, '\\', &$0->val.str, tail);
        if (tail[0])
            tok_err(config2context(config), "error: unsupported string suffix",
                    &$1);
        }
        }$
    | MULTI_STRING ${
        $0 = new_pos(val, $1);
        $0->val.type = Tstr;
        {
        char tail[3];
        string_parse(&$1, '\\', &$0->val.str, tail);
        if (tail[0])
            tok_err(config2context(config), "error: unsupported string suffix",
                    &$1);
        }
        }$
print exec cases
case Xval:
{
    struct val *v = cast(val, e);
    if (v->val.type == Tstr)
        printf("\"");
    print_value(v->val);
    if (v->val.type == Tstr)
        printf("\"");
    break;
}
propagate exec cases
case Xval:
{
    struct val *val = cast(val, prog);
    if (!type_compat(type, val->val.type, rules)) {
        type_err(c, "error: expected %1%r found %2",
                   prog, type, rules, val->val.type);
        *ok = 0;
    }
    return val->val.type;
}
interp exec cases
case Xval:
    rv = dup_value(cast(val, e)->val);
    break;
ast functions
static void free_val(struct val *v)
{
    if (!v)
        return;
    free_value(v->val);
    free(v);
}
free exec cases
case Xval: free_val(cast(val, e)); break;
ast functions
// Move all nodes from 'b' to 'rv', reversing the order.
// In 'b' 'left' is a list, and 'right' is the last node.
// In 'rv', left' is the first node and 'right' is a list.
static struct binode *reorder_bilist(struct binode *b)
{
    struct binode *rv = NULL;

    while (b) {
        struct exec *t = b->right;
        b->right = rv;
        rv = b;
        if (b->left)
            b = cast(binode, b->left);
        else
            b = NULL;
        rv->left = t;
    }
    return rv;
}

Variables

Just as we used a val to wrap a value into an exec, we similarly need a var to wrap a variable into an exec. While each val contained a copy of the value, each var hold a link to the variable because it really is the same variable no matter where it appears. When a variable is used, we need to remember to follow the ->merged link to find the primary instance.

exec type
Xvar,
ast
struct var {
    struct exec;
    struct variable *var;
};
Grammar
$*var
VariableDecl -> IDENTIFIER : ${ {
    struct variable *v = var_decl(config2context(config), $1.txt);
    $0 = new_pos(var, $1);
    $0->var = v;
    if (v)
        v->where_decl = $0;
    else {
        v = var_ref(config2context(config), $1.txt);
        $0->var = v;
        type_err(config2context(config), "error: variable '%v' redeclared",
                 $0, NULL, 0, NULL);
        type_err(config2context(config), "info: this is where '%v' was first declared",
                 v->where_decl, NULL, 0, NULL);
    }
} }$
    | IDENTIFIER :: ${ {
    struct variable *v = var_decl(config2context(config), $1.txt);
    $0 = new_pos(var, $1);
    $0->var = v;
    if (v) {
        v->where_decl = $0;
        v->constant = 1;
    } else {
        v = var_ref(config2context(config), $1.txt);
        $0->var = v;
        type_err(config2context(config), "error: variable '%v' redeclared",
                 $0, NULL, 0, NULL);
        type_err(config2context(config), "info: this is where '%v' was first declared",
                 v->where_decl, NULL, 0, NULL);
    }
} }$
    | IDENTIFIER : Type ${ {
    struct variable *v = var_decl(config2context(config), $1.txt);
    $0 = new_pos(var, $1);
    $0->var = v;
    if (v) {
        v->where_decl = $0;
        v->where_set = $0;
        v->val = val_prepare($<3);
    } else {
        v = var_ref(config2context(config), $1.txt);
        $0->var = v;
        type_err(config2context(config), "error: variable '%v' redeclared",
                 $0, NULL, 0, NULL);
        type_err(config2context(config), "info: this is where '%v' was first declared",
                 v->where_decl, NULL, 0, NULL);
    }
} }$
    | IDENTIFIER :: Type ${ {
    struct variable *v = var_decl(config2context(config), $1.txt);
    $0 = new_pos(var, $1);
    $0->var = v;
    if (v) {
        v->where_decl = $0;
        v->where_set = $0;
        v->val = val_prepare($<3);
        v->constant = 1;
    } else {
        v = var_ref(config2context(config), $1.txt);
        $0->var = v;
        type_err(config2context(config), "error: variable '%v' redeclared",
                 $0, NULL, 0, NULL);
        type_err(config2context(config), "info: this is where '%v' was first declared",
                 v->where_decl, NULL, 0, NULL);
    }
} }$

$*exec
Variable -> IDENTIFIER ${ {
    struct variable *v = var_ref(config2context(config), $1.txt);
    $0 = new_pos(var, $1);
    if (v == NULL) {
        /* This might be a label - allocate a var just in case */
        v = var_decl(config2context(config), $1.txt);
        if (v) {
            v->val = val_prepare(Tlabel);
            v->val.label = &v->val;
            v->where_set = $0;
        }
    }
    cast(var, $0)->var = v;
} }$
## variable grammar

$*type
Type -> IDENTIFIER ${
    $0 = find_type(config2context(config), $1.txt);
    if (!$0) {
        tok_err(config2context(config),
            "error: undefined type", &$1);

        $0 = Tnone;
    }
}$
## type grammar
print exec cases
case Xvar:
{
    struct var *v = cast(var, e);
    if (v->var) {
        struct binding *b = v->var->name;
        printf("%.*s", b->name.len, b->name.txt);
    }
    break;
}
format cases
case 'v':
    if (loc->type == Xvar) {
        struct var *v = cast(var, loc);
        if (v->var) {
            struct binding *b = v->var->name;
            fprintf(stderr, "%.*s", b->name.len, b->name.txt);
        } else
            fputs("???", stderr);   // NOTEST
    } else
        fputs("NOTVAR", stderr);    // NOTEST
    break;
propagate exec cases
case Xvar:
{
    struct var *var = cast(var, prog);
    struct variable *v = var->var;
    if (!v) {
        type_err(c, "%d:BUG: no variable!!", prog, NULL, 0, NULL); // NOTEST
        *ok = 0;                    // NOTEST
        return Tnone;                   // NOTEST
    }
    if (v->merged)
        v = v->merged;
    if (v->constant && (rules & Rnoconstant)) {
        type_err(c, "error: Cannot assign to a constant: %v",
                 prog, NULL, 0, NULL);
        type_err(c, "info: name was defined as a constant here",
                 v->where_decl, NULL, 0, NULL);
        *ok = 0;
        return v->val.type;
    }
    if (v->val.type == NULL) {
        if (type && *ok != 0) {
            v->val = val_prepare(type);
            v->where_set = prog;
            *ok = 2;
        }
        return type;
    }
    if (!type_compat(type, v->val.type, rules)) {
        type_err(c, "error: expected %1%r but variable '%v' is %2", prog,
                 type, rules, v->val.type);
        type_err(c, "info: this is where '%v' was set to %1", v->where_set,
                 v->val.type, rules, NULL);
        *ok = 0;
    }
    if (!type)
        return v->val.type;
    return type;
}
interp exec cases
case Xvar:
{
    struct var *var = cast(var, e);
    struct variable *v = var->var;

    if (v->merged)
        v = v->merged;
    lrv = &v->val;
    break;
}
ast functions
static void free_var(struct var *v)
{
    free(v);
}
free exec cases
case Xvar: free_var(cast(var, e)); break;

Expressions: Conditional

Our first user of the binode will be conditional expressions, which is a bit odd as they actually have three components. That will be handled by having 2 binodes for each expression. The conditional expression is the lowest precedence operatior, so it gets to define what an "Expression" is. The next level up is "BoolExpr", which comes next.

Conditional expressions are of the form "value if condition else other_value". They associate to the right, so everything to the right of else is part of an else value, while only the BoolExpr to the left of if is the if values. Between if and else there is no room for ambiguity, so a full conditional expression is allowed in there.

Binode types
CondExpr,
Grammar
$*exec
Expression -> BoolExpr if Expression else Expression ${ {
        struct binode *b1 = new(binode);
        struct binode *b2 = new(binode);
        b1->op = CondExpr;
        b1->left = $<3;
        b1->right = b2;
        b2->op = CondExpr;
        b2->left = $<1;
        b2->right = $<5;
        $0 = b1;
    } }$
    | BoolExpr ${ $0 = $<1; }$
print binode cases
case CondExpr:
    b2 = cast(binode, b->right);
    print_exec(b2->left, -1, 0);
    printf(" if ");
    print_exec(b->left, -1, 0);
    printf(" else ");
    print_exec(b2->right, -1, 0);
    break;
propagate binode cases
case CondExpr: {
    /* cond must be Tbool, others must match */
    struct binode *b2 = cast(binode, b->right);
    struct type *t2;

    propagate_types(b->left, c, ok, Tbool, 0);
    t = propagate_types(b2->left, c, ok, type, Rnolabel);
    t2 = propagate_types(b2->right, c, ok, type ?: t, Rnolabel);
    return t ?: t2;
}
interp binode cases
case CondExpr: {
    struct binode *b2 = cast(binode, b->right);
    left = interp_exec(b->left);
    if (left.bool)
        rv = interp_exec(b2->left);
    else
        rv = interp_exec(b2->right);
    }
    break;

Expressions: Boolean

The next class of expressions to use the binode will be Boolean expressions. As I haven't implemented precedence in the parser generator yet, we need different names for each precedence level used by expressions. The outer most or lowest level precedence after conditional expressions are Boolean operators which form an BoolExpr out of BTerms and BFacts. As well as or and, and not we have and then and or else which only evaluate the second operand if the result would make a difference.

Binode types
And,
AndThen,
Or,
OrElse,
Not,
Grammar
$*exec
BoolExpr -> BoolExpr or BTerm ${ {
        struct binode *b = new(binode);
        b->op = Or;
        b->left = $<1;
        b->right = $<3;
        $0 = b;
    } }$
    | BoolExpr or else BTerm ${ {
        struct binode *b = new(binode);
        b->op = OrElse;
        b->left = $<1;
        b->right = $<4;
        $0 = b;
    } }$
    | BTerm ${ $0 = $<1; }$

BTerm -> BTerm and BFact ${ {
        struct binode *b = new(binode);
        b->op = And;
        b->left = $<1;
        b->right = $<3;
        $0 = b;
    } }$
    | BTerm and then BFact ${ {
        struct binode *b = new(binode);
        b->op = AndThen;
        b->left = $<1;
        b->right = $<4;
        $0 = b;
    } }$
    | BFact ${ $0 = $<1; }$

BFact -> not BFact ${ {
        struct binode *b = new(binode);
        b->op = Not;
        b->right = $<2;
        $0 = b;
    } }$
    ## other BFact
print binode cases
case And:
    print_exec(b->left, -1, 0);
    printf(" and ");
    print_exec(b->right, -1, 0);
    break;
case AndThen:
    print_exec(b->left, -1, 0);
    printf(" and then ");
    print_exec(b->right, -1, 0);
    break;
case Or:
    print_exec(b->left, -1, 0);
    printf(" or ");
    print_exec(b->right, -1, 0);
    break;
case OrElse:
    print_exec(b->left, -1, 0);
    printf(" or else ");
    print_exec(b->right, -1, 0);
    break;
case Not:
    printf("not ");
    print_exec(b->right, -1, 0);
    break;
propagate binode cases
case And:
case AndThen:
case Or:
case OrElse:
case Not:
    /* both must be Tbool, result is Tbool */
    propagate_types(b->left, c, ok, Tbool, 0);
    propagate_types(b->right, c, ok, Tbool, 0);
    if (type && type != Tbool) {
        type_err(c, "error: %1 operation found where %2 expected", prog,
               Tbool, 0, type);
        *ok = 0;
    }
    return Tbool;
interp binode cases
case And:
    rv = interp_exec(b->left);
    right = interp_exec(b->right);
    rv.bool = rv.bool && right.bool;
    break;
case AndThen:
    rv = interp_exec(b->left);
    if (rv.bool)
        rv = interp_exec(b->right);
    break;
case Or:
    rv = interp_exec(b->left);
    right = interp_exec(b->right);
    rv.bool = rv.bool || right.bool;
    break;
case OrElse:
    rv = interp_exec(b->left);
    if (!rv.bool)
        rv = interp_exec(b->right);
    break;
case Not:
    rv = interp_exec(b->right);
    rv.bool = !rv.bool;
    break;

Expressions: Comparison

Of slightly higher precedence that Boolean expressions are Comparisons. A comparison takes arguments of any comparable type, but the two types must be the same.

To simplify the parsing we introduce an eop which can record an expression operator.

ast
struct eop {
    enum Btype op;
};
ast functions
static void free_eop(struct eop *e)
{
    if (e)
        free(e);
}
Binode types
Less,
Gtr,
LessEq,
GtrEq,
Eql,
NEql,
other BFact
| Expr CMPop Expr ${ {
    struct binode *b = new(binode);
    b->op = $2.op;
    b->left = $<1;
    b->right = $<3;
    $0 = b;
} }$
| Expr ${ $0 = $<1; }$
Grammar
$eop
CMPop ->   < ${ $0.op = Less; }$
    |  > ${ $0.op = Gtr; }$
    |  <= ${ $0.op = LessEq; }$
    |  >= ${ $0.op = GtrEq; }$
    |  == ${ $0.op = Eql; }$
    |  != ${ $0.op = NEql; }$
print binode cases
case Less:
case LessEq:
case Gtr:
case GtrEq:
case Eql:
case NEql:
    print_exec(b->left, -1, 0);
    switch(b->op) {
    case Less:   printf(" < "); break;
    case LessEq: printf(" <= "); break;
    case Gtr:    printf(" > "); break;
    case GtrEq:  printf(" >= "); break;
    case Eql:    printf(" == "); break;
    case NEql:   printf(" != "); break;
    default: abort();       // NOTEST
    }
    print_exec(b->right, -1, 0);
    break;
propagate binode cases
case Less:
case LessEq:
case Gtr:
case GtrEq:
case Eql:
case NEql:
    /* Both must match but not be labels, result is Tbool */
    t = propagate_types(b->left, c, ok, NULL, Rnolabel);
    if (t)
        propagate_types(b->right, c, ok, t, 0);
    else {
        t = propagate_types(b->right, c, ok, NULL, Rnolabel);
        if (t)
            t = propagate_types(b->left, c, ok, t, 0);
    }
    if (!type_compat(type, Tbool, 0)) {
        type_err(c, "error: Comparison returns %1 but %2 expected", prog,
                    Tbool, rules, type);
        *ok = 0;
    }
    return Tbool;
interp binode cases
case Less:
case LessEq:
case Gtr:
case GtrEq:
case Eql:
case NEql:
{
    int cmp;
    left = interp_exec(b->left);
    right = interp_exec(b->right);
    cmp = value_cmp(left, right);
    rv.type = Tbool;
    switch (b->op) {
    case Less:  rv.bool = cmp <  0; break;
    case LessEq:    rv.bool = cmp <= 0; break;
    case Gtr:   rv.bool = cmp >  0; break;
    case GtrEq: rv.bool = cmp >= 0; break;
    case Eql:   rv.bool = cmp == 0; break;
    case NEql:  rv.bool = cmp != 0; break;
    default: rv.bool = 0; break;    // NOTEST
    }
    break;
}

Expressions: The rest

The remaining expressions with the highest precedence are arithmetic and string concatenation. They are Expr, Term, and Factor. The Factor is where the Value and Variable that we already have are included.

+ and - are both infix and prefix operations (where they are absolute value and negation). These have different operator names.

We also have a 'Bracket' operator which records where parentheses were found. This makes it easy to reproduce these when printing. Once precedence is handled better I might be able to discard this.

Binode types
Plus, Minus,
Times, Divide, Rem,
Concat,
Absolute, Negate,
Bracket,
Grammar
$*exec
Expr -> Expr Eop Term ${ {
        struct binode *b = new(binode);
        b->op = $2.op;
        b->left = $<1;
        b->right = $<3;
        $0 = b;
    } }$
    | Term ${ $0 = $<1; }$

Term -> Term Top Factor ${ {
        struct binode *b = new(binode);
        b->op = $2.op;
        b->left = $<1;
        b->right = $<3;
        $0 = b;
    } }$
    | Factor ${ $0 = $<1; }$

Factor -> ( Expression ) ${ {
        struct binode *b = new_pos(binode, $1);
        b->op = Bracket;
        b->right = $<2;
        $0 = b;
    } }$
    | Uop Factor ${ {
        struct binode *b = new(binode);
        b->op = $1.op;
        b->right = $<2;
        $0 = b;
    } }$
    | Value ${ $0 = $<1; }$
    | Variable ${ $0 = $<1; }$

$eop
Eop ->    + ${ $0.op = Plus; }$
    | - ${ $0.op = Minus; }$

Uop ->    + ${ $0.op = Absolute; }$
    | - ${ $0.op = Negate; }$

Top ->    * ${ $0.op = Times; }$
    | / ${ $0.op = Divide; }$
    | % ${ $0.op = Rem; }$
    | ++ ${ $0.op = Concat; }$
print binode cases
case Plus:
case Minus:
case Times:
case Divide:
case Concat:
case Rem:
    print_exec(b->left, indent, 0);
    switch(b->op) {
    case Plus:   fputs(" + ", stdout); break;
    case Minus:  fputs(" - ", stdout); break;
    case Times:  fputs(" * ", stdout); break;
    case Divide: fputs(" / ", stdout); break;
    case Rem:    fputs(" % ", stdout); break;
    case Concat: fputs(" ++ ", stdout); break;
    default: abort();   // NOTEST
    }           // NOTEST
    print_exec(b->right, indent, 0);
    break;
case Absolute:
    printf("+");
    print_exec(b->right, indent, 0);
    break;
case Negate:
    printf("-");
    print_exec(b->right, indent, 0);
    break;
case Bracket:
    printf("(");
    print_exec(b->right, indent, 0);
    printf(")");
    break;
propagate binode cases
case Plus:
case Minus:
case Times:
case Rem:
case Divide:
    /* both must be numbers, result is Tnum */
case Absolute:
case Negate:
    /* as propagate_types ignores a NULL,
     * unary ops fit here too */
    propagate_types(b->left, c, ok, Tnum, 0);
    propagate_types(b->right, c, ok, Tnum, 0);
    if (!type_compat(type, Tnum, 0)) {
        type_err(c, "error: Arithmetic returns %1 but %2 expected", prog,
                   Tnum, rules, type);
        *ok = 0;
    }
    return Tnum;

case Concat:
    /* both must be Tstr, result is Tstr */
    propagate_types(b->left, c, ok, Tstr, 0);
    propagate_types(b->right, c, ok, Tstr, 0);
    if (!type_compat(type, Tstr, 0)) {
        type_err(c, "error: Concat returns %1 but %2 expected", prog,
                   Tstr, rules, type);
        *ok = 0;
    }
    return Tstr;

case Bracket:
    return propagate_types(b->right, c, ok, type, 0);
interp binode cases
case Plus:
    rv = interp_exec(b->left);
    right = interp_exec(b->right);
    mpq_add(rv.num, rv.num, right.num);
    break;
case Minus:
    rv = interp_exec(b->left);
    right = interp_exec(b->right);
    mpq_sub(rv.num, rv.num, right.num);
    break;
case Times:
    rv = interp_exec(b->left);
    right = interp_exec(b->right);
    mpq_mul(rv.num, rv.num, right.num);
    break;
case Divide:
    rv = interp_exec(b->left);
    right = interp_exec(b->right);
    mpq_div(rv.num, rv.num, right.num);
    break;
case Rem: {
    mpz_t l, r, rem;

    left = interp_exec(b->left);
    right = interp_exec(b->right);
    mpz_init(l); mpz_init(r); mpz_init(rem);
    mpz_tdiv_q(l, mpq_numref(left.num), mpq_denref(left.num));
    mpz_tdiv_q(r, mpq_numref(right.num), mpq_denref(right.num));
    mpz_tdiv_r(rem, l, r);
    rv = val_init(Tnum);
    mpq_set_z(rv.num, rem);
    mpz_clear(r); mpz_clear(l); mpz_clear(rem);
    break;
}
case Negate:
    rv = interp_exec(b->right);
    mpq_neg(rv.num, rv.num);
    break;
case Absolute:
    rv = interp_exec(b->right);
    mpq_abs(rv.num, rv.num);
    break;
case Bracket:
    rv = interp_exec(b->right);
    break;
case Concat:
    left = interp_exec(b->left);
    right = interp_exec(b->right);
    rv.type = Tstr;
    rv.str = text_join(left.str, right.str);
    break;
value functions
static struct text text_join(struct text a, struct text b)
{
    struct text rv;
    rv.len = a.len + b.len;
    rv.txt = malloc(rv.len);
    memcpy(rv.txt, a.txt, a.len);
    memcpy(rv.txt+a.len, b.txt, b.len);
    return rv;
}

Blocks, Statements, and Statement lists.

Now that we have expressions out of the way we need to turn to statements. There are simple statements and more complex statements. Simple statements do not contain (syntactic) newlines, complex statements do.

Statements often come in sequences and we have corresponding simple statement lists and complex statement lists. The former comprise only simple statements separated by semicolons. The later comprise complex statements and simple statement lists. They are separated by newlines. Thus the semicolon is only used to separate simple statements on the one line. This may be overly restrictive, but I'm not sure I ever want a complex statement to share a line with anything else.

Note that a simple statement list can still use multiple lines if subsequent lines are indented, so

Example: wrapped simple statement list
a = b; c = d;
   e = f; print g

is a single simple statement list. This might allow room for confusion, so I'm not set on it yet.

A simple statement list needs no extra syntax. A complex statement list has two syntactic forms. It can be enclosed in braces (much like C blocks), or it can be introduced by a colon and continue until an unindented newline (much like Python blocks). With this extra syntax it is referred to as a block.

Note that a block does not have to include any newlines if it only contains simple statements. So both of:

if condition: a=b; d=f

if condition { a=b; print f }

are valid.

In either case the list is constructed from a binode list with Block as the operator. When parsing the list it is most convenient to append to the end, so a list is a list and a statement. When using the list it is more convenient to consider a list to be a statement and a list. So we need a function to re-order a list. reorder_bilist serves this purpose.

The only stand-alone statement we introduce at this stage is pass which does nothing and is represented as a NULL pointer in a Block list. Other stand-alone statements will follow once the infrastructure is in-place.

Binode types
Block,
Grammar
$void
OptNL -> Newlines
    |

Newlines -> NEWLINE
    | Newlines NEWLINE

$*binode
Open -> {
    | NEWLINE {
Close -> }
    | NEWLINE }
Block -> Open Statementlist Close ${ $0 = $<2; }$
    | Open Newlines Statementlist Close ${ $0 = $<3; }$
    | Open SimpleStatements } ${ $0 = reorder_bilist($<2); }$
    | Open Newlines SimpleStatements } ${ $0 = reorder_bilist($<3); }$
    | : Statementlist ${ $0 = $<2; }$
    | : SimpleStatements ${ $0 = reorder_bilist($<2); }$

Statementlist -> ComplexStatements ${ $0 = reorder_bilist($<1); }$

ComplexStatements -> ComplexStatements ComplexStatement ${
    $0 = new(binode);
    $0->op = Block;
    $0->left = $<1;
    $0->right = $<2;
    }$
    | ComplexStatements NEWLINE ${ $0 = $<1; }$
    | ComplexStatement ${
    $0 = new(binode);
    $0->op = Block;
    $0->left = NULL;
    $0->right = $<1;
    }$

$*exec
ComplexStatement -> SimpleStatements NEWLINE ${
        $0 = reorder_bilist($<1);
        }$
    ## ComplexStatement Grammar

$*binode
SimpleStatements -> SimpleStatements ; SimpleStatement ${
        $0 = new(binode);
        $0->op = Block;
        $0->left = $<1;
        $0->right = $<3;
        }$
    | SimpleStatement ${
        $0 = new(binode);
        $0->op = Block;
        $0->left = NULL;
        $0->right = $<1;
        }$
    | SimpleStatements ; ${ $0 = $<1; }$

SimpleStatement -> pass ${ $0 = NULL; }$
    ## SimpleStatement Grammar
print binode cases
case Block:
    if (indent < 0) {
        // simple statement
        if (b->left == NULL)
            printf("pass");
        else
            print_exec(b->left, indent, 0);
        if (b->right) {
            printf("; ");
            print_exec(b->right, indent, 0);
        }
    } else {
        // block, one per line
        if (b->left == NULL)
            do_indent(indent, "pass\n");
        else
            print_exec(b->left, indent, bracket);
        if (b->right)
            print_exec(b->right, indent, bracket);
    }
    break;
propagate binode cases
case Block:
{
    /* If any statement returns something other than Tnone
     * or Tbool then all such must return same type.
     * As each statement may be Tnone or something else,
     * we must always pass NULL (unknown) down, otherwise an incorrect
     * error might occur.  We never return Tnone unless it is
     * passed in.
     */
    struct binode *e;

    for (e = b; e; e = cast(binode, e->right)) {
        t = propagate_types(e->left, c, ok, NULL, rules);
        if ((rules & Rboolok) && t == Tbool)
            t = NULL;
        if (t && t != Tnone && t != Tbool) {
            if (!type)
                type = t;
            else if (t != type) {
                type_err(c, "error: expected %1%r, found %2",
                         e->left, type, rules, t);
                *ok = 0;
            }
        }
    }
    return type;
}
interp binode cases
case Block:
    while (rv.type == Tnone &&
           b) {
        if (b->left)
            rv = interp_exec(b->left);
        b = cast(binode, b->right);
    }
    break;

The Print statement

print is a simple statement that takes a comma-separated list of expressions and prints the values separated by spaces and terminated by a newline. No control of formatting is possible.

print faces the same list-ordering issue as blocks, and uses the same solution.

Binode types
Print,
SimpleStatement Grammar
| print ExpressionList ${
    $0 = reorder_bilist($<2);
}$
| print ExpressionList , ${
    $0 = new(binode);
    $0->op = Print;
    $0->right = NULL;
    $0->left = $<2;
    $0 = reorder_bilist($0);
}$
| print ${
    $0 = new(binode);
    $0->op = Print;
    $0->right = NULL;
}$
Grammar
$*binode
ExpressionList -> ExpressionList , Expression ${
    $0 = new(binode);
    $0->op = Print;
    $0->left = $<1;
    $0->right = $<3;
    }$
    | Expression ${
        $0 = new(binode);
        $0->op = Print;
        $0->left = NULL;
        $0->right = $<1;
    }$
print binode cases
case Print:
    do_indent(indent, "print");
    while (b) {
        if (b->left) {
            printf(" ");
            print_exec(b->left, -1, 0);
            if (b->right)
                printf(",");
        }
        b = cast(binode, b->right);
    }
    if (indent >= 0)
        printf("\n");
    break;
propagate binode cases
case Print:
    /* don't care but all must be consistent */
    propagate_types(b->left, c, ok, NULL, Rnolabel);
    propagate_types(b->right, c, ok, NULL, Rnolabel);
    break;
interp binode cases
case Print:
{
    char sep = 0;
    int eol = 1;
    for ( ; b; b = cast(binode, b->right))
        if (b->left) {
            if (sep)
                putchar(sep);
            left = interp_exec(b->left);
            print_value(left);
            free_value(left);
            if (b->right)
                sep = ' ';
        } else if (sep)
            eol = 0;
    left.type = Tnone;
    if (eol)
        printf("\n");
    break;
}
Assignment statement

An assignment will assign a value to a variable, providing it hasn't be declared as a constant. The analysis phase ensures that the type will be correct so the interpreter just needs to perform the calculation. There is a form of assignment which declares a new variable as well as assigning a value. If a name is assigned before it is declared, and error will be raised as the name is created as Tlabel and it is illegal to assign to such names.

Binode types
Assign,
Declare,
SimpleStatement Grammar
| Variable = Expression ${
        $0 = new(binode);
        $0->op = Assign;
        $0->left = $<1;
        $0->right = $<3;
    }$
| VariableDecl = Expression ${
        $0 = new(binode);
        $0->op = Declare;
        $0->left = $<1;
        $0->right =$<3;
    }$

| VariableDecl ${
        if ($1->var->where_set == NULL) {
            type_err(config2context(config),
                     "Variable declared with no type or value: %v",
                     $1, NULL, 0, NULL);
        } else {
            $0 = new(binode);
            $0->op = Declare;
            $0->left = $<1;
            $0->right = NULL;
        }
    }$
print binode cases
case Assign:
    do_indent(indent, "");
    print_exec(b->left, indent, 0);
    printf(" = ");
    print_exec(b->right, indent, 0);
    if (indent >= 0)
        printf("\n");
    break;

case Declare:
    {
    struct variable *v = cast(var, b->left)->var;
    do_indent(indent, "");
    print_exec(b->left, indent, 0);
    if (cast(var, b->left)->var->constant) {
        if (v->where_decl == v->where_set) {
            printf("::");
            type_print(v->val.type, stdout);
            printf(" ");
        } else
            printf(" ::");
    } else {
        if (v->where_decl == v->where_set) {
            printf(":");
            type_print(v->val.type, stdout);
            printf(" ");
        } else
            printf(" :");
    }
    if (b->right) {
        printf("= ");
        print_exec(b->right, indent, 0);
    }
    if (indent >= 0)
        printf("\n");
    }
    break;
propagate binode cases
case Assign:
case Declare:
    /* Both must match and not be labels,
     * Type must support 'dup',
     * For Assign, left must not be constant.
     * result is Tnone
     */
    t = propagate_types(b->left, c, ok, NULL,
                        Rnolabel | (b->op == Assign ? Rnoconstant : 0));
    if (!b->right)
        return Tnone;

    if (t) {
        if (propagate_types(b->right, c, ok, t, 0) != t)
            if (b->left->type == Xvar)
                type_err(c, "info: variable '%v' was set as %1 here.",
                         cast(var, b->left)->var->where_set, t, rules, NULL);
    } else {
        t = propagate_types(b->right, c, ok, NULL, Rnolabel);
        if (t)
            propagate_types(b->left, c, ok, t,
                            (b->op == Assign ? Rnoconstant : 0));
    }
    if (t && t->dup == NULL) {
        type_err(c, "error: cannot assign value of type %1", b, t, 0, NULL);
        *ok = 0;
    }
    return Tnone;

    break;
interp binode cases
case Assign:
    lleft = linterp_exec(b->left);
    right = interp_exec(b->right);
    if (lleft) {
        free_value(*lleft);
        *lleft = right;
    } else
        free_value(right);  // NOTEST
    right.type = NULL;
    break;

case Declare:
{
    struct variable *v = cast(var, b->left)->var;
    if (v->merged)
        v = v->merged;
    if (b->right)
        right = interp_exec(b->right);
    else
        right = val_init(v->val.type);
    free_value(v->val);
    v->val = right;
    right.type = NULL;
    break;
}

The use statement

The use statement is the last "simple" statement. It is needed when the condition in a conditional statement is a block. use works much like return in C, but only completes the condition, not the whole function.

Binode types
Use,
SimpleStatement Grammar
| use Expression ${
    $0 = new_pos(binode, $1);
    $0->op = Use;
    $0->right = $<2;
}$
print binode cases
case Use:
    do_indent(indent, "use ");
    print_exec(b->right, -1, 0);
    if (indent >= 0)
        printf("\n");
    break;
propagate binode cases
case Use:
    /* result matches value */
    return propagate_types(b->right, c, ok, type, 0);
interp binode cases
case Use:
    rv = interp_exec(b->right);
    break;

The Conditional Statement

This is the biggy and currently the only complex statement. This subsumes if, while, do/while, switch, and some parts of for. It is comprised of a number of parts, all of which are optional though set combinations apply. Each part is (usually) a key word (then is sometimes optional) followed by either an expression or a code block, except the casepart which is a "key word and an expression" followed by a code block. The code-block option is valid for all parts and, where an expression is also allowed, the code block can use the use statement to report a value. If the code block does not report a value the effect is similar to reporting True.

The else and case parts, as well as then when combined with if, can contain a use statement which will apply to some containing conditional statement. for parts, do parts and then parts used with for can never contain a use, except in some subordinate conditional statement.

If there is a forpart, it is executed first, only once. If there is a dopart, then it is executed repeatedly providing always that the condpart or cond, if present, does not return a non-True value. condpart can fail to return any value if it simply executes to completion. This is treated the same as returning True.

If there is a thenpart it will be executed whenever the condpart or cond returns True (or does not return any value), but this will happen after dopart (when present).

If elsepart is present it will be executed at most once when the condition returns False or some value that isn't True and isn't matched by any casepart. If there are any caseparts, they will be executed when the condition returns a matching value.

The particular sorts of values allowed in case parts has not yet been determined in the language design, so nothing is prohibited.

The various blocks in this complex statement potentially provide scope for variables as described earlier. Each such block must include the "OpenScope" nonterminal before parsing the block, and must call var_block_close() when closing the block.

The code following "if", "switch" and "for" does not get its own scope, but is in a scope covering the whole statement, so names declared there cannot be redeclared elsewhere. Similarly the condition following "while" is in a scope the covers the body ("do" part) of the loop, and which does not allow conditional scope extension. Code following "then" (both looping and non-looping), "else" and "case" each get their own local scope.

The type requirements on the code block in a whilepart are quite unusal. It is allowed to return a value of some identifiable type, in which case the loop aborts and an appropriate casepart is run, or it can return a Boolean, in which case the loop either continues to the dopart (on True) or aborts and runs the elsepart (on False). This is different both from the ifpart code block which is expected to return a Boolean, or the switchpart code block which is expected to return the same type as the casepart values. The correct analysis of the type of the whilepart code block is the reason for the Rboolok flag which is passed to propagate_types().

The cond_statement cannot fit into a binode so a new exec is defined.

exec type
Xcond_statement,
ast
struct casepart {
    struct exec *value;
    struct exec *action;
    struct casepart *next;
};
struct cond_statement {
    struct exec;
    struct exec *forpart, *condpart, *dopart, *thenpart, *elsepart;
    struct casepart *casepart;
};
ast functions
static void free_casepart(struct casepart *cp)
{
    while (cp) {
        struct casepart *t;
        free_exec(cp->value);
        free_exec(cp->action);
        t = cp->next;
        free(cp);
        cp = t;
    }
}

static void free_cond_statement(struct cond_statement *s)
{
    if (!s)
        return;
    free_exec(s->forpart);
    free_exec(s->condpart);
    free_exec(s->dopart);
    free_exec(s->thenpart);
    free_exec(s->elsepart);
    free_casepart(s->casepart);
    free(s);
}
free exec cases
case Xcond_statement: free_cond_statement(cast(cond_statement, e)); break;
ComplexStatement Grammar
| CondStatement ${ $0 = $<1; }$
Grammar
$*cond_statement
// both ForThen and Whilepart open scopes, and CondSuffix only
// closes one - so in the first branch here we have another to close.
CondStatement -> ForThen WhilePart CondSuffix ${
        $0 = $<3;
        $0->forpart = $1.forpart; $1.forpart = NULL;
        $0->thenpart = $1.thenpart; $1.thenpart = NULL;
        $0->condpart = $2.condpart; $2.condpart = NULL;
        $0->dopart = $2.dopart; $2.dopart = NULL;
        var_block_close(config2context(config), CloseSequential);
        }$
    | WhilePart CondSuffix ${
        $0 = $<2;
        $0->condpart = $1.condpart; $1.condpart = NULL;
        $0->dopart = $1.dopart; $1.dopart = NULL;
        }$
    | SwitchPart CondSuffix ${
        $0 = $<2;
        $0->condpart = $<1;
        }$
    | IfPart IfSuffix ${
        $0 = $<2;
        $0->condpart = $1.condpart; $1.condpart = NULL;
        $0->thenpart = $1.thenpart; $1.thenpart = NULL;
        // This is where we close an "if" statement
        var_block_close(config2context(config), CloseSequential);
        }$

CondSuffix -> IfSuffix ${
        $0 = $<1;
        // This is where we close scope of the whole
        // "for" or "while" statement
        var_block_close(config2context(config), CloseSequential);
    }$
    | CasePart CondSuffix ${
        $0 = $<2;
        $1->next = $0->casepart;
        $0->casepart = $<1;
    }$

$*casepart
CasePart -> Newlines case Expression OpenScope Block ${
        $0 = calloc(1,sizeof(struct casepart));
        $0->value = $<3;
        $0->action = $<5;
        var_block_close(config2context(config), CloseParallel);
    }$
    | case Expression OpenScope Block ${
        $0 = calloc(1,sizeof(struct casepart));
        $0->value = $<2;
        $0->action = $<4;
        var_block_close(config2context(config), CloseParallel);
    }$

$*cond_statement
IfSuffix -> Newlines ${ $0 = new(cond_statement); }$
    | Newlines else OpenScope Block ${
        $0 = new(cond_statement);
        $0->elsepart = $<4;
        var_block_close(config2context(config), CloseElse);
    }$
    | else OpenScope Block ${
        $0 = new(cond_statement);
        $0->elsepart = $<3;
        var_block_close(config2context(config), CloseElse);
    }$
    | Newlines else OpenScope CondStatement ${
        $0 = new(cond_statement);
        $0->elsepart = $<4;
        var_block_close(config2context(config), CloseElse);
    }$
    | else OpenScope CondStatement ${
        $0 = new(cond_statement);
        $0->elsepart = $<3;
        var_block_close(config2context(config), CloseElse);
    }$


$*exec
// These scopes are closed in CondSuffix
ForPart -> for OpenScope SimpleStatements ${
        $0 = reorder_bilist($<3);
    }$
    |  for OpenScope Block ${
        $0 = $<3;
    }$

ThenPart -> then OpenScope SimpleStatements ${
        $0 = reorder_bilist($<3);
        var_block_close(config2context(config), CloseSequential);
    }$
    |  then OpenScope Block ${
        $0 = $<3;
        var_block_close(config2context(config), CloseSequential);
    }$

ThenPartNL -> ThenPart OptNL ${
        $0 = $<1;
    }$

// This scope is closed in CondSuffix
WhileHead -> while OpenScope Block ${
    $0 = $<3;
    }$

$cond_statement
ForThen -> ForPart OptNL ThenPartNL ${
        $0.forpart = $<1;
        $0.thenpart = $<3;
    }$
    | ForPart OptNL ${
        $0.forpart = $<1;
    }$

// This scope is closed in CondSuffix
WhilePart -> while OpenScope Expression Block ${
        $0.type = Xcond_statement;
        $0.condpart = $<3;
        $0.dopart = $<4;
    }$
    | WhileHead OptNL do Block ${
        $0.type = Xcond_statement;
        $0.condpart = $<1;
        $0.dopart = $<4;
    }$

IfPart -> if OpenScope Expression OpenScope Block ${
        $0.type = Xcond_statement;
        $0.condpart = $<3;
        $0.thenpart = $<5;
        var_block_close(config2context(config), CloseParallel);
    }$
    | if OpenScope Block OptNL then OpenScope Block ${
        $0.type = Xcond_statement;
        $0.condpart = $<3;
        $0.thenpart = $<7;
        var_block_close(config2context(config), CloseParallel);
    }$

$*exec
// This scope is closed in CondSuffix
SwitchPart -> switch OpenScope Expression ${
        $0 = $<3;
    }$
    | switch OpenScope Block ${
        $0 = $<3;
    }$
print exec cases
case Xcond_statement:
{
    struct cond_statement *cs = cast(cond_statement, e);
    struct casepart *cp;
    if (cs->forpart) {
        do_indent(indent, "for");
        if (bracket) printf(" {\n"); else printf(":\n");
        print_exec(cs->forpart, indent+1, bracket);
        if (cs->thenpart) {
            if (bracket)
                do_indent(indent, "} then {\n");
            else
                do_indent(indent, "then:\n");
            print_exec(cs->thenpart, indent+1, bracket);
        }
        if (bracket) do_indent(indent, "}\n");
    }
    if (cs->dopart) {
        // a loop
        if (cs->condpart && cs->condpart->type == Xbinode &&
            cast(binode, cs->condpart)->op == Block) {
            if (bracket)
                do_indent(indent, "while {\n");
            else
                do_indent(indent, "while:\n");
            print_exec(cs->condpart, indent+1, bracket);
            if (bracket)
                do_indent(indent, "} do {\n");
            else
                do_indent(indent, "do:\n");
            print_exec(cs->dopart, indent+1, bracket);
            if (bracket)
                do_indent(indent, "}\n");
        } else {
            do_indent(indent, "while ");
            print_exec(cs->condpart, 0, bracket);
            if (bracket)
                printf(" {\n");
            else
                printf(":\n");
            print_exec(cs->dopart, indent+1, bracket);
            if (bracket)
                do_indent(indent, "}\n");
        }
    } else {
        // a condition
        if (cs->casepart)
            do_indent(indent, "switch");
        else
            do_indent(indent, "if");
        if (cs->condpart && cs->condpart->type == Xbinode &&
            cast(binode, cs->condpart)->op == Block) {
            if (bracket)
                printf(" {\n");
            else
                printf(":\n");
            print_exec(cs->condpart, indent+1, bracket);
            if (bracket)
                do_indent(indent, "}\n");
            if (cs->thenpart) {
                do_indent(indent, "then:\n");
                print_exec(cs->thenpart, indent+1, bracket);
            }
        } else {
            printf(" ");
            print_exec(cs->condpart, 0, bracket);
            if (cs->thenpart) {
                if (bracket)
                    printf(" {\n");
                else
                    printf(":\n");
                print_exec(cs->thenpart, indent+1, bracket);
                if (bracket)
                    do_indent(indent, "}\n");
            } else
                printf("\n");
        }
    }
    for (cp = cs->casepart; cp; cp = cp->next) {
        do_indent(indent, "case ");
        print_exec(cp->value, -1, 0);
        if (bracket)
            printf(" {\n");
        else
            printf(":\n");
        print_exec(cp->action, indent+1, bracket);
        if (bracket)
            do_indent(indent, "}\n");
    }
    if (cs->elsepart) {
        do_indent(indent, "else");
        if (bracket)
            printf(" {\n");
        else
            printf(":\n");
        print_exec(cs->elsepart, indent+1, bracket);
        if (bracket)
            do_indent(indent, "}\n");
    }
    break;
}
propagate exec cases
case Xcond_statement:
{
    // forpart and dopart must return Tnone
    // thenpart must return Tnone if there is a dopart,
    // otherwise it is like elsepart.
    // condpart must:
    //    be bool if there is no casepart
    //    match casepart->values if there is a switchpart
    //    either be bool or match casepart->value if there
    //             is a whilepart
    // elsepart and casepart->action must match the return type
    //   expected of this statement.
    struct cond_statement *cs = cast(cond_statement, prog);
    struct casepart *cp;

    t = propagate_types(cs->forpart, c, ok, Tnone, 0);
    if (!type_compat(Tnone, t, 0))
        *ok = 0;
    t = propagate_types(cs->dopart, c, ok, Tnone, 0);
    if (!type_compat(Tnone, t, 0))
        *ok = 0;
    if (cs->dopart) {
        t = propagate_types(cs->thenpart, c, ok, Tnone, 0);
        if (!type_compat(Tnone, t, 0))
            *ok = 0;
    }
    if (cs->casepart == NULL)
        propagate_types(cs->condpart, c, ok, Tbool, 0);
    else {
        /* Condpart must match case values, with bool permitted */
        t = NULL;
        for (cp = cs->casepart;
             cp && !t; cp = cp->next)
            t = propagate_types(cp->value, c, ok, NULL, 0);
        if (!t && cs->condpart)
            t = propagate_types(cs->condpart, c, ok, NULL, Rboolok);
        // Now we have a type (I hope) push it down
        if (t) {
            for (cp = cs->casepart; cp; cp = cp->next)
                propagate_types(cp->value, c, ok, t, 0);
            propagate_types(cs->condpart, c, ok, t, Rboolok);
        }
    }
    // (if)then, else, and case parts must return expected type.
    if (!cs->dopart && !type)
        type = propagate_types(cs->thenpart, c, ok, NULL, rules);
    if (!type)
        type = propagate_types(cs->elsepart, c, ok, NULL, rules);
    for (cp = cs->casepart;
         cp && !type;
         cp = cp->next)
        type = propagate_types(cp->action, c, ok, NULL, rules);
    if (type) {
        if (!cs->dopart)
            propagate_types(cs->thenpart, c, ok, type, rules);
        propagate_types(cs->elsepart, c, ok, type, rules);
        for (cp = cs->casepart; cp ; cp = cp->next)
            propagate_types(cp->action, c, ok, type, rules);
        return type;
    } else
        return NULL;
}
interp exec cases
case Xcond_statement:
{
    struct value v, cnd;
    struct casepart *cp;
    struct cond_statement *c = cast(cond_statement, e);

    if (c->forpart)
        interp_exec(c->forpart);
    do {
        if (c->condpart)
            cnd = interp_exec(c->condpart);
        else
            cnd.type = Tnone;
        if (!(cnd.type == Tnone ||
              (cnd.type == Tbool && cnd.bool != 0)))
            break;
        // cnd is Tnone or Tbool, doesn't need to be freed
        if (c->dopart)
            interp_exec(c->dopart);

        if (c->thenpart) {
            rv = interp_exec(c->thenpart);
            if (rv.type != Tnone || !c->dopart)
                goto Xcond_done;
            free_value(rv);
        }
    } while (c->dopart);

    for (cp = c->casepart; cp; cp = cp->next) {
        v = interp_exec(cp->value);
        if (value_cmp(v, cnd) == 0) {
            free_value(v);
            free_value(cnd);
            rv = interp_exec(cp->action);
            goto Xcond_done;
        }
        free_value(v);
    }
    free_value(cnd);
    if (c->elsepart)
        rv = interp_exec(c->elsepart);
    else
        rv.type = Tnone;
Xcond_done:
    break;
}

Top level structure

All the language elements so far can be used in various places. Now it is time to clarify what those places are.

At the top level of a file there will be a number of declarations. Many of the things that can be declared haven't been described yet, such as functions, procedures, imports, and probably more. For now there are two sorts of things that can appear at the top level. They are predefined constants, struct types, and the main program. While the syntax will allow the main program to appear multiple times, that will trigger an error if it is actually attempted.

The various declarations do not return anything. They store the various declarations in the parse context.

Parser: grammar
$void
Ocean -> DeclarationList

DeclarationList -> Declaration
    | DeclarationList Declaration

Declaration -> DeclareConstant
    | DeclareProgram
    | DeclareStruct
    | NEWLINE

## top level grammar

The const section

As well as being defined in with the code that uses them, constants can be declared at the top level. These have full-file scope, so they are always InScope. The value of a top level constant can be given as an expression, and this is evaluated immediately rather than in the later interpretation stage. Once we add functions to the language, we will need rules concern which, if any, can be used to define a top level constant.

Constants are defined in a section that starts with the reserved word const and then has a block with a list of assignment statements. For syntactic consistency, these must use the double-colon syntax to make it clear that they are constants. Type can also be given: if not, the type will be determined during analysis, as with other constants.

As the types constants are inserted at the head of a list, printing them in the same order that they were read is not straight forward. We take a quadratic approach here and count the number of constants (variables of depth 0), then count down from there, each time searching through for the Nth constant for decreasing N.

top level grammar
DeclareConstant -> const Open ConstList Close
    | const Open Newlines ConstList Close
    | const Open SimpleConstList }
    | const Open Newlines SimpleConstList }
    | const : ConstList
    | const SimpleConstList

ConstList -> ComplexConsts
ComplexConsts -> ComplexConst ComplexConsts
    | ComplexConst
ComplexConst -> SimpleConstList NEWLINE
SimpleConstList -> Const ; SimpleConstList
    | Const
    | Const ; SimpleConstList ;

$*type
CType -> Type   ${ $0 = $<1; }$
    |   ${ $0 = NULL; }$
$void
Const -> IDENTIFIER :: CType = Expression ${ {
    int ok;
    struct variable *v;

    v = var_decl(config2context(config), $1.txt);
    if (v) {
        struct var *var = new_pos(var, $1);
        v->where_decl = var;
        v->where_set = var;
        var->var = v;
        v->constant = 1;
    } else {
        v = var_ref(config2context(config), $1.txt);
        tok_err(config2context(config), "error: name already declared", &$1);
        type_err(config2context(config), "info: this is where '%v' was first declared",
                 v->where_decl, NULL, 0, NULL);
    }
    do {
        ok = 1;
        propagate_types($5, config2context(config), &ok, $3, 0);
    } while (ok == 2);
    if (!ok)
        config2context(config)->parse_error = 1;
    else if (v) {
        v->val = interp_exec($5);
    }
} }$
print const decls
{
    struct variable *v;
    int target = -1;

    while (target != 0) {
        int i = 0;
        for (v = context.in_scope; v; v=v->in_scope)
            if (v->depth == 0) {
                i += 1;
                if (i == target)
                    break;
            }

        if (target == -1) {
            if (i)
                printf("const:\n");
            target = i;
        } else {
            printf("    %.*s :: ", v->name->name.len, v->name->name.txt);
            type_print(v->val.type, stdout);
            printf(" = ");
            if (v->val.type == Tstr)
                printf("\"");
            print_value(v->val);
            if (v->val.type == Tstr)
                printf("\"");
            printf("\n");
            target -= 1;
        }
    }
}

Finally the whole program.

Somewhat reminiscent of Pascal a (current) Ocean program starts with the keyword "program" and a list of variable names which are assigned values from command line arguments. Following this is a block which is the code to execute. Unlike Pascal, constants and other declarations come before the program.

As this is the top level, several things are handled a bit differently. The whole program is not interpreted by interp_exec as that isn't passed the argument list which the program requires. Similarly type analysis is a bit more interesting at this level.

Binode types
Program,
top level grammar
DeclareProgram -> Program ${ {
    struct parse_context *c = config2context(config);
    if (c->prog)
        type_err(c, "Program defined a second time",
                 $1, NULL, 0, NULL);
    else
        c->prog = $<1;
} }$


$*binode
Program -> program OpenScope Varlist Block OptNL ${
    $0 = new(binode);
    $0->op = Program;
    $0->left = reorder_bilist($<3);
    $0->right = $<4;
    var_block_close(config2context(config), CloseSequential);
    if (config2context(config)->scope_stack) abort();
    }$
    | ERROR ${
        tok_err(config2context(config),
                "error: unhandled parse error", &$1);
    }$

Varlist -> Varlist ArgDecl ${
        $0 = new(binode);
        $0->op = Program;
        $0->left = $<1;
        $0->right = $<2;
    }$
    | ${ $0 = NULL; }$

$*var
ArgDecl -> IDENTIFIER ${ {
    struct variable *v = var_decl(config2context(config), $1.txt);
    $0 = new(var);
    $0->var = v;
} }$

## Grammar
print binode cases
case Program:
    do_indent(indent, "program");
    for (b2 = cast(binode, b->left); b2; b2 = cast(binode, b2->right)) {
        printf(" ");
        print_exec(b2->left, 0, 0);
    }
    if (bracket)
        printf(" {\n");
    else
        printf(":\n");
    print_exec(b->right, indent+1, bracket);
    if (bracket)
        do_indent(indent, "}\n");
    break;
propagate binode cases
case Program: abort();      // NOTEST
core functions
static int analyse_prog(struct exec *prog, struct parse_context *c)
{
    struct binode *b = cast(binode, prog);
    int ok = 1;

    if (!b)
        return 0;   // NOTEST
    do {
        ok = 1;
        propagate_types(b->right, c, &ok, Tnone, 0);
    } while (ok == 2);
    if (!ok)
        return 0;

    for (b = cast(binode, b->left); b; b = cast(binode, b->right)) {
        struct var *v = cast(var, b->left);
        if (!v->var->val.type) {
            v->var->where_set = b;
            v->var->val = val_prepare(Tstr);
        }
    }
    b = cast(binode, prog);
    do {
        ok = 1;
        propagate_types(b->right, c, &ok, Tnone, 0);
    } while (ok == 2);
    if (!ok)
        return 0;

    /* Make sure everything is still consistent */
    propagate_types(b->right, c, &ok, Tnone, 0);
    return !!ok;
}

static void interp_prog(struct exec *prog, char **argv)
{
    struct binode *p = cast(binode, prog);
    struct binode *al;
    struct value v;

    if (!prog)
        return;     // NOTEST
    al = cast(binode, p->left);
    while (al) {
        struct var *v = cast(var, al->left);
        struct value *vl = &v->var->val;

        if (argv[0] == NULL) {
            printf("Not enough args\n");
            exit(1);
        }
        al = cast(binode, al->right);
        free_value(*vl);
        *vl = parse_value(vl->type, argv[0]);
        if (vl->type == NULL)
            exit(1);
        argv++;
    }
    v = interp_exec(p->right);
    free_value(v);
}
interp binode cases
case Program: abort();  // NOTEST

And now to test it out.

Having a language requires having a "hello world" program. I'll provide a little more than that: a program that prints "Hello world" finds the GCD of two numbers, prints the first few elements of Fibonacci, performs a binary search for a number, and a few other things which will likely grow as the languages grows.

File: oceani.mk
tests :: sayhello
sayhello : oceani
    @echo "===== TEST ====="
    ./oceani --section "test: hello" oceani.mdc 55 33
test: hello
const:
    pi ::= 3.1415926
    four ::= 2 + 2 ; five ::= 10/2
const pie ::= "I like Pie";
    cake ::= "The cake is"
      ++ " a lie"

struct fred:
    size:[four]number
    name:string
    alive:Boolean

program A B:
    print "Hello World, what lovely oceans you have!"
    print "Are there", five, "?"
    print pi, pie, "but", cake

    /* When a variable is defined in both branches of an 'if',
     * and used afterwards, the variables are merged.
     */
    if A > B:
        bigger := "yes"
    else:
        bigger := "no"
    print "Is", A, "bigger than", B,"? ", bigger
    /* If a variable is not used after the 'if', no
     * merge happens, so types can be different
     */
    if A > B * 2:
        double:string = "yes"
        print A, "is more than twice", B, "?", double
    else:
        double := B*2
        print "double", B, "is", double

    a : number
    a = A;
    b:number = B
    if a > 0 and then b > 0:
        while a != b:
            if a < b:
                b = b - a
            else:
                a = a - b
        print "GCD of", A, "and", B,"is", a
    else if a <= 0:
        print a, "is not positive, cannot calculate GCD"
    else:
        print b, "is not positive, cannot calculate GCD"

    for:
        togo := 10
        f1 := 1; f2 := 1
        print "Fibonacci:", f1,f2,
    then togo = togo - 1
    while togo > 0:
        f3 := f1 + f2
        print "", f3,
        f1 = f2
        f2 = f3
    print ""

    /* Binary search... */
    for:
        lo:= 0; hi := 100
        target := 77
    while:
        mid := (lo + hi) / 2
        if mid == target:
            use Found
        if mid < target:
            lo = mid
        else:
            hi = mid
        if hi - lo < 1:
            use GiveUp
        use True
    do: pass
    case Found:
        print "Yay, I found", target
    case GiveUp:
        print "Closest I found was", mid

    size::= 10
    list:[size]number
    list[0] = 1234
    // "middle square" PRNG.  Not particularly good, but one my
    // Dad taught me - the first one I ever heard of.
    for i:=1; then i = i + 1; while i < size:
        n := list[i-1] * list[i-1]
        list[i] = (n / 100) % 10000

    print "Before sort:",
    for i:=0; then i = i + 1; while i < size:
        print "", list[i],
    print

    for i := 1; then i=i+1; while i < size:
        for j:=i-1; then j=j-1; while j >= 0:
            if list[j] > list[j+1]:
                t:= list[j]
                list[j] = list[j+1]
                list[j+1] = t
    print " After sort:",
    for i:=0; then i = i + 1; while i < size:
        print "", list[i],
    print

    bob:fred
    bob.name = "Hello"
    bob.alive = (bob.name == "Hello")
    print "bob", "is" if  bob.alive else "isn't", "alive"