From: NeilBrown Date: Sat, 10 Oct 2020 23:34:06 +0000 (+1100) Subject: parsergen: allow terminals to be declared. X-Git-Url: https://ocean-lang.org/code/?p=ocean;a=commitdiff_plain;h=229d6941cd1da3ba78d38e093dc51246c081a847 parsergen: allow terminals to be declared. By default, any non-virtual symbol that does not appear in the head of a product is assumed to be a Terminal. For larger grammars, this misses out of an opportunity to detect errors. So allow a "$TERM" line to list terminals (that do no appear in precedence lines). If any $TERM line is given, then generate error if any symbol appears in a production but is not declared, either as terminal or non-terminal. Signed-off-by: NeilBrown --- diff --git a/csrc/indent_test.mdc b/csrc/indent_test.mdc index 3df78d0..087df32 100644 --- a/csrc/indent_test.mdc +++ b/csrc/indent_test.mdc @@ -132,6 +132,8 @@ with complete bracketing and indenting. ~~~~~~ +$TERM if { } : * + - / ; = + Program -> Statementlist ${ print_statement($1, 0); }$ $*statement diff --git a/csrc/parsergen.mdc b/csrc/parsergen.mdc index cdb5f82..66a41b5 100644 --- a/csrc/parsergen.mdc +++ b/csrc/parsergen.mdc @@ -151,11 +151,17 @@ those which don't. There are also "virtual" symbols used for precedence marking discussed later, and sometimes we won't know what type a symbol is yet. +To help with code safety it is possible to declare the terminal symbols. +If this is done, then any symbol used in a production that does not +appear in a head and is not declared is treated as an error. + ###### forward declarations enum symtype { Unknown, Virtual, Terminal, Nonterminal }; char *symtypes = "UVTN"; ###### symbol fields enum symtype type; +###### grammar fields + int terminals_declared; Symbols can be either `TK_ident` or `TK_mark`. They are saved in a table of known symbols and the resulting parser will report them as @@ -241,9 +247,10 @@ symbol, but its type might be `Unknown`. ### Data types and precedence. -Data type specification and precedence specification are both -introduced by a dollar sign at the start of the line. If the next -word is `LEFT`, `RIGHT` or `NON`, then the line specifies a +Data type specification, precedence specification, and declaration of +terminals are all introduced by a dollar sign at the start of the line. +If the next word is `LEFT`, `RIGHT` or `NON`, then the line specifies a +precedence, if it is `TERM` the the line declares terminals without precedence, otherwise it specifies a data type. The data type name is simply stored and applied to the head of all @@ -296,6 +303,7 @@ Subsequent lines introduce symbols with higher precedence. struct token t = token_next(ts); char *err; enum assoc assoc; + int term = 0; int found; if (t.num != TK_ident) { @@ -308,7 +316,10 @@ Subsequent lines introduce symbols with higher precedence. assoc = Right; else if (text_is(t.txt, "NON")) assoc = Non; - else { + else if (text_is(t.txt, "TERM")) { + term = 1; + g->terminals_declared = 1; + } else { g->current_type = t.txt; g->type_isref = isref; if (text_is(t.txt, "void")) @@ -326,7 +337,7 @@ Subsequent lines introduce symbols with higher precedence. goto abort; } - // This is a precedence line, need some symbols. + // This is a precedence or TERM line, need some symbols. found = 0; g->prec_levels += 1; t = token_next(ts); @@ -340,6 +351,10 @@ Subsequent lines introduce symbols with higher precedence. err = "$$ must be followed by a word"; goto abort; } + if (term) { + err = "Virtual symbols not permitted on $TERM line"; + goto abort; + } } else if (t.num != TK_ident && t.num != TK_mark) { err = "Illegal token in precedence line"; @@ -347,17 +362,19 @@ Subsequent lines introduce symbols with higher precedence. } s = sym_find(g, t.txt); if (s->type != Unknown) { - err = "Symbols in precedence line must not already be known."; + err = "Symbols in precedence/TERM line must not already be known."; goto abort; } s->type = type; - s->precedence = g->prec_levels; - s->assoc = assoc; + if (!term) { + s->precedence = g->prec_levels; + s->assoc = assoc; + } found += 1; t = token_next(ts); } if (found == 0) - err = "No symbols given on precedence line"; + err = "No symbols given on precedence/TERM line"; goto abort; return NULL; abort: @@ -492,8 +509,10 @@ Now we have all the bits we need to parse a full production. tk = token_next(state); while (tk.num == TK_ident || tk.num == TK_mark) { struct symbol *bs = sym_find(g, tk.txt); - if (bs->type == Unknown) - bs->type = Terminal; + if (bs->type == Unknown) { + if (!g->terminals_declared) + bs->type = Terminal; + } if (bs->type == Virtual) { err = "Virtual symbol not permitted in production"; goto abort; @@ -669,6 +688,21 @@ to produce errors that the parser is better positioned to handle. goto abort; } token_close(state); + if (g->terminals_declared) { + struct symbol *s; + int errs = 0; + for (s = g->syms; s; s = s->next) { + if (s->type != Unknown) + continue; + errs += 1; + fprintf(stderr, "Token %.*s not declared\n", + s->name.len, s->name.txt); + } + if (errs) { + free(g); + g = NULL; + } + } return g; abort: fprintf(stderr, "Error at line %d: %s\n",