From a9b23e6495e444c818d6f61614f01b7c4068621b Mon Sep 17 00:00:00 2001 From: NeilBrown Date: Fri, 5 Mar 2021 21:24:14 +1100 Subject: [PATCH] parsergen: add support for "special" terminals. We will want a new terminal "EOL", which is like "NEWLINE", but different. There is currently no room in the numbering for something like that, so make some room. Signed-off-by: NeilBrown --- csrc/parsergen.mdc | 31 +++++++++++++++++++++++++------ 1 file changed, 25 insertions(+), 6 deletions(-) diff --git a/csrc/parsergen.mdc b/csrc/parsergen.mdc index c80bc9d..ff2ed7b 100644 --- a/csrc/parsergen.mdc +++ b/csrc/parsergen.mdc @@ -171,7 +171,11 @@ is treated as an error. Symbols can be either `TK_ident` or `TK_mark`. They are saved in a table of known symbols and the resulting parser will report them as `TK_reserved + N`. A small set of identifiers are reserved for the -different token types that `scanner` can report. +different token types that `scanner` can report, and an even smaller set +are reserved for a special token that the parser can generate (`EOL`) as +will be described later. This latter set cannot use predefined numbers, +so they are marked as `isspecial` for now and will get assigned a number +with the non-terminals later. ###### declarations @@ -186,9 +190,12 @@ different token types that `scanner` can report. { TK_out, "OUT" }, { TK_newline, "NEWLINE" }, { TK_eof, "$eof" }, + { -1, "EOL" }, }; + ###### symbol fields short num; + unsigned int isspecial:1; Note that `TK_eof` and the two `TK_*_comment` tokens cannot be recognised. The former is automatically expected at the end of the text @@ -246,6 +253,7 @@ symbol, but its type might be `Unknown`. s = sym_find(g, t); s->type = Terminal; s->num = reserved_words[i].num; + s->isspecial = 1; } } @@ -1481,8 +1489,10 @@ a report. Once we have built everything we allocate arrays for the two lists: symbols and itemsets. This allows more efficient access during -reporting. The symbols are grouped as terminals and then non-terminals, -and we record the changeover point in `first_nonterm`. +reporting. The symbols are grouped as terminals, then non-terminals, +then virtual, with the start of non-terminals recorded as `first_nonterm`. +Special terminals -- meaning just EOL -- are included with the +non-terminals so that they are not expected by the scanner. ###### grammar fields struct symbol **symtab; @@ -1497,7 +1507,7 @@ and we record the changeover point in `first_nonterm`. struct itemset *is; int snum = TK_reserved; for (s = g->syms; s; s = s->next) - if (s->num < 0 && s->type == Terminal) { + if (s->num < 0 && s->type == Terminal && !s->isspecial) { s->num = snum; snum++; } @@ -1922,7 +1932,8 @@ The table of nonterminals used for tracing is a similar array. for (i = TK_reserved; i < g->num_syms; i++) - if (g->symtab[i]->type == Nonterminal) + if (g->symtab[i]->type == Nonterminal || + g->symtab[i]->isspecial) fprintf(f, "\t\"%.*s\",\n", g->symtab[i]->name.len, g->symtab[i]->name.txt); fprintf(f, "};\n\n"); @@ -2268,7 +2279,15 @@ appropriate for tokens) on any terminal symbol. fprintf(f, "static void do_free(short sym, void *asn)\n"); fprintf(f, "{\n"); fprintf(f, "\tif (!asn) return;\n"); - fprintf(f, "\tif (sym < %d) {\n", g->first_nonterm); + fprintf(f, "\tif (sym < %d", g->first_nonterm); + /* Need to handle special terminals too */ + for (i = 0; i < g->num_syms; i++) { + struct symbol *s = g->symtab[i]; + if (i >= g->first_nonterm && s->type == Terminal && + s->isspecial) + fprintf(f, " || sym == %d", s->num); + } + fprintf(f, ") {\n"); fprintf(f, "\t\tfree(asn);\n\t\treturn;\n\t}\n"); fprintf(f, "\tswitch(sym) {\n"); -- 2.43.0