From b3fdb1d9f081d8c034653b077e7aa337585356cd Mon Sep 17 00:00:00 2001 From: NeilBrown Date: Sun, 4 May 2014 07:55:03 +1000 Subject: [PATCH] parsergen: compute "can_eol" for each symbol. A symbol is "can_eol" if it can derive a phrase which ends with a newlike token. This will allow us to recognise line-like sections of code and thus know when to ignore newlines and when not to. Signed-off-by: NeilBrown --- csrc/parsergen.mdc | 63 ++++++++++++++++++++++++++++++++++++++++++---- 1 file changed, 58 insertions(+), 5 deletions(-) diff --git a/csrc/parsergen.mdc b/csrc/parsergen.mdc index d2d70d5..fc14b91 100644 --- a/csrc/parsergen.mdc +++ b/csrc/parsergen.mdc @@ -837,6 +837,57 @@ changes happen. } } +### Setting `can_eol` + +In order to be able to ignore newline tokens when not relevant, but +still include them in the parse when needed, we will need to know +which states can start a "line-like" section of code. We ignore +newlines when there is an indent since the most recent start of a +line-like section. + +To know what is line-like, we first need to know which symbols can end +a line-like section, which is precisely those which can end with a +newline token. These symbols don't necessarily alway end with a +newline, but they can. Hence they are not described as "lines" but +only "line-like". + +Clearly the `TK_newline` token can end with a newline. Any symbol +which is the head of a production that contains a line-ending symbol +followed only by nullable symbols is also a line-ending symbol. We +use a new field `can_eol` to record this attribute of symbols, and +compute it in a repetitive manner similar to `set_nullable`. + +###### symbol fields + int can_eol; + +###### functions + static void set_can_eol(struct grammar *g) + { + int check_again = 1; + g->symtab[TK_newline]->can_eol = 1; + while (check_again) { + int p; + check_again = 0; + for (p = 0; p < g->production_count; p++) { + struct production *pr = g->productions[p]; + int s; + + if (pr->head->can_eol) + continue; + + for (s = pr->body_size - 1; s >= 0; s--) { + if (pr->body[s]->can_eol) { + pr->head->can_eol = 1; + check_again = 1; + break; + } + if (!pr->body[s]->nullable) + break; + } + } + } + } + ### Building the `first` sets When calculating what can follow a particular non-terminal, we will need to @@ -1357,10 +1408,11 @@ changeover point in `first_nonterm`. for (s = g->syms; s; s = s->next) g->symtab[s->num] = s; - if (type >= SLR) { - set_nullable(g); + set_nullable(g); + set_can_eol(g); + if (type >= SLR) build_first(g); - } + if (type == SLR) build_follow(g); @@ -1405,8 +1457,9 @@ set if that was generated. if (!s) continue; - printf(" %c%3d%c: ", - s->nullable ? '*':' ', + printf(" %c%c%3d%c: ", + s->nullable ? '.':' ', + s->can_eol ? '>':' ', s->num, symtypes[s->type]); prtxt(s->name); if (s->precedence) -- 2.43.0