}
}
-### Setting `can_eol` and `line_like`
+### Setting `line_like`
In order to be able to ignore newline tokens when not relevant, but
still include them in the parse when needed, we will need to know
newlines when there is an indent since the most recent start of a
line-like symbol.
-To know which symbols are line-like, we first need to know which
-symbols start with a NEWLINE token. Any symbol which is followed by a
-NEWLINE, or anything that starts with a NEWLINE, is deemed to be a line-like symbol.
-Certainly when trying to parse one of these we must take note of NEWLINEs.
+A "line_like" symbol is simply any symbol that can derive a NEWLINE.
+If a symbol cannot derive a NEWLINE, then it is only part of a line -
+so is word-like. If it can derive a NEWLINE, then we consider it to
+be like a line.
-Clearly the `TK_newline` token can start with a NEWLINE. Any symbol
-which is the head of a production that contains a starts-with-NEWLINE
-symbol preceeded only by nullable symbols is also a
-starts-with-NEWLINE symbol. We use a new field `can_eol` to record
-this attribute of symbols, and compute it in a repetitive manner
-similar to `set_nullable`.
-Once we have that, we can determine which symbols are `line_like` by
-seeing which are followed by a `can_eol` symbol in any production.
+Clearly the `TK_newline` token can derive a NEWLINE. Any symbol which
+is the head of a production that contains a line_like symbol is also a
+line-like symbol. We use a new field `line_like` to record this
+attribute of symbols, and compute it in a repetitive manner similar to
+`set_nullable`.
###### symbol fields
- int can_eol;
int line_like;
###### functions
- static void set_can_eol(struct grammar *g)
+ static void set_line_like(struct grammar *g)
{
int check_again = 1;
- g->symtab[TK_newline]->can_eol = 1;
+ g->symtab[TK_newline]->line_like = 1;
while (check_again) {
int p;
check_again = 0;
struct production *pr = g->productions[p];
int s;
- if (pr->head->can_eol)
+ if (pr->head->line_like)
continue;
for (s = 0 ; s < pr->body_size; s++) {
- if (pr->body[s]->can_eol) {
- pr->head->can_eol = 1;
+ if (pr->body[s]->line_like) {
+ pr->head->line_like = 1;
check_again = 1;
break;
}
- if (!pr->body[s]->nullable)
- break;
}
}
}
}
- static void set_line_like(struct grammar *g)
- {
- int p;
- for (p = 0; p < g->production_count; p++) {
- struct production *pr = g->productions[p];
- int s;
-
- for (s = 1; s < pr->body_size; s++)
- if (pr->body[s]->can_eol)
- pr->body[s-1]->line_like = 1;
- }
- }
-
### Building the `first` sets
When calculating what can follow a particular non-terminal, we will need to
For correct handling of `TK_newline` when parsing, we will need to
know which states (itemsets) can occur at the start of a line, so we
-will record a `starts_line` flag too.
+will record a `starts_line` flag too whenever DOT is at the start of a
+`line_like` symbol.
-Finally, for handling `TK_out` we need to know where production in the
+Finally, for handling `TK_out` we need to know whether productions in the
current state started *before* the most recent indent. A state
doesn't usually keep details of individual productions, so we need to
add one extra detail. `min_prefix` is the smallest non-zero number of
We also collect a set of all symbols which follow "DOT" (in `done`) as this
is used in the next stage.
-If any of these symbols are flagged as starting a line, then this
+If any of these symbols are flagged as `line_like`, then this
state must be a `starts_line` state so now is a good time to record that.
When itemsets are created we assign a precedence to the itemset from
g->symtab[s->num] = s;
set_nullable(g);
- set_can_eol(g);
set_line_like(g);
if (type >= SLR)
build_first(g);
if (!s)
continue;
- printf(" %c%c%c%3d%c: ",
+ printf(" %c%c%3d%c: ",
s->nullable ? '.':' ',
- s->can_eol ? '>':' ',
s->line_like ? '<':' ',
s->num, symtypes[s->type]);
prtxt(s->name);