From: NeilBrown <neilb@suse.de>
Date: Sat, 3 May 2014 21:55:03 +0000 (+1000)
Subject: parsergen: compute "can_eol" for each symbol.
X-Git-Tag: linebreakparser~4
X-Git-Url: https://ocean-lang.org/code/?a=commitdiff_plain;h=b3fdb1d9f081d8c034653b077e7aa337585356cd;p=ocean

parsergen: compute "can_eol" for each symbol.

A symbol is "can_eol" if it can derive a phrase which ends with a
newlike token.
This will allow us to recognise line-like sections of code and
thus know when to ignore newlines and when not to.

Signed-off-by: NeilBrown <neilb@suse.de>
---

diff --git a/csrc/parsergen.mdc b/csrc/parsergen.mdc
index d2d70d5..fc14b91 100644
--- a/csrc/parsergen.mdc
+++ b/csrc/parsergen.mdc
@@ -837,6 +837,57 @@ changes happen.
 		}
 	}
 
+### Setting `can_eol`
+
+In order to be able to ignore newline tokens when not relevant, but
+still include them in the parse when needed, we will need to know
+which states can start a "line-like" section of code.  We ignore
+newlines when there is an indent since the most recent start of a
+line-like section.
+
+To know what is line-like, we first need to know which symbols can end
+a line-like section, which is precisely those which can end with a
+newline token.  These symbols don't necessarily alway end with a
+newline, but they can.  Hence they are not described as "lines" but
+only "line-like".
+
+Clearly the `TK_newline` token can end with a newline.  Any symbol
+which is the head of a production that contains a line-ending symbol
+followed only by nullable symbols is also a line-ending symbol.  We
+use a new field `can_eol` to record this attribute of symbols, and
+compute it in a repetitive manner similar to `set_nullable`.
+
+###### symbol fields
+	int can_eol;
+
+###### functions
+	static void set_can_eol(struct grammar *g)
+	{
+		int check_again = 1;
+		g->symtab[TK_newline]->can_eol = 1;
+		while (check_again) {
+			int p;
+			check_again = 0;
+			for (p = 0; p < g->production_count; p++) {
+				struct production *pr = g->productions[p];
+				int s;
+
+				if (pr->head->can_eol)
+					continue;
+
+				for (s = pr->body_size - 1; s >= 0; s--) {
+					if (pr->body[s]->can_eol) {
+						pr->head->can_eol = 1;
+						check_again = 1;
+						break;
+					}
+					if (!pr->body[s]->nullable)
+						break;
+				}
+			}
+		}
+	}
+
 ### Building the `first` sets
 
 When calculating what can follow a particular non-terminal, we will need to
@@ -1357,10 +1408,11 @@ changeover point in `first_nonterm`.
 		for (s = g->syms; s; s = s->next)
 			g->symtab[s->num] = s;
 
-		if (type >= SLR) {
-			set_nullable(g);
+		set_nullable(g);
+		set_can_eol(g);
+		if (type >= SLR)
 			build_first(g);
-		}
+
 		if (type == SLR)
 			build_follow(g);
 
@@ -1405,8 +1457,9 @@ set if that was generated.
 			if (!s)
 				continue;
 
-			printf(" %c%3d%c: ",
-			       s->nullable ? '*':' ',
+			printf(" %c%c%3d%c: ",
+			       s->nullable ? '.':' ',
+			       s->can_eol ? '>':' ',
 			       s->num, symtypes[s->type]);
 			prtxt(s->name);
 			if (s->precedence)