parsergen: avoid infinite loop on error.

author NeilBrown <neil@brown.name>

Sat, 10 Oct 2020 22:50:12 +0000 (09:50 +1100)

committer NeilBrown <neil@brown.name>

Sat, 10 Oct 2020 22:50:12 +0000 (09:50 +1100)
author NeilBrown <neil@brown.name>
Sat, 10 Oct 2020 22:50:12 +0000 (09:50 +1100)
committer NeilBrown <neil@brown.name>
Sat, 10 Oct 2020 22:50:12 +0000 (09:50 +1100)
diff --git a/csrc/parsergen.mdc b/csrc/parsergen.mdc

index 13b946fb0e992580c8c7b2d410a003415bb3eea3..cdb5f82204cb670d87d570e96177d52eb1aa5e41 100644 (file)
--- a/csrc/parsergen.mdc
+++ b/csrc/parsergen.mdc
@@ -2779,8 +2779,12 @@ accepted the input and can finish.
  We return whatever `asn` was returned by reducing production zero.
  
  If we can neither shift nor reduce we have an error to handle.  We pop
-single entries off the stack until we can shift the `TK_error` symbol, then
-drop input tokens until we find one we can shift into the new error state.
+single entries off the stack until we can shift the `TK_error` symbol,
+then drop input tokens until we find one we can shift into the new error
+state.  We need to ensure that something is dropped or shifted after an
+error, or we could get into an infinite loop, only shifting in
+`TK_error`, then reducing.  So we track if there has been a shift since
+the last error, and if not the next error always discards one token.
  
  When we find `TK_in` and `TK_out` tokens which report indents we need
  to handle them directly as the grammar cannot express what we want to
@@ -2824,11 +2828,11 @@ in the thing that preceed:
  Here the NEWLINE will be shifted because nothing can be reduced until
  the `if` is seen.
  
-When, during error handling, we discard token read in, we want to keep
+When during error handling we discard tokens read in, we want to keep
  discarding until we see one that is recognised.  If we had a full set
-of LR(1) grammar states, this will mean looking in the look-ahead set,
+of LR(1) grammar states, this would mean looking in the look-ahead set,
  but we don't keep a full look-ahead set.  We only record the subset
-that leads to SHIFT.  We can, however, deduce the look-ahead set but
+that leads to SHIFT.  We can, however, deduce the look-ahead set by
  looking at the SHIFT subsets for all states that we can get to by
  reducing zero or more times.  So we need a little function which
  checks if a given token is in any of these look-ahead sets.
@@ -2860,6 +2864,7 @@ checks if a given token is in any of these look-ahead sets.
                 struct parser p = { 0 };
                 struct token *tk = NULL;
                 int accepted = 0;
+               int shift_since_err = 1;
                 void *ret = NULL;
  
                 shift(&p, TK_eof, 0, 1, NULL, states);
@@ -2931,6 +2936,7 @@ checks if a given token is in any of these look-ahead sets.
                                         goto force_reduce;
                         }
                         if (shift(&p, tk->num, 0, tk->num == TK_newline, tk, states)) {
+                               shift_since_err = 1;
                                 tk = NULL;
                                 parser_trace_action(trace, "Shift");
                                 continue;
@@ -2995,11 +3001,22 @@ checks if a given token is in any of these look-ahead sets.
                                 // no state accepted TK_error
                                 break;
                         }
+                       if (!shift_since_err) {
+                               /* must discard at least one token to avoid
+                                * infinite loop.
+                                */
+                               if (tk->num == TK_eof)
+                                       break;
+                               free(tk);
+                               tk = tok_copy(token_next(tokens));
+                       }
+                       shift_since_err = 0;
                         tos = &p.stack[p.tos-1];
                         while (!in_lookahead(tk, states, tos->state) &&
                                tk->num != TK_eof) {
                                 free(tk);
                                 tk = tok_copy(token_next(tokens));
+                               shift_since_err = 1;
                                 if (tk->num == TK_in)
                                         indents += 1;
                                 if (tk->num == TK_out) {
author	NeilBrown <neil@brown.name>
	Sat, 10 Oct 2020 22:50:12 +0000 (09:50 +1100)
committer	NeilBrown <neil@brown.name>
	Sat, 10 Oct 2020 22:50:12 +0000 (09:50 +1100)