X-Git-Url: https://ocean-lang.org/code/?p=ocean;a=blobdiff_plain;f=csrc%2Fparsergen.mdc;h=cdb5f82204cb670d87d570e96177d52eb1aa5e41;hp=13b946fb0e992580c8c7b2d410a003415bb3eea3;hb=1e17521c96f4116042cdaf3cf2aac40284f6dec8;hpb=2594d150e123c29949493c37c8b9c14aa6abe5a4 diff --git a/csrc/parsergen.mdc b/csrc/parsergen.mdc index 13b946f..cdb5f82 100644 --- a/csrc/parsergen.mdc +++ b/csrc/parsergen.mdc @@ -2779,8 +2779,12 @@ accepted the input and can finish. We return whatever `asn` was returned by reducing production zero. If we can neither shift nor reduce we have an error to handle. We pop -single entries off the stack until we can shift the `TK_error` symbol, then -drop input tokens until we find one we can shift into the new error state. +single entries off the stack until we can shift the `TK_error` symbol, +then drop input tokens until we find one we can shift into the new error +state. We need to ensure that something is dropped or shifted after an +error, or we could get into an infinite loop, only shifting in +`TK_error`, then reducing. So we track if there has been a shift since +the last error, and if not the next error always discards one token. When we find `TK_in` and `TK_out` tokens which report indents we need to handle them directly as the grammar cannot express what we want to @@ -2824,11 +2828,11 @@ in the thing that preceed: Here the NEWLINE will be shifted because nothing can be reduced until the `if` is seen. -When, during error handling, we discard token read in, we want to keep +When during error handling we discard tokens read in, we want to keep discarding until we see one that is recognised. If we had a full set -of LR(1) grammar states, this will mean looking in the look-ahead set, +of LR(1) grammar states, this would mean looking in the look-ahead set, but we don't keep a full look-ahead set. We only record the subset -that leads to SHIFT. We can, however, deduce the look-ahead set but +that leads to SHIFT. We can, however, deduce the look-ahead set by looking at the SHIFT subsets for all states that we can get to by reducing zero or more times. So we need a little function which checks if a given token is in any of these look-ahead sets. @@ -2860,6 +2864,7 @@ checks if a given token is in any of these look-ahead sets. struct parser p = { 0 }; struct token *tk = NULL; int accepted = 0; + int shift_since_err = 1; void *ret = NULL; shift(&p, TK_eof, 0, 1, NULL, states); @@ -2931,6 +2936,7 @@ checks if a given token is in any of these look-ahead sets. goto force_reduce; } if (shift(&p, tk->num, 0, tk->num == TK_newline, tk, states)) { + shift_since_err = 1; tk = NULL; parser_trace_action(trace, "Shift"); continue; @@ -2995,11 +3001,22 @@ checks if a given token is in any of these look-ahead sets. // no state accepted TK_error break; } + if (!shift_since_err) { + /* must discard at least one token to avoid + * infinite loop. + */ + if (tk->num == TK_eof) + break; + free(tk); + tk = tok_copy(token_next(tokens)); + } + shift_since_err = 0; tos = &p.stack[p.tos-1]; while (!in_lookahead(tk, states, tos->state) && tk->num != TK_eof) { free(tk); tk = tok_copy(token_next(tokens)); + shift_since_err = 1; if (tk->num == TK_in) indents += 1; if (tk->num == TK_out) {