review/summary.
We track indents and newlines. The goal is resolve ambiguities and detect errors.
Ambiguities are resolved by forcing a REDUCE in some circumstances when an OUT or NEWLINE is seen.
- Errors happen when an there are too many OUTs.
+ Errors happen when there are too many OUTs.
NEWLINEs are a normal part of a grammar, except that they get ignored sometimes when they are not relevant.
and are protected.
seen so far is empty. So much like my "non_empty"
OK - much easier to get it right once I've thought it through :-)
+
+13feb2021
+
+ This isn't quite working how I had hoped :-(
+ The "EOL SOL" pair, or more the "SOL else" pair suggests I need a look-ahead
+ for 2 to recognise if I have an IfSuffix or not.
+ But I know and LR(2) can be re-written as LR(1) (Did I learn that in uni?)
+ How can I do that?
+
+ Statementlist -> SOL SimpleStatements EOL Statementlist
+ | SOL Ifhead EOL Statementlist
+ | SOL Ifhead IfSuffix Statementlist
+ | SOL IfHead EOL SOL IfSuffix Statementlist
+ |
+ So if we see EOL SOL we can wait for else, which leads to IfSuffix, or
+ something else for StatementList.
+ But I don't want to allow StatementList to be empty. I can achieve this
+ but duplicating the above for a StatementList_nonempty. A bit ugly.
+
+ Also, this is right-recursive which uses a lot of stack.
+ I can compress it a bit. By making an IfStat include the following statement.
+ SL -> Stat | SL Stat
+
+ Stat -> SOL SimpList EOL
+ | IfX Stat
+ | IfX SOL IfSuffix
+ | SOL IfHead IfSuffix
+
+ IfX -> SOL IfHead EOL
+ IfHead -> if Expr Block
+ IfSuffix -> else Block
+ | else IfHead
+ | else IfHead IfSuffix
+ | else IfHead EOL SOL IfSuffix
+ | else IfHead EOL Stat
+
+
+ Getting there... (again).
+ Problem:
+ if cond1:
+ if cond2:
+ stat1
+ else:
+
+ The 'else' pairs with cond2.
+ There is an EOL after "if cond2: stat1" and then "SOL else"
+ which looks just the same as
+ if cond1:
+ if cond2:
+ stat1
+ else:
+
+ The only difference is an extra OUT IN which we currently ignore.
+
+ How can I use the OUT?
+ I have
+ SOL IFHead EOL .... OUT IN SOL
+ and I need the OUT to tell me to Reduce, or to block the Shift of SOL.
+ But if I simply block Shift when I have an OUT, the SOL IfHead EOL
+ becomes a Statement which is merged into the StatementList and then
+ the SOL is Shifted. I need to go all the way to make that Statementlist
+ a Block and IfHead.
+ If I hold out with the OUT longer until reduce_size!=1
+ I get further but
+ IfHead else IfHead .... EOL
+ cannot shift the EOL
+
+ Maybe I need to use min_prefix, but I really don't like that.
+ Need to think this through.
+
+ Well, I have it working.
+
+ If suppress shift if there are outs EXCEPT for TK_eol. Why?
+ Also I use the Bstatementlist indirection
+ and don't cancel the out if reduce_size==1
+
+ It's a bit clunky. Can I justify it?
+
+ I'd like the tokens to be different. With
+ if cond:
+ st
+ else:
+
+ The SOL before the else is ignored becuause we don't expect SOL there.
+ Trouble is in the problem case, SOL doesn't get ignored until later.
+
+ Can I *only* prevent a shift of SOL when it is unbalanced?
+
+ So: prevent shift of SOL if there is an uncancelled out, otherwise it will
+ be assumed to be at the wrong level.
+ Better, but not completely happy...
+
+14feb2021 valentines day
+
+ What if the rule for cancelling indents was that the cancel couldn't cross
+ a starts-line state. How would that work out?
+
+15feb2021
+ I didn't have time to pursue that, and now I'm a lot less convinced.
+
+ New idea: Allow IN and OUT in the grammar, and selectively ignore them
+ like we do with SOL EOL.
+ That was, OUT could force a reduce which could not them be extended, so that
+ whole issue of recursive productions becomes moot.
+
+ When are indents relevant? Maybe we have starts-block states which
+ expect IN, and with ignore IN if there is an indent since the last
+ starts-block state.
+ So
+ block -> : IN statementlist OUT
+ | : simplestatements
+ would ignore IN until we hit the :, then IN becomes relevant.
+ If we don't see and IN it must be simplestatements. Do we allow IN
+ there-in? Probably not. It would look confusing.
+ But if we get an IN, then we start ignoring INs again.
+
+ The OUT absolutely must balance the IN, so we ignore OUT whenever the matching
+ IN was ignored.
+
+ We still refuse to skip OUT if the matching IN is too far away. Must be in top
+ frame.
+
+ Clarify handling of OUT when the IN was ignored...
+ A linelike production that started before the IN must not reduce until
+ after the OUT???
+
+ Any production that started after the IN must reduce before the OUT.
+ We don't force it to reduce, we flag an error.
+ So if we reduce some symbols which contain more OUT than IN, that is
+ an error
+
+17feb2021
+ I need to track in/out carefully so they match properly and I ignore the right
+ OUTs.
+ IN is ignored whenever SOL/EOL would be. OUT is ignored precisely when the matching
+ IN was ignored.
+ I also want to track all ins and outs until they cancel in a reduction.
+ It is only at the reduction step that we can determine if an error occured.
+ An error is when a symbol contains nett negative indent.
+ So we can just count indents in each symbol.
+ Some in/out are within symbols, possibly IN and OUT. Others which are ignored
+ exist between symbols. A frame holds (symbol+internal indents),(state+pending indents).
+ To track which OUT to ignore we need a depth count and a bit-set.
+ If a bit is set, then the IN was ignored so the OUT must be too.
+ If clear, the IN was shifted, so the OUT must be too.
+
+ I need to get indents_on_line right.
+ Previously I tracked them before this frame. I don't know why...
+ I want 0 when starts_line
+
+19feb2021
+ OK, new approach is looking really good. Need to make sure it isn't too hard
+ to use.
+ Tricky area is multi-line statements that don't *have* to be multi-line.
+
+ We cannot reduce "SOL IfHead EOL" to a statement as we cannot tell if it
+ is complete until we shift the SOL and look for an "else".
+ One option is "statement -> SOL IfHead EOL statement | SOL IfHead EOL IfTail"
+ So "statement" is really a sublit of statements.
+ Easy in indent_test, what about in ocean?
+
+ There are lots of parts that can be on a line:
+ if, else, for, then, while, do, switch, case
+
+ if and while can be "expr block" or "block" and the thenpart/dopart
+ else can be "block" or "statement"
+ then is optional in for, request if some if
+
+ ifpart -> if expr block | if block then block | if block EOL SOL then block
+
+ OR??
+
+ ifpart -> if expr block EOL SOL | if block then block EOL SOL...
+
+ What if I support backtracking over terminals? So if I cannot shift
+ and cannot reduce, I back up until I can reduce, then do so?
+
+ Then I can shift the SOL and if there is an else, I'm good. If not I back up
+ and reduce the statement
+ So
+ statement -> SOL simple EOL
+ | SOL ifhead EOL
+ | SOL ifhead EOL SOL elsepart EOL
+ | SOL ifhead elsepart EOL
+ would work.
+ But do I need it?
+
+ statement -> simple EOL
+ | ifhead EOL
+ | ifhead EOL SOL statement
+ | ifhead EOL SOL iftail
+ | whilepart
+ | forhead whilepart
+ | switchead casepart
+
+
+ ifhead -> if block then block | if expr block | if block EOL SOL then block
+ iftail -> else block | else statement
+
+ whilehead -> while expr block | while block EOL SOL do block | while block do block
+ whilepart -> whilehead EOL
+ | whilehead EOL SOL statement
+ | whilehead casepart
+ | whilehead EOL SOL casepart
+
+ casepart -> casehead casepart
+ | casehead EOL SOL casepart
+ | casehead EOL SOL statement
+ | iftail
+ casehead -> case expr block
+
+22feb2021
+ I've had a new idea - let's drop SOL! Now that I have IN, it isn't really needed.
+ We can assume SOL follows EOL or IN .... maybe.
+ Problem is if we want to require IN/OUT around something that is not line-oriented.
+ Might that ever matter?
+ No, I don't think so.
+
+23feb2021
+ Maybe this make it really really easy.
+ We don't mark different sorts of states, and we only track which indents were
+ 'ignored'.
+
+ Then:
+ IN never causes a reduction, it is either shifted or ignored.
+ An EOL is ignored if the most recent IN was ignored, otherwise it is a normal
+ token.
+ An OUT is similarly ignored if the matching indent was ignored. It also
+ cancels that indent.
+
+ Is thats too easy?
+
+ .... no, it seems to work.
+
+ So: back to the ocean grammar
+
+ statement -> simple EOL
+ | ifhead EOL
+ | ifhead EOL iftail
+ | whilepart
+ | forhead whilepart
+ | switchead casepart
+
+
+ ifhead -> if block then block | if expr block | if block EOL then block
+ iftail -> else block EOL | else statement
+
+ whilehead -> while expr block | while block EOL do block | while block do block
+ whilepart -> whilehead EOL
+ | whilehead casepart
+ | whilehead EOL casepart
+
+ casepart -> casehead casepart
+ | casehead EOL casepart
+ | casehead EOL
+ | iftail
+ casehead -> case expr block
+
+
+24feb
+ Hmmm. awkwardness.
+ An ifpart can be "if expr then simple ;"... no it cannot...
+ But the problem was that some forms for a head with an optional tail
+ must end EOL, other forms need not.
+ But the whole must end EOL.
+
+ So: do we put EOL at end of 'statement' or end of IfSuffix
+
+ Let's try assuming it is at the end of 'statement'
+ So IfSuffix can assume an EOL follows
+ So CondStatement can too
+ So an ifhead either 'may' or 'must' be followed by an EOL.
+ If may, it is followed by IfSuffix which is empty, or starts OptEOL
+ If must, it is followed by empty or
+ No.. this isn't working for me.
+
+ Let's try assuming that a CondStatement ends with an EOL.
+ So an IfSuffix must too. and it cannot be just EOL
+ If an ifhead that must be followed by EOL, it is either EOL or EOL IfSuffix
+ If it may be, then EOL or IfSuffix
+
+
+ ForPart ThenPart SwitchPart are ALWAYS followed by something, so can end
+ EOL or not, as suits
+ WhilePart IfPart CasePart might be the last thing so each option must
+ end with a SuffixEOL which ends with EOL or SuffixOpt which might not
+
+ What do I want to do about
+ : SimpleStatements
+
+ It is useful for
+ case value : statement
+ and maybe even
+ if cond : statement
+ though for the latter I can and use 'then'.
+ For 'else' I don't need the ':', but it wouldn't hurt.
+
+ Problem is: do I insist on a trailing newline or ';'
+ If I don't then
+ case foo: bar case bar: baz
+ would be legal, but hard to read, as would
+ if cond : stat1 else stat2
+ which is probbly error prone.
+
+ But do I want
+ switch expr
+ case val1: st1
+ case val2: st2
+ else: st3
+
+ That looks like an indented block, but is really indented lines.
+ So it is probably a mistake.
+ So allow switch expr : or ';' at the end
+
+ Whatever happens after "switch expr" must work after "while expr block"
+
+ So....
+ If first case is not indented, none of them may be
+ If first is: it happens in an IN/OUT block, so again all the same
+
+ Can I implement that? Can I have IN after a non-terminal somehow?
+ When I see an IN, I could reduce as long as go_to_cnt == 0.
+ That might help after an OUT, but not after EXPR,,
+
+ Or: look at next symbol. If it can be shifted, we ignore the IN.
+ If not, we reduce and try to shift the IN again.
+
+ Also: need to mark IN as ignored when popped off during error recovery,
+ and maintain stack when discarding during error recovery
+
+26feb2021
+ Syntax for blocks?
+ { IN statements OUT }
+ { simplestatements }
+ : IN Statements OUT
+
+ but what about
+ : simplestatements NL .... or ';'
+
+ In other contexts I have
+ for simple; statements; then simple ; statements ; while expr:
+
+ I currently require a ';' or newline before "then" or "while"
+
+ Interesting other cases are:
+
+ case expr : simplestatements
+ while expr : simplestatements
+
+ For 'if' I currently have "if expr then simplestatements"
+
+ Because of 'for' and 'then' I don't want to require ':' before simplestatements.
+ I could have
+ while expr do simplestatements
+ But what do I do for 'case' ??? I really want the ':' there.
+ So I should use it for 'if' and 'while'
+ 'for' could be followed immediately by IN, as could then and even if/while
+ So the ':' comes after an expression.
+
+27feb2021
+ Problems with the idea of only using : to come after an expression.
+ 1/ "else" looks wrong compared to Python, but may I can get used to that
+ 2/ with "for" it would be simple statements, with "while" it would be expr
+ if there was no indent. Do I need different things to look different?
+ If statements always follow ':', the "for" and "then" always need a ':'
+ for: a=1; then: a = a+1; while a < 10:
+
+ In C there is no difference, but I want a difference..
+
+03mar2021
+ Arg... I'm not struggle with parsing concepts this time, I'm struggling with code.
+ I want to add an "EOL" symbol to the grammar as a special terminal.
+ It is like "NEWLINE", but handled a bit differently.
+
+ In parsergen it is just another terminal symbol, but it mustn't get added
+ to the "known" list. Currently all terminals from TK_reserved are added
+ to "known". Maybe if I give it a number that is after the virtual symbols