X-Git-Url: https://ocean-lang.org/code/?p=ocean-D;a=blobdiff_plain;f=twod;h=251f794b95d73de42239d48cfb34903d28e1efdc;hp=e6db5cd230a0f119ba4b9fdaac620acb901d6b5c;hb=509e0c8cd5c64e608032aa95a51dd5096185c338;hpb=18d02a36152f7ab7325e4b82a03d885a6a3bc56c diff --git a/twod b/twod index e6db5cd..251f794 100644 --- a/twod +++ b/twod @@ -2896,7 +2896,7 @@ ifstatement -> ifhead iftail review/summary. We track indents and newlines. The goal is resolve ambiguities and detect errors. Ambiguities are resolved by forcing a REDUCE in some circumstances when an OUT or NEWLINE is seen. - Errors happen when an there are too many OUTs. + Errors happen when there are too many OUTs. NEWLINEs are a normal part of a grammar, except that they get ignored sometimes when they are not relevant. and are protected. @@ -2991,3 +2991,708 @@ ifstatement -> ifhead iftail Maybe a state should only be startline if the core item has dot followed by a single symbol (which can derive a newline) ?? + +27jan2021 + I need a new idea concerning starts-line states. I need some refinement somehow + The state + block -> { statementlist . } + should ignore newlines - providing statementlist isn't recursive - but doesn't + because + block -> { . statementlist } + is further up the stack, and that is a startsline state + + Maybe the thing is that the latter is startsline only because of statementlist, and + now that statementlist is gone, the startsline-ness lapses. + + So in the former state, it is not startsline, and it is not terminal, so it + suppresses a startlines state 2 levels up. + + But does that help? We would suppress the startsline-ness but there are + no remaining indents to ignore the newline. + Why can I ignore a newline in "if cond { st }" but not in "a = ( x )" + ?? + Ahhh. This helps because the new top startsline would be at the start + of a line, so newlines can be shifted. The grammar can explicitly + allow a newline there... only then the state becomes a startsline + state?? or does it? But it is the top state, so it doesn't matter. + + Rule: a NEWLINE cannot be SHIFTed if the topmost active startlines state + is not at the start of a line non-indented. This is because newline + must be meant to end a line started earlier - where starts-line was at + the beginning of a line. + The stop state is never "active" as the line it would start hasn't + actually started. If the shifted newline reduces immediately, the + grammar is probably broken. + Also a state is inactive if a subsequent state declares it to be. This + happens when a state is non-terminal (not reducable), and is not startsline. + The smallest prefix length of all core items indicates how many + preceding states are deactivated. If min-prefix is N, then N-1 starts + are deactivated. + + + So what do I need to code: + - I need to record with each state how far back it suppresses + start-line states. + - enhance test for shifting newline + +30jan2021 + OK, the parsing code seems to do what I want, now I need to fix the grammar. + The context is structure statements which contain lines. e.g. + if cond: + statements + else: + statements + + The "if cond: statements" is a while line so it looks like a statement. + But then we see "else" which isn't the start of a statement. + I've considered two avenues. + 1/ decide that "else: statements" is a valid statement and generate errors + in the semantics analysis if the preceeding statement doesn't like the else. + 2/ enumerate all the possibilities to the grammar as 1 or more lines. + ifstatement -> ifline | ifheadline elseline ... + But that seems problematic with cascaded "else if" + + So let's try avenue 1. "else block" and "else ifstatement" are statements. + +03feb2021 + indent_test seems to work, now trying to convert ocean. + My plan is that the various parts of a condstatement can either be + all on one "line", or some of them on their own lines. + The parts are: + + for then while do case* else + switch case* else + if then else + + a for,while,switch,if,do can start a statement + and this determines what other parts are allowed. + So we need to allow continuations of + + after for + then? while case* else? + after while + do* case* else? + after switch + case* else? + after if + then? else? + after do + -nothing + + + But wait... what happens with "else"? + I want to allow "else" to be followed by a CondStatement so + if cond: + stuff + else if cond: + sufff + + works. I guess there is not much of an issue there the 'else' becomes an + option prefix to a condstatement + Callinfg var_block_close at the right time might be awkward as we don't + know when we are parsing the end of a CondStatement. + + Pause and reflect: what is the problem we are trying to solve, and does + it still apply? + + The problem is newlines. When we see one we don't know whether to + reduce to a Statement or just to an (e.g.) IfPart. + We would need to allow several Newlines while staying at IfPart. + Then if we see 'else' we shift that, otherwise reduce to Statement + + ifstatement -> ifhead elsepart + | ifheadnl elsepart + | ifheadnl + + + But wait... indent_test is broken!! + If I indent the 'else' one space, it looks like an ElseStatement after + the Statementlist that should be closed - but is recursive. + I can change it to a BStatementlist, but there is nothing to force that + to reduce. We prevent shifting until the outdent is cleared, but that + happens with the Statementlist. Maybe don't clear the outdent if the + top symbol state had a reduce-length of 1.?? + + OK.. that's fixed. Let's get back to the bigger problem. + + A statement can be: + -> + | simplestatements NEWLINEs + | IfHeadNL + | IfHead IfSuffixNL + | IfHeadNL IfSuffixBL + | SwitchPart CondSuffixNL + | SwitchPartNL CondSuffixNL + | WhilePart CondSuffixNL + | WhilePartNL CondSuffixNL + | ForPart WhilePart CondSuffixNL + | ForPart WhilePartNL CondSuffixNL + | ForPartNL WhilePart CondSuffixNL + | ForPartNL WhilePartNL CondSuffixNL + + ... and some for ThenPart and ThenPartNL + + ForPart -> for simplestatements + | for Block + ForPartNL -> ForPart NEWLINE + | ForPartNL NEWLINE + IfHeadNL -> IfHead NEWLINE + | IfHeadNL NEWLINE + IfSuffixNL -> IfSuffix NEWLINE + | else Block NEWLINE + | else statement + SwitchPart -> switch Expr + | switch Block + SwitchPartNL -> SwitchPart NEWLINE + | SwitchPartNL NEWLINE + CondSuffixNL -> IfSuffixNL + | CasePart CondSuffixNL + | CasePartNL CondSuffixNL + + CasePart -> case Expr Block + CasePartNL -> CasePart NEWLINE + | CarePartNL NEWLINE + +05feb2021 + + Above looks promising but doesn't quite work. + The "statement" after an "else" must be "statementNONL" because no + further newline is expected, but even then it isn't quite right + + if expr1: + stat1 + else if cond2: + stat2 + + scans as: if expr1 : IN stat1 NL OUT IN else if cond2 : IN stat2 NL OUT NL OUT NL + + whereas + if expr1 : + stat1 + else if cond2: stat2 + + scans as: if expr1 : IN stat1 NL OUT IN else if cond2 : stat2 NL OUT NL + + In both cases there are more NLs than things that need to be ended. + We always was a NL for the starting 'if', and in the first case we need a NL + for 'stat2'. I wonder what that means. + + Separately + + if cond block else block NL + + because the state before 'else' is startsline the NEWLINE cannot be shifted. + That seems to mean the NEWLINE must be in the production that starts the line, + so "CasePartNL" etc cannot be used..... + + Bingo(??) I change each statement type to be a FooNL, or list thereof, with + FooNL -> stuff and nonsense NEWLINE + | FooNL NEWLINE + + But what about that extra NL .... which now seems not to be a problem + + Ah-ha. The second (of 3) is ignored because it is indented. All good (for now). + +06feb2021 + The longest multi-line thing is + For Then While Do Case... Else + + Each can be on a new line, or on previous line. + How can Case be handled? I guess they all need to be the same. + + What about + if cond1: + stat1 + else if cond2: + stat2 + else if cond3.... + + ??? That looks awkward. + + Can I have + For -> ForPart + | For NEWLINE + ?? + I should test and see. ... I don't think so. At least not without more + smarts for newline handling. + + So back to + For Then While Do Case... Else NEWLINE + + Other forms are + + ForNL Then While Do Case... Else + ForNL ThenNL While Do Case... Else + ForNL ThenNL WhileNL Do Case... Else + For Then While Do Case... Else + For Then While Do Case... Else + For Then While Do Case... Else + For Then While Do Case... Else + + more than 64 combinations.... + + First line is one of: + + For + For Then + For Then While + For Then While Do + For Then While Do Case + For Then While Do Case Else + + Then + Then.. 5 options + then 3, 2, 1 + Maybe only 21 parts + + Cases should be easy. A list of caselines, each as list of case parts. + Followed by an elseline which has zero or more caseparts and an elsepart. + + I think I need to change how NEWLINE is handled, do minprefix differently. + It is used to ignore stuff when deciding which startsline starts can prevent a + newline from shifting. Review exactly what is wanted there. + + What exactly do I do with newlines? + - If a production contains a literal NEWLINE, the head is marked line-like + - forbid shifting NEWLINE when recent starts_line state is not at actual + start of line... but ignore intermediate states based on min_prefix + - record where lines actually start + - ignore if indent since starts-line state + and that is all. + + Note that any state where an item starts with a line-like symbol is a + starts-line state. + Any state that can reduce to a line-like symbol requires indents to be + balanced. + starts_line states only affect ignoring newlines and choosing when to + allow shift, as described above. + + Thoughts: + I could extend 'line-like' to any production containing a symbol that + starts with NEWLINE. The Newlines would work. + Rather than 'min_prefix' I could store "since-newline-or-start' so + that multiple newlines in a production would make sense, + +10feb2021 + New thoughts. I wonder if they will work. + + Change the scanner to produce paired SOL and EOL tokens, where EOL is + much link NEWLINE currently and is delayed by paired IN/OUT. + Also skip blank line, so only get a SOL if there is text on the line. + + Now a production needs to be explicit about being at the start of a + line. + Maybe we can even do + OptNL -> + | EOL SOL + + So: + statement -> SOL SimpleStatements EOL + | SOL CondStatement EOL + + If the grammar requires an EOL followed by an EOL, there must be an + implied OUT. + + in "if cond block else" + how do we know when the "block" is finished so that the "else" can be + shifted? + The expansion of 'block' will (possibly) end with a EOL. For "else" to + follow EOL without a SOL, there must be an OUT. + +12feb2021 + I need to clarify how the scanner must work for SOL/EOL so that I can + write code that works. + + SOL needs to be generated when we see a non-space character on a new line. + This is the same time that we need to possibly generate IN, which is in + check_indent. + So at start of line we scan for non-space, then unget and set check_indent. + In check_indent we assume start-of-line and generate SOL after any IN. + + EOL needs to be generated after we see a NEWLINE (or maybe EOF) on a + non-empty line. It may be delayed until after indents, so we need to store + it. We delay it until after multiple blank lines, so we always need to + store it. So ->indent_eol[->indent_level] is a delayed EOL, if ->num + is not TK_error. + + I think we need a flag for 'at start of line' which means the line + seen so far is empty. So much like my "non_empty" + + OK - much easier to get it right once I've thought it through :-) + +13feb2021 + + This isn't quite working how I had hoped :-( + The "EOL SOL" pair, or more the "SOL else" pair suggests I need a look-ahead + for 2 to recognise if I have an IfSuffix or not. + But I know and LR(2) can be re-written as LR(1) (Did I learn that in uni?) + How can I do that? + + Statementlist -> SOL SimpleStatements EOL Statementlist + | SOL Ifhead EOL Statementlist + | SOL Ifhead IfSuffix Statementlist + | SOL IfHead EOL SOL IfSuffix Statementlist + | + So if we see EOL SOL we can wait for else, which leads to IfSuffix, or + something else for StatementList. + But I don't want to allow StatementList to be empty. I can achieve this + but duplicating the above for a StatementList_nonempty. A bit ugly. + + Also, this is right-recursive which uses a lot of stack. + I can compress it a bit. By making an IfStat include the following statement. + SL -> Stat | SL Stat + + Stat -> SOL SimpList EOL + | IfX Stat + | IfX SOL IfSuffix + | SOL IfHead IfSuffix + + IfX -> SOL IfHead EOL + IfHead -> if Expr Block + IfSuffix -> else Block + | else IfHead + | else IfHead IfSuffix + | else IfHead EOL SOL IfSuffix + | else IfHead EOL Stat + + + Getting there... (again). + Problem: + if cond1: + if cond2: + stat1 + else: + + The 'else' pairs with cond2. + There is an EOL after "if cond2: stat1" and then "SOL else" + which looks just the same as + if cond1: + if cond2: + stat1 + else: + + The only difference is an extra OUT IN which we currently ignore. + + How can I use the OUT? + I have + SOL IFHead EOL .... OUT IN SOL + and I need the OUT to tell me to Reduce, or to block the Shift of SOL. + But if I simply block Shift when I have an OUT, the SOL IfHead EOL + becomes a Statement which is merged into the StatementList and then + the SOL is Shifted. I need to go all the way to make that Statementlist + a Block and IfHead. + If I hold out with the OUT longer until reduce_size!=1 + I get further but + IfHead else IfHead .... EOL + cannot shift the EOL + + Maybe I need to use min_prefix, but I really don't like that. + Need to think this through. + + Well, I have it working. + + If suppress shift if there are outs EXCEPT for TK_eol. Why? + Also I use the Bstatementlist indirection + and don't cancel the out if reduce_size==1 + + It's a bit clunky. Can I justify it? + + I'd like the tokens to be different. With + if cond: + st + else: + + The SOL before the else is ignored becuause we don't expect SOL there. + Trouble is in the problem case, SOL doesn't get ignored until later. + + Can I *only* prevent a shift of SOL when it is unbalanced? + + So: prevent shift of SOL if there is an uncancelled out, otherwise it will + be assumed to be at the wrong level. + Better, but not completely happy... + +14feb2021 valentines day + + What if the rule for cancelling indents was that the cancel couldn't cross + a starts-line state. How would that work out? + +15feb2021 + I didn't have time to pursue that, and now I'm a lot less convinced. + + New idea: Allow IN and OUT in the grammar, and selectively ignore them + like we do with SOL EOL. + That was, OUT could force a reduce which could not them be extended, so that + whole issue of recursive productions becomes moot. + + When are indents relevant? Maybe we have starts-block states which + expect IN, and with ignore IN if there is an indent since the last + starts-block state. + So + block -> : IN statementlist OUT + | : simplestatements + would ignore IN until we hit the :, then IN becomes relevant. + If we don't see and IN it must be simplestatements. Do we allow IN + there-in? Probably not. It would look confusing. + But if we get an IN, then we start ignoring INs again. + + The OUT absolutely must balance the IN, so we ignore OUT whenever the matching + IN was ignored. + + We still refuse to skip OUT if the matching IN is too far away. Must be in top + frame. + + Clarify handling of OUT when the IN was ignored... + A linelike production that started before the IN must not reduce until + after the OUT??? + + Any production that started after the IN must reduce before the OUT. + We don't force it to reduce, we flag an error. + So if we reduce some symbols which contain more OUT than IN, that is + an error + +17feb2021 + I need to track in/out carefully so they match properly and I ignore the right + OUTs. + IN is ignored whenever SOL/EOL would be. OUT is ignored precisely when the matching + IN was ignored. + I also want to track all ins and outs until they cancel in a reduction. + It is only at the reduction step that we can determine if an error occured. + An error is when a symbol contains nett negative indent. + So we can just count indents in each symbol. + Some in/out are within symbols, possibly IN and OUT. Others which are ignored + exist between symbols. A frame holds (symbol+internal indents),(state+pending indents). + To track which OUT to ignore we need a depth count and a bit-set. + If a bit is set, then the IN was ignored so the OUT must be too. + If clear, the IN was shifted, so the OUT must be too. + + I need to get indents_on_line right. + Previously I tracked them before this frame. I don't know why... + I want 0 when starts_line + +19feb2021 + OK, new approach is looking really good. Need to make sure it isn't too hard + to use. + Tricky area is multi-line statements that don't *have* to be multi-line. + + We cannot reduce "SOL IfHead EOL" to a statement as we cannot tell if it + is complete until we shift the SOL and look for an "else". + One option is "statement -> SOL IfHead EOL statement | SOL IfHead EOL IfTail" + So "statement" is really a sublit of statements. + Easy in indent_test, what about in ocean? + + There are lots of parts that can be on a line: + if, else, for, then, while, do, switch, case + + if and while can be "expr block" or "block" and the thenpart/dopart + else can be "block" or "statement" + then is optional in for, request if some if + + ifpart -> if expr block | if block then block | if block EOL SOL then block + + OR?? + + ifpart -> if expr block EOL SOL | if block then block EOL SOL... + + What if I support backtracking over terminals? So if I cannot shift + and cannot reduce, I back up until I can reduce, then do so? + + Then I can shift the SOL and if there is an else, I'm good. If not I back up + and reduce the statement + So + statement -> SOL simple EOL + | SOL ifhead EOL + | SOL ifhead EOL SOL elsepart EOL + | SOL ifhead elsepart EOL + would work. + But do I need it? + + statement -> simple EOL + | ifhead EOL + | ifhead EOL SOL statement + | ifhead EOL SOL iftail + | whilepart + | forhead whilepart + | switchead casepart + + + ifhead -> if block then block | if expr block | if block EOL SOL then block + iftail -> else block | else statement + + whilehead -> while expr block | while block EOL SOL do block | while block do block + whilepart -> whilehead EOL + | whilehead EOL SOL statement + | whilehead casepart + | whilehead EOL SOL casepart + + casepart -> casehead casepart + | casehead EOL SOL casepart + | casehead EOL SOL statement + | iftail + casehead -> case expr block + +22feb2021 + I've had a new idea - let's drop SOL! Now that I have IN, it isn't really needed. + We can assume SOL follows EOL or IN .... maybe. + Problem is if we want to require IN/OUT around something that is not line-oriented. + Might that ever matter? + No, I don't think so. + +23feb2021 + Maybe this make it really really easy. + We don't mark different sorts of states, and we only track which indents were + 'ignored'. + + Then: + IN never causes a reduction, it is either shifted or ignored. + An EOL is ignored if the most recent IN was ignored, otherwise it is a normal + token. + An OUT is similarly ignored if the matching indent was ignored. It also + cancels that indent. + + Is thats too easy? + + .... no, it seems to work. + + So: back to the ocean grammar + + statement -> simple EOL + | ifhead EOL + | ifhead EOL iftail + | whilepart + | forhead whilepart + | switchead casepart + + + ifhead -> if block then block | if expr block | if block EOL then block + iftail -> else block EOL | else statement + + whilehead -> while expr block | while block EOL do block | while block do block + whilepart -> whilehead EOL + | whilehead casepart + | whilehead EOL casepart + + casepart -> casehead casepart + | casehead EOL casepart + | casehead EOL + | iftail + casehead -> case expr block + + +24feb + Hmmm. awkwardness. + An ifpart can be "if expr then simple ;"... no it cannot... + But the problem was that some forms for a head with an optional tail + must end EOL, other forms need not. + But the whole must end EOL. + + So: do we put EOL at end of 'statement' or end of IfSuffix + + Let's try assuming it is at the end of 'statement' + So IfSuffix can assume an EOL follows + So CondStatement can too + So an ifhead either 'may' or 'must' be followed by an EOL. + If may, it is followed by IfSuffix which is empty, or starts OptEOL + If must, it is followed by empty or + No.. this isn't working for me. + + Let's try assuming that a CondStatement ends with an EOL. + So an IfSuffix must too. and it cannot be just EOL + If an ifhead that must be followed by EOL, it is either EOL or EOL IfSuffix + If it may be, then EOL or IfSuffix + + + ForPart ThenPart SwitchPart are ALWAYS followed by something, so can end + EOL or not, as suits + WhilePart IfPart CasePart might be the last thing so each option must + end with a SuffixEOL which ends with EOL or SuffixOpt which might not + + What do I want to do about + : SimpleStatements + + It is useful for + case value : statement + and maybe even + if cond : statement + though for the latter I can and use 'then'. + For 'else' I don't need the ':', but it wouldn't hurt. + + Problem is: do I insist on a trailing newline or ';' + If I don't then + case foo: bar case bar: baz + would be legal, but hard to read, as would + if cond : stat1 else stat2 + which is probbly error prone. + + But do I want + switch expr + case val1: st1 + case val2: st2 + else: st3 + + That looks like an indented block, but is really indented lines. + So it is probably a mistake. + So allow switch expr : or ';' at the end + + Whatever happens after "switch expr" must work after "while expr block" + + So.... + If first case is not indented, none of them may be + If first is: it happens in an IN/OUT block, so again all the same + + Can I implement that? Can I have IN after a non-terminal somehow? + When I see an IN, I could reduce as long as go_to_cnt == 0. + That might help after an OUT, but not after EXPR,, + + Or: look at next symbol. If it can be shifted, we ignore the IN. + If not, we reduce and try to shift the IN again. + + Also: need to mark IN as ignored when popped off during error recovery, + and maintain stack when discarding during error recovery + +26feb2021 + Syntax for blocks? + { IN statements OUT } + { simplestatements } + : IN Statements OUT + + but what about + : simplestatements NL .... or ';' + + In other contexts I have + for simple; statements; then simple ; statements ; while expr: + + I currently require a ';' or newline before "then" or "while" + + Interesting other cases are: + + case expr : simplestatements + while expr : simplestatements + + For 'if' I currently have "if expr then simplestatements" + + Because of 'for' and 'then' I don't want to require ':' before simplestatements. + I could have + while expr do simplestatements + But what do I do for 'case' ??? I really want the ':' there. + So I should use it for 'if' and 'while' + 'for' could be followed immediately by IN, as could then and even if/while + So the ':' comes after an expression. + +27feb2021 + Problems with the idea of only using : to come after an expression. + 1/ "else" looks wrong compared to Python, but may I can get used to that + 2/ with "for" it would be simple statements, with "while" it would be expr + if there was no indent. Do I need different things to look different? + If statements always follow ':', the "for" and "then" always need a ':' + for: a=1; then: a = a+1; while a < 10: + + In C there is no difference, but I want a difference.. + +03mar2021 + Arg... I'm not struggle with parsing concepts this time, I'm struggling with code. + I want to add an "EOL" symbol to the grammar as a special terminal. + It is like "NEWLINE", but handled a bit differently. + + In parsergen it is just another terminal symbol, but it mustn't get added + to the "known" list. Currently all terminals from TK_reserved are added + to "known". Maybe if I give it a number that is after the virtual symbols