From: NeilBrown Date: Sat, 19 Sep 2020 05:59:02 +0000 (+1000) Subject: updates X-Git-Url: https://ocean-lang.org/code/?p=ocean-D;a=commitdiff_plain;h=6830d7faa83def1431a9f5a981c008eeb272a5f6 updates --- diff --git a/00-TODO b/00-TODO index 68717a4..8c0efc7 100644 --- a/00-TODO +++ b/00-TODO @@ -1,7 +1,13 @@ This is a living document - delete things when done. Avoid discussion. Current version (Cataract Creek) -- use precedence levels for expressions +- Warn when left-recursive symbols appear elsewhere, other than at the end + of a production. Might have to special-case Newlines. +- parser not to get into ERROR infinite loop +- sort 'virtual' symbols to end +- allow $xy instead of $3. Chooses shortest bodysym with xy in that order + $xy_2 gives the second one +- allow $TERM terminals to be listed. If so, extras are errors - structs - const fields - anonymous field - array or struct (or pointer to these) @@ -10,21 +16,27 @@ Current version (Cataract Creek) - anon struct field gets fields interpolated - manifest values for arrays and structs [a,b,c] + or [.foo=a, .bar=b] or [ [1]=a, [2]=b] + That last doesn't parse easily, unless we require tags... not a good idea. + [ .[1] = a, .[2] = b ] ?? Maybe. - yet more operators << >> # bit-ops & | ~ &~ op= - split values so I can have an array of just the value (1 byte for u8) -- integers, unsigned, bitfield, float +- integers, unsigned, bitfield, float, double? - pointers - owned or borrowed - pure, loaded, overloaded, augmented - owned: once, counted, collected + - shared or thread-local - array slice - array buffer - can be added to and grows. - char, string search, regexp search - allow "do stuff" as a stand-alone statement (scope) +- 'use' labels *must* appear in case statements. +- re-read parsergen lit-doc and make sure it is still coherent. Next version (Govetts Creek): - functions and procedures diff --git a/Ocean-functions b/Ocean-functions new file mode 100644 index 0000000..9014d1a --- /dev/null +++ b/Ocean-functions @@ -0,0 +1,37 @@ +I want to add functions and procedures soon. I should decide on syntax at least. + +The args to a function are effective a struct, so I want it to look the same. +C doesn't allow "int a, b, c" in the parameters, which I think is clumsy. +struct can be + + struct name: + a,b,c:number + d:string + +So function might be + + func name: + arg1, arg2: type + arg3: type2 + returns type + do: + stuff + +A procedure is different as it doesn't have just a return type, +it has a return structure. So many C functions have 'ret' or 'result' +variable that it might be nice to follow the Pascal approach of +assigning to the function name?? or having + + func name: + args:types + returns: + results:types + do: + statements + +A shorter version would be + + func name(args:types;args:types):type { } +or + proc name(args:types;args:types):(result:type;...) {} + diff --git a/Ocean-types b/Ocean-types index 1671480..ed57f26 100644 --- a/Ocean-types +++ b/Ocean-types @@ -304,3 +304,96 @@ I'm in the middle of stage-1 on structures. I need a type to parse the declaration into. It needs to be a linked list of fields, each of which is a type, a name, and an initial value. i.e. a 'struct field'. + +----------------- + +Numbers... +I want signed/unsigned/bitset integers (and probably floats). +These are different sizes, and I want to move 'type' out of 'value' +so I can have arrays of numbers that are *just* the densely packets numbers. + +So there are two questions here: how will I handle values in oceani, and +what are the semantics of numbers in ocean. + +I think I want bitops to requires bitsets and arith ops to require signed/unsigned. +But there is some overlap. +e.g. we use bitops to test if a number is a power of two +We sometimes use bitops to multiply, but that is probably best avoided. +use * to multiply. + +Converting between the two can be done with simple assignment. + +So + - * / % require/assume signed or unsigned + | & ~ << >> require/assume bitset + + # accepts either and produces a bitset + +Other issue is overflow/underflow checking. +Do we need another unsigned type - cyclic + + i32 - signed integer in 32 bits + u32 - unsigned integer + c32 - unsigned with overflow permitted and ignored + b32 - bitset + + int uint cint bset - whatever size. + +i32 and u32 detect overflow/underflow and set to NaN - all 1's +If I want to allow overloading (such a NaN), I need a type that +declare no overloading. s32 and c32? Or annotations. !s32 !u32 + +So what about values in oceani? I want to separate out the type and not +use a union. +Where are they used? + - return of init, prepare, parse, dup + - passed to print, cmp, dup, free, to_int, to_float, to_mpq + - field in 'struct variable' + - field in 'struct lrval' + - result of 'interp' + - intermediate left/right in interp + - field in array and struct field + - field in 'struct val' for manifest constants + +So: + variable gets a 'type' pointer and a union which can be a pointer + to the value, or the value itself (depending on size) + lrval get a type pointer as well, plus the union + interp returns ... + + +----------------- +Struct/array initialisers. +I like [a,b,c] rather than {a,b,c} because the latter can look like code. +But [] is also array indexing. +So an array initializer could look: + [ [1] = "hello", [5] = "there" ] +and that is confusingly similar to nested initialization + [ [1,2] , [3,4] ] +Options: + 1/ use different outer. {} () <> << >> + < is possibly as it is not a prefix operator. + But nesting results in <<1,2>,<3,4>> which looks like << instead of < < + {} I already don't like + () is bad enough with function calls - it is best if it is grouping only. + though with function calls it is a list ... + << [1]="hello", [2]="there" >>... I don't really like that + + array[ ] + struct[ ] + No, too noisy. + + 2/ use different inner syntax. + [ .[1] = "hello", .[5] = "hello" ] + + What about a newline-based syntax: + a: [4]int : + [0] = 2 + [1] = 3 + [3] = 1 + + Nice, but doesn't actually help. Still need .[] because I want to allow + a one-line syntax too. + Maybe I just use {} after all. + + a:[4]int = { [0]=2, [1]=3, [3]=1 } + Yes, I guess that is best. diff --git a/twod b/twod index de0e302..bde8aae 100644 --- a/twod +++ b/twod @@ -1448,3 +1448,899 @@ The NEWLINE after b is not Ignored in the expression, Maybe we want: Open -> { | NEWLINE Newlines { + + + I have a problem. + I want + + else: a := b + + to parse the same as + + else: + a := b + + and for the last newline to close the elsepart. + But the latter has 2 newlines while the former only has one + and I don't have any obvious justification for ignoring either. + I think it is in the Newline before the OUT that is extra. + + I could drop the newline before the OUT, assuming the newline + separate things, and the OUT will force any reductions needed. + But then we have fewer newlines reported than actual. + (Same imbalance happens with multiline comments and strings, so maybe + that is OK). + Another way to look at it is that the newline following an IN is discarded + (or always ignored) and not moved to after the OUT. + So (maybe) the newline at an IN or OUT is reported *after* the IN or OUT. + so + A + B + C + D + + Would be A IN NL B NL C OUT IN NL D OUT NL + + The parser always ignores the NL after an IN but uses other + NL to reduce to a single symbol (if possible) + OR maybe it doesn't ignore (unless not line-like context) and + lines are preceeded by NL, not followed by them... + No, followed is usually good.. though separated is better... so preceed!! + + Ok, this isn't working. + A construct + if cond: + pass + + cannot be reduced to a Statement until we know what comes next, and it + might be separated by several newlines. + So the newlines need to be part of the Statement. + But that means we cannot have newlines at the front of a statement. + But that was the point... + + Maybe a Statementlist is a series of StatementNL followed by a Statement + + We allow + StatementNL -> Statement Newlines + as a general catch-all, but when we have something like if, or anything + with an optional tail "else:" or "case:" + We say: + StatementNL -> if Expression Block Newlines + But that would produce a conflict with + Statement -> if Expression Block + As a newline could either trigger a reduce to Statement, or a shift. + Obviously we shift, but maybe we use precedence to force the point. + + Can we handle 'else if' ... + IfStatementNL -> if Expression Block Newlines + | if Expression Block else IfStatementNL + + ... I'm contemplating having the parser duplicate NL as necessary, so + that + if test: action + can appear to be followed by 2 NL, one to terminate the 'action' statement + and one to terminate the whole 'if'. + This might mean I need to extend when NL are discarded - to ensure they + don't get duplicated too much. + 1/ if state does not permit newlines, discard + 2/ else if I can reduce symbols all since start of line do that. + 3/ else if can shift, do that + 4/ else if only one symbol since newline, discard. + 5/ else ERROR + + This means that we cannot recognise multiple newlines + or does it. + If we shift a Newline, that is since_newline=0; + If we reduce that to Newlines, that is still since_newline=0 + If 4/discard only applies when since_newline==1 -- we win. + + Currently since_newline essentially means the symbol contains a newline. + So 'statements' usually does, but 'statement' doesn't. + When we shift the newline and reduce, it all becomes since_newline=0. + That is when we want to ignore newlines. + + 15jun2019 - still working this through.. + + Normally the parser does + shift or else reduce or else error + exceptions are TK_in which is simply recorded and + TK_out: reduce until there is a TK_in in scope, then cancel, else error + TK_newline: + if not newline_permitted (indent since last starts_line state) + Discard + if can Reduce to at most start-of-line, reduce + if can Shift, duplicate and Shift + if can Reduce, do so + if 0 since newline, Discard + + since_newline needs to be changed a bit. + A TK_newline token *isn't* zero, it is N+1. The token *after* + the NEWLINE is zero - so that + + Arg. I'm struggle with that fact that having shifted a newline, + we are both at the end of a line, and at the start of the next. + When I see a newline, I want to reduce until the end of line + is in the same state as the start of that line. + + Maybe I do want newline to be a separator. + What if I don't actually include the newline in the grammar, just like in/out. + Instead we mark select productions as lines. This is like marking + for precedence. + A marked production is reduced when a newline is seen providing it won't + contain any indents. + So: if the reducable item in a state is marked, the start gets marked. + When we see a newline, if the state is marked and the reduce size does not + exceed since_indent, we reduce. Otherwise we discard. + No... I need an error condition too. + So I need the state to have a starts_line marking, when a new item is marked. + + So: + productions can be marked $$NEWlINE which flags the production as line-like + a state with an item with DOT at start of a line-like production is starts_line + a state with an item with DOT at the end of a line-line product is ends_line + We track indents as before. + When we process an indent or newline, we set since_newline to 0 + When we see a newline we do one of: + if not newline_permitted, we discard + if top state starts line, we discard + else reduce or else error + +No..... + A production -> { statements } + needs to ignore newlines either side of statements. + It is a multi-line production - newlines don't matter. + Maybe there are several sorts of symbols: + - in-line: must be broken across lines unless indented + - line-like: is terminated (reduced) by a newline + - multi-line: newlines are ignored + + We tag symbols which are line-like. + Any symbol which can derive a line-like symbol is multi-line + Any other symbol is in-line. + + So SimpleStatements, ifhead, elsepart, casepart etc are linelike + +$line SimpleStatements IfHead .... + + A state that is at the start of a linelike symbol starts_line + Any state in a multi-line production starts_line + + if tos starts_line, newlines are ignored. + else if there is an indent since the starts_line, newlines are ignored. + But if there are symbols since starts_line, we have to reduce until are + are in a starts_line start (or can see an indent). + +No.... + Block -> : Statementlist +is none of thse. It must be reduced by a newline, but isn't entirely line-like. + Block -> { Statementlist } +is multi-line + +but maybe Block here is neither. It only becomes linelike when it +terminates a Statement, which is linelike. Or terminates an ifHead + +Should this be legal? + a:=b;pass if something +probably not. I want to require at least ';' or NEWLINE. +That means I need to include NEWLINE in the grammar. + Statements -> SimpleStatements ; Statement + | Statements Statement + | Statement + + if cond: if cond: a:=b +NEWLINE reduces this down to IfHead +IfHead -> IfHead NEWLINE +Statement -> IfHead + | IfHead ElsePart + +ElsePart -> else BLOCK + | else IfHead + | else IfHead ElsePart + | ElsePart NEWLINE + + +Statements -> Statements Statement + | Statement + + +if cond { statments } else { statements} +but not +if cond: statements else: statements + +so :statements must expect a NEWLINE but then + if cond: if cond: statement +expects 2 NEWLINEs. +Maybe + Block -> : IN statments NEWLINE +if there is no indent, we synth one which triggers an OUT NEWLINE pair. +This could be automatic. + If a linelike is followed by a newline, we synthesis an IN before it. + +That requires a hack to the scanner: Synth Indent + +What if + + Block -> { Statements } + | : Statements NEWLINE + | : SimpleStatements + +Then + if Expr Block + +might not end in a NEWLINE so else could come immediately. Is that OK? + if expr: a:=0 if expr +must be forbidden. That requires a newline. + + Block -> : IN statements OUT NEWLINE + +In marks state as 'needs_indent' If an indent arrives, fine. +If something else, we record that next newline (After balanced in/out) +must synth extra out/newline. + + Block -> { Statements } + | : statementblock + + statementblock -> Statements $$line + +$$line means it must be reduced by a newline. If something else tries, +it is an error and we skip to newline. +It also strips everything but NEWLINE from the (effective) lookahead +to avoid reporting conficts, as those things will never be shifted. + + IfHead -> if Expr Block + | IFHead NEWLINE + Block -> { statements } + | : statements $$line + + IfStatement -> IfHead + | IfHead IfTail + | IfHead else IfStatement + IfTail -> else Block + | IfTail NEWLINE + + +------ +20jun2019 Happy 87th birthday Dad. + +I'm not convinced about $$NEWLINE + + else: simplestatementlist + +should be able to parse simplestatementlist without a newline, and +use the newline to close the if/else. +where as + else: + statementlist + +has a newline to close the statementlist and another to close the if/else. +But can the LR parser tell the difference? +It only sees that newlines don't forcibly reduce the else: +So when it sees the newline at the end of simplestatementlist, +it cannto shift because there is a sub-line thing that can be reduced. +So this becomes elsepart before the newline is absorbed. +Whereas in statementlist, the newline can be shifted creating simplestatementline. + +What about + if cond: if cond: statement + +Again, the newline cannot be shifted while we can reduce + +But.... how does conflict analysis know that an 'if', for example, is not +permitted after simplestatementlist? + +Ahh.. This is exactly what $$NEWLINE is for. Maybe it should be $$OUT. +Either way, the grammar is ambiguous and relies on newlines or indentation to +close the production, and this fact needs to be explicit. +Requiring OUT is probably best as it means + if cond: + statements + else: + statements + +works even though there is no newline after the first statements. +Here I want the 'statements' to be closed by OUT, but the whole to +be closed by NEWLINE. +So maybe I need both $$NEWLINE and $$OUT ?? + + +$$OUT makes lots of sense. It is exactly how we expect :statements to +be closed - where we allow NEWLINE to have the same effect. + +$$NEWLINE is good for closing an complex if or for etc. It means that +nothing else can be on the same line - allowing for indents. +How do we implement that? +Any production in the grammar that represents a full line but doesn't +end with a newline should be marked $$NEWLINE +This head of that production should recursively absorb NEWLINEs. + +I'm not yet clear on exactly the difference bwetween $$OUT and $$NEWLINE. +I would put $$OUT after + block -> : Statements $$OUT +and $$NEWLINE at the end of a statement that must end a line + condstatement -> Ifpart IfSuffix $$NEWLINE + | constatement NEWLINE + +Maybe I need a worked example. + while conda: + if condb: if condc: action + pass + +So after action there is NL OUT NL pass +The NL sees that it can reduce, and the if allows the NL to reduce it so + while COND : IN if COND : statement [ NL OUT NL ] +again the NL can reduce. Note that we *don'* absorb the NL in the statement + while COND : in statement [ NL OUT NL ] +Now we can shift the NL + while COND : IN statment [ OUT NL ] +Now the OUT forces a reduction + while COND BLOCK(in) [ OUT NL ] +Now the out is cancelled + while COND BLOCK [NL] +and the while is reduced. + +So the $$NEWLINE must always see a newline (or $$EOF) +An $$OUT must see an OUT or a NEWLINE (if there was no IN) + +$$OUT causes the LA set for items with the production to be empty. +It is never credible that anything will be shifted so any apparent LA +contents can be ignored. +The state when a $$OUT is reducible has a recedence higher than any terminal, so +nothing can be shifted and no completion should be possible. +The state when a $$NEWLINE is reducible is much the same. + +Maybe I don't want NEWLINE in the grammar, only $$NEWLINE?? +How would we recognize a blank line? + command -> $$NEWLINE +?? +We would need a new rule for discarding newlines. +e.g. when the top-but-one state is start-of-line we discard and mark the top +state s-o-l. That stops us discarding a newline until it reduces something that is at the start of a line.... + +1/ if there is an indent since the last start-of-line state, discard NEWLINEs +2/ if .... + +Q: When is a NEWLINE an error? +A: when it isn't ignored and we cannot reduce and + top or top-but-one state isn't starts_line.?? + +So we need extra state info and extra frame info. + +State has: + - starts-line - is at start of a $$NEWLINE production + - ends-line - is at end of unreduced $NEWLINE production + - ends-indent - is at the end of a $$OUT production + - min-prefix - how far back a 'in' can be and still cancel + +Frame has: + - indents - count in or after that sym + - line_start - is the was a line start (IN or NEWLINE) immediately after + the symbol + - newline_permitted: no indent since start-line + - since_indent: number of frames where indents==0 + - since_newline: number of frames where line_start==0 + +If we see a NEWLINE then: + if ! newline_permitted, discard + elseif can reduce and reduce_count <= since_newline - reduce + elseif since_newline <= 1, and state.starts-line, discard and record line_start + else error + +If we see an IN + increment indents, set line_start + +If we see an OUT + if reduce_size <= since_indent, reduce + if min_prefix >= since_indent, cancel + else error + +How does error handling work? +Normally we pop states until we can shift ERROR +Then we discard tokens until we can shift one. + +However we need to do something different for IN OUT NEWLINE. +For IN, we simply increment a counter +For OUT we decrement if it is positive. + If it is zero and the state ends-indent, then we are synced. + If it doesn't, we need to pop more states until we have an indent to cancel. +For NEWLINE if the state ends-line or ends-indent and ...something... we are synced. + else we skip it?? + +... no, that doesn't work because I cannot see a way to describe an optional newline. + +Let's try with just $$OUT which requires OUT or NEWLINE... +We put $$OUT on productions that must be closed in a 2-d obvious way. +So they can be at the end of a line or at the end of an indente block. +So + : statements $OUT +means the next line after the : cannot be indented. +However + Block -> : statementline | : statementblock | { statements } + statementblock -> statements $NEWLINE +means I can have else indented, or on same line as single statement + + if cond: a = b; else: b = a + if cond: + a = b + else: + b = a + +The whole 'if' needs a $NEWLINE marking to ensure a following statement isn't +indented. +So implementation is almost exactly what I have: + - if anything else is lookahead when reducing that production, it is an error. + - remove non-newlines from lookahead in items + +But I don't think $$OUT is quite what I want to call it. +That doesn't quote cover end-of-line possibilities. +Maybe allow $$NEWLINE or $$OUT but with same behaviour. + +.... still not there. +Another way to satisfy a $$OUT reduction is for it to already look right. +So: No indents and at start-of-line + +But that upsets the modification to look-ahead as we can no longer assume +the next token. +I think this might be more like a precedence thing?? +Without look-ahead modification, the first token of a statement can be shifted + before a newline forces a reduction.... + +Maybe I do need two sorts productions. + $$OUT requires an out/newline to reduce it. + $$NEWLINE follows either $$OUT or NEWLINE and requires start-of-line and no indents. + or $$OUT or $$NEWLINE + +Or does it matter. Over-modification of the look-ahead suppresses warnings, but +doesn't affect the parse. +Will we get warnings anyway? + + +-------- +Are left-recursive symbols in a non-final position always bad? + +Left-recursive symbols cannot be closed by forcing a reduction. +So if one starts in an indented region (in which newlines are ignored) +it could continue afterwards - unless we make that an explicit error somehow. +If they appear at the end of some other production, that one will (maybe) +be reduced as well so (maybe) no problem... + +if cond: + a() + b() + c() + +is weird and I want to forbid it. Al that is between b() and c() is +NL OUT IN. NL closed b(), so it is just OUT IN +So I do want the statementlist to close. + + a := + 1 + 2 + * 3 + 4 + +is very wrong. How much can I help? +The OUT will reduce "1 + 2" which will then become + ((1 + 2) * 3) + 4 +which would be highly confusing. +So something about this must be disallowed. +Maybe when newlines are ignored, OUT doesn't force a reduce?? +I can make it an error by having Expression reduce to something else. + +Do I want an error even for + 1 + 2 + * 3 + 4 +?? + +I could achieve that by adding extra checks when we SHIFT at +start of line. +If we could reduce tokens since previous SOL, then we have 2D ambiguity. + + 1 + 2 * + 3 + 4 + +That is just as ambiguous, but we cannot reduce anything. +When we see the second '+', the reduction crosses a line-start but doesn't +result in a line-start. + +So: a reduce that doesn't contain an indent, but does contain a start-of-line +must reduce to that start of a line. + +This means we need to keep the start-of-line when we "IGNORE" a newline. + +Can I use this sort of logic to avoid the need for the extra reduction, +or for the $$OUT markings?? + +1/ The point of extra reduction is to avoid consuming more after an OUT or + ignored NL. + if cond: + a() + b() + c() + + must be an error. The OUT reduced a()b(). The stack is then + if cond : statements(n1) . IN Ident + The first indent is gone. . There is no error until we see all of c() so + if cond : statements(n1) simplestatement(i) . NL OUT NL + + Is it problen that the simplestatement is indented? + if cond : statements(n1) statement(i) . OUT NL + + Q: is it an error to reduce a sequence containing an (uncancelled) indent? + +2/ The $$OUT markings guard against exactly a reduction containing an uncancelled IN. + +So maybe I have two new rules. + + 1/ a reduction must not include any uncancelled indent. pop() must return 0. + 2/ a reduction the contains an unindented start-of-line must begin with start-of-line. + So when we cancel an indent, we also cancel line starts since there. + +One other value of $$OUT is that is avoided conflicts - most symbols could not +be shifted. That should have only applied to $$NEWLINE(!) and doesn't apply +at all if I drop the marking and use internal rules instead. +So how do I avoid reporting conflicts? + +Really, there shouldn't be any conflict as NEWLINE should be expected. +Let's go back to that idea. + +1/ A linelike thing MAY start with Newlines and MUST end with a NEWLINE +2/ A SimpleStatement is not linelike and doesn't include and Newlines +3/ if condition : SimpleStatement + is a SimpleStatement. + +4/ When we see the NEWLINE after "if condition : SimpleStatement" we have a shift/reduce + conflict as we could SHIFT to make a complex statement, or reduce the whole thing + to a SimpleStatement. + Default action is SHIFT but in this case we want REDUCE - due to precedence? + + However when we see the NEWLINE after "if condition :IN Simplestatement" + we cannot REDUCE as there is an cancelled indent, so we have to shift. + + But when we reduce, we only want to Reduce to IfHead so that an 'else' can appear + on the next line. + + If we see IN .. just continue. + +What do I need to do: + + 1/ Change grammer to expect blank-lines before and to have a NEWLINE at the end + of any line-like thing. + This requires IfHeadNL and IfHead. ditto for switch, while, then ... + This get complex with + for a:=0; then a += 1; while a < 10: + which could have several newlines + for a:=0 + then a+=1 ; while a < 10: + + ForPart -> for simplestatements ; + | for simplestatements NEWLINE + | for Block + | Newlines ForPart + + ThenPart -> then SimpleStatements ; + | then SimpleStatements NEWLINE + | then BLOCK + | Newlines ThenPart + + 2/ disallow Reduce when embedded indents - report ERROR + 3/ disallow Reduce when embedded start-of-line. + 4/ TK_newline uses these rules to decide when to force a reduce. + +A/ A parser symbol that starts after an IN must end before the OUT +B/ A parser symbol that starts before an IN must end at-or-after the OUT + only if if the symbol is not line-like ??? + +C/ A parser symbol that starts after a line-start and before an indent must end + by the end of line +D/ A parser symbol that starts at a line-start must end before the end-of-line, + or at a subsequent end-of-line. + +A is satisfied by forcing a reduce on OUT and reporting error if IN cannot be cancelled +B is satisfied if we report an error if we try to reduce an uncancelled IN +C is satisfied by forcing a reduce *after* shifting NL and reporting ERROR if + min_prefix exceeds the line +D is satisfied if we report an error when reducing at eol crosses a NL and doesn't start + at start-of-line. + +C is interesting - do we reduce *after* shifting NL?? I think we do, yes. + +So: when can I suppress conflicts, and how do I handle reduce/reduce conflicts? + +I need to be sure that a line-like ends with an unindented newline. +I can trigger an error when that doesn't happen, but I want more. +I want to encourage it to happen. So if the grammar allows a NEWLINE it +will be shifted in, but if we have already seen an OUT, we ignore the NEWLINE +rather than trigger an error. +Also, I need to not report shift/reduce conflicts on whatever comes next. +i.e. if + a -> b c +and c can end with a, then both c and a can be followed by the same things. +This is a conflict. If c (and a) end with NEWLINE we declare the conflict +resolved. + +An IfHead might not end with a NEWLINE. So to make a statement we need +to follow it with an optional NEWLINE. Let's see if we can make that work. + +What is a SimpleCondStatement? It has no blank lines or unindented breaks.. + + ForPart ThenPart WhilePart CondSuffix OptNL + +We don't need a distinct Simple class!! If it didn't start at SOL, +then unindented NEWLINEs must be terminal Wooho!! + +This requires that "Statements" doesn't insist on following NEWLINE + +A SimpleStatement can be followed by a ';', a Statement cannot. That +different is still needed. +So + SimpleStatements -> SimpleStatement | SimpleStatements ; SimpleStatement + +SimpleStatements can end with a ComplexStatement and no NEWLINE. +ComplexStatements must end with a NEWLINE after each statement except the last + +Each Part (including SSlist) end with arbitrary NEWLINEs. These will +only ever be at the same indent level. +A ComplexStatement must be separated from next by a NEWLINE. +So if the final non-empty Part does not end with NEWLINEs, how do we require one? +Maybe not.. + +What if a Part doesn't end with NEWLINE ever, but can start with them + +CondStatement -> IfPart Newlines + | IfPart IfTail... + +I think I need a CondStatement which doesn't end with a newline and a +CondStatementNL which does. Then anything that can end a cond statement +must come in two versions. + IfPart ElsePart CasePart WhilePart CondSuffix + +If we expect the non-NL, we accept the NL but not vice-versa. + +------ +problem. +in + if cond: + cmd1 + cmd2 + +the 'if' that started before the indent must finish at/after the indent. +But in + if a = b or + c = d : + do something +The Expr that started before the first indent may finish well before the indent finishes. +I think this is because Expr is not linelike but 'if' is. + +So I don't want an error when reducing if there is an indent, unless the new top start +starts_line +... + +OK, I'm up to the part where I need to hide conflicts that I can automatically resolve. +I have: + + State 7 has 2 (or more) reducible items + IfHead -> IfHeadNL . [25] + IfStatementNL -> IfHeadNL . [27] (2Right) + State 35 has 2 (or more) reducible items + IfTailNL -> else IfStatementNL . [32] (2Right) + IfStatement -> IfStatementNL . [30] + +I need to clarify the rules that I'm working with. + +1/ Statements might not end with a NL but as it is linelike... + +2/ IfHeadNL IfStatementNL IfTailNL all end with a NEWLINE + IfHead IfStatement IfTail might not (but they may) + +Why did I want IfHead -> IfHeadNL?? +Because I might have + if cond: + action + + + else: bar + +No, that is still an IfStatementNL. Once there is any NL, we cannot fit it +on a line. + +Hmm... the FooNL pattern is getting out of control. +Why do I need this again? + +Because when "if cond : statements" is followed by a NEWLINE I need to +hang on to the parser state - not reducing to statement - until I see an 'else' or don't. + +If I do see the else, the difference doesn't matter. If I don't then I need +to know if I have a NEWLINE. + +So I could have + IfHead -> if Expression : Statements + IfHeadNL -> IfHead Newlines + + IfElse -> IfHead else + | IfHeadNL else + + Statement -> IfHeadNL + + But I want "if Expr then SimpleStatements" + where SimpleStatements can end with "IfHead" +So when I see: + + if foo : if bar : baz NEWLINE + +That NEWLINE mustn't turn "if bar: baz" into an IfHeadNL + We need to first turn "if bar: baz" into a SimpleStatement, then + "if foo : SimpleStatement" into an "IfHead", but NOT into a SimpleStatement. + +Arg. I might not know to reduce something until I've seen an IN. It is the +'else' + +What if an Statement *always* ends with a newline. +So + if Expr : Statement +also ends with a newline and can be a statement +But if there is an IN after ':' the newline is hidden. +So that doesn't work. + +What if a NEWLINE absolutely has to be at the top level. +If a symbol contains a NEWLINE, then it must be at the start of a line, +possibly indented. +So if it isn't indented, it mustn't contain a NEWLINE - no NEWLINE will get shifted in. + + Statement -> if Expr : Statements + | SimpleStatements + +What does Statements look like? It must end a NEWLINE + Statements -> Statements Statement NEWLINE + | Statement NEWLINE + | Statements NEWLINE + | SimpleStatements + + SimpleStatements -> SimplePrefix ; Statement + SimplePrefix -> SimpleStatement + | SimplePrefix ; SimpleStatement + + +01July2019 + I think I have a very different approach - it incorporates a lot of + the ideas so far and is maybe better. + + From the top: + + We have a simplified SLR(1) grammar where each state has at most one + reducible production. We don't have an action table, but use the goto table + to decide if a terminal can be shifted. If it can, we do. If it cannot, + we reduce or trigger an error. + + Onto this we add handling for IN/OUT and NL. NL can appear int the grammar, + IN/OUT cannot. + + Any non-terminal which can derive a NL is deemed "line-like". + Such non-terminals will normally appear at the start of a line - possibly indented. + These non-terminals can have some productions that have a NL (usually at the end) + and some that contain no NL. + If a non-terminal appears other than at the start of a line then no NL will ever + be shifted into it, so a production without NL will be used. + If it does appear at the start of a line, then any production can be used, though + it must end up ending in a NL. + + The above paints an incorrect picture of how LR parsing works. At any given + time you don't know what non-terminal is being matched, so we cannot exclude + NL based on the non-terminal. We only know what parser state(s) we are in. + So: any parser state which is at the start of a line-like non-terminal is + flagged as "starts_line". + Also, in each start we store a "min prefix" which is the minimum non-zero number of + symbols before "dot" in any item in the set. This given a sense of where we are + in the parse. + + If min_prefix if the top state is less than the number of symbols since start-of-line, + then we will not SHIFT a newline. + + Indents (IN) are recorded after the symbol they follow. If there is an IN + since the most recent starts_line state, the any NL is ignored. + An OUT will cancel the most recent IN, providing is in the top min_prefix symbols. + If not, we need to reduce something first. + + So when we see an OUT, we reduce until we can cancel. + When we see a NL, we reduce until the min_prefix reaches at least to the + start of the line. Then we can shift the NL. + After shifting the NL, the whole line should be reduced. + + When a line-like non-terminal produces a sequence that *doesn't* + end with an explicit NEWLINE, the grammar analysis ensures that + nothing can be shifted in after the end of the production. This + forces it to be reduced into the non-terminal. + + For example + simplestatement -> var = expr | print expr + simplestatements -> simplestatement | simplestatements simplestatement + SSline -> simplestatements NEWLINE | simplestatements ; condstatement NEWLINE + + statement ->.. + | simplestatements NEWLINE + | simplestatements ; statement + | if expr then statements NEWLINE + | if expr then statements Newlines else statements NEWLINE + + When a non-terminal is explicitly followed by a NEWLINE, it is line-like + also if it contains a NEWLINE or linelike, it is linelike. + + + ifhead -> if expr : statements + | ifhead NEWLINE + + iftail -> else : statements + | else ifstatement + + ifstatement -> ifhead + | ifhead iftail + + statement -> simplelist | ifstatement | Newlines simplelist | Newlines ifstatement + + statements -> statement | statements statement + + A line-like that contains newlines must be reduced by OUT or NEWLINE. + + How can I know that a statements can be followed by else in + if cond : statements else: statements + or + IfHead IfTail + bit not by 'if' in + statements statement + + Maybe I could have a $sol token. + If at $sol, and cannot shift then try shifting $sol.. + then + statements -> statements $sol statement + or maybe $eol is better, then we can have NEWLINEs start start of statement. + OR maybe either... $linebreak is shifted if previous or next is NEWLINE + An IN doesn't allow a LINEBREAK. + +Just to repeat myself: + Arg. I might not know to reduce something until I've seen an IN. It is the + 'else' + +After above, new approach: IfHead and Statement have NL versions, nothing else does. +MAybe fixed the above... + +So + + StatementNL -> Statement NEWLINE | IfHeadNL + Statements -> StatmentNL | Statements StatementNL + StatementList -> Statement | Statements + + Block -> : Statements | Open Statements Close + + IfStatement -> IfHead | IfHead IfTail | IfHeadNL IfTail + IfHead -> if Expr Block + IfHeadNL -> IfHead NEWLINE | IfHeadNL NEWLINE + IfTail -> else Block | else IfStatement + +Close, but the IfHeadNL in "else IfStatement" cannot accept newlines. +What if + IfHead -> if Expr Block | IfHeadNL else IfHead | IfHead else IfHead + +No, I think I have to take a totally different approach. + +IfPart elsepart switchpart whilepart etc are all syntactically valid +as stand-alone statements in the base grammar. +We use the code to fail a stand-alone elsepart the isn't preceeded by an ifpart, whilepart or casepart. + +So statements never contain newlines, only the statement-list does. +If puts a NEWLINE at the end of each statement + statementlist -> statement | statementlist NEWLINE statement +statement can be empty string, thus allowing blank lines and a NEWLINE at the end, +which the parser will require. + + +[[ thought experiment - interestibg, but gets unwieldy with + more complex statements +statements -> simpleline NL statements + | ifhead NL iftail NL statements + | ifhead NL statements + | ifstatement NL statements + +iftail -> else block + | else ifhead iftail + | NL iftail + +ifhead -> Newlines if expr block +ifstatement -> ifhead iftail +]]