updates

author NeilBrown <neil@brown.name>

Sat, 19 Sep 2020 05:59:02 +0000 (15:59 +1000)

committer NeilBrown <neil@brown.name>

Sat, 19 Sep 2020 05:59:02 +0000 (15:59 +1000)
author NeilBrown <neil@brown.name>
Sat, 19 Sep 2020 05:59:02 +0000 (15:59 +1000)
committer NeilBrown <neil@brown.name>
Sat, 19 Sep 2020 05:59:02 +0000 (15:59 +1000)
diff --git a/00-TODO b/00-TODO

index 68717a49d4043be38af90df115c63a17539735c6..8c0efc77bafe710ae709f0a582f47232ed876ca7 100644 (file)
--- a/00-TODO
+++ b/00-TODO
@@ -1,7 +1,13 @@
  This is a living document - delete things when done.  Avoid discussion.
  
  Current version (Cataract Creek)
-- use precedence levels for expressions
+- Warn when left-recursive symbols appear elsewhere, other than at the end
+  of a production.  Might have to special-case Newlines.
+- parser not to get into ERROR infinite loop
+- sort 'virtual' symbols to end
+- allow $xy instead of $3.  Chooses shortest bodysym with xy in that order
+    $xy_2 gives the second one
+- allow $TERM terminals to be listed.  If so, extras are errors
  - structs
     - const fields
     - anonymous field - array or struct (or pointer to these)
@@ -10,21 +16,27 @@ Current version (Cataract Creek)
     - anon struct field gets fields interpolated
  
  - manifest values for arrays and structs [a,b,c]
+    or [.foo=a, .bar=b] or [ [1]=a, [2]=b]
+   That last doesn't parse easily, unless we require tags... not a good idea.
+   [ .[1] = a, .[2] = b ] ?? Maybe.
  - yet more operators
       << >> #
       bit-ops & | ~ &~
       op=
  - split values so I can have an array of just the value (1 byte for u8)
-- integers, unsigned, bitfield, float
+- integers, unsigned, bitfield, float, double?
  - pointers
     - owned or borrowed
     - pure, loaded, overloaded, augmented
     - owned: once, counted, collected
+   - shared or thread-local
  - array slice
  - array buffer - can be added to and grows.
  - char, string search, regexp search
  
  - allow "do stuff" as a stand-alone statement (scope)
+- 'use' labels *must* appear in case statements.
+- re-read parsergen lit-doc and make sure it is still coherent.
  
  Next version (Govetts Creek):
  - functions and procedures
diff --git a/Ocean-functions b/Ocean-functions

new file mode 100644 (file)

index 0000000..9014d1a
--- /dev/null
+++ b/Ocean-functions
@@ -0,0 +1,37 @@
+I want to add functions and procedures soon.  I should decide on syntax at least.
+
+The args to a function are effective a struct, so I want it to look the same.
+C doesn't allow "int a, b, c" in the parameters, which I think is clumsy.
+struct can be
+
+  struct name:
+     a,b,c:number
+     d:string
+
+So function might be
+
+   func name:
+       arg1, arg2: type
+       arg3: type2
+   returns type
+   do:
+       stuff
+
+A procedure is different as it doesn't have just a return type,
+it has a return structure.  So many C functions have 'ret' or 'result'
+variable that it might be nice to follow the Pascal approach of
+assigning to the function name??  or having
+
+    func name:
+       args:types
+    returns:
+       results:types
+    do:
+       statements
+
+A shorter version would be
+
+    func name(args:types;args:types):type { }
+or
+    proc name(args:types;args:types):(result:type;...) {}
+
diff --git a/Ocean-types b/Ocean-types

index 16714803a186af0502d3c45205a47aec6e4501ab..ed57f26d1407bee3baa002089d14bec3b85688fb 100644 (file)
--- a/Ocean-types
+++ b/Ocean-types
@@ -304,3 +304,96 @@ I'm in the middle of stage-1 on structures.
  
  I need a type to parse the declaration into.  It needs to be a linked list
  of fields, each of which is a type, a name, and an initial value.  i.e. a 'struct field'.
+
+-----------------
+
+Numbers...
+I want signed/unsigned/bitset integers (and probably floats).
+These are different sizes, and I want to move 'type' out of 'value'
+so I can have arrays of numbers that are *just* the densely packets numbers.
+
+So there are two questions here: how will I handle values in oceani, and
+what are the semantics of numbers in ocean.
+
+I think I want bitops to requires bitsets and arith ops to require signed/unsigned.
+But there is some overlap.
+e.g. we use bitops to test if a number is a power of two
+We sometimes use bitops to multiply, but that is probably best avoided.
+use * to multiply.
+
+Converting between the two can be done with simple assignment.
+
+So + - * / %     require/assume signed or unsigned
+   | & ~ << >>   require/assume bitset
+
+  #  accepts either and produces a bitset
+
+Other issue is overflow/underflow checking.
+Do we need another unsigned type - cyclic
+
+    i32 - signed integer in 32 bits
+    u32 - unsigned integer
+    c32 - unsigned with overflow permitted and ignored
+    b32 - bitset
+
+    int uint cint bset - whatever size.
+
+i32 and u32 detect overflow/underflow and set to NaN - all 1's
+If I want to allow overloading (such a NaN), I need a type that
+declare no overloading. s32 and c32?  Or annotations.  !s32 !u32
+
+So what about values in oceani?  I want to separate out the type and not
+use a union.
+Where are they used?
+ - return of init, prepare, parse, dup
+ - passed to  print, cmp, dup, free, to_int, to_float, to_mpq
+ - field in 'struct variable'
+ - field in 'struct lrval'
+ - result of 'interp'
+ - intermediate left/right in interp
+ - field in array and struct field
+ - field in 'struct val' for manifest constants
+
+So:
+  variable gets a 'type' pointer and a union which can be a pointer
+  to the value, or the value itself (depending on size)
+  lrval get a type pointer as well, plus the union
+  interp returns ...
+
+
+-----------------
+Struct/array initialisers.
+I like [a,b,c] rather than {a,b,c} because the latter can look like code.
+But [] is also array indexing.
+So an array initializer could look:
+  [ [1] = "hello", [5] = "there" ]
+and that is confusingly similar to nested initialization
+  [ [1,2] , [3,4] ]
+Options:
+ 1/ use different outer.  {}  () <> << >>
+   < is possibly as it is not a prefix operator.
+     But nesting results in <<1,2>,<3,4>> which looks like << instead of < <
+   {} I already don't like
+   () is bad enough with function calls - it is best if it is grouping only.
+     though with function calls it is a list ...
+   << [1]="hello", [2]="there" >>...  I don't really like that
+
+   array[ ]
+   struct[ ]
+     No, too noisy.
+
+ 2/ use different inner syntax.
+     [ .[1] = "hello", .[5] = "hello" ]
+
+ What about a newline-based syntax:
+  a: [4]int :
+       [0] = 2
+       [1] = 3
+       [3] = 1
+
+ Nice, but doesn't actually help.  Still need .[] because I want to allow
+ a one-line syntax too.
+ Maybe I just use {} after all.
+
+  a:[4]int = { [0]=2, [1]=3, [3]=1 }
+ Yes, I guess that is best.
diff --git a/twod b/twod

index de0e302f196ea0b582dc81c82916524261b5fa53..bde8aae6cb3e886c7dc71c90c6a51b4aa55ae945 100644 (file)
--- a/twod
+++ b/twod
@@ -1448,3 +1448,899 @@ The NEWLINE after b is not Ignored in the expression,
      Maybe we want:
          Open -> {
              | NEWLINE Newlines {
+
+
+ I have a problem.
+ I want
+
+    else: a := b
+
+ to parse the same as
+
+    else:
+          a := b
+
+ and for the last newline to close the elsepart.
+ But the latter has 2 newlines while the former only has one
+ and I don't have any obvious justification for ignoring either.
+ I think it is in the Newline before the OUT that is extra.
+
+ I could drop the newline before the OUT, assuming the newline
+ separate things, and the OUT will force any reductions needed.
+ But then we have fewer newlines reported than actual.
+ (Same imbalance happens with multiline comments and strings, so maybe
+  that is OK).
+ Another way to look at it is that the newline following an IN is discarded
+ (or always ignored) and not moved to after the OUT.
+ So (maybe) the newline at an IN or OUT is reported *after* the IN or OUT.
+ so
+  A
+    B
+    C
+   D
+
+ Would be A IN NL B NL C OUT IN NL D OUT NL
+
+ The parser always ignores the NL after an IN but uses other
+ NL to reduce to a single symbol (if possible)
+ OR maybe it doesn't ignore (unless not line-like context) and
+ lines are preceeded by NL, not followed by them...
+ No, followed is usually good.. though separated is better... so preceed!!
+
+ Ok, this isn't working.
+ A construct
+   if cond:
+      pass
+
+ cannot be reduced to a Statement until we know what comes next, and it
+ might be separated by several newlines.
+ So the newlines need to be part of the Statement.
+ But that means we cannot have newlines at the front of a statement.
+ But that was the point...
+
+ Maybe a Statementlist is a series of StatementNL followed by a Statement
+
+ We allow
+   StatementNL -> Statement Newlines
+ as a general catch-all, but when we have something like if, or anything
+ with an optional tail "else:" or "case:"
+ We say:
+   StatementNL -> if Expression Block Newlines
+ But that would produce a conflict with
+   Statement -> if Expression Block
+ As a newline could either trigger a reduce to Statement, or a shift.
+ Obviously we shift, but maybe we use precedence to force the point.
+
+ Can we handle 'else if' ...
+ IfStatementNL -> if Expression Block Newlines
+             | if Expression Block else IfStatementNL
+
+ ... I'm contemplating having the parser duplicate NL as necessary, so
+ that
+    if test: action
+ can appear to be followed by 2 NL, one to terminate the 'action' statement
+ and one to terminate the whole 'if'.
+ This might mean I need to extend when NL are discarded - to ensure they
+ don't get duplicated too much.
+ 1/ if state does not permit newlines, discard
+ 2/ else if I can reduce symbols all since start of line do that.
+ 3/ else if can shift, do that
+ 4/ else if only one symbol since newline, discard.
+ 5/ else ERROR
+
+ This means that we cannot recognise multiple newlines
+ or does it.
+ If we shift a Newline, that is since_newline=0;
+ If we reduce that to Newlines, that is still since_newline=0
+ If 4/discard only applies when since_newline==1 -- we win.
+
+ Currently since_newline essentially means the symbol contains a newline.
+ So 'statements' usually does, but 'statement' doesn't.
+ When we shift the newline and reduce, it all becomes since_newline=0.
+ That is when we want to ignore newlines.
+
+ 15jun2019 - still working this through..
+
+ Normally the parser does
+   shift or else reduce or else error
+ exceptions are TK_in which is simply recorded and
+   TK_out: reduce until there is a TK_in in scope, then cancel, else error
+   TK_newline:
+       if not newline_permitted (indent since last starts_line state)
+                 Discard
+       if can Reduce to at most start-of-line, reduce
+       if can Shift, duplicate and Shift
+       if can Reduce, do so
+       if 0 since newline, Discard
+
+  since_newline needs to be changed a bit.
+   A TK_newline token *isn't* zero, it is N+1.  The token *after*
+   the NEWLINE is zero - so that
+
+ Arg.  I'm struggle with that fact that having shifted a newline,
+ we are both at the end of a line, and at the start of the next.
+ When I see a newline, I want to reduce until the end of line
+ is in the same state as the start of that line.
+
+ Maybe I do want newline to be a separator.
+ What if I don't actually include the newline in the grammar, just like in/out.
+ Instead we mark select productions as lines.  This is like marking
+  for precedence.
+ A marked production is reduced when a newline is seen providing it won't
+ contain any indents.
+ So: if the reducable item in a state is marked, the start gets marked.
+ When we see a newline, if the state is marked and the reduce size does not
+ exceed since_indent, we reduce.  Otherwise we discard.
+ No... I need an error condition too.
+ So I need the state to have a starts_line marking, when a new item is marked.
+
+ So:
+   productions can be marked $$NEWlINE which flags the production as line-like
+   a state with an item with DOT at start of a line-like production is starts_line
+   a state with an item with DOT at the end of a line-line product is ends_line
+   We track indents as before.
+   When we process an indent or newline, we set since_newline to 0
+   When we see a newline we do one of:
+     if not newline_permitted, we discard
+     if top state starts line, we discard
+     else reduce or else error
+
+No.....
+ A production -> { statements }
+ needs to ignore newlines either side of statements.
+ It is a multi-line production - newlines don't matter.
+ Maybe there are several sorts of symbols:
+  - in-line:  must be broken across lines unless indented
+  - line-like: is terminated (reduced) by a newline
+  - multi-line: newlines are ignored
+
+ We tag symbols which are line-like.
+   Any symbol which can derive a line-like symbol is multi-line
+   Any other symbol is in-line.
+
+ So SimpleStatements, ifhead, elsepart, casepart etc are linelike
+
+$line SimpleStatements IfHead ....
+
+ A state that is at the start of a linelike symbol starts_line
+ Any state in a multi-line production starts_line
+
+ if tos starts_line, newlines are ignored.
+ else if there is an indent since the starts_line, newlines are ignored.
+ But if there are symbols since starts_line, we have to reduce until are
+ are in a starts_line start (or can see an indent).
+
+No....
+    Block -> : Statementlist
+is none of thse.  It must be reduced by a newline, but isn't entirely line-like.
+    Block -> { Statementlist }
+is multi-line
+
+but maybe Block here is neither.  It only becomes linelike when it
+terminates a Statement, which is linelike.  Or terminates an ifHead
+
+Should this be legal?
+  a:=b;pass if something
+probably not. I want to require at least ';' or NEWLINE.
+That means I need to include NEWLINE in the grammar.
+ Statements -> SimpleStatements ; Statement
+      | Statements  Statement
+      | Statement
+
+ if cond: if cond: a:=b
+NEWLINE reduces this down to IfHead
+IfHead -> IfHead NEWLINE
+Statement -> IfHead
+   | IfHead ElsePart
+
+ElsePart -> else BLOCK
+  | else IfHead
+  | else IfHead ElsePart
+  | ElsePart NEWLINE
+
+
+Statements -> Statements Statement
+       | Statement
+
+
+if cond { statments } else { statements}
+but not
+if cond: statements else: statements
+
+so :statements must expect a NEWLINE but then
+  if cond: if cond: statement
+expects 2 NEWLINEs.
+Maybe
+   Block -> : IN statments NEWLINE
+if there is no indent, we synth one which triggers an OUT NEWLINE pair.
+This could be automatic.
+ If a linelike is followed by a newline, we synthesis an IN before it.
+
+That requires a hack to the scanner: Synth Indent
+
+What if
+
+ Block -> { Statements }
+   | : Statements NEWLINE
+   | : SimpleStatements
+
+Then
+   if Expr Block
+
+might not end in a NEWLINE so else could come immediately. Is that OK?
+   if expr: a:=0 if expr
+must be forbidden.  That requires a newline.
+
+ Block -> : IN statements OUT NEWLINE
+
+In marks state as 'needs_indent'  If an indent arrives, fine.
+If something else, we record that next newline (After balanced in/out)
+must synth extra out/newline.
+
+ Block -> { Statements }
+     | : statementblock
+
+ statementblock -> Statements $$line
+
+$$line means it must be reduced by a newline.  If something else tries,
+it is an error and we skip to newline.
+It also strips everything but NEWLINE from the (effective) lookahead
+to avoid reporting conficts, as those things will never be shifted.
+
+ IfHead -> if Expr Block
+    | IFHead NEWLINE
+ Block -> { statements }
+     | : statements $$line
+
+ IfStatement -> IfHead
+      | IfHead IfTail
+      | IfHead else IfStatement
+ IfTail -> else Block
+      | IfTail NEWLINE
+
+
+------
+20jun2019  Happy 87th birthday Dad.
+
+I'm not convinced about $$NEWLINE
+
+   else: simplestatementlist
+
+should be able to parse simplestatementlist without a newline, and
+use the newline to close the if/else.
+where as
+   else:
+      statementlist
+
+has a newline to close the statementlist and another to close the if/else.
+But can the LR parser tell the difference?
+It only sees that newlines don't forcibly reduce the else:
+So when it sees the newline at the end of simplestatementlist,
+it cannto shift because there is a sub-line thing that can be reduced.
+So this becomes elsepart before the newline is absorbed.
+Whereas in statementlist, the newline can be shifted creating simplestatementline.
+
+What about
+   if cond: if cond: statement
+
+Again, the newline cannot be shifted while we can reduce
+
+But.... how does conflict analysis know that an 'if', for example, is not
+permitted after simplestatementlist?
+
+Ahh.. This is exactly what $$NEWLINE is for. Maybe it should be $$OUT.
+Either way, the grammar is ambiguous and relies on newlines or indentation to
+close the production, and this fact needs to be explicit.
+Requiring OUT is probably best as it means
+  if cond:
+       statements
+    else:
+       statements
+
+works even though there is no newline after the first statements.
+Here I want the 'statements' to be closed by OUT, but the whole to
+be closed by NEWLINE.
+So maybe I need both $$NEWLINE and $$OUT ??
+
+
+$$OUT makes lots of sense.  It is exactly how we expect :statements to
+be closed - where we allow NEWLINE to have the same effect.
+
+$$NEWLINE is good for closing an complex if or for etc.  It means that
+nothing else can be on the same line - allowing for indents.
+How do we implement that?
+Any production in the grammar that represents a full line but doesn't
+end with a newline should be marked $$NEWLINE
+This head of that production should recursively absorb NEWLINEs.
+
+I'm not yet clear on exactly the difference bwetween $$OUT and $$NEWLINE.
+I would put $$OUT after
+   block -> : Statements $$OUT
+and $$NEWLINE at the end of a statement that must end a line
+   condstatement -> Ifpart IfSuffix $$NEWLINE
+         | constatement NEWLINE
+
+Maybe I need a worked example.
+   while conda:
+       if condb: if condc: action
+   pass
+
+So after action there is NL OUT NL pass
+The NL sees that it can reduce, and the if allows the NL to reduce it so
+      while COND : IN if COND : statement   [ NL OUT NL ]
+again the NL can reduce.  Note that we *don'* absorb the NL in the statement
+      while COND : in statement [ NL OUT NL ]
+Now we can shift the NL
+      while COND : IN statment [ OUT NL ]
+Now the OUT forces a reduction
+      while COND BLOCK(in)  [ OUT NL ]
+Now the out is cancelled
+      while COND BLOCK [NL]
+and the while is reduced.
+
+So the $$NEWLINE must always see a newline (or $$EOF)
+An $$OUT must see an OUT or a NEWLINE (if there was no IN)
+
+$$OUT causes the LA set for items with the production to be empty.
+It is never credible that anything will be shifted so any apparent LA
+contents can be ignored.
+The state when a $$OUT is reducible has a recedence higher than any terminal, so
+nothing can be shifted and no completion should be possible.
+The state when a $$NEWLINE is reducible is much the same.
+
+Maybe I don't want NEWLINE in the grammar, only $$NEWLINE??
+How would we recognize a blank line?
+  command -> $$NEWLINE
+??
+We would need a new rule for discarding newlines.
+e.g. when the top-but-one state is start-of-line we discard and mark the top
+state s-o-l.  That stops us discarding a newline until it reduces something that is at the start of a line....
+
+1/ if there is an indent since the last start-of-line state, discard NEWLINEs
+2/ if ....
+
+Q: When is a NEWLINE an error?
+A: when it isn't ignored and we cannot reduce and
+   top or  top-but-one state isn't starts_line.??
+
+So we need extra state info and extra frame info.
+
+State has:
+  - starts-line - is at start of a $$NEWLINE production
+  - ends-line - is at end of unreduced $NEWLINE production
+  - ends-indent - is at the end of a $$OUT production
+  - min-prefix - how far back a 'in' can be and still cancel
+
+Frame has:
+  - indents - count in or after that sym
+  - line_start - is the was a line start (IN or NEWLINE) immediately after
+      the symbol
+  - newline_permitted: no indent since start-line
+  - since_indent: number of frames where indents==0
+  - since_newline: number of frames where line_start==0
+
+If we see a NEWLINE then:
+  if ! newline_permitted, discard
+  elseif can reduce and reduce_count <= since_newline - reduce
+  elseif since_newline <= 1, and state.starts-line, discard and record line_start
+  else error
+
+If we see an IN
+  increment indents, set line_start
+
+If we see an OUT
+  if reduce_size <= since_indent, reduce
+  if min_prefix >= since_indent, cancel
+  else error
+
+How does error handling work?
+Normally we pop states until we can shift ERROR
+Then we discard tokens until we can shift one.
+
+However we need to do something different for IN OUT NEWLINE.
+For IN, we simply increment a counter
+For OUT we decrement if it is positive.
+   If it is zero and the state ends-indent, then we are synced.
+   If it doesn't, we need to pop more states until we have an indent to cancel.
+For NEWLINE if the state ends-line or ends-indent and ...something... we are synced.
+   else we skip it??
+
+... no, that doesn't work because I cannot see a way to describe an optional newline.
+
+Let's try with just $$OUT which requires OUT or NEWLINE...
+We put $$OUT on productions that must be closed in a 2-d obvious way.
+So they can be at the end of a line or at the end of an indente block.
+So
+   : statements $OUT
+means the next line after the : cannot be indented.
+However
+  Block -> : statementline | : statementblock | { statements }
+  statementblock -> statements $NEWLINE
+means I can have else indented, or on same line as single statement
+
+  if cond: a = b; else: b = a
+  if cond:
+        a = b
+     else:
+        b = a
+
+The whole 'if' needs a $NEWLINE marking to ensure a following statement isn't
+indented.
+So implementation is almost exactly what I have:
+ - if anything else is lookahead when reducing that production, it is an error.
+ - remove non-newlines from lookahead in items
+
+But I don't think $$OUT is quite what I want to call it.
+That doesn't quote cover end-of-line possibilities.
+Maybe allow $$NEWLINE or $$OUT but with same behaviour.
+
+.... still not there.
+Another way to satisfy a $$OUT reduction is for it to already look right.
+So: No indents and at start-of-line
+
+But that upsets the modification to look-ahead as we can no longer assume
+the next token.
+I think this might be more like a precedence thing??
+Without look-ahead modification, the first token of a statement can be shifted
+ before a newline forces a reduction....
+
+Maybe I do need two sorts productions.
+ $$OUT requires an out/newline to reduce it.
+ $$NEWLINE follows either $$OUT or NEWLINE and requires start-of-line and no indents.
+        or $$OUT or $$NEWLINE
+
+Or does it matter. Over-modification of the look-ahead suppresses warnings, but
+doesn't affect the parse.
+Will we get warnings anyway?
+
+
+--------
+Are left-recursive symbols in a non-final position always bad?
+
+Left-recursive symbols cannot be closed by forcing a reduction.
+So if one starts in an indented region (in which newlines are ignored)
+it could continue afterwards - unless we make that an explicit error somehow.
+If they appear at the end of some other production, that one will (maybe)
+be reduced as well so (maybe) no problem...
+
+if cond:
+   a()
+   b()
+ c()
+
+is weird and I want to forbid it.  Al that is between b() and c() is
+NL OUT IN.  NL closed b(), so it is just OUT IN
+So I do want the statementlist to close.
+
+ a :=
+     1 + 2
+   * 3 + 4
+
+is very wrong.  How much can I help?
+The OUT will reduce "1 + 2" which will then become
+    ((1 + 2) * 3) + 4
+which would be highly confusing.
+So something about this must be disallowed.
+Maybe when newlines are ignored, OUT doesn't force a reduce??
+I can make it an error by having Expression reduce to something else.
+
+Do I want an error even for
+    1 + 2
+    * 3 + 4
+??
+
+I could achieve that by adding extra checks when we SHIFT at
+start of line.
+If we could reduce tokens since previous SOL, then we have 2D ambiguity.
+
+   1 + 2 *
+   3 + 4
+
+That is just as ambiguous, but we cannot reduce anything.
+When we see the second '+', the reduction crosses a line-start but doesn't
+result in a line-start.
+
+So: a reduce that doesn't contain an indent, but does contain a start-of-line
+must reduce to that start of a line.
+
+This means we need to keep the start-of-line when we "IGNORE" a newline.
+
+Can I use this sort of logic to avoid the need for the extra reduction,
+or for the $$OUT markings??
+
+1/ The point of extra reduction is to avoid consuming more after an OUT or
+   ignored NL.
+    if cond:
+         a()
+         b()
+       c()
+
+    must be an error.  The OUT reduced a()b().  The stack is then
+        if cond : statements(n1) . IN Ident
+    The first indent is gone. . There is no error until we see all of c() so
+     if cond : statements(n1) simplestatement(i) . NL OUT NL
+
+    Is it  problen that the simplestatement is indented?
+     if cond : statements(n1) statement(i) . OUT NL
+
+   Q: is it an error to reduce a sequence containing an (uncancelled) indent?
+
+2/ The $$OUT markings guard against exactly a reduction containing an uncancelled IN.
+
+So maybe I have two new rules.
+
+ 1/ a reduction must not include any uncancelled indent.  pop() must return 0.
+ 2/ a reduction the contains an unindented start-of-line must begin with start-of-line.
+     So when we cancel an indent, we also cancel line starts since there.
+
+One other value of $$OUT is that is avoided conflicts - most symbols could not
+be shifted.  That should have only applied to $$NEWLINE(!) and doesn't apply
+at all if I drop the marking and use internal rules instead.
+So how do I avoid reporting conflicts?
+
+Really, there shouldn't be any conflict as NEWLINE should be expected.
+Let's go back to that idea.
+
+1/ A linelike thing MAY start with Newlines and MUST end with a NEWLINE
+2/ A SimpleStatement is not linelike and doesn't include and Newlines
+3/    if condition : SimpleStatement
+   is a SimpleStatement.
+
+4/ When we see the NEWLINE after "if condition : SimpleStatement" we have a shift/reduce
+   conflict as we could SHIFT to make a complex statement, or reduce the whole thing
+   to a SimpleStatement.
+   Default action is SHIFT but in this case we want REDUCE - due to precedence?
+
+   However when we see the NEWLINE after "if condition :IN Simplestatement"
+   we cannot REDUCE as there is an cancelled indent, so we have to shift.
+
+   But when we reduce, we only want to Reduce to IfHead so that an 'else' can appear
+   on the next line.
+
+   If we see IN .. just continue.
+
+What do I need to do:
+
+ 1/ Change grammer to expect blank-lines before and to have a NEWLINE at the end
+    of any line-like thing.
+    This requires IfHeadNL and IfHead.  ditto for switch, while, then ...
+    This get complex with
+      for a:=0; then a += 1; while a < 10:
+    which could have several newlines
+      for a:=0
+      then a+=1 ; while a < 10:
+
+    ForPart -> for simplestatements ;
+            | for simplestatements NEWLINE
+            | for Block
+            | Newlines ForPart
+
+    ThenPart -> then SimpleStatements ;
+            | then SimpleStatements NEWLINE
+            | then BLOCK
+            | Newlines ThenPart
+
+ 2/ disallow Reduce when embedded indents - report ERROR
+ 3/ disallow Reduce when embedded start-of-line.
+ 4/ TK_newline uses these rules to decide when to force a reduce.
+
+A/ A parser symbol that starts after an IN must end before the OUT
+B/ A parser symbol that starts before an IN must end at-or-after the OUT
+    only if if the symbol is not line-like ???
+
+C/ A parser symbol that starts after a line-start and before an indent must end
+    by the end of line
+D/ A parser symbol that starts at a line-start must end before the end-of-line,
+    or at a subsequent end-of-line.
+
+A is satisfied by forcing a reduce on OUT and reporting error if IN cannot be cancelled
+B is satisfied if we report an error if we try to reduce an uncancelled IN
+C is satisfied by forcing a reduce *after* shifting NL and reporting ERROR if
+   min_prefix exceeds the line
+D is satisfied if we report an error when reducing at eol crosses a NL and doesn't start
+   at start-of-line.
+
+C is interesting - do we reduce *after* shifting NL??  I think we do, yes.
+
+So: when can I suppress conflicts, and how do I handle reduce/reduce conflicts?
+
+I need to be sure that a line-like ends with an unindented newline.
+I can trigger an error when that doesn't happen, but I want more.
+I want to encourage it to happen.  So if the grammar allows a NEWLINE it
+will be shifted in, but if we have already seen an OUT, we ignore the NEWLINE
+rather than trigger an error.
+Also, I need to not report shift/reduce conflicts on whatever comes next.
+i.e. if
+   a -> b c
+and c can end with a, then both c and a can be followed by the same things.
+This is a conflict.  If c (and a) end with NEWLINE we declare the conflict
+resolved.
+
+An IfHead might not end with a NEWLINE.  So to make a statement we need
+to follow it with an optional NEWLINE.  Let's see if we can make that work.
+
+What is a SimpleCondStatement?  It has no blank lines or unindented breaks..
+
+  ForPart ThenPart WhilePart CondSuffix OptNL
+
+We don't need a distinct Simple class!! If it didn't start at SOL,
+then unindented NEWLINEs must be terminal  Wooho!!
+
+This requires that "Statements" doesn't insist on following NEWLINE
+
+A SimpleStatement can be followed by a ';', a Statement cannot.  That
+different is still needed.
+So
+   SimpleStatements -> SimpleStatement | SimpleStatements ; SimpleStatement
+
+SimpleStatements can end with a ComplexStatement and no NEWLINE.
+ComplexStatements must end with a NEWLINE after each statement except the last
+
+Each Part (including SSlist) end with arbitrary NEWLINEs.  These will
+only ever be at the same indent level.
+A ComplexStatement must be separated from next by a NEWLINE.
+So if the final non-empty Part does not end with NEWLINEs, how do we require one?
+Maybe not..
+
+What if a Part doesn't end with NEWLINE ever, but can start with them
+
+CondStatement -> IfPart Newlines
+  | IfPart IfTail...
+
+I think I need a CondStatement which doesn't end with a newline and a
+CondStatementNL which does.  Then anything that can end a cond statement
+must come in two versions.
+ IfPart ElsePart CasePart WhilePart  CondSuffix
+
+If we expect the non-NL, we accept the NL but not vice-versa.
+
+------
+problem.
+in
+   if cond:
+       cmd1
+       cmd2
+
+the 'if' that started before the indent must finish at/after the indent.
+But in
+   if a = b or
+      c = d :
+          do something
+The Expr that started before the first indent may finish well before the indent finishes.
+I think this is because Expr is not linelike but 'if' is.
+
+So I don't want an error when reducing if there is an indent, unless the new top start
+starts_line
+...
+
+OK, I'm up to the part where I need to hide conflicts that I can automatically resolve.
+I have:
+
+  State 7 has 2 (or more) reducible items
+    IfHead -> IfHeadNL . [25]
+    IfStatementNL -> IfHeadNL . [27] (2Right)
+  State 35 has 2 (or more) reducible items
+    IfTailNL -> else IfStatementNL . [32] (2Right)
+    IfStatement -> IfStatementNL . [30]
+
+I need to clarify the rules that I'm working with.
+
+1/ Statements might not end with a NL but as it is linelike...
+
+2/ IfHeadNL IfStatementNL IfTailNL all end with a NEWLINE
+   IfHead IfStatement IfTail might not (but they may)
+
+Why did I want IfHead -> IfHeadNL??
+Because I might have
+   if cond:
+        action
+
+
+     else: bar
+
+No, that is still an IfStatementNL.  Once there is any NL, we cannot fit it
+on a line.
+
+Hmm... the FooNL pattern is getting out of control.
+Why do I need this again?
+
+Because when "if cond : statements" is followed by a NEWLINE I need to
+hang on to the parser state - not reducing to statement - until I see an 'else' or don't.
+
+If I do see the else, the difference doesn't matter.  If I don't then I need
+to know if I have a NEWLINE.
+
+So I could have
+  IfHead -> if Expression : Statements
+  IfHeadNL -> IfHead Newlines
+
+  IfElse -> IfHead else
+          | IfHeadNL else
+
+  Statement -> IfHeadNL
+
+ But I want "if Expr then SimpleStatements"
+ where SimpleStatements can end with "IfHead"
+So when I see:
+
+ if foo : if bar : baz NEWLINE
+
+That NEWLINE mustn't turn "if bar: baz" into an IfHeadNL
+  We need to first turn "if bar: baz" into a SimpleStatement, then
+  "if foo : SimpleStatement" into an "IfHead", but NOT into a SimpleStatement.
+
+Arg.  I might not know to reduce something until I've seen an IN.  It is the
+'else'
+
+What if an Statement *always* ends with a newline.
+So
+   if Expr : Statement
+also ends with a newline and can be a statement
+But if there is an IN after ':' the newline is hidden.
+So that doesn't work.
+
+What if a NEWLINE absolutely has to be at the top level.
+If a symbol contains a NEWLINE, then it must be at the start of a line,
+possibly indented.
+So if it isn't indented, it mustn't contain a NEWLINE - no NEWLINE will get shifted in.
+
+ Statement -> if Expr : Statements
+            | SimpleStatements
+
+What does Statements look like?  It must end a NEWLINE
+  Statements -> Statements Statement NEWLINE
+       | Statement NEWLINE
+       | Statements NEWLINE
+       | SimpleStatements
+
+  SimpleStatements -> SimplePrefix ; Statement
+  SimplePrefix -> SimpleStatement
+       | SimplePrefix ; SimpleStatement
+
+
+01July2019
+ I think I have a very different approach - it incorporates a lot of
+ the ideas so far and is maybe better.
+
+ From the top:
+
+   We have a simplified SLR(1) grammar where each state has at most one
+   reducible production.  We don't have an action table, but use the goto table
+   to decide if a terminal can be shifted.  If it can, we do.  If it cannot,
+   we reduce or trigger an error.
+
+   Onto this we add handling for IN/OUT and NL.  NL can appear int the grammar,
+   IN/OUT cannot.
+
+   Any non-terminal which can derive a NL is deemed "line-like".
+   Such non-terminals will normally appear at the start of a line - possibly indented.
+   These non-terminals can have some productions that have a NL (usually at the end)
+   and some that contain no NL.
+   If a non-terminal appears other than at the start of a line then no NL will ever
+   be shifted into it, so a production without NL will be used.
+   If it does appear at the start of a line, then any production can be used, though
+   it must end up ending in a NL.
+
+   The above paints an incorrect picture of how LR parsing works.  At any given
+   time you don't know what non-terminal is being matched, so we cannot exclude
+   NL based on the non-terminal. We only know what parser state(s) we are in.
+   So: any parser state which is at the start of a line-like non-terminal is
+   flagged as "starts_line".
+   Also, in each start we store a "min prefix" which is the minimum non-zero number of
+   symbols before "dot" in any item in the set.  This given a sense of where we are
+   in the parse.
+
+   If min_prefix if the top state is less than the number of symbols since start-of-line,
+   then we will not SHIFT a newline.
+
+   Indents (IN) are recorded after the symbol they follow.  If there is an IN
+   since the most recent starts_line state, the any NL is ignored.
+   An OUT will cancel the most recent IN, providing is in the top min_prefix symbols.
+   If not, we need to reduce something first.
+
+   So when we see an OUT, we reduce until we can cancel.
+   When we see a NL, we reduce until the min_prefix reaches at least to the
+   start of the line.  Then we can shift the NL.
+   After shifting the NL, the whole line should be reduced.
+
+   When a line-like non-terminal produces a sequence that *doesn't*
+   end with an explicit NEWLINE, the grammar analysis ensures that
+   nothing can be shifted in after the end of the production.  This
+   forces it to be reduced into the non-terminal.
+
+   For example
+      simplestatement -> var = expr | print expr
+      simplestatements -> simplestatement | simplestatements simplestatement
+      SSline -> simplestatements NEWLINE | simplestatements ; condstatement NEWLINE
+
+      statement ->..
+             | simplestatements NEWLINE
+             | simplestatements ; statement
+             | if expr then statements NEWLINE
+             | if expr then statements Newlines else statements NEWLINE
+
+   When a non-terminal is explicitly followed by a NEWLINE, it is line-like
+   also if it contains a NEWLINE or linelike, it is linelike.
+
+
+ ifhead -> if expr : statements
+   | ifhead NEWLINE
+
+ iftail -> else : statements
+   | else ifstatement
+
+ ifstatement -> ifhead
+   | ifhead iftail
+
+ statement -> simplelist | ifstatement | Newlines simplelist | Newlines ifstatement
+
+ statements -> statement | statements statement
+
+ A line-like that contains newlines must be reduced by OUT or NEWLINE.
+
+ How can I know that a statements can be followed by else in
+      if cond : statements else: statements
+   or
+      IfHead IfTail
+ bit not by 'if' in
+      statements statement
+
+ Maybe I could have a $sol token.
+  If at $sol, and cannot shift then try shifting $sol..
+  then
+    statements -> statements $sol statement
+  or maybe $eol is better, then we can have NEWLINEs start start of statement.
+  OR maybe either... $linebreak is shifted if previous or next is NEWLINE
+  An IN doesn't allow a LINEBREAK.
+
+Just to repeat myself:
+  Arg.  I might not know to reduce something until I've seen an IN.  It is the
+ 'else'
+
+After above, new approach:  IfHead and Statement have NL versions, nothing else does.
+MAybe fixed the above...
+
+So
+
+  StatementNL -> Statement NEWLINE | IfHeadNL
+  Statements -> StatmentNL | Statements StatementNL
+  StatementList -> Statement | Statements
+
+  Block -> : Statements | Open Statements Close
+
+  IfStatement -> IfHead | IfHead IfTail | IfHeadNL IfTail
+  IfHead -> if Expr Block
+  IfHeadNL -> IfHead NEWLINE | IfHeadNL NEWLINE
+  IfTail -> else Block | else IfStatement
+
+Close, but the IfHeadNL in "else IfStatement" cannot accept newlines.
+What if
+   IfHead -> if Expr Block | IfHeadNL else IfHead | IfHead else IfHead
+
+No, I think I have to take a totally different approach.
+
+IfPart elsepart switchpart whilepart etc are all syntactically valid
+as stand-alone statements in the base grammar.
+We use the code to fail a stand-alone elsepart the isn't preceeded by an ifpart, whilepart or casepart.
+
+So statements never contain newlines, only the statement-list does.
+If puts a NEWLINE at the end of each statement
+ statementlist -> statement | statementlist NEWLINE statement
+statement can be empty string, thus allowing blank lines and a NEWLINE at the end,
+which the parser will require.
+
+
+[[ thought experiment - interestibg, but gets unwieldy with
+   more complex statements
+statements -> simpleline NL statements
+ | ifhead NL iftail NL statements
+ | ifhead NL statements
+ | ifstatement NL statements
+
+iftail -> else block
+ | else ifhead iftail
+ | NL iftail
+
+ifhead -> Newlines if expr block
+ifstatement -> ifhead iftail
+]]
author	NeilBrown <neil@brown.name>
	Sat, 19 Sep 2020 05:59:02 +0000 (15:59 +1000)
committer	NeilBrown <neil@brown.name>
	Sat, 19 Sep 2020 05:59:02 +0000 (15:59 +1000)
00-TODO		patch \| blob \| history
Ocean-functions	[new file with mode: 0644]	patch \| blob
Ocean-types		patch \| blob \| history
twod		patch \| blob \| history