--- /dev/null
+
+Pointers is one of the particular areas where I wanted to innovate in
+the design of Ocean.
+
+Traditionally (in lanuages like C) pointers are both powerful and
+dangerous. My goal is to control that power without extinguishing it,
+so they become powerful with more clarity, and without the danger.
+
+Pointers have always been overloaded - the can store either a pointer
+to an actual object, or a 'nil' (or 'NULL') pointer, which isn't an
+object and cannot be dereferenced safely. This provides power (it
+enables simple finite linked lists) and danger (it is too easy to try
+to dereference the `nil` pointer). In Ocean, pointers that can be
+`nil` are a different (though related) type to those which cannot. A
+pointer which can be `nil` may only be dereferenced after a test
+confirms that it isn't, in fact, `nil`. This requires more careful
+notation of the types of all pointers, but is easily checked by the
+compiler so notation errors are easily caught and corrected.
+
+As well as allowing `nil`, it is sometimes useful to allow other
+non-pointer values to be stored in a pointer object. The Linux kernel
+does this a lot, and the are two specific cases that are used.
+In the first case, small integers (normally negative) can be stored in
+a pointer variable. These are used as error codes. As the first page
+of virtual memory and also the last page are not mapped, any pointer
+with an absolute numeric value less than the page size cannot be a
+valid pointers and can be treated as something else. Zero is used for
+the `nil` pointer, negative numbers are used as errors, positive
+numbers are largely unused. Ocean will make this pattern explicit in
+the types: a pure pointer will not have one of these values and can be
+dereferenced. A loaded pointer might have a reserved value and can
+only be dereferenced after a test. If the test fails, the numeric
+value can be extracted.
+
+The second case relies on the fact that on all supported
+architectures, any pointer to a 16-byte (or larger) value must be (at
+least) 2-byte aligned, so the numeric value of the pointer will be
+even. This means that odd values cannot be valid pointers, so those
+values can be used for something else. This provides a secord, or
+third, level of "loading" that can be declared for a pointer. The
+Linux kernel uses these in hashtables that support lockless lookups
+and movement of entries between chains. The loaded value is treated
+like a `nil` pointer, but records which hash chain has come to an
+end. If a search find the wrong hash value at the end of the chain,
+it know that it might have been diverted between chains and needs to
+re-check.
+
+The odd pointer values can also be used in a different way. In each
+of indicating a different interpretation of the whole pointer, the
+least significant bit can be treated as a separate value that is
+stored in the pointer - to optimize space usage. A particular use for
+this is as a lock bit, again in a hash table though this time in the
+head. Before a thread may change a hash chain, it must atomically
+change the least significant bit from zero to one. After the change,
+it can clear the bit, possibly as a sige effect of updating that first
+pointer.
+
+The Ocean language shouldn't insist that the various non-pointer
+values are error codes, hash values, or lock bits, but it will make
+them available for the programmer to use in a controlled way, and will
+ensure that a pointer that can contain non-pointer values is always
+properly checked.
+
+Apart from `nil` pointer dereferences, the main danger with pointers
+involves the interaction with memory allocation and freeing. If
+memory is released (and then reused) while a pointer to the memory is
+held, future dereferences of that pointer are likely to cause
+problems. In order to avoid these errors, the language must ensure
+that no such references remain when memory is freed. Once it does
+that, it can help the programmer by actively freeing the memory once
+the number of valid references reaches zero.
+
+This tracking of references is sometimes done using a process referred
+to as "garbage collection". All pointer that are (or could be)
+active, and all locations they point to are marked. Anything not-marked
+can be freed. This has a degree of conceptual simplicity and a degree
+of practical inelegance. The implementation can supposedly be made
+quite efficient, but it isn't the approach that I want to take.
+
+The alternate approach is to enhance the type system so that the
+language "knows" when a pointer is active or not, and how many active
+pointers there are to any given location. When there is only one
+pointer and it become inactive, the memory can be free. The simplest
+implementation of this pattern is to store a reference-counter in each
+allocated object, and to use it to keep count of all references. This
+is a useful pattern and in some circumstances it is the best possible
+pattern, but it is not the only valid pattern. Ocean will support
+this natively by allowing a field on a structure to be identified as
+reference counter, but it will allow other options as well.
+
+Points can be classified as either "owned" or "borrowed" references.
+This will probably be determined dynamically from code analysis,
+though in some cases such as function parameters and returns, it must
+be declared. An owned reference is one that holds a reference count
+or by some other mechanism is an independent reference which will
+prevent the object from being freed, until some action happens on the
+owned references to releasse it. A "borrowed" reference is a
+temporary reference that exists under the umbrella of some
+identifiable owned reference. The language definition will require that
+the borrowed reference have limited scope, and that the related owned
+reference will not be released while that scope is still active.
+
+A simple example of a borrowed reference happens when an owned
+reference is passed as a parameter to a function which declares the
+formal parameter to be a borrowed reference. Inside the function, the
+borrowed reference cannot be used to release the reference, and the
+owned reference much remain intact for the duration of the function
+call.
+
+It may be that the owned reference cannot be uniquely identified. As an
+example, there might be two owned references, and an conditional
+statement assigns one or the other to a borrowed pointer. In this case
+it isn't possible to know at time of analysis which owned pointer is the
+primary, so both (or all) possible primaries must be preserved.
+
+The "Scope" of the borrowed reference that the owned reference must be
+preserved for will often a a lexical scope, but may not always be. I
+imaging that the language may at some stage be able describe other
+temporal relationships between different objects. In that case, the
+borrowed reference must have a life time contained within the
+guaranteed lifetime of the matching borrowed pointer.
+
+I should say that "Garbage collection" will be an option, just not the
+only or the default. Language can potentially identify exactly the
+references that need to be checked. Maybe collectable pointers
+reserve the lsb for mark, prior to sweep.
+compiler would extract a description of places that gc refs can live,
+and encourage them in specific domains.
+
+We might also make an allocation domain a well defined concept.
+
+## Other Ownership
+
+There are a variety of sorts of owned references. They include:
+
+- counted references. These are created by incrementing a
+ reference-count field, which is then decremented when the reference
+ is destroyed.
+
+- single references. Some objects are declared as only ever having a
+ single owned reference. This can be moved from pointer to pointer
+ but never duplicated. When the reference is destroyed, so is the
+ object.
+
+- inherited references. An ....
+
+## Rules for borrowing references.
+
+Borrowed references can live in automatic variables and in structures
+(including arrays). They can only live here as long as the owned
+reference is stable.
+For automatic variables, we can (hopefully) deduce the relevant owned
+references - it is what was copied to get a borrowed ref.
+For refs in structures, we need to be explicit for borrowed refs
+That means we need a language and a precise understanding.
+We would need a parent or sibling or child object to have the ref..
+Maybe we just declare the name of the object, where it has to be
+passed in as an arg to be part of a parent.
+
+
+----------------------
+Later...
+
+How much of this could/should be implemented as Smart Pointers?
+Rust uses smart pointers to implement
+ Nullable (Option<>), RefCount (rc<>) Atomic refcount (arc<>) and
+ others.
+It even has Box<> to create on heap instead of stack.
+
+Why doesn't I just do that? Partly because I don't have classes yet!!
+
+I like a simple syntax to test if a pointer is over-loaded.
+ if pointer?
+and to get the overload value
+ pointer!
+
+Rust would use pattern matching.
+ if let Some(p) = pointer {
+ use p
+ }
+
+in place of
+ if pointer? :
+ use pointer!
+
+Rc() uses Rc::clone() to take a reference. I would rather that were
+transparent.
+
+Rc and ARc need to be different. I don't want the difference to be
+that obvious.
+
+Box<> is interesting. It is the gateway to the heap. Maybe it is
+just syntax.
+
+I want to be able to store an owned pointer in an ADT without telling
+the ADT the precise type. Either it returns an object to be freed by
+caller, or a 'free' function is passed in.
+... NAH this can be done with normal type parameterization.
+
+So the ownership-type needs to be transparent to a degree.
+i.e. I can have:
+ - a borrowed pointer
+ - an owned pointer of unknown disposition
+ - an owned pointer of specific discipline.
+
+Rust uses & to specify a borrowed reference.
+
+Maybe
+ ^type is a borrowed reference
+ @type is an owned reference with discipline defined by type.
+ discipline@type is an owned reference of a given discipline.
+
+
+How do I want to handle mutability?
+Function can usefully have 'const' markers.
+What are the risks when single-threaded? We could extract a value,
+accidentally change something, then depend on the value.
+We could probably do that anyway...
+
+Pointers are normally (default) overloaded with a small integer. +/2047
+Pointers can have status attributes when stored in structures or
+passed to or from functions.
+ - "pure" means there is no small integer
+ - "loaded" means there is large-integer overloading - if lsb set.
+ - "init" means the target is being initialised
+ - "exit" means it can be torn down
+ - other words can reflect stages in life cycle
+ - still other words might reflect locking status.
+There can be several locking statuses, and several life-time statuses,
+as these can affect different fields differently.
+
+In a struct each field can have a mapping from internal states to
+external state. (init=foo, locked=foobar, rlocked=baz)
+And internal state not listed in inheritet from external states (a=a).
+
+For a function, returned value attribute can be copied from args, as
+can type I suspect
+ fun foo(a:thing, b:thing) : <a|b>
+
+or better
+
+ fun foo<x:thing>(a:x, b:x) : x
+
ifhead -> Newlines if expr block
ifstatement -> ifhead iftail
]]
+
+
+13oct2020 - much later.
+
+ Maybe a different approach to newlines. Never shift them, only use
+ them like indent to guide parsing.
+
+ IN is just recorded
+ OUT must reduce until top symbol contains an IN, which is cancelled.
+
+ Some productions are flagged as $$NEWLINE (or similar) meaning that they can
+ only be reduced as a whole number of lines.
+ When we see a newline, we record that until a non-newline is seen.
+ If we cannot shift and so choose to reduce:
+ if the production doesn't require a newline - we just reduce.
+ If it does, and we have recently seen a newline, again we reduce.
+ Otherwise, we signal an error and (maybe) shift until we see a newline.
+ NOTE the newline only counts if there are no indents.
+
+ This means that
+ a = b
+ + c
+
+ would be parsed as a statement, which I don't want.
+ But I do want
+ if foo: bar
+ else: baz
+
+ to be a single statement.
+ So a new line in some context must force a reduction, like OUT.
+ When exactly does a newline force?
+ An OUT forces a reduction until the top symbol has the matching indent.
+ A NEWLINE ?? Must depend on the grammar.
+
+
+
+ Q: Can this be only about resolving ambiguity and reporting errors?
+ If the default is to shift if we can, reduce otherwise, then
+ 2D causing an early reduce can resolve ambiguity.
+
+ If certain productions must end at EOL, and error is thrown if they don't.
+
+ That latter really must be a gramatically need to shift a newline.
+
+ Maybe I'm wrong about reporting errors. Maybe they should always come from
+ a failure to shift or reduce.
+
+ One of my problems was
+ if foo: if bar: dostuff
+ where the newline at the end needs to reduce everything, so I can
+ require at most one newline in the grammar.
+ It probably should be required for the first 'if'
+
+ Stmt -> if Cond : StLstNL
+ StLstNL -> StLst NL
+
+ StLst -> Stmt
+ | StLst ; Stmt
+ | SlLst NLs Stmt
+ | NLs StLst
+
+ IfSt -> IfHead
+
+
+ NONO. I don't want some statements that end with NL and some that don't.
+
+ - Record indent with preceding token and line-start with following
+ - when reducing all indents are accumulated into the new symbol,
+ and the line-start of first symbol is preserved. Other line-starts are dropped
+ - When OUT in lookahead, there is an indent within 'min_prefix' of top state,
+ decrement closest indent count and discard the OUT
+ - When OUT in lookahead and cannot be discarded and top state can be reduced,
+ reduce it
+ - When OUT canot be discarded or reduce, and error is triggered.
+ - No production can be reduced if it contains unmatched indents. If cannot shift,
+ then error.
+ - Some productions are marked as $LINE
+ - any state that contains an item at the start of a $LINE is marked 'starts line'
+ - if there is an indent more recent than a starts-line state, or the top
+ state starts_Line, newlines are ignored.
+ - if a newline is seen but not ignored, and the top state can be reduced,
+ it is reduced.
+ - A $LINE production can only be reduced by a newline, or, if it didn't start
+ at start-of-line, by an OUT
+
+ so
+ Block -> : statementlist $LINE
+
+ I want to be sure that
+ if cond {
+ statements
+ } else {
+ more
+ }
+
+ is OK. Here the "IfHead" doesn't end on a newline, but the state that is
+ expecting 'Block' 'starts-line'. I guess as "Block -> { statementslist }"
+ isn't a $LINE, it can be reduced by 'else'.
+
+24oct2020
+
+ Maybe I don't want TK_out to force a reduce (mostly). Maybe the grammar
+ should be in control. So when we see an OUT we record it off-stack.
+ If we reduce for some other REASON and the top state contains an indent,
+ we cancel.
+ When we decide to SHIFT, if the top of stack has -ve indent, that is an
+ error.
+ So with C,
+ if (cond) {
+ s1;
+ s2;
+ s2;
+
+ would show an error when s2 was shifted.
+ if (cond)
+ s1;
+ s2;
+ needs to show an error too. The "if(Cond)s1;" statement would have an
+ internal indent.
+
+ So maybe the requirement is that no symbol has unbalanced internal
+ indents.
+ When we shift we require that top balances first.
+ So we count up outs and ins. There can be several OUTs and at most on
+ IN before a terminal.
+ If the terminal can be shifted, the top symbol must have no internal
+ indents after cancelling with out, and there must be no outstanding
+ TK_outs
+ If the terminal cannot be shifted we reduce and we must have enough
+ outs to balance anything within the reduction.
+
+ So indents on the stack are always between symbols. Each frame has
+ a symbol, the indent after it etc.
+
+ If we don't have enough outs to complete a reduction we raise an error
+ and need to discard (or shift/reduce) until we find an 'out'.
+ If there are too many outs to shift we again need an error which will
+ hopefully let us reduce and cancel some outs - just discarding states
+ will help with that.
+
+ So: outs must cancel, following token must envourage reductions
+
+ Some productions end with newline, their start state is 'starts-line',
+ and if there is an indent since a starts-line, we ignore newlines.
+
+ If we don't ignore newlines, then there is a starts-line state that we
+ should reduce to. If we don't have enough symbols yet, we error.
+ So if we see a newline, we reduce until the newline would be ignored.
+
+ If that state expects a newline and we don't have one, we error.
+ Except.. I sometimes want OUT to work like a newline.
+ If I need a newline and reduction isn't to start of line
+
+ If state expects newline and we don't have one but do have OUTs
+ we reduce as long as there are enough outs and raise an error if
+ we end up at start of line.
+
+
+ Do I want to require that a newline be shifted? We can synthesize
+ one when minprefix is less than since newline and we have an out.
+
+
+ Summary of stuff.
+ A shift requires there be no pending OUT ??
+ A reduce requires there be no pending indent, i.e that
+ indents within the reduction don't exceed outs.
+ NO this doesn't work.
+ A IN increments the indent on the top frame - never causes error
+ An OUT decrements indent in a frame within min_prefix, or error
+ A NEWLINE is either ignored or shifted, but is duplicated when
+ shifted.
+ After an OUT is handled, if a NEWLINE is expected but will only
+ reduce a partial line, it is provided.
+
+ if foo: if bar:
+ stuff
+ bazz
+
+ This effectively pretends there was a IN at the start of the "line",
+ and we are not doing two OUTs with a NEWLINE between.
+ if foo:
+ if bar:
+ stuff
+ baz
+
+
+ Do I really want to shift newlines:
+ yes: simpler grammer
+ no: don't need to record min-prefix.
+
+ I think 'yes'.
+
+ So discard $NEWLINE
+ Place NEWLINE in the grammar
+ When completing a state, if any production contains NEWLINE, record as
+ StartsLine
+
+
+ ---
+ My idea of ignoring newline when the top state is a starts_line state isn't
+ working.
+ The idea was that reducing
+ statement -> assignment NEWLINE
+ would leave at a starts_line start, so NEWLINEs would be ignored.
+ But this leaves us at a ' ... statement . ' state
+ e.g. " Statementlist -> Statement . "
+ which reduces to "Statementlist -> Statementlist . Statement"
+
+
+
+30oct2020
+
+ Review: because it isn't working (as usual).
+
+ Indenting have 2 effects:
+ - it triggers errors when something looks wrong
+ - it causes newlines to be sometimes ignored.
+
+ Specifically:
+ - A symbol that starts indented must finish while still indented.
+ So an OUT cannot cancel an IN except in the top phrase, and an attempt to
+ shift while there are pending OUTs is an error.
+ - A symbol that begins at the start of line and contains an indent may only
+ be the top symbol - shifting before an OUT cancels the indent is an error.
+
+ - Newlines are ignored when there is an indent since a starts-line state
+
+ Procedurally:
+ - There can be one IN and several OUTs pending - we update the counters
+ as they are seen.
+ - If there are pending OUTs and their is an indent in the top phrase,
+ we can cancel an indent.
+ - a SHIFT becomes an error if the top symbol has an indent and started
+ at beginning of line
+ - a SHIFT becomes an error if there are pending OUTs
+ - a SHIFT that is not an error first imposes the IN on the top of stack
+ - REDUCE may happen at any time.
+
+ Newlines, when not ignored, are repeating terminals. An indefinite series
+ is generated until a state is reached where they are ignored.
+ The expectation is that they will be shifted (possibly after some reductions)
+ and then a reduction will happen which changes the state appropriately.
+
+ It is vital that no recursive production keeps consuming newlines
+ foo -> foo NEWLINE
+ as that will be fed indefinitely, or if newlines are ignored, will never reduce.
+
+ NEWLINEs are recognized if there is a starts_line state since the most recent
+ indent, but not when the current state is starts_line. So once we reach
+ starts_line, we ignore newlines until something else is shifted.
+ Or maybe only a starts_line state at the start of a line?
+
+ A starts_line state has an item "head -> . stuff NEWLINE"
+ So if were just reduce 'foo -> foo NEWLINE' then we are at the end of foo,
+ so an earlier state with have ".foo" and so "foo -> . foo NEWLINE" and so will
+ be starts_line. So we will always accept a NEWLINE. So these must be disallowed.
+
+ How does "if cond : if cond : print NEWLINE" parse?
+ "block -> : slist NEWLINE"
+ "slist -> statment | slist statement"
+ "statement -> print NEWLINE | if cond block NEWLINE"
+
+ so starts_line are before statement or ':' - I don't want that as it would allow a newline
+ before ':' to be ignored.
+ Maybe I want starts AFTER a NEWLINE?? Did I do that before?
+ So after block, after statement, after slist.. but we expect NEWLINE there.
+
+ Maybe the important states are "foo -> baz . stuff NEWLINE".
+ Once we've entered a path to a NEWLINE, we need to be ready for it.
+ So: after :, after print after if and cond and block
+
+ The repeating newlines are hurting now. The rule to start ignoring them
+ isn't working.
+ ifhead else if expr : statementlist [NEWLINE]
+ the NEWLINE should be ignored.
+ There is an indent after ':' but final state is startsline because this could
+ be the end of a block -> : statementlist NEWLINE
+ But I want that newline to be ignored because of the indent.
+
+ if (A)
+ b;
+ c;
+ why is that bad? How can we know? Because we reduce to start-of-line
+ and still have an indent.
+
+ a =
+ b + c
+ + d
+ *
+
+ I need some good examples to remind myself.
+ Why do I want min_prefix? I allow indents to be cancels within min_prefix
+ cmd -> { statements }
+
+
+01nov2020
+ ARRRGGHHH I keep going down rabbit holes. I need to take a more mathematical
+ approach and present clearly what I have and what I need.
+ The big issue, for the moment at least, is when to ignore newlines.
+ I REALLY like the idea of newlines being repeating until they are ignored.
+ I definitely like that they are ignored after an indent unless they are explicitly
+ expected. I need to ensure they are not expected at awkward places, like the
+ start of a line.
+
+ I want to be careful of making LL assumptions about the LR grammar. i.e. I cannot
+ assume the end from the beginning - that is was LL does. I want the appearance
+ of a newline to only make a difference once it appears. But at the point that
+ it appears we need to know if it is to be ignored. In
+ a = b + NEWLINE
+ it isn't ignored event though it isn't expected exactly here. In
+ a = b + (IN)
+ c + NEWLINE
+ it is ignored and the difference is that no state since the IN expects a
+ newline.
+
+ To acknowledge LR approach we need to take the latest state. The start of
+ line, before the 'a' is tempting but we hardly know anything there. After the 'a'
+ is tempting, but that would mean after a[b+c].thing(foo) as well which seems to
+ miss the point.
+ Maybe something earlier sets the expectation?
+ if cond :
+ The ':' says newline-separated things are coming. So it is the state
+ after the ':' which says 'newlines matter'. This will be after the indent.
+ So if we saw
+ if cond {
+ then maybe newlines don't matter (depending on grammar)
+
+ But a newline is ignored *at* that state, so that blank lines can be skipped.
+ I think there is a 'statementlist' symbol and at the start or end of that symbol
+ newlines are ignored. Within that symbol unindented newlines are significant.
+ statementlist -> statement NEWLINE
+ | statementlist statement NEWLINE
+
+
+02nov2020
+ Can I drop the 'min-prefix' concept?
+ The means that OUT can only cancel against an indent 'in' or immediately before
+ the top symbol. This means we go back to keeping a 'starts_indented' flag and
+ storing each indent with the following symbol.
+
+ Then the 'ignore newline' test is since_starts_line > since_indent
+ but since_indent is 0 fir there are internal indents and 1 if there is only a
+ starts_indented indent.
+
+ So a TK_in simply sets starts_indented for next
+ TK_out is recorded
+ Whenever top symbol has indents and outs are recorded, we cancel.
+ If we want to shift with pending outs, that causes an error
+
+ I need a reason to signal an error for
+ if (a)
+ b=1;
+ c=1;
+ based on indents. equally for
+ if (c)
+ b=1;c=1;
+ If indent is a continuation, then 'c=1;' but be part of 'if...' before
+ it is part of what comes before?
+ This means the OUT will not be able to see the IN ... unless the whole
+ statement list ends here....
+ In "a = IN b + c ; " the ';' causes a reduction to Assignment with indent
+ so I must be happy for the 'c=' to reduce to 'ifstatement' with an indent,
+ but reducing to the recursive 'statementlist' seems wrong. Maybe these
+ left-recursive productions really are special.
+ But what about 'expr -> expr + expr' ??
+
+ The rule I'm depending on is that I cannot shift with a pending OUT.
+ So everything since the IN needs to compress to a single symbol.
+ If the code is
+ {
+ if (a)
+ b=1;c=1;
+ }
+ then the '}' will reduce to "{ statementlist" before shifting, so the OUT will be
+ cancelled. The question is: how can we prevent that in the grammar?
+ There are 2 OUTs (in this case) does that help?
+ A comparable case is
+ {
+ foo;
+ if (a)
+ b=1;c=1;
+ }
+ In that case the two OUTs will be cancelled at clearly different times, but not real
+ difference.
+
+ {
+ foo;
+ if (a)
+ b=1;c=1;
+ bar;
+ }
+
+ Now the 'bar' will be blocked from shifting.... no it won't.
+
+ I need a new rule - this *must* be wrong, even when we ignore newlines.
+ But why? Because "b=1;c=1;" looks like a statementlist, but isn't.
+ It isn't because a statementlist doesn't belong here, ony a block or a statement.
+ How do I know that indent is continuing the 'if', not the statementlist.
+ I need some marking on statementlist to say "Cannot be continued by indent"
+ I guess that is what NEWLINE is for.
+
+ So the rules would be "a suitably marked production cannot be reduced with
+ outstanding indents. If nothing can be shifted, an error must be raised.
+ These will typically be productions that end in a NEWLINE, but maybe any
+ production from a symbol that can produce a NEWLINE, or that is marked $$something
+
+ Back to TK_newline. These are duplicated on shift, but discarded when not wanted.
+ They are not wanted unless there is a starts-line start below the current state,
+ and more recent than the most recent indent.
+ A starts-line state is any state which contains an item with a NEWLINE and dot
+ at the start.
+
+ statementlist -> statement NEWLINE
+ | statementlist statement NEWLINE
+
+
+ Q: how do nested 'if's resolve?
+ if cond:
+ if cond2:
+ print
+ else:
+ die
+
+ statement -> if cond block
+ | if cond block else block
+
+ The tokenization would be
+ if cond : IN if cond2 : IN print NL OUT NL OUT IN else : IN die NL OUT NL OUT NL
+ 1 2 A 2 B 1 3 4 C 4 D 3 E
+ so NLs:
+ A makes 'print' a statment,
+ B makes 'if cond2' a statment,
+ C makes 'die' a statement
+ D is protected by IN/3 and is ignored.
+ E makes if cond .. else.. a statement
+
+ What if I also had
+
+ | if cond block NEWLINE else block
+
+ Then on seeing the NEWLINE I wouldn't know whether to shift it,
+ or reduce to a statement.
+ To fix that I would need:
+ statementlist -> statement | statementlist statement
+ statement -> simplestatements NEWLINE
+ | if cond block NEWLINE
+ | if cond block else block NEWLINE
+ | if cond block else statement
+ | if cond block NEWLINE else block
+ | if cond block NEWLINE else statement
+
+ But wait.... Allowing NEWLINE before 'else' confuses the parse of 'if cond:...' above.
+ NL-B makes "if cond block NEWLINE" and then the else can be shifted, but that is illegal
+ because of the negative indent. That means the negative indent must prevent SHIFT.
+
+
+ Do I still need to know which symbols are 'line-like' ??
+ Yes, to make it easier to detect line-line productions which must contain unmatched indents.
+ Do I need recursive line-like detection?
+ For no-indent productions, it probably makes sense.
+ For starts-line detection? I don't think so. 'block' isn't the start of a line.
+ And not really needed for no-indent
+
+
+ Do I need to worry about left-recursive symbols?
+ I don't think so. There should always be some terminal - often NEWLINE -
+ which will ensure the symbol isn't extended??.. or if it did we would get
+ a parse error due to uncancelled OUT
+
+ Do I need to split indents?
+ A stack frame holds "symbol state". Indents are within or before the symbol.
+ For cancelling with OUT, indents within and before the top state are equivalent.
+ For hiding newlines the indent before the symbol is too distant. f->next_indented ? "/" : "",
+
+ So: yes.
+ Should I store them in the stack?
+ "symbol+indents indent state"
+ So when we see an indent, we mini-push that
+ Any indent since state protects newline
+ Out can be cancelled if top state has indents, or previous has trailing indent.
+
+ Rather than an ignore_newline flag we need a outs_needed flag. This is how
+ many outs are needed before we process newlines.
+ If state is startsline, then outs_needed == infinity.
+ Else it is a count of indents since a starts-line state.
+ So maybe it is an indents_since_starts_line and we test state and indents.
+
+ So:
+ 1/ rearrange frame to include 'indented' in the top frame
+ 2/ replace ignore_newline with indents_on_line
+
+14nov2020
+ I'm getting close...
+ Problem is correctly ignoring of newlines.
+ I ignore them after an indent - that bit is easy.
+ But I also ignore them at a start-line symbol - sometimes.
+
+ INDENT statementlist NEWLINE should ignore the newline
+ but
+ INDENT statementlist if expr : stl if expr : stl NEWLINE should not
+
+ (/0s) Statementlist (11s) IfHead (5) else (22) if (3) Expression (19) : (29s) Statementlist (40s) [NEWLINE:22:0] - Ignore
+
+ This shouldn't Ignore. Is that because there is a non-'s' since the '/' ??
+
+ i.e Ignore newlines if all states since indent are startsline, or if none are.
+ Don't count current state if it is startsline
+ indents_on_line > outs || all_startsline(...)
+
+ No - better, but no banana. if expr { statement; newline
+ need to ignore even without the indent.
+ So I need a new type of state - one that is at the start of Statementlist but
+ not at the end. State 29 in the above, but not 40
+
+(/0s) Statementlist (11s) if (3) Expression (19) { (30s) Statementlist (39s) [NEWLINE:34:0] - ERROR
+
+18nov2020
+ So:
+ I introduce "startsline" and "endsline" states.
+ A state before a linelike symbol is startsline, a state after such a
+ symbol is endsline. A state can be both, in which case we consider it 'endsline'.
+ So 11 endsline, 30 startsline 39 endsline
+
+ If there is an indent since the last startsline or endsline state, we ignore NEWLINE.
+ If there are only start/end line states since indent, ignore newline
+ if there are only endsline states since startsline, ignore newline - NO.
+
+
+ I wonder if maybe I should have IN and OUT in the grammar.
+ The tokens can still appear anywhere, but the production can only be reduced of there is an IN there.
+ Block -> : IN Statementlist OUT | : SimpleStatements
+ might get messy. But it already is messy.
+
+20jan2021
+ Where am I?
+ The big questions seems to be: when can/must I ignore NEWLINEs?
+ Conversely when do they REDUCE or get SHIFTed?
+ The answer must lie in what appears between the most recent INDENT and the NEWLINE.
+ - if there are no startsline states, we can ignore.
+
+ It seems that I need to be able to see the start of a block, either an indent, or a {..
+ But if there is a statementlist, surely a block started? No, that misses a point.
+ INDENT ignores newlines because there will be a matching OUT
+ Maybe { causes NEWLINEs to be ignored because there is a matching } expected?
+
+ If a startesline state is at the end of a productions, it plays no role in NEWLINEs,
+ but if it is followed by something
+
+23jan2021
+ review/summary.
+ We track indents and newlines. The goal is resolve ambiguities and detect errors.
+ Ambiguities are resolved by forcing a REDUCE in some circumstances when an OUT or NEWLINE is seen.
+ Errors happen when an there are too many OUTs.
+ NEWLINEs are a normal part of a grammar, except that they get ignored sometimes when they are not relevant.
+ and are protected.
+
+ 1/ a production from a lineline symbol cannot be reduced while it contains an unbalanced indent,
+ and so must be reduced before an excess OUT token is processed.
+ 2/ NEWLINES are ignored if there is an IN since a start-of-line state
+
+ Otherwise NEWLINEs must be in the grammar. This can be awkward where they are optional
+ such as before an 'else'. To handle the fact that a structured statement can be multiple lines
+ or few we need to build it up from the possible line-like parts.
+ Maybe this means that ifpart, elsepart, whilepart, casepart etc need to be statements which
+ are combined above the grammar level??
+ The reason has something to do with reducing to Statement when the newline is shifted.
+ I wonder if that is really needed.
+ if we have "ifpart -> if cond block" then that can be followed by elsepart, but we
+ need a separate "ifpartNL -> ifpart NEWLINE | ifpartNL NEWLINE" which can also be followed
+ by an elsepart, or can reduce to a statement
+ ..but no. That isn't sufficient if NEWLINEs are indefinitely duplicated.
+ So if we don't ignore newlines where they aren't needed, we cannot duplicate them.
+
+ The need for duplicating was constructs like
+ if cond : if cond2 : action
+ which needed 2 NEWLINEs, one for each statement.
+ Maybe the "if cond2 : action" needs to be a simplestatement.
+ When we see the NEWLINE we can (must?) reduce anything that started since
+ a start-of-line, which creates the simplestatement.
+
+ So let's try dropping the infinite repeat and require newlines in the grammar
+
+ Extra rules:
+ - a production from lineline symbol cannot be reduced while it contains unbalanced
+ indent
+ - a NEWLINE is ignored if no startsline state since indent
+ - a NEWLINE cannot be shifted unless top startsline is at start of line.
+
+ So a NEWLINE is:
+ - ignored if not startsline since indent
+ - forces REDUCE if top startsline is not at start of line !!!!. i.e. don't shift
+ - else can be SHIFTed.
+
+26jan2021 - happy australia/invasion day
+ I want ": simplestatements" to be a valid block, but not when there is an
+ indent in the middle
+ Previously this was resolved by having an SSLine -> simplestatements NEWLINE
+ and the SHIFT of the NEWLINE outweights the reduction to block.
+ And I have that now with Statement -> SimpleStatements NEWLINE...
+ Ahh.. it is because I put Newlines before Statementlist. I guess a NEWLINE
+ needs to be a Statement
+
+ I currently suppress shift unless "outs == 0". That means
+ if cond:
+ statement
+ else:
+ doesn't shift the 'else' because the "out" isn't resolved.
+ Without the leading space, the NEWLINE reduces to ifstatement, and I sort-of know
+ that is an issue.
+ So: why suppress SHIFT when there is an outstanding OUT... or better Question,
+ why is this OUT still outstanding?
+ Ahhh.. I misunderstood. I DO want to reduce that 'if' because it is nested.
+ I have an IfStatement what needs to reduce to a Statement and merge with a
+ Statementlist, so the OUT can be cancelled and then the 'else' shifted.
+ BUT as 'else' is the lookahead.... I need NEWLINE to reduce IfStatement to
+ Statement. But even with a NEWLINE we don't SHIFT that because we still have
+ that OUT to clear. So how do I clear an OUT ?? It can only be cancelled when
+ the matching IN reaches TOS, which requires the NEWLINE to be shifted.
+ We cannot shift the newline early, partly because that would be wrong, and
+ partly because it gets reduced to a statement and disappears.
+
+ Hmmm. After blocking SHIFT(NL) when we don't have a line I see
+ progress but not there yet.
+ if false {
+ print = OK
+ } else {
+
+ After the OK is NL OUT NL
+ The first NL is shifted and reduces down to Statementlist and the OUT cancels.
+ But now the NL doesn't belong. Can I explicitly allow NL before } ??
+ No, because it doesn't even SHIFT. It cannot because the Statementlist doesn't
+ appear to be sol - the OUT hide that and it now looks like
+ if expr { statementlist
+
+ Arggh. I think I need to hide the starts-line state in this context.
+ More thinking needed.
+
+ I introduced the shift-block to make sure the NEWLINE paired with the right line.
+ Here the NEWLINE should probably be ignored. But why? It isn't indented, but it
+ is inside a multiline thingy.
+ Might be better to explicitly allow it and find some way to let it be SHIFTed.
+ Problem is that the Statementlist is linelike, but I don't want that here.
+ Hard to avoid when the statements are and must be linelike.
+ Maybe I need a way to hide linelike-ness.
+
+ Maybe a state should only be startline if the core item has dot followed by
+ a single symbol (which can derive a newline) ??