parsergen: update description to match current reality.

author NeilBrown <neil@brown.name>

Fri, 3 Oct 2014 04:52:16 +0000 (14:52 +1000)

committer NeilBrown <neil@brown.name>

Fri, 3 Oct 2014 04:52:16 +0000 (14:52 +1000)
author NeilBrown <neil@brown.name>
Fri, 3 Oct 2014 04:52:16 +0000 (14:52 +1000)
committer NeilBrown <neil@brown.name>
Fri, 3 Oct 2014 04:52:16 +0000 (14:52 +1000)
diff --git a/csrc/parsergen.mdc b/csrc/parsergen.mdc

index 7ac1e0bae1a920dc3489379bd345fb02a6db4c58..ae7087ef7ccd677958cdb80256ca8db158a61d4d 100644 (file)
--- a/csrc/parsergen.mdc
+++ b/csrc/parsergen.mdc
@@ -2437,20 +2437,23 @@ The `state` is the most important one and guides the parsing process.  The
  freeing function.  The symbol leads us to the right free function through
  `do_free`.
  
  freeing function.  The symbol leads us to the right free function through
  `do_free`.
  
-The `indents` count tracks the line indents in the symbol.  These are
-used to allow indent information to guide parsing and error recovery.
+The `indents` count tracks the line indents with in the symbol or
+immediately follow it.  These are used to allow indent information to
+guide parsing and error recovery.
  
  `since_newline` tracks how many stack frames since the last
  start-of-line (whether indented or not).  So if `since_newline` is
  
  `since_newline` tracks how many stack frames since the last
  start-of-line (whether indented or not).  So if `since_newline` is
-zero, then this symbol is at the start of a line.
+zero, then this symbol is at the start of a line.  Similarly
+`since_indent` counts the number of states since an indent, it is zero
+precisely when `indents` is not zero.
  
  `newline_permitted` keeps track of whether newlines should be ignored
  
  `newline_permitted` keeps track of whether newlines should be ignored
-or not, and `starts_line` records if this state stated on a newline.
+or not.
  
  The stack is most properly seen as alternating states and symbols -
  states, like the 'DOT' in items, are between symbols.  Each frame in
  our stack holds a state and the symbol that was before it.  The
  
  The stack is most properly seen as alternating states and symbols -
  states, like the 'DOT' in items, are between symbols.  Each frame in
  our stack holds a state and the symbol that was before it.  The
-bottom of stack holds the start state, but no symbol, as nothing came
+bottom of stack holds the start state but no symbol, as nothing came
  before the beginning.
  
  ###### parser functions
  before the beginning.
  
  ###### parser functions
@@ -2474,12 +2477,15 @@ before the beginning.
  
  Two operations are needed on the stack - shift (which is like push) and pop.
  
  
  Two operations are needed on the stack - shift (which is like push) and pop.
  
-Shift applies not only to terminals but also to non-terminals.  When we
-reduce a production we will pop off entries corresponding to the body
-symbols, then push on an item for the head of the production.  This last is
-exactly the same process as shifting in a terminal so we use the same
-function for both.  In both cases we provide a stack frame which
-contains the symbol to shift and related indent information.
+Shift applies not only to terminals but also to non-terminals.  When
+we reduce a production we will pop off entries corresponding to the
+body symbols, then push on an item for the head of the production.
+This last is exactly the same process as shifting in a terminal so we
+use the same function for both.  In both cases we provide the symbol,
+the number of indents the symbol contains (which will be zero for a
+terminal symbol) and a flag indicating the the symbol was at (or was
+reduced from a symbol which was at) the start of a line.  The state is
+deduced from the current top-of-stack state and the new symbol.
  
  To simplify other code we arrange for `shift` to fail if there is no `goto`
  state for the symbol.  This is useful in basic parsing due to our design
  
  To simplify other code we arrange for `shift` to fail if there is no `goto`
  state for the symbol.  This is useful in basic parsing due to our design
@@ -2489,17 +2495,13 @@ function reports if it could.
  `shift` is also used to push state zero onto the stack, so if the
  stack is empty, it always chooses zero as the next state.
  
  `shift` is also used to push state zero onto the stack, so if the
  stack is empty, it always chooses zero as the next state.
  
-So `shift` finds the next state.  If that succeed it extends the allocations
-if needed and pushes all the information onto the stacks.
+So `shift` finds the next state.  If that succeeds it extends the
+allocations if needed and pushes all the information onto the stacks.
  
  
-Newlines are permitted after a starts_line state until an internal
-indent.  So we need to find the topmost state which `starts_line` and
-see if there are any indents other than immediately after it.
-
-So we walk down:
-
--  if state starts_line, then newlines_permitted.
--  if any non-initial indents, newlines not permitted
+Newlines are permitted after a `starts_line` state until an internal
+indent.  If the new frame has neither a `starts_line` state nor an
+indent, newlines are permitted if the previous stack frame permitted
+them.
  
  ###### parser functions
  
  
  ###### parser functions
  
@@ -2557,9 +2559,9 @@ So we walk down:
  
  `pop` primarily moves the top of stack (`tos`) back down the required
  amount and frees any `asn` entries that need to be freed.  It also
  
  `pop` primarily moves the top of stack (`tos`) back down the required
  amount and frees any `asn` entries that need to be freed.  It also
-collects a summary of the indents in the symbols that are being
-removed. It is called _after_ we reduce a production, just before we
-`shift` the nonterminal in.
+collects a summary of the indents and line starts in the symbols that
+are being removed. It is called _after_ we reduce a production, just
+before we `shift` the nonterminal in.
  
  ###### parser functions
  
  
  ###### parser functions
  
@@ -2614,9 +2616,9 @@ copying, hence `memdup` and `tokcopy`.
  
  ### The heart of the parser.
  
  
  ### The heart of the parser.
  
-Now we have the parser.  If we can shift, we do, though newlines and
-reducing indenting may block that.  If not and we can reduce we do.
-If the production we reduced was production zero, then we have
+Now we have the parser.  If we can shift we do, though newlines and
+reducing indenting may block that.  If not and we can reduce we do
+that.  If the production we reduced was production zero, then we have
  accepted the input and can finish.
  
  We return whatever `asn` was returned by reducing production zero.
  accepted the input and can finish.
  
  We return whatever `asn` was returned by reducing production zero.
@@ -2629,16 +2631,23 @@ When we find `TK_in` and `TK_out` tokens which report indents we need
  to handle them directly as the grammar cannot express what we want to
  do with them.
  
  to handle them directly as the grammar cannot express what we want to
  do with them.
  
-`TK_in` tokens are easy: we simply update the `next` stack frame to
-record how many indents there are and that the next token started with
-an indent.
-
-`TK_out` tokens must either be counted off against any pending indent,
-or must force reductions until there is a pending indent which isn't
-at the start of a production.
-
-`TK_newline` tokens are ignored precisely if there has been an indent
-since the last state which could have been at the start of a line.
+`TK_in` tokens are easy: we simply update indent count in the top stack frame to
+record how many indents there are following the previous token.
+
+`TK_out` tokens must either be canceled against an indent count
+within the stack.  If we can reduce some symbols that are all since
+the most recent indent, then we do that first.  If the minimum prefix
+of the current state then extents back before the most recent indent,
+that indent can be cancelled.  If the minimum prefix is shorter then
+the indent is premature and we must start error handling, which
+currently doesn't work at all.
+
+`TK_newline` tokens are ignored unless the top stack frame records
+that they are permitted.  In that case they will not be considered for
+shifting if it is possible to reduce some symbols that are all since
+the most recent start of line.  This is how a newline forcible
+terminates any line-like structure - we try to reduce down to at most
+one symbol for each line where newlines are allowed.
  
  ###### parser includes
         #include "parser.h"
  
  ###### parser includes
         #include "parser.h"
author	NeilBrown <neil@brown.name>
	Fri, 3 Oct 2014 04:52:16 +0000 (14:52 +1000)
committer	NeilBrown <neil@brown.name>
	Fri, 3 Oct 2014 04:52:16 +0000 (14:52 +1000)