parsergen: review and update text.

author NeilBrown <neilb@suse.de>

Sun, 11 May 2014 06:04:53 +0000 (16:04 +1000)

committer NeilBrown <neilb@suse.de>

Sun, 15 Jun 2014 07:18:24 +0000 (17:18 +1000)
author NeilBrown <neilb@suse.de>
Sun, 11 May 2014 06:04:53 +0000 (16:04 +1000)
committer NeilBrown <neilb@suse.de>
Sun, 15 Jun 2014 07:18:24 +0000 (17:18 +1000)
diff --git a/csrc/parsergen.mdc b/csrc/parsergen.mdc

index 19d1b5a4e94f6c894853202b8108419668663664..ef01bf31c0e825a4201fe60149252906a18cf615 100644 (file)
--- a/csrc/parsergen.mdc
+++ b/csrc/parsergen.mdc
@@ -1194,7 +1194,7 @@ should happen, so we don't need to search a second time.
         }
  
  Adding an itemset may require merging the LA sets if LALR analysis is
         }
  
  Adding an itemset may require merging the LA sets if LALR analysis is
-happening. If any new LA set add symbol that weren't in the old LA set, we
+happening. If any new LA set adds any symbols that weren't in the old LA set, we
  clear the `completed` flag so that the dependants of this itemset will be
  recalculated and their LA sets updated.
  
  clear the `completed` flag so that the dependants of this itemset will be
  recalculated and their LA sets updated.
  
@@ -1242,9 +1242,9 @@ them to a data structure, of freeing them.
  
  #### The build
  
  
  #### The build
  
-To build all the itemsets, we first insert the initial itemset made from the
-start symbol, complete each itemset, and then generate new itemsets from old
-until no new ones can be made.
+To build all the itemsets, we first insert the initial itemset made
+from production zero, complete each itemset, and then generate new
+itemsets from old until no new ones can be made.
  
  Completing an itemset means finding all the items where "DOT" is followed by
  a nonterminal and adding "DOT=0" items for every production from that
  
  Completing an itemset means finding all the items where "DOT" is followed by
  a nonterminal and adding "DOT=0" items for every production from that
@@ -1471,7 +1471,7 @@ changeover point in `first_nonterm`.
  
  The purpose of the report is to give the grammar developer insight into
  how the grammar parser will work.  It is basically a structured dump of
  
  The purpose of the report is to give the grammar developer insight into
  how the grammar parser will work.  It is basically a structured dump of
-all the tables that have been generated, plus an description of any conflicts.
+all the tables that have been generated, plus a description of any conflicts.
  
  ###### grammar_report
         static int grammar_report(struct grammar *g, enum grammar_type type)
  
  ###### grammar_report
         static int grammar_report(struct grammar *g, enum grammar_type type)
@@ -1483,8 +1483,9 @@ all the tables that have been generated, plus an description of any conflicts.
                 return report_conflicts(g, type);
         }
  
                 return report_conflicts(g, type);
         }
  
-Firstly we have the complete list of symbols, together with the "FIRST"
-set if that was generated.
+Firstly we have the complete list of symbols, together with the
+"FIRST" set if that was generated.  We add a mark to each symbol to
+show if it can end in a newline (`>`), or if it is nullable (`.`).
  
  ###### functions
  
  
  ###### functions
  
@@ -1523,7 +1524,7 @@ set if that was generated.
                 }
         }
  
                 }
         }
  
-Then we have to follow sets if they were computed.
+Then we have the follow sets if they were computed.
  
         static void report_follow(struct grammar *g)
         {
  
         static void report_follow(struct grammar *g)
         {
@@ -1646,7 +1647,7 @@ LR1 are also similar as they have FOLLOW or LA sets.
         }
  
  LR0 conflicts are any state which have both a reducible item and
         }
  
  LR0 conflicts are any state which have both a reducible item and
-a shiftable item.
+a shiftable item, or two reducible items.
  
  LR05 conflicts only occurs if two possibly reductions exist,
  as shifts always over-ride reductions.
  
  LR05 conflicts only occurs if two possibly reductions exist,
  as shifts always over-ride reductions.
@@ -1695,12 +1696,13 @@ as shifts always over-ride reductions.
         }
  
  SLR, LALR, and LR1 conflicts happen if two reducible items have over-lapping
         }
  
  SLR, LALR, and LR1 conflicts happen if two reducible items have over-lapping
-look ahead, or if a symbol in a look-ahead can be shifted.  The differ only
+look ahead, or if a symbol in a look-ahead can be shifted.  They differ only
  in the source of the look ahead set.
  
  in the source of the look ahead set.
  
-We build a dataset mapping terminal to item for possible SHIFTs and then
-another for possible REDUCE operations. We report when we get conflicts
-between the two.
+We build two datasets to reflect the "action" table: one which maps
+terminals to items where that terminal could be shifted and another
+which maps terminals to items that could be reduced when the terminal
+is in look-ahead.  We report when we get conflicts between the two.
  
         static int conflicts_slr(struct grammar *g, enum grammar_type type)
         {
  
         static int conflicts_slr(struct grammar *g, enum grammar_type type)
         {
@@ -1777,7 +1779,7 @@ between the two.
  
  ## Generating the parser
  
  
  ## Generating the parser
  
-The export part of the parser is the `parse_XX` function, where the name
+The exported part of the parser is the `parse_XX` function, where the name
  `XX` is based on the name of the parser files.
  
  This takes a `code_node`, a partially initialized `token_config`, and an
  `XX` is based on the name of the parser files.
  
  This takes a `code_node`, a partially initialized `token_config`, and an
@@ -1785,7 +1787,7 @@ optional `FILE` to send tracing to.  The `token_config` gets the list of
  known words added and then is used with the `code_node` to initialize the
  scanner.
  
  known words added and then is used with the `code_node` to initialize the
  scanner.
  
-`parse_XX` then call the library function `parser_run` to actually complete
+`parse_XX` then calls the library function `parser_run` to actually complete
  the parse.  This needs the `states` table and function to call the various
  pieces of code provided in the grammar file, so they are generated first.
  
  the parse.  This needs the `states` table and function to call the various
  pieces of code provided in the grammar file, so they are generated first.
  
@@ -1815,9 +1817,9 @@ pieces of code provided in the grammar file, so they are generated first.
                 fprintf(f, "}\n\n");
         }
  
                 fprintf(f, "}\n\n");
         }
  
-### Table words table
+### Known words table
  
  
-The know words is simply an array of terminal symbols.
+The known words table is simply an array of terminal symbols.
  The table of nonterminals used for tracing is a similar array.
  
  ###### functions
  The table of nonterminals used for tracing is a similar array.
  
  ###### functions
@@ -1959,7 +1961,7 @@ This code needs to be able to store data somewhere.  Rather than requiring
  
  The code fragment requires translation when written out.  Any `$N` needs to
  be converted to a reference either to that buffer (if `$0`) or to the
  
  The code fragment requires translation when written out.  Any `$N` needs to
  be converted to a reference either to that buffer (if `$0`) or to the
-structure returned by a previous reduction.  These pointer need to be cast
+structure returned by a previous reduction.  These pointers need to be cast
  to the appropriate type for each access.  All this is handling in
  `gen_code`.
  
  to the appropriate type for each access.  All this is handling in
  `gen_code`.
  
@@ -2047,10 +2049,10 @@ done.
  It is particularly important to have fine control over freeing during error
  recovery where individual stack frames might need to be freed.
  
  It is particularly important to have fine control over freeing during error
  recovery where individual stack frames might need to be freed.
  
-For this, the grammar author required to defined a `free_XX` function for
-each structure that is used by a non-terminal.  `do_free` all call whichever
+For this, the grammar author is required to defined a `free_XX` function for
+each structure that is used by a non-terminal.  `do_free` will call whichever
  is appropriate given a symbol number, and will call `free` (as is
  is appropriate given a symbol number, and will call `free` (as is
-appropriate for tokens` on any terminal symbol.
+appropriate for tokens on any terminal symbol.
  
  ###### functions
  
  
  ###### functions
  
@@ -2174,8 +2176,9 @@ grammar file).
  To be able to run `mdcode` and `scanner` on the grammar we need to memory
  map it.
  
  To be able to run `mdcode` and `scanner` on the grammar we need to memory
  map it.
  
-One we have extracted the code (with `mdcode`) we expect to file three
-sections: header, code, and grammar.  Anything else is an error.
+One we have extracted the code (with `mdcode`) we expect to find three
+sections: header, code, and grammar.  Anything else that is not
+excluded by the `--tag` option is an error.
  
  "header" and "code" are optional, though it is hard to build a working
  parser with neither. "grammar" must be provided.
  
  "header" and "code" are optional, though it is hard to build a working
  parser with neither. "grammar" must be provided.
@@ -2302,7 +2305,7 @@ file with the code section (if any) and the parser tables and function.
         }
  
  And that about wraps it up.  We need to set the locale so that UTF-8 is
         }
  
  And that about wraps it up.  We need to set the locale so that UTF-8 is
-recognised properly, and link with `libicuuc` is `libmdcode` requires that.
+recognised properly, and link with `libicuuc` as `libmdcode` requires that.
  
  ###### File: parsergen.mk
         parsergen : parsergen.o libscanner.o libmdcode.o
  
  ###### File: parsergen.mk
         parsergen : parsergen.o libscanner.o libmdcode.o
@@ -2329,7 +2332,7 @@ recognised properly, and link with `libicuuc` is `libmdcode` requires that.
  
  ## The SHIFT/REDUCE parser
  
  
  ## The SHIFT/REDUCE parser
  
-Having analysed the grammar and generated all the table, we only need the
+Having analysed the grammar and generated all the tables, we only need the
  shift/reduce engine to bring it all together.
  
  ### Goto table lookup
  shift/reduce engine to bring it all together.
  
  ### Goto table lookup
@@ -2375,7 +2378,7 @@ We keep the stack as two separate allocations.  One, `asn_stack` stores the
  production, and by keeping a separate `asn` stack, we can just pass a
  pointer into this stack.
  
  production, and by keeping a separate `asn` stack, we can just pass a
  pointer into this stack.
  
-The other allocation stores all other stack fields of which there are four.
+The other allocation stores all other stack fields of which there are six.
  The `state` is the most important one and guides the parsing process.  The
  `sym` is nearly unnecessary.  However when we want to free entries from the
  `asn_stack`, it helps to know what type they are so we can call the right
  The `state` is the most important one and guides the parsing process.  The
  `sym` is nearly unnecessary.  However when we want to free entries from the
  `asn_stack`, it helps to know what type they are so we can call the right
@@ -2386,6 +2389,9 @@ The `indents` count and the `starts_indented` flag track the line
  indents in the symbol.  These are used to allow indent information to
  guide parsing and error recovery.
  
  indents in the symbol.  These are used to allow indent information to
  guide parsing and error recovery.
  
+`newline_permitted` keeps track of whether newlines should be ignored
+or not, and `starts_line` records if this state stated on a newline.
+
  As well as the stack of frames we have a `next` frame which is
  assembled from the incoming token and other information prior to
  pushing it onto the stack.
  As well as the stack of frames we have a `next` frame which is
  assembled from the incoming token and other information prior to
  pushing it onto the stack.
@@ -2531,6 +2537,9 @@ an indent.
  or must force reductions until there is a pending indent which isn't
  at the start of a production.
  
  or must force reductions until there is a pending indent which isn't
  at the start of a production.
  
+`TK_newline` tokens are ignored precisely if there has been an indent
+since the last state which could have been at the start of a line.
+
  ###### parser includes
         #include "parser.h"
  ###### parser_run
  ###### parser includes
         #include "parser.h"
  ###### parser_run
@@ -2746,13 +2755,9 @@ As `scanner` provides number parsing function using `libgmp` is it not much
  work to perform arbitrary rational number calculations.
  
  This calculator takes one expression, or an equality test per line.  The
  work to perform arbitrary rational number calculations.
  
  This calculator takes one expression, or an equality test per line.  The
-results are printed and in any equality test fails, the program exits with
+results are printed and if any equality test fails, the program exits with
  an error.
  
  an error.
  
-Embedding mdcode inside mdcode is rather horrible.  I'd like to find a
-better approach, but as the grammar file must have 3 components I need
-something like this.
-
  ###### File: parsergen.mk
         calc.c calc.h : parsergen parsergen.mdc
                 ./parsergen --tag calc -o calc parsergen.mdc
  ###### File: parsergen.mk
         calc.c calc.h : parsergen parsergen.mdc
                 ./parsergen --tag calc -o calc parsergen.mdc
author	NeilBrown <neilb@suse.de>
	Sun, 11 May 2014 06:04:53 +0000 (16:04 +1000)
committer	NeilBrown <neilb@suse.de>
	Sun, 15 Jun 2014 07:18:24 +0000 (17:18 +1000)