Fix lots of typos, improve poor descriptions, and update some text to
match recent changes.
Signed-off-by: NeilBrown <neilb@suse.de>
}
Adding an itemset may require merging the LA sets if LALR analysis is
}
Adding an itemset may require merging the LA sets if LALR analysis is
-happening. If any new LA set add symbol that weren't in the old LA set, we
+happening. If any new LA set adds any symbols that weren't in the old LA set, we
clear the `completed` flag so that the dependants of this itemset will be
recalculated and their LA sets updated.
clear the `completed` flag so that the dependants of this itemset will be
recalculated and their LA sets updated.
-To build all the itemsets, we first insert the initial itemset made from the
-start symbol, complete each itemset, and then generate new itemsets from old
-until no new ones can be made.
+To build all the itemsets, we first insert the initial itemset made
+from production zero, complete each itemset, and then generate new
+itemsets from old until no new ones can be made.
Completing an itemset means finding all the items where "DOT" is followed by
a nonterminal and adding "DOT=0" items for every production from that
Completing an itemset means finding all the items where "DOT" is followed by
a nonterminal and adding "DOT=0" items for every production from that
The purpose of the report is to give the grammar developer insight into
how the grammar parser will work. It is basically a structured dump of
The purpose of the report is to give the grammar developer insight into
how the grammar parser will work. It is basically a structured dump of
-all the tables that have been generated, plus an description of any conflicts.
+all the tables that have been generated, plus a description of any conflicts.
###### grammar_report
static int grammar_report(struct grammar *g, enum grammar_type type)
###### grammar_report
static int grammar_report(struct grammar *g, enum grammar_type type)
return report_conflicts(g, type);
}
return report_conflicts(g, type);
}
-Firstly we have the complete list of symbols, together with the "FIRST"
-set if that was generated.
+Firstly we have the complete list of symbols, together with the
+"FIRST" set if that was generated. We add a mark to each symbol to
+show if it can end in a newline (`>`), or if it is nullable (`.`).
-Then we have to follow sets if they were computed.
+Then we have the follow sets if they were computed.
static void report_follow(struct grammar *g)
{
static void report_follow(struct grammar *g)
{
}
LR0 conflicts are any state which have both a reducible item and
}
LR0 conflicts are any state which have both a reducible item and
+a shiftable item, or two reducible items.
LR05 conflicts only occurs if two possibly reductions exist,
as shifts always over-ride reductions.
LR05 conflicts only occurs if two possibly reductions exist,
as shifts always over-ride reductions.
}
SLR, LALR, and LR1 conflicts happen if two reducible items have over-lapping
}
SLR, LALR, and LR1 conflicts happen if two reducible items have over-lapping
-look ahead, or if a symbol in a look-ahead can be shifted. The differ only
+look ahead, or if a symbol in a look-ahead can be shifted. They differ only
in the source of the look ahead set.
in the source of the look ahead set.
-We build a dataset mapping terminal to item for possible SHIFTs and then
-another for possible REDUCE operations. We report when we get conflicts
-between the two.
+We build two datasets to reflect the "action" table: one which maps
+terminals to items where that terminal could be shifted and another
+which maps terminals to items that could be reduced when the terminal
+is in look-ahead. We report when we get conflicts between the two.
static int conflicts_slr(struct grammar *g, enum grammar_type type)
{
static int conflicts_slr(struct grammar *g, enum grammar_type type)
{
-The export part of the parser is the `parse_XX` function, where the name
+The exported part of the parser is the `parse_XX` function, where the name
`XX` is based on the name of the parser files.
This takes a `code_node`, a partially initialized `token_config`, and an
`XX` is based on the name of the parser files.
This takes a `code_node`, a partially initialized `token_config`, and an
known words added and then is used with the `code_node` to initialize the
scanner.
known words added and then is used with the `code_node` to initialize the
scanner.
-`parse_XX` then call the library function `parser_run` to actually complete
+`parse_XX` then calls the library function `parser_run` to actually complete
the parse. This needs the `states` table and function to call the various
pieces of code provided in the grammar file, so they are generated first.
the parse. This needs the `states` table and function to call the various
pieces of code provided in the grammar file, so they are generated first.
-The know words is simply an array of terminal symbols.
+The known words table is simply an array of terminal symbols.
The table of nonterminals used for tracing is a similar array.
###### functions
The table of nonterminals used for tracing is a similar array.
###### functions
The code fragment requires translation when written out. Any `$N` needs to
be converted to a reference either to that buffer (if `$0`) or to the
The code fragment requires translation when written out. Any `$N` needs to
be converted to a reference either to that buffer (if `$0`) or to the
-structure returned by a previous reduction. These pointer need to be cast
+structure returned by a previous reduction. These pointers need to be cast
to the appropriate type for each access. All this is handling in
`gen_code`.
to the appropriate type for each access. All this is handling in
`gen_code`.
It is particularly important to have fine control over freeing during error
recovery where individual stack frames might need to be freed.
It is particularly important to have fine control over freeing during error
recovery where individual stack frames might need to be freed.
-For this, the grammar author required to defined a `free_XX` function for
-each structure that is used by a non-terminal. `do_free` all call whichever
+For this, the grammar author is required to defined a `free_XX` function for
+each structure that is used by a non-terminal. `do_free` will call whichever
is appropriate given a symbol number, and will call `free` (as is
is appropriate given a symbol number, and will call `free` (as is
-appropriate for tokens` on any terminal symbol.
+appropriate for tokens on any terminal symbol.
To be able to run `mdcode` and `scanner` on the grammar we need to memory
map it.
To be able to run `mdcode` and `scanner` on the grammar we need to memory
map it.
-One we have extracted the code (with `mdcode`) we expect to file three
-sections: header, code, and grammar. Anything else is an error.
+One we have extracted the code (with `mdcode`) we expect to find three
+sections: header, code, and grammar. Anything else that is not
+excluded by the `--tag` option is an error.
"header" and "code" are optional, though it is hard to build a working
parser with neither. "grammar" must be provided.
"header" and "code" are optional, though it is hard to build a working
parser with neither. "grammar" must be provided.
}
And that about wraps it up. We need to set the locale so that UTF-8 is
}
And that about wraps it up. We need to set the locale so that UTF-8 is
-recognised properly, and link with `libicuuc` is `libmdcode` requires that.
+recognised properly, and link with `libicuuc` as `libmdcode` requires that.
###### File: parsergen.mk
parsergen : parsergen.o libscanner.o libmdcode.o
###### File: parsergen.mk
parsergen : parsergen.o libscanner.o libmdcode.o
## The SHIFT/REDUCE parser
## The SHIFT/REDUCE parser
-Having analysed the grammar and generated all the table, we only need the
+Having analysed the grammar and generated all the tables, we only need the
shift/reduce engine to bring it all together.
### Goto table lookup
shift/reduce engine to bring it all together.
### Goto table lookup
production, and by keeping a separate `asn` stack, we can just pass a
pointer into this stack.
production, and by keeping a separate `asn` stack, we can just pass a
pointer into this stack.
-The other allocation stores all other stack fields of which there are four.
+The other allocation stores all other stack fields of which there are six.
The `state` is the most important one and guides the parsing process. The
`sym` is nearly unnecessary. However when we want to free entries from the
`asn_stack`, it helps to know what type they are so we can call the right
The `state` is the most important one and guides the parsing process. The
`sym` is nearly unnecessary. However when we want to free entries from the
`asn_stack`, it helps to know what type they are so we can call the right
indents in the symbol. These are used to allow indent information to
guide parsing and error recovery.
indents in the symbol. These are used to allow indent information to
guide parsing and error recovery.
+`newline_permitted` keeps track of whether newlines should be ignored
+or not, and `starts_line` records if this state stated on a newline.
+
As well as the stack of frames we have a `next` frame which is
assembled from the incoming token and other information prior to
pushing it onto the stack.
As well as the stack of frames we have a `next` frame which is
assembled from the incoming token and other information prior to
pushing it onto the stack.
or must force reductions until there is a pending indent which isn't
at the start of a production.
or must force reductions until there is a pending indent which isn't
at the start of a production.
+`TK_newline` tokens are ignored precisely if there has been an indent
+since the last state which could have been at the start of a line.
+
###### parser includes
#include "parser.h"
###### parser_run
###### parser includes
#include "parser.h"
###### parser_run
work to perform arbitrary rational number calculations.
This calculator takes one expression, or an equality test per line. The
work to perform arbitrary rational number calculations.
This calculator takes one expression, or an equality test per line. The
-results are printed and in any equality test fails, the program exits with
+results are printed and if any equality test fails, the program exits with
-Embedding mdcode inside mdcode is rather horrible. I'd like to find a
-better approach, but as the grammar file must have 3 components I need
-something like this.
-
###### File: parsergen.mk
calc.c calc.h : parsergen parsergen.mdc
./parsergen --tag calc -o calc parsergen.mdc
###### File: parsergen.mk
calc.c calc.h : parsergen parsergen.mdc
./parsergen --tag calc -o calc parsergen.mdc