From 0d6b47aa0f660a19e9fb36a33ab1bef60cd26289 Mon Sep 17 00:00:00 2001 From: NeilBrown Date: Wed, 10 Mar 2021 16:14:20 +1100 Subject: [PATCH] parsergen: doco updates Minor revision to descriptive text following all the recent changes. Signed-off-by: NeilBrown --- csrc/parsergen.mdc | 38 ++++++++++++++++---------------------- 1 file changed, 16 insertions(+), 22 deletions(-) diff --git a/csrc/parsergen.mdc b/csrc/parsergen.mdc index 356edef..59da0ec 100644 --- a/csrc/parsergen.mdc +++ b/csrc/parsergen.mdc @@ -5,9 +5,9 @@ fragments, analyses it, and can report details about the analysis and write out C code files which can be compiled to make a parser. "2D support" means that indentation and line breaks can be significant. -Indent tokens (IN and OUT) and end-of-line (EOL) tokens can be used to -describe the grammar and the parser can selectively ignore these where -they aren't relevant. +Indent tokens (IN and OUT) and end-of-line tokens (EOL and NEWLINE) can +be used to describe the grammar and the parser can selectively ignore +these where they aren't relevant. There are several distinct sections. @@ -152,11 +152,11 @@ because that is what we need to detect tags. Productions are comprised primarily of symbols - terminal and non-terminal. We do not make a syntactic distinction between the two, -though non-terminals must be identifiers. Non-terminal symbols are -those which appear in the head of a production, terminal symbols are -those which don't. There are also "virtual" symbols used for precedence -marking discussed later, and sometimes we won't know what type a symbol -is yet. +though non-terminals must be identifiers while terminals can also be +marks. Non-terminal symbols are those which appear in the head of a +production, terminal symbols are those which don't. There are also +"virtual" symbols used for precedence marking discussed later, and +sometimes we won't know what type a symbol is yet. To help with code safety it is possible to declare the terminal symbols. If this is done, then any symbol used in a production that does not @@ -291,7 +291,7 @@ carry precedence information but are not included in the grammar. A production can declare that it inherits the precedence of a given virtual symbol. -This use for `$$` precludes it from being used as a symbol in the +This use of `$$` precludes it from being used as a symbol in the described language. Two other symbols: `${` and `}$` are also unavailable. @@ -728,8 +728,8 @@ and to simplify some comparisons of sets, these sets will be stored in a list of unique sets, each assigned a number. Once we have the data structures in hand to manage these sets and lists, -we can start setting the 'nullable' flag, build the 'FIRST' sets, and -then create the item sets which define the various states. +we can start setting the 'nullable' flag, build the 'FIRST' and 'FOLLOW' +sets, and then create the item sets which define the various states. ### Symbol sets. @@ -2042,9 +2042,9 @@ When the parser engine decides to reduce a production, it calls `do_reduce` which runs the code that was included with the production, if any. -This code needs to be able to store data somewhere. Rather than -requiring `do_reduce` to `malloc` that "somewhere", we pass in a large -buffer and have `do_reduce` return the size to be saved. +This code needs to be able to store data somewhere so we record the size +of the data expected with each state so it can be allocated before +`do_reduce` is called. In order for the code to access "global" context, we pass in the "config" pointer that was passed to the parser function. If the `struct @@ -2620,14 +2620,14 @@ The stack usually won't grow very large - maybe a few tens of entries. So we dynamically grow an array as required but never bother to shrink it down again. -We keep the stack as two separate allocations. One, `asn_stack` stores +We keep the stack as two separate allocations. One, `asn_stack`, stores the "abstract syntax nodes" which are created by each reduction. When we call `do_reduce` we need to pass an array of the `asn`s of the body of the production, and by keeping a separate `asn` stack, we can just pass a pointer into this stack. The other allocation stores all other stack fields of which there are -several. The `state` is the most important one and guides the parsing +two. The `state` is the most important one and guides the parsing process. The `sym` is nearly unnecessary as it is implicit in the state. However when we want to free entries from the `asn_stack`, it helps to know what type they are so we can call the right freeing @@ -2641,12 +2641,6 @@ bottom of stack holds the start state but no symbol, as nothing came before the beginning. As we need to store some value, `TK_eof` is used to mark the beginning of the file as well as the end. -Indents (IN) are sometimes shifted and sometimes only accounted. -Whatever decision is made must apply equally to the matching OUT. To -ensure this we keep a stack of bits in `ignored_indents`. When we -process an IN, we record if it was ignored. When we see an out, we pop -of the relavant bit and use it to decide how to handle the OUT. - ###### parser functions struct parser { -- 2.43.0