Minor revision to descriptive text following all the recent changes.
Signed-off-by: NeilBrown <neil@brown.name>
write out C code files which can be compiled to make a parser.
"2D support" means that indentation and line breaks can be significant.
write out C code files which can be compiled to make a parser.
"2D support" means that indentation and line breaks can be significant.
-Indent tokens (IN and OUT) and end-of-line (EOL) tokens can be used to
-describe the grammar and the parser can selectively ignore these where
-they aren't relevant.
+Indent tokens (IN and OUT) and end-of-line tokens (EOL and NEWLINE) can
+be used to describe the grammar and the parser can selectively ignore
+these where they aren't relevant.
There are several distinct sections.
There are several distinct sections.
Productions are comprised primarily of symbols - terminal and
non-terminal. We do not make a syntactic distinction between the two,
Productions are comprised primarily of symbols - terminal and
non-terminal. We do not make a syntactic distinction between the two,
-though non-terminals must be identifiers. Non-terminal symbols are
-those which appear in the head of a production, terminal symbols are
-those which don't. There are also "virtual" symbols used for precedence
-marking discussed later, and sometimes we won't know what type a symbol
-is yet.
+though non-terminals must be identifiers while terminals can also be
+marks. Non-terminal symbols are those which appear in the head of a
+production, terminal symbols are those which don't. There are also
+"virtual" symbols used for precedence marking discussed later, and
+sometimes we won't know what type a symbol is yet.
To help with code safety it is possible to declare the terminal symbols.
If this is done, then any symbol used in a production that does not
To help with code safety it is possible to declare the terminal symbols.
If this is done, then any symbol used in a production that does not
production can declare that it inherits the precedence of a given
virtual symbol.
production can declare that it inherits the precedence of a given
virtual symbol.
-This use for `$$` precludes it from being used as a symbol in the
+This use of `$$` precludes it from being used as a symbol in the
described language. Two other symbols: `${` and `}$` are also
unavailable.
described language. Two other symbols: `${` and `}$` are also
unavailable.
list of unique sets, each assigned a number.
Once we have the data structures in hand to manage these sets and lists,
list of unique sets, each assigned a number.
Once we have the data structures in hand to manage these sets and lists,
-we can start setting the 'nullable' flag, build the 'FIRST' sets, and
-then create the item sets which define the various states.
+we can start setting the 'nullable' flag, build the 'FIRST' and 'FOLLOW'
+sets, and then create the item sets which define the various states.
`do_reduce` which runs the code that was included with the production,
if any.
`do_reduce` which runs the code that was included with the production,
if any.
-This code needs to be able to store data somewhere. Rather than
-requiring `do_reduce` to `malloc` that "somewhere", we pass in a large
-buffer and have `do_reduce` return the size to be saved.
+This code needs to be able to store data somewhere so we record the size
+of the data expected with each state so it can be allocated before
+`do_reduce` is called.
In order for the code to access "global" context, we pass in the
"config" pointer that was passed to the parser function. If the `struct
In order for the code to access "global" context, we pass in the
"config" pointer that was passed to the parser function. If the `struct
So we dynamically grow an array as required but never bother to
shrink it down again.
So we dynamically grow an array as required but never bother to
shrink it down again.
-We keep the stack as two separate allocations. One, `asn_stack` stores
+We keep the stack as two separate allocations. One, `asn_stack`, stores
the "abstract syntax nodes" which are created by each reduction. When
we call `do_reduce` we need to pass an array of the `asn`s of the body
of the production, and by keeping a separate `asn` stack, we can just
pass a pointer into this stack.
The other allocation stores all other stack fields of which there are
the "abstract syntax nodes" which are created by each reduction. When
we call `do_reduce` we need to pass an array of the `asn`s of the body
of the production, and by keeping a separate `asn` stack, we can just
pass a pointer into this stack.
The other allocation stores all other stack fields of which there are
-several. The `state` is the most important one and guides the parsing
+two. The `state` is the most important one and guides the parsing
process. The `sym` is nearly unnecessary as it is implicit in the
state. However when we want to free entries from the `asn_stack`, it
helps to know what type they are so we can call the right freeing
process. The `sym` is nearly unnecessary as it is implicit in the
state. However when we want to free entries from the `asn_stack`, it
helps to know what type they are so we can call the right freeing
before the beginning. As we need to store some value, `TK_eof` is used
to mark the beginning of the file as well as the end.
before the beginning. As we need to store some value, `TK_eof` is used
to mark the beginning of the file as well as the end.
-Indents (IN) are sometimes shifted and sometimes only accounted.
-Whatever decision is made must apply equally to the matching OUT. To
-ensure this we keep a stack of bits in `ignored_indents`. When we
-process an IN, we record if it was ignored. When we see an out, we pop
-of the relavant bit and use it to decide how to handle the OUT.
-
###### parser functions
struct parser {
###### parser functions
struct parser {