NeilBrown [Sun, 22 Jun 2014 05:12:41 +0000 (15:12 +1000)]
parsergen: fix up stack management
The stack has alternating states and symbols. I had groups a state
with the following symbol as the first thing pushed is a state and the
next is a symbol.
It works much better to group the other ways. First we push just state zero.
Then we push some symbol and the state which 'goto' leads to.
In particularly this keeps the 'shift' that happens after "reduce"
quite separate from the 'shift' that happens when the look-ahead is
shifted in. Previous the post-reduce shift was stealing the indent
information that should have stayed in the look-ahead buffer.
NeilBrown [Sun, 15 Jun 2014 08:44:14 +0000 (18:44 +1000)]
parsergen: work-around for indent parsing problem.
These was a problem with my reasoning about parsing indents.
Resolving it properly will take a bit of work, but this little 'fix'
handles an easy case for now.
NeilBrown [Sat, 31 May 2014 05:56:20 +0000 (15:56 +1000)]
parsergen: pass 'config' in to 'reduce' function.
As we only support synthesise attributes and no inherited attributes,
we have no way for the reduce functions to access any context (such
as building a table of variables) except via global variables (yuck).
So pass the 'context' pointer through. The main program can embed
this in a larger structure which contains relevant context, and
the reduce functions can find that using pointer manipulation.
NeilBrown [Sun, 11 May 2014 04:23:10 +0000 (14:23 +1000)]
pargergen: make use of --tag for calc grammar
Rather than placing the 'calc' grammar in a separate file,
add 'calc:' tags to the sections and use --tag option to
extract the grammar directly from the pargergen.mdc file.
NeilBrown [Sun, 11 May 2014 04:21:26 +0000 (14:21 +1000)]
parsergen: add --tag option.
Normally parsergen extracts three secctions: header, code, and grammar.
With "--tag foo", it will ignore anything that doesn't start "foo:",
will extract "foo: header", "foo: code", and "foo: grammar", and only
complain if there are other "foo:" headers.
NeilBrown [Sat, 10 May 2014 10:58:02 +0000 (20:58 +1000)]
mdcode.mdc: Allow more sections than just Example: to be ignored.
"Example:" is no longer a special case. Any section name with
starts "Word:" for some "Word", does not need to be included in
other sections.
"File:" is still a special case of that and will be stored in the
named file.
NeilBrown [Sat, 10 May 2014 23:42:02 +0000 (09:42 +1000)]
parsergen: initialise parser.next properly.
Most fields in the 'next' frame should be initialised to zero,
but some need to be properly initialised. Otherwise we might not handle
early newlines correctly.
NeilBrown [Sun, 4 May 2014 10:31:05 +0000 (20:31 +1000)]
parsergen: track when newline is permitted, and discard if not.
A newline is only permitted (as a recognised symbol) if we are
parsing a non-indented line-like segment.
If we have seen an internal indent since the last line-like start,
newline tokens should be ignored.
NeilBrown [Sat, 3 May 2014 21:55:03 +0000 (07:55 +1000)]
parsergen: compute "can_eol" for each symbol.
A symbol is "can_eol" if it can derive a phrase which ends with a
newlike token.
This will allow us to recognise line-like sections of code and
thus know when to ignore newlines and when not to.
NeilBrown [Sun, 24 Nov 2013 07:14:11 +0000 (18:14 +1100)]
parsergen: centralise (some of) the collecting of next token.
A future patch will introduce next sites where we want to
discard the current token.
Rather than calling "token_next" at each site, make it possible
to just set "tk = NULL", and the next token will automatically
be collected when needed.
NeilBrown [Sun, 24 Nov 2013 06:50:09 +0000 (17:50 +1100)]
parsergen: recorded a prefered shift-symbol for error recovery.
When we find there is no valid parse step, one option that we don't
currently try is to synthesize a symbol and shift it. i.e. insert a
missing symbol.
A future patch will provide a circumstance where this is the ideal
response, so here we choose a symbol which could usefully be shifted.
We choose the one that will get us closest to the end of a production.
NeilBrown [Sun, 24 Nov 2013 06:36:41 +0000 (17:36 +1100)]
parsegen: unify the "next" frame to go onto stack.
We current have a current 'state' in the parser and a 'sym'
which is a local variable passed around in different ways.
Both of these get pushed onto the stack at the next 'shift'.
We will shortly add some more fields to the stack frame, so unify
'state' and 'sym' in to a 'next' struct in the parser struct which can
easily be extended.
These names work better for me.
There is an indent on every line, so the place where the
indent increases shouldn't be called "indent".
And "undent" isn't a word.
Change the tracing to print the full stack after every step.
This can be cross-referenced with the report that parsergen
can generate to get a full picture of how the parse is progressing.
parsergen: make sure we continue making states until all done.
Whenever we add a state, we need to check again, as it might have been
added early in the list.
There is probably a much more efficient way to do this....
I want items with larger indexes to always preceed smaller
indexes. This makes the report easier to follow.
So negate the index number (and add an offset) when sorting.