NeilBrown [Fri, 5 Mar 2021 09:31:32 +0000 (20:31 +1100)]
parsergen: remove line_like information.
I'm going to change the 2D nature of the parser over several patches.
First I remove what I don't want, then I add what I do.
During this series, tests won't work!
NeilBrown [Fri, 26 Feb 2021 06:33:43 +0000 (17:33 +1100)]
parsergen: don't use static buffer for result value.
Add the size of the result value to the per-state information, so it can
be allocated before calling do_reduce(), thus removing the need for a
overly large static buffer.
NeilBrown [Fri, 5 Mar 2021 08:20:22 +0000 (19:20 +1100)]
parsergen: change how reserved_words are stored
Rather than a simple array with holes, have a dense array mapping number
to name. This will enable a future change which adds names that don't
have numbers assigned.
NeilBrown [Sun, 11 Oct 2020 03:49:07 +0000 (14:49 +1100)]
parsergen: add more power to symbol references in generated code
As well as symbol references like "$2", you can now use references
with letters like "$Ss". This will find the shortest symbol in the
production that contains all the given letters in the given order.
There must be a unique shortest symbol.
If that same symbol occurs multiple times, later instances can be given
with a numeric suffix such as "$Ss2".
NeilBrown [Sat, 10 Oct 2020 23:34:06 +0000 (10:34 +1100)]
parsergen: allow terminals to be declared.
By default, any non-virtual symbol that does not appear in the head of a
product is assumed to be a Terminal.
For larger grammars, this misses out of an opportunity to detect errors.
So allow a "$TERM" line to list terminals (that do no appear in
precedence lines). If any $TERM line is given, then generate error
if any symbol appears in a production but is not declared, either
as terminal or non-terminal.
NeilBrown [Sat, 10 Oct 2020 22:50:12 +0000 (09:50 +1100)]
parsergen: avoid infinite loop on error.
If the grammar allows "ERROR" in a recursive location, error handling
can loop for every.
e.g.
foo -> foo bar
foo -> ERROR
Rather than detect and reject such grammars, detect the infinite loop
as it start, and discard an extra token.
i.e. if error handling doesn't discard any tokens from the input
stream, and another error is triggered before anything is shifted, then
we force the next error handling phase to discard at least one token,
or to abort if that token is EOF.
NeilBrown [Tue, 6 Oct 2020 06:02:22 +0000 (17:02 +1100)]
parsegen: detect left-recursive symbols in non-final position.
A left-recursive symbol that appear other than at the end of a
production causes problem for indent-based parsing, as describe in the
document. So teach parsergen to be able to report them.
Ocean currently has several of these, which I'll need to look into at a
later date.
NeilBrown [Tue, 6 Oct 2020 04:44:46 +0000 (15:44 +1100)]
scanner: change the meaning of ignoring comment tokens.
Previously ignoring comment tokens meant they were still parsed, but not
returned. The only way to stop them being parsed was to declare
known marks for the start symbols.
This made is not possible for parsergen to define a language that had
a known mark that would otherwise start a comment.
So change the ignoring of comment tokens to mean they aren't parsed. If
you want to parse comments but not return them, leave the new
"return_comments" field as so. In the unusual case that you want to
return comments set return_comments to 1.
Confirm that this has the desired effect by added in "//" as an
integer-division operator to the sample calculator.
NeilBrown [Mon, 5 Oct 2020 23:00:31 +0000 (10:00 +1100)]
indent_test: fix makefile
Maybe 'make' has changed a little to be less forgiving, but 'make itest'
isn't working now. All of LDLIBS are included in the 'cc' line, but
there are no dependencies to make sure they have been built.
The problem is that I'm using LDLIBS for different programs which need
different libs. This isn't such a good idea.
So change indent_test to use itestLDLIBS and itestCFLAGS.
NeilBrown [Fri, 28 Jun 2019 09:36:49 +0000 (19:36 +1000)]
parsergen: only non-terminals should make a state "starts_line"
If a state is followed by NEWLINE, then it isn't starts_line - more like
ends_line.
It is only non-terminals containing NEWLINEs that cause a state
to be starts_line.
So move the test to after we stop looking at terminals.
NeilBrown [Sun, 23 Jun 2019 05:37:50 +0000 (15:37 +1000)]
oceani: allow 'then' in simple if statements.
Allow 'then' after "if expression", and don't require a ':' if
it is followed by simple statements.
Similarly "else" doesn't need a colon for simple statements
NeilBrown [Sun, 23 Jun 2019 04:41:47 +0000 (14:41 +1000)]
oceani: change parsing for ; at end
When we have 'for' and 'then' on the same line, I want to
require a ';' for the 'for' (and 'while').
So change SimpleStatemnts to never end with ';', and require
a ; or Newline after each instance of SimpleStatements.
NeilBrown [Sun, 23 Jun 2019 04:29:13 +0000 (14:29 +1000)]
oceani: modify grammar to not waste stack on newlines
Current grammar uses one stack frame per newline for leading
newlines as these productions are right-recursive. This is
unnecessary and inelegant. Change to use a left-recursive Newlines
production.
NeilBrown [Sun, 23 Jun 2019 03:51:46 +0000 (13:51 +1000)]
indent_test: reduce stack usage for preceding NEWLINEs
In the cases where we allow preceding newlines (Statementlist Open Close)
we current use one parse-stack from for each newline. While there are
unlikely to be many, this is inelegant.
Change the right-recursive form to use a left-recursive Newlines rule
that absorbs one or more NEWLINEs using at most 2 stack frames.
NeilBrown [Sun, 23 Jun 2019 00:21:14 +0000 (10:21 +1000)]
parsergen: allow $$OUT to be satisfied are start-of-line.
If a $$OUT (or $$NEWLINE) production is being reduced at
start-of-line (with no indents), then that is satisfactory,
we don't need NEWLINE etc as look-ahead.
This means that in cases where this is relevant, the computed
lookahead is wrong - we shouldn't have striped it.
I don't think this matters as it only affects conflict warnings,
and I think these will be reported at a higher level if relevant.
If essense, the $$OUT marking is like a precendence marking which
suppresses shift/reduce warnings as it say that decision is being made
on some basis other than look-ahead.
NeilBrown [Sun, 16 Jun 2019 01:31:54 +0000 (11:31 +1000)]
parsegen: fix up look-ahead for $$NEWLINE items.
I was discarding all non-newlines from the lookahead
in the wrong place.
I need to do it based on the productions added, not
the item the are generated by.
NeilBrown [Sun, 16 Jun 2019 00:16:07 +0000 (10:16 +1000)]
oceani: change NEWLINE parsing in statements to new model.
The new module is:
A list of line-like things must access a newline first:
Statementlist -> Statements
| NEWLINE Statementlist
Any line-like thing must reduce to a single symbol:
SimpleStatementLine IfPart WhilePart CasePart etc
An individual line-like thing must allow following newlines
e.g.
IfHead -> if Expression Block
| IfHead NEWLINE
A block that can be multi-line or single-line should be marked with
$$NEWLINE
This will require a NEWLINE to reduce it, but won't swollow the newline.
NeilBrown [Sat, 15 Jun 2019 23:47:58 +0000 (09:47 +1000)]
indent_test: adjust grammer to handle blank lines better.
This uses the new $$NEWLINE are other techniques to ensure
blank lines are handled well.
We also test that adding blank lines everywhere doesn't break
anything.
NeilBrown [Sat, 15 Jun 2019 23:20:30 +0000 (09:20 +1000)]
ident_test: declare precedence for 'else'
By declaring precedence for 'else', we suppress conflict warnings
Normally newlines and indents will resolve any conflict, but
if not, else associates to the right - it should be shifted, not cause
a reduce (which is the default anyway)
NeilBrown [Sat, 15 Jun 2019 22:29:16 +0000 (08:29 +1000)]
parsergen: introuce $$NEWINE pseudo-precedence.
Sometimes we need a produce to be terminated by a newline, but we
don't want to consume the newline with a "shift".
Case in point is:
Block -> : StatementList
Which can be used with
Statement -> if Expression Block
StatementList -> Statement
I want this to parse:
if something: if otherthing: action
which might seem a little odd, but is syntactically sensible.
The NEWLINE at the end is requred, and must close both nested Statements.
The NEWLINE will already cause a REDUCE, but if we don't have
Block -> : Statementlist NEWLINE
then something else could force a reduce, and we don't want that.
So introduce a marking "$$NEWLINE" which is similar to imposing a precedence
on a production. Now
Block -> : StatementList $$NEWLINE
means that a NEWLINE is required to end a Block, but it isn't
shifted. If anything else if found here, it is an error.
We also allow $eof and OUT to reduce this production.
NeilBrown [Sun, 9 Jun 2019 23:00:43 +0000 (09:00 +1000)]
parsergen: flip ordering of precedence declarations.
Change so first precedence declaration is the lowest precedence.
This is consistent with bison, and will make converting
'oceani' easer.
When not using precedence, it is easier to do the lowest
precedence first - so keep for approach.
NeilBrown [Sun, 9 Jun 2019 22:49:08 +0000 (08:49 +1000)]
oceani: use 'bracket' printing for expressions.
Adding brackets to expression printing removes and ambiguity.
As I'm about to change expression parsing, I want to be able to
see that the result is correct.
NeilBrown [Sun, 9 Jun 2019 22:35:36 +0000 (08:35 +1000)]
parsergen: include virtual symbols in table of non-terminals
Symbol numbers assigned in grammar_analyse are in three groups:
- predefined (NUMBER, STRING, etc)
- Terminals
- everything else: non-terminals and virtual.
When creating the non_term[] list of names, we need to include
virtual symbols in there, otherwise lookup by symbol-number
might find the wrong value - or might reach beyond end of array.
NeilBrown [Sat, 8 Jun 2019 23:42:05 +0000 (09:42 +1000)]
scanner: improve transition from node to node.
When we are at the end of a node, it is wrong to use do_strip() as
that looks beyond the end of the node.
It is better, once we have determined to accept the newline at the
end of a node (i.e. once no unget is possible), to move to the
start of the next node, and assess column position and indents from
that perspective.
Do this removes some tests on at_son/at_eon, and make some code a
bit more transparent - for example the flag that say whether an "out"
is next now depends on where a newline was recently seen, which makes
more sense than whether we were at the start of a node (out and newline
alternate in some contexts).
Also: add the test which found this problem. This requires a
new set of tests - tests which can scan tokens from multiple nodes.
Now that we are testing node transitions, the coverage has jumped
over 92%
NeilBrown [Sat, 8 Jun 2019 10:18:35 +0000 (20:18 +1000)]
scanner: fix bug with indent at start of node.
If we find an indent, we assume there are delayed newlines
to comsume.
This is often true, but not at the start of a node.
So don't decrement delayed_lines if it is already zero.
NeilBrown [Sat, 8 Jun 2019 04:35:34 +0000 (14:35 +1000)]
scanner: fix handling of indents in sub-nodes
I seem to have confused ->indent_sizes[] and ->col
->col is used for the reported location of a token so
must be the actual column in the file, with no adjustment.
->indent_sizes[] is indents, which must include any inherited from
parent nodes. So this is a completely different value.
So change mdcode to store the local node indent in ->needs_strip -
this is the number of text columns that are stripped off.
This, subtracted from ->indent is the text offset of the physical
start-of-line. Adding the measured ->col then gives us
the indent in the composed file, the indent that must be used
for detecting TK_in and TK_out.
Introduce a new function state_indent() which determines that indent,
and use it instead of ->col.
NeilBrown [Sat, 8 Jun 2019 04:26:15 +0000 (14:26 +1000)]
scanner: fix at_son()
Current test for "at start of node" is broken for 2 reasons.
1/ it doesn't account for the node-indent chars that are stripped
off by do_strip()
2/ it check the ->offset *after* a character has been extracted,
it needs to check the offset from before, which is in ->prev_offset
NeilBrown [Wed, 5 Jun 2019 08:21:18 +0000 (18:21 +1000)]
oceani: redo parsing of blank lines.
I've been puzzling how best to write a grammar to
handle blank lines and option line-breaks well.
The "OptNL" approach didn't work, and "Newlines"
only sometimes works.
I won't try to explain all the logic here, but I do plan to write a
blog post about it soon.
Shift/Reduce conflicts that are likely to be resolved by
a line break are currently hidden. This is probably a good idea,
but sometimes it can be useful to see them anyway.
So report them as "non-critical" conflicts, and don't
count them - so if no other conflicts are found, the
"no conflicts" message is still generated.
NeilBrown [Sun, 2 Jun 2019 06:45:56 +0000 (16:45 +1000)]
parsegen: Add brief explanation about optional newlines.
Optional newlines need care with a parsergen parse and the special
rules around them mean you cannot have a symbol that just absorbs
newlines. Rather, you need to put the newline absorbtion in front of
whatever is allowed to follow newlines.
NeilBrown [Wed, 29 May 2019 11:51:22 +0000 (21:51 +1000)]
scanner: improve number parsing.
In particular, space must be preceeded and followed by a digit
(not a letter).
Also '_' must be preceded and followed by a hex digit, but this
wasn't enforced.
NeilBrown [Wed, 29 May 2019 00:38:39 +0000 (10:38 +1000)]
Separate demos from tests.
'tests' check that the code is working, and fail if the results
aren't what was expected.
'demos' simply run the code and show what it can do. The don't provide
any immediate assurance that it is doing the right thing.
NeilBrown [Sun, 26 May 2019 05:04:43 +0000 (15:04 +1000)]
parsergen - fix newline parsing (again)
Add a test-case to oceani-tests.mdc which fails, but shouldn't.
It fails because expressions are treated as line-like, so newlines
aren't ignored.
I realize that having linelike symbols being those that are followed by
a newline really doesn't work.
So go back to the original idea that "linelike symbols are those which
contain a newline".
Then a state starts a line if it is at the start of a linelike symbol.
This simplifies the code, seems to work correctly for existing tests,
and allows the new test to pass.
NeilBrown [Sat, 18 May 2019 20:46:13 +0000 (06:46 +1000)]
Oceani - Jamison Creek Version
Clean up text and provide new version name.
Additions for this version include:
- type identifiers
- arrays and structs
- global const
- "and then" "or else" "if .. else"
- test suite
- valgrind testing, coverage testing
NeilBrown [Sat, 18 May 2019 13:51:24 +0000 (23:51 +1000)]
oceani: fix merging of conditionally-scoped variables.
The problem here was that the list seen in ->in_scope
includes more than just what is currently in-scope.
It also contains things that have been replaced by new instances
of the name.
These can be detected a they aren't the first variable listed
under their name any more.
NeilBrown [Sat, 18 May 2019 00:11:43 +0000 (10:11 +1000)]
oceani-tests: test code that has been printed
Test that the printed code actually works, as well as being re-printable.
Also simplify the messages so they don't use as much space
and fix a typo "exit1" -> "exit 1"
NeilBrown [Fri, 17 May 2019 13:31:48 +0000 (23:31 +1000)]
scanner: handle missing newline at EOF
If there is no newline at EOF, we can see EOF immediately after
a valid symbol. This can lead to calling close_token() when
state->node is NULL, which crashes.
The code in close_token() only makes sense if state->node is still the
same as token->node. If it isn't, the token must be at the very end of
its code-node, so a different calculation is needed.
NeilBrown [Sat, 11 May 2019 01:41:59 +0000 (11:41 +1000)]
oceani: mark code that doesn't need testing.
Some code is included to check and report impossible
conditions. Failure to exercise this code shouldn't
be seen as a failure of test coverage.
So mark such code as //NOTEST and exclude it
from statistics.