2 I don't know how to create a grammar or condstatements.
3 The newlines are confusing.
8 This ends with NL OUT NL.
9 The first NL is included in "statements" which is allowed to end with an NL.
10 The "OUT" reduced us down to "if Expression Block" and then cancels.
11 The second NL is the problem. It could end the whole statement, or
12 it could be followed by an 'else' and I don't know which.
14 I currently have complex statements separated by newlines. The means the
15 second NL must close the complex statement. So we cannot shift it until
16 we have reduced a ComplexStatement. So we cannot see if an 'else' is coming.
18 If we request ComplexStatement to end with a newline, then have:
21 IfPart OptNL Elsepart NEWLINE
22 IfPart -> if Expression Block
23 ElsePart -> else Block
25 Then the NEWLINE can be shifted. If we see an 'else' we reduce it to OptNL.
26 If we don't we reduce to ComplexStatement.
27 But if we see another newline .... Need both to allow a list
30 ComplexStatement -> WhilePart CaseList ElsePart
31 WhilePart -> while Expression Block
36 | OptNL else Block Newlines
40 WhilePart -> while Expression Block
41 CaseList -> OptNL CasePart
43 ElsePart -> OptNL else Block
45 ComplexStatement -> WhilePart WhileSuffix
46 WhileSuffix -> Newlines
47 | OptNL CasePart WhileSuffix
48 | OptNL ElsePart Newlines
50 ComplexStatement -> ForPart WhilePart WhileSuffix
52 | WhilePart WhileSuffix
53 | SwitchPart WhileSuffix
61 statementlist -> statementlist NEWLINE statement
62 then the newline will completely close the statementlist which I don't want.
70 at 'HERE' there is a newline before and after the OUT which
71 need to be shifted into two different things... it isn't working for
81 The NEWLINE at the end could be shifted to turn the "simplestatements"
82 into a "statements", or it could trigger a reduce and then be shifted for
84 So in neither case is it ignored, and that is all the previous logic involved.
86 Both the state at 'if' and at 'a=b' starts_line and there are no
88 So the NEWLINE should reduce anything that started with starts-line
89 that doesn't contain an indent or a newline.
91 Actually, 'a-b' doesn't start_line.
93 So when we see a NEWLINE when it is allowed, we could reduce anything that is
94 completely since the last start. maybe....
96 If newlines are ignored, obviously we ignore any we find.
97 If not, there must be a starts_line since the last indent.
98 We really want to reduce everything since there to a single non-recursive
100 But maybe we need to SHIFT before we can REDUCE that far.
101 So we just reduce as far as we can.
105 Hmmm... I've made a mess of this. How embarrassing.
106 My top-of-stack and 'next' handling gets confused. The indent
107 on the 'next' token gets stolen when I reduce.
110 The stack alternates tokens, which can hold indent, and states,
111 which can allow newlines. The top and bottom are states.
112 Each frame contains a state (with newline flag) and the following
113 token (with indent information).
114 The final (topmost) state (with newline flag) is stored in 'next',
115 as is the look-ahead token (together with indent info).
117 When we reduce() we remove several (0 or more) frames and replace
118 with a single frame. The information we remove is actually
119 a token and its following state, N times. Including the state in 'next',
120 but not the token in 'next'.
121 The new frame is the new reduced-to token with the old state, either
122 from frame or from 'next'. 'next' gets a new state.
124 This suggests I did the frame the wrong way. A frame should have
125 a token (With ast and indent) and then a state (With newline flag).
126 The bottom-off-stack can have a null token.
127 'next' just has the look-ahead token with indent state.
129 Reduce discards N frames (never the bottom frame) and pushes
130 the resulting token (with ast) and 'goto' state from previous frame.
131 'shift' pushes the 'next' token and 'goto' state.
140 if a==b: print a; print b
142 and when I see the NEWLINE I want to reduce "print a; print b" to Statementlist
143 without shifting the NEWLINE. Then the NEWLINE is shifted to make a ComplexStatement.
145 The state at the start of 'print' doesn't expect a newline, but at start of 'if'
146 does (I hope)... only it doesn't if it is at the start of a block. But in that
148 So we look backwards for an indent or a starts-line state.
149 If we can reduce without discarding the state or absorbing the indent, we do.
151 Only .... now that 'print' isn't in a starts_line state, it also
152 isn't after an indent, so we are ignoring newlines.
153 I think I want my cake, and am eating it.
159 This should parse like the above. So "Block" isn't a nicely
160 reducible element. Only Statementlist is.
164 The state after the ':' is important for reducing back to.
165 If that because it is a recursion point? No.
166 It is because next thing can be preceded by a newline.
167 i.e. CompleStatement can follow a newline.
168 So we find symbols that end with a newline, and thus symbols
169 which follow a newline in a production, but only immediately. So
172 Then any state where a newline-following symbol follows DOT,
173 is declared a line-starting state.
174 These make newlines visible until the next indent and ....
176 and what? If a newline appears (before an indent) it reduces
177 everything since that state while it can or until there is just
178 one thing which is not recursive.
180 After a line-starting state, an *internal* indent disables newlines.
181 An initial indent reduced the reductions that a newline can cause.?
183 I think I need to track if a symbol is at start-of-line.
184 When we get a newline, we reduce anything we can since that
185 start-of-line until we get one symbol?
186 i.e. if we get NEWLINE and top symbol didn't start line,
187 we reduce if that reduction won't swallow start of line.
191 - IN, OUT, NEWLINE tokens. NEWLINE that would be before IN is moved
193 - IN are recorded against 'next' symbol. Each symbol records indents
194 (INs minus OUTs) it contains, and whether it started with an IN
195 - in parsergen, each symbol is tagged if it can end with
197 - Each symbol following a can-eol symbol is 'starts-line'
198 - Each state where a starts-line symbol follows DOT is a starts-line
200 - When parsing we record which symbols followed IN or NEWLINE.
201 - If there are net indents after a starts-line start, other than
202 immediately after, then NEWLINE tokens are ignored.
203 - If we see a NEWLINE which is not ignored then we must reduce any
204 production which started after the most recent start-of-line.
205 So if we can reduce and length is less than length-to-start,
207 - If we see an OUT we must reduce any production which started
211 We end up with lots of starts-line state that aren't interesting, as
212 they are very close to a newline anyway or only a terminal away from
213 the next starts-line start
215 Optional NEWLINES are awkward. When we see a newline, we are prone
216 to reduce early so we end up with a newline to be shifted when it
217 isn't wanted any more.
219 Optional newlines are even more awkward. An optional newline
220 in "block -> OptNL { statementlist }" messes up because the NEWLINE
221 forces the reduction of OptNL from 'empty' before NEWLINE is shifted.
222 So we can never achieve "OptNL -> NEWLINE" except at the start of a
224 The purpose of reducing early is to ensure a symbol never includes a
225 newline unless it started at start-of-line, or explicitly allows
232 (where newline is permitted between 'st1;st2' and 'st3') different
238 (where newline is permitted between 'cond' and 'then').
239 In first instance I want to reduce further. In second instance I
241 In first case, new thing started midline. In second it didn't.
244 ARG. Still not sure I have this right. Though maybe by indent
245 grammar is broken....
247 We definitely need to know if a start "starts lines". i.e. it is a
248 state where we are expecting a 'line like' think.
249 A 'line like' thing should be a thing. i.e. a non-terminal.
250 A non-terminal which ends with a newline is a perfect candidate for
251 'linelike' So any state which is followed by can_eol is linelike?
254 The grammar needs to be carefully constructed. Anywhere a NEWLINE
255 appears, we definitely don't ignore newlines.
256 So they should only appear after things that we want to be line-like;
258 variable = expression NEWLINE
259 is no good, because we don't want 'expression' to be linelike.
261 for SimpleStatements NEWLINE
262 ok? Not really because SimpleStatements in recursive.
265 1/ outdents and newlines must "close" any productions which started
266 at-or-after the matching indent, or after the matching start-line
268 The key idea is that the total set of tokens for any given symbol
270 - not include an OUT without the matching IN. If the IN was at the
271 start, the OUT must be at the end.
272 - not include a NEWLINE unless it started at or before start-of-line.
273 unless NEWLINEs are being ignored.
274 (unless the symbol includes only the newline)
276 So when we see an 'OUT' we reduce anything we can until we can
277 cancel against and IN. If the IN we would cancel against is at
278 the start, reduce again if length==1.
280 When we see a NEWLINE, reduce if we can as long as length doesn't
281 go beyond start-of-line.
283 2/ NEWLINES are ignored after an indent if they are not "expected"
286 A symbol is 'linelike' if it is ever followed by a NEWLINE.
287 i.e. the symbol after it in some production begins with a NEWLINE.
288 (if "a -> b c" and "x -> a NEWLINE", then a is linelike, but c
291 If a state is followed by a linelike symbol, then it is a
293 Newlines are expected in starts_line states
295 SimpleStatements Block ComplexStatements
298 - track which symbols *start* with a newline
299 - deduce which symbols are linelike - they are followed by newline
300 - deduce which starts start_line - a lineline symbol follows DOT
301 - Make sure grammar handles newlines properly.
314 The "shift if you can, reduce if you cannot" rule means that an
315 unexpected symbol effectively terminates everything. But we don't
316 want to terminate an indent before we see the outdent. so that needs