oceani/parsergen: assorted text improvements.

author NeilBrown <neil@brown.name>

Mon, 29 Jan 2018 07:29:03 +0000 (18:29 +1100)

committer NeilBrown <neil@brown.name>

Tue, 13 Feb 2018 01:56:57 +0000 (12:56 +1100)
author NeilBrown <neil@brown.name>
Mon, 29 Jan 2018 07:29:03 +0000 (18:29 +1100)
committer NeilBrown <neil@brown.name>
Tue, 13 Feb 2018 01:56:57 +0000 (12:56 +1100)
diff --git a/csrc/oceani.mdc b/csrc/oceani.mdc

index c3e5904f72296bd86be79346616b09c6428a5d4f..a9046750e5b658a0f26a3423bb2b396bd773d60a 100644 (file)
--- a/csrc/oceani.mdc
+++ b/csrc/oceani.mdc
@@ -531,12 +531,12 @@ subclasses, and to access these we need to be able to `cast` the
  Each different type of `exec` node needs a number of functions
  defined, a bit like methods.  We must be able to be able to free it,
  print it, analyse it and execute it.  Once we have specific `exec`
-types we will need to parse them to.  Let's take this a bit more
+types we will need to parse them too.  Let's take this a bit more
  slowly.
  
  #### Freeing
  
-The parser generator requires as `free_foo` function for each struct
+The parser generator requires a `free_foo` function for each struct
  that stores attributes and they will be `exec`s of subtypes there-of.
  So we need `free_exec` which can handle all the subtypes, and we need
  `free_binode`.
@@ -608,10 +608,10 @@ also want to know what sort of bracketing to use.
  
  #### Analysing
  
-As discusses, analysis involves propagating type requirements around
+As discussed, analysis involves propagating type requirements around
  the program and looking for errors.
  
-So propagate_types is passed a type that the `exec` is expected to return,
+So `propagate_types` is passed a type that the `exec` is expected to return,
  and returns the type that it does return, either of which can be `Vunknown`.
  An `ok` flag is passed by reference. It is set to `0` when an error is
  found, and `2` when any change is made.  If it remains unchanged at
@@ -858,7 +858,7 @@ Our first user of the `binode` will be expressions, and particularly
  Boolean expressions.  As I haven't implemented precedence in the
  parser generator yet, we need different names from each precedence
  level used by expressions.  The outer most or lowest level precedence
-are Boolean `or` `and`, and `not` which form and `Expression` our of `BTerm`s
+are Boolean `or` `and`, and `not` which form an `Expression` out of `BTerm`s
  and `BFact`s.
  
  ###### Binode types
@@ -972,7 +972,7 @@ expression operator.
                         $0->left = $<1;
                         $0->right = $<3;
                 }$
-               | Expr ${ $0 = $<1; }$
+       | Expr ${ $0 = $<1; }$
  
  ###### Grammar
  
@@ -1573,7 +1573,7 @@ function.
  ### The Conditional Statement
  
  This is the biggy and currently the only complex statement.
-This subsumes `if`, `while`, `do/while`, `switch`, and some part of
+This subsumes `if`, `while`, `do/while`, `switch`, and some parts of
  `for`.  It is comprised of a number of parts, all of which are
  optional though set combinations apply.
  
@@ -1584,7 +1584,7 @@ value.  `condpart` can fail to return any value if it simply executes
  to completion.  This is treated the same as returning True.
  
  If there is a `thenpart` it will be executed whenever the `condpart`
-or `cond` returns True (or does not return), but this will happen
+or `cond` returns True (or does not return any value), but this will happen
  *after* `dopart` (when present).
  
  If `elsepart` is present it will be executed at most once when the
@@ -1594,7 +1594,7 @@ executed when the condition returns a matching value.
  The particular sorts of values allowed in case parts has not yet been
  determined in the language design.
  
-The cond_statement cannot fit into a `binode` so a new `exec` is
+The `cond_statement` cannot fit into a `binode` so a new `exec` is
  defined.
  
  ###### exec type
@@ -1949,7 +1949,7 @@ defined.
  ### Finally the whole program.
  
  Somewhat reminiscent of Pascal a (current) Ocean program starts with
-the keyword "program" and list of variable names which are assigned
+the keyword "program" and a list of variable names which are assigned
  values from command line arguments.  Following this is a `block` which
  is the code to execute.
  
diff --git a/csrc/parsergen.mdc b/csrc/parsergen.mdc

index 6eb58bda6994d3547bf6b014e1774bab5e5c913c..7499e07c9e5996688cf0642ab1cea7c441cf0a30 100644 (file)
--- a/csrc/parsergen.mdc
+++ b/csrc/parsergen.mdc
@@ -63,7 +63,8 @@ If the `--tag` option is given, then any top level header that doesn't
  start with the tag is ignored, and the tag is striped from the rest.  So
  `--tag Foo`
  means that the three needed sections must be `Foo: header`, `Foo: code`,
-and `Foo: grammar`.
+and `Foo: grammar`.  The tag `calc` is used to extract the same calculator
+from this file.
  
  [mdcode]: mdcode.html
  [scanner]: scanner.html
@@ -437,7 +438,7 @@ the end.
                 return code;
         }
  
-Now we have all the bit we need to parse a full production.
+Now we have all the bits we need to parse a full production.
  
  ###### includes
         #include <memory.h>
@@ -552,7 +553,11 @@ where `START` is the first non-terminal given.
         p->head->first_production = g->production_count;
         array_add(&g->productions, &g->production_count, p);
  
-Now we are ready to read in the grammar.
+Now we are ready to read in the grammar.  We ignore comments
+and strings so that the marks which introduce them can be
+used as terminals in the grammar.  We don't ignore numbers
+even though we don't allow them as that causes the scanner
+to produce errors that the parser is better positioned to handle.
  
  ###### grammar_read
         static struct grammar *grammar_read(struct code_node *code)
@@ -725,10 +730,10 @@ or `-1` to indicate failure.
         }
  
  We will often want to form the union of two symsets.  When we do, we
-will often want to know if anything changed (as they might mean there
+will often want to know if anything changed (as that might mean there
  is more work to do).  So `symset_union` reports whether anything was
  added to the first set.  We use a slow quadratic approach as these
-sets don't really get very big.  If profiles shows this to be a problem is
+sets don't really get very big.  If profiles shows this to be a problem it
  can be optimised later.
  
         static int symset_union(struct symset *a, struct symset *b)
@@ -880,7 +885,7 @@ line-like symbol.
  To know which symbols are line-like, we first need to know which
  symbols start with a NEWLINE token.  Any symbol which is followed by a
  NEWLINE, or anything that starts with a NEWLINE, is deemed to be a line-like symbol.
-Certainly when trying to parse one of these we must take not of NEWLINEs.
+Certainly when trying to parse one of these we must take note of NEWLINEs.
  
  Clearly the `TK_newline` token can start with a NEWLINE.  Any symbol
  which is the head of a production that contains a starts-with-NEWLINE
@@ -889,7 +894,7 @@ starts-with-NEWLINE symbol.  We use a new field `can_eol` to record
  this attribute of symbols, and compute it in a repetitive manner
  similar to `set_nullable`.
  
-Once we have that, we can determine which symbols are `line_like` be
+Once we have that, we can determine which symbols are `line_like` by
  seeing which are followed by a `can_eol` symbol in any production.
  
  ###### symbol fields
@@ -950,7 +955,7 @@ we repeat the calculations until no changes happen, just like with
  `set_nullable`.  This makes use of the fact that `symset_union`
  reports if any change happens.
  
-The core of this which finds the "first" of part of a production body
+The core of this, which finds the "first" of part of a production body,
  will be reused for computing the "follow" sets, so we split it out
  into a separate function.
  
@@ -1170,6 +1175,8 @@ And now we can build the list of itemsets.  The lookup routine returns
  both a success flag and a pointer to where in the list an insert
  should happen, so we don't need to search a second time.
  
+FIXME: document min_prefix
+
  ###### declarations
         struct itemset {
                 struct itemset *next;
@@ -1670,7 +1677,7 @@ LR1 are also similar as they have FOLLOW or LA sets.
  LR0 conflicts are any state which have both a reducible item and
  a shiftable item, or two reducible items.
  
-LR05 conflicts only occurs if two possibly reductions exist,
+LR05 conflicts only occur if two possible reductions exist,
  as shifts always over-ride reductions.
  
  ###### conflict functions
@@ -1989,7 +1996,7 @@ can access the larger structure using pointer manipulation.
  The code fragment requires translation when written out.  Any `$N` needs to
  be converted to a reference either to that buffer (if `$0`) or to the
  structure returned by a previous reduction.  These pointers need to be cast
-to the appropriate type for each access.  All this is handling in
+to the appropriate type for each access.  All this is handled in
  `gen_code`.
  
  `gen_code` also allows symbol references to contain a '`<`' as in '`$<2`'.
@@ -2103,7 +2110,7 @@ recovery where individual stack frames might need to be freed.
  For this, the grammar author is required to defined a `free_XX` function for
  each structure that is used by a non-terminal.  `do_free` will call whichever
  is appropriate given a symbol number, and will call `free` (as is
-appropriate for tokens on any terminal symbol.
+appropriate for tokens) on any terminal symbol.
  
  ###### functions
  
@@ -2228,7 +2235,7 @@ grammar file).
  To be able to run `mdcode` and `scanner` on the grammar we need to memory
  map it.
  
-One we have extracted the code (with `mdcode`) we expect to find three
+Once we have extracted the code (with `mdcode`) we expect to find three
  sections: header, code, and grammar.  Anything else that is not
  excluded by the `--tag` option is an error.
  
@@ -2322,7 +2329,7 @@ the report finds conflicts we will exit with an error status.
                                 rv |= 1;
         }
  
-If a headers section is defined, we write it out and include a declaration
+If a "headers" section is defined, we write it out and include a declaration
  for the `parse_XX` function so it can be used from separate code.
  
         if (rv == 0 && hdr && outfile) {
@@ -2437,7 +2444,7 @@ The `state` is the most important one and guides the parsing process.  The
  freeing function.  The symbol leads us to the right free function through
  `do_free`.
  
-The `indents` count tracks the line indents with in the symbol or
+The `indents` count tracks the line indents with-in the symbol or
  immediately follow it.  These are used to allow indent information to
  guide parsing and error recovery.
  
@@ -2634,10 +2641,10 @@ do with them.
  `TK_in` tokens are easy: we simply update indent count in the top stack frame to
  record how many indents there are following the previous token.
  
-`TK_out` tokens must either be canceled against an indent count
+`TK_out` tokens must be canceled against an indent count
  within the stack.  If we can reduce some symbols that are all since
  the most recent indent, then we do that first.  If the minimum prefix
-of the current state then extents back before the most recent indent,
+of the current state then extends back before the most recent indent,
  that indent can be cancelled.  If the minimum prefix is shorter then
  the indent is premature and we must start error handling, which
  currently doesn't work at all.
@@ -2918,7 +2925,7 @@ The obvious example for a parser is a calculator.
  As `scanner` provides number parsing function using `libgmp` is it not much
  work to perform arbitrary rational number calculations.
  
-This calculator takes one expression, or an equality test per line.  The
+This calculator takes one expression, or an equality test, per line.  The
  results are printed and if any equality test fails, the program exits with
  an error.
author	NeilBrown <neil@brown.name>
	Mon, 29 Jan 2018 07:29:03 +0000 (18:29 +1100)
committer	NeilBrown <neil@brown.name>
	Tue, 13 Feb 2018 01:56:57 +0000 (12:56 +1100)
csrc/oceani.mdc		patch \| blob \| history
csrc/parsergen.mdc		patch \| blob \| history