ocean-lang.org Git - ocean/blob - csrc/mdcode.mdc

   1 # mdcode: extract C code from a _markdown_ file.
   2
   3 _markdown_ is a popular format for simple text markup which can easily
   4 be converted to HTML.  As it allows easy indication of sections of
   5 code, it is quite suitable for use in literate programming.  This file
   6 is an example of that usage.
   7
   8 The code included below provides two related functionalities.
   9 Firstly it provides a library routine for extracting code out of a
  10 _markdown_ file, so that other routines might make use of it.
  11
  12 Secondly it provides a simple client of this routine which extracts
  13 1 or more C-language files from a markdown document so they can be
  14 passed to a C compiler.  These two combined to make a tool that is needed
  15 to compile this tool.  Yes, this is circular.  A prototype tool was
  16 used for the first extraction.
  17
  18 The tool provided is described as specific to the C language as it
  19 generates
  20
  21 ##### Example: a _line_ command
  22
  23         #line __line-number__ __file-name__
  24
  25 lines so that the C compiler will report where in the markdown file
  26 any error is found.  This tool is suitable for any other language
  27 which allows the same directive, or will treat it as a comment.
  28
  29 ## Literate Details
  30
  31 Literate programming is more than just including comments with the
  32 code, even nicely formatted comments.  It also involves presenting the
  33 code in an order that makes sense to a human, rather than an order
  34 that makes sense to a compiler.  For this reason a core part of any
  35 literate programming tool is the ability to re-arrange the code found
  36 in the document into a different order in the final code file - or
  37 files.  This requires some form of linkage to be encoded.
  38
  39 The approach taken here is focused around section headings - of any
  40 depth.
  41
  42 All the code in any section is treated as a single sequential
  43 collection of code, and is named by the section that it is in.  If
  44 multiple sections have the same name, then the code blocks in all of
  45 them are joined together in the order they appear in the document.
  46
  47 A code section can contain a special marker which starts with 2
  48 hashes: __##__.
  49 The text after the marker must be the name of some section which
  50 contains code.  Code from that section will be interpolated in place
  51 of the marker, and will be indented to match the indent of the marker.
  52
  53 It is not permitted for the same code to be interpolated multiple
  54 times.  Allowing this might make some sense, but it is probably a
  55 mistake, and prohibiting it make some of the code a bit cleaner.
  56
  57 Equally, every section of code should be interpolated at least once -
  58 with two exceptions.  These exceptions are imposed by the tool, not
  59 the library.  A different client could impose different rules on the
  60 names of top-level code sections.
  61
  62 The first exception we have already seen.  A section name starting
  63 __Example:__ indicates code that is not to be included in the final product.
  64
  65 The second exception is for the top level code sections which will be
  66 written to files.  Again these are identified by their section name.
  67 This must start with __File:__  the following text (after optional
  68 spaces) will be used as a file name.
  69
  70 Any section containing code that does not start __Example:__ or
  71 __File:__ must be included in some other section exactly once.
  72
  73 ### Multiple files
  74
  75 Allowing multiple top level code sections which name different files
  76 means that one _markdown_ document can describe several files.  This
  77 is very useful with the C language where a program file and a header
  78 file might be related.  For the present document we will have a header
  79 file and two code files, one with the library content and one for the
  80 tool.
  81
  82 It will also be very convenient to create a `makefile` fragment to
  83 ensure the code is compiled correctly.  A simple `make -f mdcode.mk`
  84 will "do the right thing".
  85
  86 ### File: mdcode.mk
  87
  88         CFLAGS += -Wall -g
  89         all::
  90         mdcode.h libmdcode.c md2c.c mdcode.mk :  mdcode.mdc
  91                 ./md2c mdcode.mdc
  92
  93
  94 ### File: mdcode.h
  95
  96         ## exported types
  97         ## exported functions
  98
  99 ### File: libmdcode.c
 100         #define _GNU_SOURCE
 101         #include <unistd.h>
 102         #include <stdlib.h>
 103         #include <stdio.h>
 104
 105         #include "mdcode.h"
 106         ## internal includes
 107         ## private types
 108         ## internal functions
 109
 110 ### File: mdcode.mk
 111
 112         all :: libmdcode.o
 113         libmdcode.o : libmdcode.c mdcode.h
 114                 $(CC) $(CFLAGS) -c libmdcode.c
 115
 116
 117 ### File: md2c.c
 118
 119         #include <unistd.h>
 120         #include <stdlib.h>
 121
 122         #include "mdcode.h"
 123
 124         ## client includes
 125         ## client functions
 126
 127 ### File: mdcode.mk
 128
 129         all :: md2c
 130         md2c : md2c.o libmdcode.o
 131                 $(CC) $(CFLAGS) -o md2c md2c.o libmdcode.o
 132         md2c.o : md2c.c mdcode.h
 133                 $(CC) $(CFLAGS) -c md2c.c
 134
 135 ## Data Structures
 136
 137 As the core purpose of _mdcode_ is to discover and re-arrange blocks
 138 of text, it makes sense to map the whole document file into memory and
 139 produce a data structure which lists various parts of the file in the
 140 appropriate order.  Each node in this structure will have some text
 141 from the document, a child pointer, and a next pointer, any of which
 142 might not be present.  The text is most easily stored as a pointer and a
 143 length.  We'll call this a `text`
 144
 145 A list of these `code_nodes` will belong to each section and it will
 146 be useful to have a separate `section` data structure to store the
 147 list of `code_nodes`, the section name, and some other information.
 148
 149 This other information will include a reference counter so we can
 150 ensure proper referencing, and an `indent` depth.  As referenced
 151 content can have an extra indent added, we need to know what that is.
 152 The `code_node` will also have an `indent` depth which eventually gets
 153 set to the sum for the indents from all references on the path from
 154 the root.
 155
 156 Finally we need to know if the `code_node` was recognised by being
 157 indented or not.  If it was, the client of this data will want to
 158 strip of the leading tab or 4 spaces.  Hence a `needs_strip` flag is
 159 needed.
 160
 161 ##### exported types
 162
 163         struct text {
 164                 char *txt;
 165                 int len;
 166         };
 167
 168         struct section {
 169                 struct text section;
 170                 struct code_node *code;
 171                 struct section *next;
 172         };
 173
 174         struct code_node {
 175                 struct text code;
 176                 int indent;
 177                 int line_no;
 178                 int needs_strip;
 179                 struct code_node *next;
 180                 struct section *child;
 181         };
 182
 183 ##### private types
 184
 185         struct psection {
 186                 struct section;
 187                 struct code_node *last;
 188                 int refcnt;
 189                 int indent;
 190         };
 191
 192 You will note that the `struct psection` contains an anonymous `struct
 193 section` embedded at the start.  To make this work right, GCC
 194 requires the `-fplan9-extensions` flag.
 195
 196 ##### File: mdcode.mk
 197
 198         CFLAGS += -fplan9-extensions
 199
 200 ### Manipulating the node
 201
 202 Though a tree with `next` and `child` links is the easiest way to
 203 assemble the various code sections, it is not the easiest form for
 204 using them.  For that a simple list would be best.
 205
 206 So once we have a fully linked File section we will want to linearize
 207 it, so that the `child` links become `NULL` and the `next` links will
 208 find everything required.  It is at this stage that the requirements
 209 that each section is linked only once becomes import.
 210
 211 `code_linearize` will merge the `code_node`s from any child into the
 212 given `code_node`.  As it does this it sets the 'indent' field for
 213 each `code_node`.
 214
 215 Note that we don't clear the section's `last` pointer, even though
 216 it no longer owns any code.  This allows subsequent code to see if a
 217 section ever had any code, and to report an error if a section is
 218 referenced but not defined.
 219
 220 ##### internal functions
 221
 222         static void code_linearize(struct code_node *code)
 223         {
 224                 struct code_node *t;
 225                 for (t = code; t; t = t->next)
 226                         t->indent = 0;
 227                 for (; code; code = code->next)
 228                         if (code->child) {
 229                                 struct code_node *next = code->next;
 230                                 struct psection *pchild =
 231                                         (struct psection *)code->child;
 232                                 int indent = pchild->indent;
 233                                 code->next = code->child->code;
 234                                 code->child->code = NULL;
 235                                 code->child = NULL;
 236                                 for (t = code; t->next; t = t->next)
 237                                         t->next->indent = code->indent + indent;
 238                                 t->next = next;
 239                         }
 240         }
 241
 242 Once a client has made use of a linearized code set, it will probably
 243 want to free it.
 244
 245         void code_free(struct code_node *code)
 246         {
 247                 while (code) {
 248                         struct code_node *this;
 249                         if (code->child)
 250                                 code_linearize(code);
 251                         this = code;
 252                         code = code->next;
 253                         free(this);
 254                 }
 255         }
 256
 257 ##### exported functions
 258
 259         void code_free(struct code_node *code);
 260
 261 ### Building the tree
 262
 263 As we parse the document there are two things we will want to do to
 264 node trees: add some text or add a reference.  We'll assume for now
 265 that the relevant section structures have been found, and will just
 266 deal with the `code_node`.
 267
 268 Adding text simply means adding another node.  We will never have
 269 empty nodes, even if the last node only has a child, new text must go
 270 in a new node.
 271
 272 ##### internal functions
 273
 274         static void code_add_text(struct psection *where, struct text txt,
 275                                   int line_no, int needs_strip)
 276         {
 277                 struct code_node *n;
 278                 if (txt.len == 0)
 279                         return;
 280                 n = malloc(sizeof(*n));
 281                 n->code = txt;
 282                 n->indent = 0;
 283                 n->line_no = line_no;
 284                 n->needs_strip = needs_strip;
 285                 n->next = NULL;
 286                 n->child = NULL;
 287                 if (where->last)
 288                         where->last->next = n;
 289                 else
 290                         where->code = n;
 291                 where->last = n;
 292         }
 293
 294 However when adding a link, we might be able to include it in the last
 295 `code_node` if it currently only has text.
 296
 297         void code_add_link(struct psection *where, struct psection *to,
 298                            int indent)
 299         {
 300                 struct code_node *n;
 301
 302                 to->indent = indent;
 303                 to->refcnt++;   // this will be checked elsewhere
 304                 if (where->last && where->last->child == NULL) {
 305                         where->last->child = to;
 306                         return;
 307                 }
 308                 n = malloc(sizeof(*n));
 309                 n->code.len = 0;
 310                 n->indent = 0;
 311                 n->line_no = 0;
 312                 n->next = NULL;
 313                 n->child = to;
 314                 if (where->last)
 315                         where->last->next = n;
 316                 else
 317                         where->code = n;
 318                 where->last = n;
 319         }
 320
 321 ### Finding sections
 322
 323 Now we need a lookup table to be able to find sections by name.
 324 Something that provides an `n*log(N)` search time is probably
 325 justified, but for now I want a minimal stand-alone program so a
 326 linked list managed by insertion-sort will do.  As a comparison
 327 function it is easiest to sort based on length before content.  So
 328 sections won't be in standard lexical order, but that isn't important.
 329
 330 If we cannot find a section, we simply want to create it.  This allows
 331 sections and references to be created in any order.  Sections with
 332 no references or no content will cause a warning eventually.
 333
 334 #### internal functions
 335
 336         static int text_cmp(struct text a, struct text b)
 337         {
 338                 if (a.len != b.len)
 339                         return a.len - b.len;
 340                 return strncmp(a.txt, b.txt, a.len);
 341         }
 342
 343         static struct psection *section_find(struct psection **list, struct text name)
 344         {
 345                 struct psection *new;
 346                 while (*list) {
 347                         int cmp = text_cmp((*list)->section, name);
 348                         if (cmp == 0)
 349                                 return *list;
 350                         if (cmp > 0)
 351                                 break;
 352                         list = (struct psection **)&((*list)->next);
 353                 }
 354                 /* Add this section */
 355                 new = malloc(sizeof(*new));
 356                 new->next = *list;
 357                 *list = new;
 358                 new->section = name;
 359                 new->code = NULL;
 360                 new->last = NULL;
 361                 new->refcnt = 0;
 362                 new->indent = 0;
 363                 return new;
 364         }
 365
 366 ## Parsing the _markdown_
 367
 368 Parsing markdown is fairly easy, though there are complications.
 369
 370 The document is divided into "paragraphs" which are mostly separated by blank
 371 lines (which may contain white space).  The first few characters of
 372 the first line of a paragraph determine the type of paragraph.  For
 373 our purposes we are only interested in list paragraphs, code
 374 paragraphs, section headings, and everything else.  Section headings
 375 are single-line paragraphs and so do not require a preceding or
 376 following blank line.
 377
 378 Section headings start with 1 or more hash characters (__#__).  List
 379 paragraphs start with hyphen, asterisk, plus, or digits followed by a
 380 period.  Code paragraphs aren't quite so easy.
 381
 382 The "standard" code paragraph starts with 4 or more spaces, or a tab.
 383 However if the previous paragraph was a list paragraph, then those
 384 spaces indicate another  paragraph in the same list item, and 8 or
 385 more spaces are required.  Unless a nested list is in effect, in
 386 which case 12 or more are need.   Unfortunately not all _markdown_
 387 parsers agree on nested lists.
 388
 389 Two alternate styles for marking code are in active use.  "Github" uses
 390 three backticks(_`` ``` ``_), while "pandoc" uses three or more tildes
 391 (_~~~_).  In these cases the code should not be indented.
 392
 393 Trying to please everyone as much as possible, this parser will handle
 394 everything except for code inside lists.
 395
 396 So an indented (4+) paragraph after a list paragraph is always a list
 397 paragraph, otherwise it is a code paragraph.  A paragraph that starts
 398 with three backticks or three tildes is code which continues until a
 399 matching string of backticks or tildes.
 400
 401 ### Skipping bits
 402
 403 While walking the document looking for various markers we will *not*
 404 use the `struct text` introduced earlier as advancing that requires
 405 updating both start and length which feels clumsy.  Instead we will
 406 carry `pos` and `end` pointers, only the first of which needs to
 407 change.
 408
 409 So to start, we need to skip various parts of the document.  `lws`
 410 stands for "Linear White Space" and is a term that comes from the
 411 Email RFCs (e.g. RFC822).  `line` and `para` are self explanatory.
 412 Note that `skip_para` needs to update the current line number.
 413 `skip_line` doesn't but every caller should.
 414
 415 #### internal functions
 416
 417         static char *skip_lws(char *pos, char *end)
 418         {
 419                 while (pos < end && (*pos == ' ' || *pos == '\t'))
 420                         pos++;
 421                 return pos;
 422         }
 423
 424         static char *skip_line(char *pos, char *end)
 425         {
 426                 while (pos < end && *pos != '\n')
 427                         pos++;
 428                 if (pos < end)
 429                         pos++;
 430                 return pos;
 431         }
 432
 433         static char *skip_para(char *pos, char *end, int *line_no)
 434         {
 435                 /* Might return a pointer to a blank line, as only
 436                  * one trailing blank line is skipped
 437                  */
 438                 if (*pos == '#') {
 439                         pos = skip_line(pos, end);
 440                         (*line_no) += 1;
 441                         return pos;
 442                 }
 443                 while (pos < end &&
 444                        *pos != '#' &&
 445                        *(pos = skip_lws(pos, end)) != '\n') {
 446                         pos = skip_line(pos, end);
 447                         (*line_no) += 1;
 448                 }
 449                 if (pos < end && *pos == '\n') {
 450                         pos++;
 451                         (*line_no) += 1;
 452                 }
 453                 return pos;
 454         }
 455
 456 ### Recognising things
 457
 458 Recognising a section header is trivial and doesn't require a
 459 function.  However we need to extract the content of a section header
 460 as a `struct text` for passing to `section_find`.
 461 Recognising the start of a new list is fairly easy.  Recognising the
 462 start (and end) of code is a little messy so we provide a function for
 463 matching the first few characters, which has a special case for "4
 464 spaces or tab".
 465
 466 #### internal includes
 467
 468         #include  <ctype.h>
 469         #include  <string.h>
 470
 471 #### internal functions
 472
 473         static struct text take_header(char *pos, char *end)
 474         {
 475                 struct text section;
 476
 477                 while (pos < end && *pos == '#')
 478                         pos++;
 479                 while (pos < end && *pos == ' ')
 480                         pos++;
 481                 section.txt = pos;
 482                 while (pos < end && *pos != '\n')
 483                         pos++;
 484                 while (pos > section.txt &&
 485                        (pos[-1] == '#' || pos[-1] == ' '))
 486                         pos--;
 487                 section.len = pos - section.txt;
 488                 return section;
 489         }
 490
 491         static int is_list(char *pos, char *end)
 492         {
 493                 if (strchr("-*+", *pos))
 494                         return 1;
 495                 if (isdigit(*pos)) {
 496                         while (pos < end && isdigit(*pos))
 497                                 pos += 1;
 498                         if  (pos < end && *pos == '.')
 499                                 return 1;
 500                 }
 501                 return 0;
 502         }
 503
 504         static int matches(char *start, char *pos, char *end)
 505         {
 506                 if (start == NULL)
 507                         return matches("\t", pos, end) ||
 508                                matches("    ", pos, end);
 509                 return (pos + strlen(start) < end &&
 510                         strncmp(pos, start, strlen(start)) == 0);
 511         }
 512
 513 ### Extracting the code
 514
 515 Now that we can skip paragraphs and recognise what type each paragraph
 516 is, it is time to parse the file and extract the code.  We'll do this
 517 in two parts, first we look at what to do with some code once we
 518 find it, and then how to actually find it.
 519
 520 When we have some code, we know where it is, what the end marker
 521 should look like, and which section it is in.
 522
 523 There are two sorts of end markers: the presence of a particular
 524 string, or the absence of an indent.  We will use a string to
 525 represent a presence, and a `NULL` to represent the absence.
 526
 527 While looking at code we don't think about paragraphs are all - just
 528 look for a line that starts with the right thing.
 529 Every line that is still code then needs to be examined to see if it
 530 is a section reference.
 531
 532 When a section reference is found, all preceding code (if any) must be
 533 added to the current section, then the reference is added.
 534
 535 When we do find the end of the code, all text that we have found but
 536 not processed needs to be saved too.
 537
 538 When adding a reference we need to set the `indent`.  This is the
 539 number of spaces (counting 8 for tabs) after the natural indent of the
 540 code (which is a tab or 4 spaces).  We use a separate function `count_spaces`
 541 for that.
 542
 543 #### internal functions
 544
 545         static int count_space(char *sol, char *p)
 546         {
 547                 int c = 0;
 548                 while (sol < p) {
 549                         if (sol[0] == ' ')
 550                                 c++;
 551                         if (sol[0] == '\t')
 552                                 c+= 8;
 553                         sol++;
 554                 }
 555                 return c;
 556         }
 557
 558
 559         static char *take_code(char *pos, char *end, char *marker,
 560                                struct psection **table, struct text section,
 561                                int *line_nop)
 562         {
 563                 char *start = pos;
 564                 int line_no = *line_nop;
 565                 int start_line = line_no;
 566                 struct psection *sect;
 567
 568                 sect = section_find(table, section);
 569
 570                 while (pos < end) {
 571                         char *sol, *t;
 572                         struct text ref;
 573
 574                         if (marker && matches(marker, pos, end))
 575                                 break;
 576                         if (!marker &&
 577                             (skip_lws(pos, end))[0] != '\n' &&
 578                             !matches(NULL, pos, end))
 579                                 /* Paragraph not indented */
 580                                 break;
 581
 582                         /* Still in code - check for reference */
 583                         sol = pos;
 584                         if (!marker) {
 585                                 if (*sol == '\t')
 586                                         sol++;
 587                                 else if (strcmp(sol, "    ") == 0)
 588                                         sol += 4;
 589                         }
 590                         t = skip_lws(sol, end);
 591                         if (t[0] != '#' || t[1] != '#') {
 592                                 /* Just regular code here */
 593                                 pos = skip_line(sol, end);
 594                                 line_no++;
 595                                 continue;
 596                         }
 597
 598                         if (pos > start) {
 599                                 struct text txt;
 600                                 txt.txt = start;
 601                                 txt.len = pos - start;
 602                                 code_add_text(sect, txt, start_line,
 603                                               marker == NULL);
 604                         }
 605                         ref = take_header(t, end);
 606                         if (ref.len) {
 607                                 struct psection *refsec = section_find(table, ref);
 608                                 code_add_link(sect, refsec, count_space(sol, t));
 609                         }
 610                         pos = skip_line(t, end);
 611                         line_no++;
 612                         start = pos;
 613                         start_line = line_no;
 614                 }
 615                 if (pos > start) {
 616                         struct text txt;
 617                         txt.txt = start;
 618                         txt.len = pos - start;
 619                         code_add_text(sect, txt, start_line,
 620                                       marker == NULL);
 621                 }
 622                 if (marker) {
 623                         pos = skip_line(pos, end);
 624                         line_no++;
 625                 }
 626                 *line_nop = line_no;
 627                 return pos;
 628         }
 629
 630 ### Finding the code
 631
 632 It is when looking for the code that we actually use the paragraph
 633 structure.  We need to recognise section headings so we can record the
 634 name, list paragraphs so we can ignore indented follow-on paragraphs,
 635 and the three different markings for code.
 636
 637 #### internal functions
 638
 639         static struct psection *code_find(char *pos, char *end)
 640         {
 641                 struct psection *table = NULL;
 642                 int in_list = 0;
 643                 int line_no = 1;
 644                 struct text section = {0};
 645
 646                 while (pos < end) {
 647                         if (pos[0] == '#') {
 648                                 section = take_header(pos, end);
 649                                 in_list = 0;
 650                                 pos = skip_line(pos, end);
 651                                 line_no++;
 652                         } else if (is_list(pos, end)) {
 653                                 in_list = 1;
 654                                 pos = skip_para(pos, end, &line_no);
 655                         } else if (!in_list && matches(NULL, pos, end)) {
 656                                 pos = take_code(pos, end, NULL, &table,
 657                                                 section, &line_no);
 658                         } else if (matches("```", pos, end)) {
 659                                 in_list = 0;
 660                                 pos = skip_line(pos, end);
 661                                 line_no++;
 662                                 pos = take_code(pos, end, "```", &table,
 663                                                 section, &line_no);
 664                         } else if (matches("~~~", pos, end)) {
 665                                 in_list = 0;
 666                                 pos = skip_line(pos, end);
 667                                 line_no++;
 668                                 pos = take_code(pos, end, "~~~", &table,
 669                                                 section, &line_no);
 670                         } else {
 671                                 if (!isspace(*pos))
 672                                         in_list = 0;
 673                                 pos = skip_para(pos, end, &line_no);
 674                         }
 675                 }
 676                 return table;
 677         }
 678
 679 ### Returning the code
 680
 681 Having found all the code blocks and gathered them into a list of
 682 section, we are now ready to return them to the caller.  This is where
 683 to perform consistency checks, like at most one reference and at least
 684 one definition for each section.
 685
 686 All the sections with no references are returned in a list for the
 687 caller to consider.  The are linearized first so that the substructure
 688 is completely hidden -- except for the small amount of structure
 689 displayed in the line numbers.
 690
 691 To return errors, we have the caller pass a function which takes an
 692 error message - a `code_err_fn`.
 693
 694 #### exported types
 695
 696         typedef void (*code_err_fn)(char *msg);
 697
 698 #### internal functions
 699         struct section *code_extract(char *pos, char *end, code_err_fn error)
 700         {
 701                 struct psection *table;
 702                 struct section *result = NULL;
 703                 struct section *tofree = NULL;
 704
 705                 table = code_find(pos, end);
 706
 707                 while (table) {
 708                         struct psection *t = (struct psection*)table->next;
 709                         if (table->last == NULL) {
 710                                 char *msg;
 711                                 asprintf(&msg,
 712                                         "Section \"%.*s\" is referenced but not declared",
 713                                          table->section.len, table->section.txt);
 714                                 error(msg);
 715                                 free(msg);
 716                         }
 717                         if (table->refcnt == 0) {
 718                                 /* Root-section,  return it */
 719                                 table->next = result;
 720                                 result = table;
 721                                 code_linearize(result->code);
 722                         } else {
 723                                 table->next = tofree;
 724                                 tofree = table;
 725                                 if (table->refcnt > 1) {
 726                                         char *msg;
 727                                         asprintf(&msg,
 728                                                  "Section \"%.*s\" referenced multiple times (%d).",
 729                                                  table->section.len, table->section.txt,
 730                                                  table->refcnt);
 731                                         error(msg);
 732                                         free(msg);
 733                                 }
 734                         }
 735                         table = t;
 736                 }
 737                 while (tofree) {
 738                         struct section *t = tofree->next;
 739                         free(tofree);
 740                         tofree = t;
 741                 }
 742                 return result;
 743         }
 744
 745 ##### exported functions
 746
 747         struct section *code_extract(char *pos, char *end, code_err_fn error);
 748
 749
 750 ## Using the library
 751
 752 Now that we can extract code from a document and link it all together
 753 it is time to do something with that code.  Firstly we need to print
 754 it out.
 755
 756 ### Printing the Code
 757
 758 Printing is mostly straight forward - we just walk the list and print
 759 the code sections, adding whatever indent is required for each line.
 760 However there is a complication (isn't there always)?
 761
 762 For code that was recognised because the paragraph was indented, we
 763 need to strip that indent first.  For other code, we don't.
 764
 765 The approach taken here is simple, though it could arguably be wrong
 766 in some unlikely cases.  So it might need to be fixed later.
 767
 768 If the first line of a code block is indented, then either one tab or
 769 4 spaces are striped from every non-blank line.
 770
 771 This could go wrong if the first line of a code block marked by
 772 _`` ``` ``_ is indented.  To overcome this we would need to
 773 record some extra state in each `code_node`.  For now we won't bother.
 774
 775 The indents we insert will all be spaces.  This might not work well
 776 for `Makefiles`.
 777
 778 ##### client functions
 779
 780         static void code_print(FILE *out, struct code_node *node,
 781                                char *fname)
 782         {
 783                 for (; node; node = node->next) {
 784                         char *c = node->code.txt;
 785                         int len = node->code.len;
 786
 787                         if (!len)
 788                                 continue;
 789
 790                         fprintf(out, "#line %d \"%s\"\n",
 791                                 node->line_no, fname);
 792                         while (len && *c) {
 793                                 fprintf(out, "%*s", node->indent, "");
 794                                 if (node->needs_strip) {
 795                                         if (*c == '\t' && len > 1) {
 796                                                 c++;
 797                                                 len--;
 798                                         } else if (strncmp(c, "    ", 4) == 0 && len > 4) {
 799                                                 c += 4;
 800                                                 len-= 4;
 801                                         }
 802                                 }
 803                                 do {
 804                                         fputc(*c, out);
 805                                         c++;
 806                                         len--;
 807                                 } while (len && c[-1] != '\n');
 808                         }
 809                 }
 810         }
 811
 812 ### Bringing it all together
 813
 814 We are just about ready for the `main` function of the tool which will
 815 extract all this lovely code and compile it.  Just one helper is still
 816 needed.
 817
 818 #### Handling filenames
 819
 820 Section names are stored in `struct text` which is not `nul`
 821 terminated.  Filenames passed to `open` need to be null terminated.
 822 So we need to convert one to the other, and strip the leading `File:`
 823 of while we are at it.
 824
 825 ##### client functions
 826
 827         static void copy_fname(char *name, int space, struct text t)
 828         {
 829                 char *sec = t.txt;
 830                 int len = t.len;
 831                 name[0] = 0;
 832                 if (len < 5 || strncmp(sec, "File:", 5) != 0)
 833                         return;
 834                 sec += 5;
 835                 len -= 5;
 836                 while (len && sec[0] == ' ') {
 837                         sec++;
 838                         len--;
 839                 }
 840                 if (len >= space)
 841                         len = space - 1;
 842                 strncpy(name, sec, len);
 843                 name[len] = 0;
 844         }
 845
 846 #### Main
 847
 848 And now we take a single file name, extract the code, and if there are
 849 no error we write out a file for each appropriate code section.  And
 850 we are done.
 851
 852
 853 ##### client includes
 854
 855         #include <fcntl.h>
 856         #include <errno.h>
 857         #include <sys/mman.h>
 858         #include <string.h>
 859         #include <stdio.h>
 860
 861 ##### client functions
 862
 863         static int errs;
 864         static void pr_err(char *msg)
 865         {
 866                 errs++;
 867                 fprintf(stderr, "%s\n", msg);
 868         }
 869
 870         int main(int argc, char *argv[])
 871         {
 872                 int fd;
 873                 size_t len;
 874                 char *file;
 875                 struct section *table, *s, *prev;
 876
 877                 errs = 0;
 878                 if (argc != 2) {
 879                         fprintf(stderr, "Usage: mdcode file.mdc\n");
 880                         exit(2);
 881                 }
 882                 fd = open(argv[1], O_RDONLY);
 883                 if (fd < 0) {
 884                         fprintf(stderr, "mdcode: cannot open %s: %s\n",
 885                                 argv[1], strerror(errno));
 886                         exit(1);
 887                 }
 888                 len = lseek(fd, 0, 2);
 889                 file = mmap(NULL, len, PROT_READ, MAP_SHARED, fd, 0);
 890                 table = code_extract(file, file+len, pr_err);
 891
 892                 for (s = table; s;
 893                         (code_free(s->code), prev = s, s = s->next, free(prev))) {
 894                         FILE *fl;
 895                         char fname[1024];
 896                         if (strncmp(s->section.txt, "Example:", 8) == 0)
 897                                 continue;
 898                         if (strncmp(s->section.txt, "File:", 5) != 0) {
 899                                 fprintf(stderr, "Unreferenced section is not a file name: %.*s\n",
 900                                         s->section.len, s->section.txt);
 901                                 errs++;
 902                                 continue;
 903                         }
 904                         copy_fname(fname, sizeof(fname), s->section);
 905                         if (fname[0] == 0) {
 906                                 fprintf(stderr, "Missing file name at:%.*s\n",
 907                                         s->section.len, s->section.txt);
 908                                 errs++;
 909                                 continue;
 910                         }
 911                         fl = fopen(fname, "w");
 912                         if (!fl) {
 913                                 fprintf(stderr, "Cannot create %s: %s\n",
 914                                         fname, strerror(errno));
 915                                 errs++;
 916                                 continue;
 917                         }
 918                         code_print(fl, s->code, argv[1]);
 919                         fclose(fl);
 920                 }
 921                 exit(!!errs);
 922         }
 923