Thursday, December 19, 2013

Reentrant parsers with Flex and Bison

By default, Flex and Bison generate old-school code with global variables. Trawling the manuals to find the options that generate re-entrant code is tedious, so I’m recording a small example that works on my system (which has Bison 2.7.12 and Flex 2.5.35).

Flex preamble

With these options, yylval is now a pointer. When converting existing Flex source, we mostly replace with yylval with *yylval.

%option outfile="flex.c" header-file="flex.h"
%option reentrant bison-bridge
%option noyywrap nounput noinput

%{
#include "bison.h"
%}

Bison preamble

%output  "bison.c"
%defines "bison.h"
%define api.pure full
%lex-param { yyscan_t scanner }
%parse-param { yyscan_t scanner }
%parse-param { val_callback_t callback }

%code requires {
#include "val.h"
#define YYSTYPE val_ptr
#ifndef YY_TYPEDEF_YY_SCANNER_T
#define YY_TYPEDEF_YY_SCANNER_T
typedef void *yyscan_t;
#endif
}

%code {
#include "flex.h"
int yyerror(yyscan_t scanner, val_callback_t callback, const char *msg) {
return 0;
}
}

val.h: semantic values

Rather than use the %union Bison declaration or similar, I prefer to define the type that holds the semantic values in a C source file. In general, I like to minimize the amount of C in the Bison and Flex source.

enum {
T_INT,
T_STRING,
};

struct val_s {
int type;
struct {
char *s;
struct val_s **kid;
int nkid;
};
};
typedef struct val_s *val_ptr;
typedef int (*val_callback_t)(val_ptr);

Calling the parser

Because the parser is no longer global, we must initialize and pass a yyscan_t variable to Bison and Flex.

  yyscan_t scanner;
if (yylex_init(&scanner)) exit(1);
YY_BUFFER_STATE buf = NULL;
// Uncomment to parse from a string instead of standard input.
// buf = yy_scan_string("input string", scanner);
int f(struct val_s *v) {
val_print_pre(v);
putchar('\n');
val_print_tree("", v);
val_free(v);
return 0;
}
if (yyparse(scanner, f)) exit(1);
yy_delete_buffer(buf, scanner);
yylex_destroy(scanner);

Complete example

See https://github.com/blynn/symple/, which reads an expression and pretty-prints it:

$ ./main 'sin(x)*cos(y) + e^x'
+(*(sin(x), cos(y)), ^(e, x))
+─┬─*─┬─sin───x
│ └─cos───y
└─^─┬─e
└─x