Thursday, December 19, 2013

Reentrant parsers with Flex and Bison

By default, Flex and Bison generate old-school code with global variables. Trawling the manuals to find the options that generate re-entrant code is tedious, so I’m recording a small example that works on my system (which has Bison 2.7.12 and Flex 2.5.35).

Flex preamble

With these options, yylval is now a pointer. When converting existing Flex source, we mostly replace with yylval with *yylval.

%option outfile="flex.c" header-file="flex.h"
%option reentrant bison-bridge
%option noyywrap nounput noinput

%{
#include "bison.h"
%}

Bison preamble

%output  "bison.c"
%defines "bison.h"
%define api.pure full
%lex-param { yyscan_t scanner }
%parse-param { yyscan_t scanner }
%parse-param { val_callback_t callback }

%code requires {
#include "val.h"
#define YYSTYPE val_ptr
#ifndef YY_TYPEDEF_YY_SCANNER_T
#define YY_TYPEDEF_YY_SCANNER_T
typedef void *yyscan_t;
#endif
}

%code {
#include "flex.h"
int yyerror(yyscan_t scanner, val_callback_t callback, const char *msg) {
return 0;
}
}

val.h: semantic values

Rather than use the %union Bison declaration or similar, I prefer to define the type that holds the semantic values in a C source file. In general, I like to minimize the amount of C in the Bison and Flex source.

enum {
T_INT,
T_STRING,
};

struct val_s {
int type;
struct {
char *s;
struct val_s **kid;
int nkid;
};
};
typedef struct val_s *val_ptr;
typedef int (*val_callback_t)(val_ptr);

Calling the parser

Because the parser is no longer global, we must initialize and pass a yyscan_t variable to Bison and Flex.

  yyscan_t scanner;
if (yylex_init(&scanner)) exit(1);
YY_BUFFER_STATE buf = NULL;
// Uncomment to parse from a string instead of standard input.
// buf = yy_scan_string("input string", scanner);
int f(struct val_s *v) {
val_print_pre(v);
putchar('\n');
val_print_tree("", v);
val_free(v);
return 0;
}
if (yyparse(scanner, f)) exit(1);
yy_delete_buffer(buf, scanner);
yylex_destroy(scanner);

Complete example

See https://github.com/blynn/symple/, which reads an expression and pretty-prints it:

$ ./main 'sin(x)*cos(y) + e^x'
+(*(sin(x), cos(y)), ^(e, x))
+─┬─*─┬─sin───x
│ └─cos───y
└─^─┬─e
└─x

2 comments:

Franklin Chen said...

Blast from the past! I haven't used Flex or Bison since almost 20 years ago when I used them for in-house tools and products involving parsing. I remember being happy about the Flex "reentrant" option, and then also getting into C++ and using C++ for both Flex and Bison to create nicely encapsulated parsers. I should still have all that code around somewhere backed up. In 1995 I did switch away from C/C++ when possible though, and ended up creating my in-house parsing apps in ML (mlyacc, mllex) and then Haskell (Happy) instead... And now I'm still parsing, but using Scala. It never ends.

Ben Lynn said...

My experiences have been similar, e.g. I wrote a goyacc-compatible lexer for Go (http://cs.stanford.edu/~blynn/nex). But I still use C when I want speed.