home mail me! RSS (2.0) feed

BNFC: language-agnostic parser generator generator

Parsing code is hard, and strictly limited to The Chosen Few. Right? No, wrong!

I know you are aware of some compiler compilers or parser generators out there – i.e., frameworks that let you specify annotated grammar descriptions, often using a BNF (Backus Naur Form) kind of syntax. The most famous parser generator is Yacc, which together with its more literal brother, Lex, constitute a framework for generating a C application that parses a described grammar, along with custom actions (such as actually interpreting the code…) For GNU aficionados out there, look at their alternatives Bison and Flex.

A description for a simple arithmetic language could look like something like this, when using Yacc (or Bison):

BISON:
%left ‘+’ ‘-‘
%left ‘*’ ‘/’

program:
       program stat
     |
     ;
stat:
       PRINT exp ‘;’ { printf(“%d”, $2); }
     ;
exp:
       NUM     { $$ = $1; }
     | exp ‘+’ exp { $$ = $1 + $2; }
     | exp ‘-‘ exp { $$ = $1 - $2; }
     | exp ‘*’ exp { $$ = $1 * $2; }
     | exp ‘/’ exp { $$ = $1 / $2; }
     ;

Clearly much easier than hand-coding the proper automaton for parsing and acting on this arithmetic language. But, note the “something like” above, i.e., there is still a lot of details to work out, such as specifying the token generator (often called a lexical analyzer) in a separate file and tool and declare certain global variables. And, you need to be totally comfortable with C.

Ok, there are similar parser generators for other languages, such as JavaCC for Java, and why not the eminent ANTLR tool, which transcends language barriers?

The intention of this post is to open your eyes to a quite different bread of parser generator, which is actually a parser generator generator – no, I am not studdering. That tool is BNFC, which takes a language-agnostic BNF description and generates the proper skeleton code for various parser generators, such as JavaCC and Yacc. Thus, the same input file can produce parsers implemented in various languages, using existing parser generator frameworks.

Input for the arithmetic toy language could look like:

BNFC:
Prog. PROG ::= [STAT] ;

separator STAT ";" ;
Print. STAT ::= "print" EXP ;
Add.   EXP  ::= EXP "+" EXP2 ;
Sub.   EXP  ::= EXP "-" EXP2 ;
Mult.  EXP2 ::= EXP2 "*" EXP3 ;
Div.   EXP2 ::= EXP2 "/" EXP3 ;
Num.   EXP3 ::= Integer ;
coercions EXP 3 ;

BNFC itself is implemented in Haskell and initially aimed at Haskell parser generation, using Happy, but can be used to create Yacc input.

Run the following command for the above input:

bnfc –c arith.cf

and you get almost the Yacc input above plus all that stuff you do not want to deal with.

I am currently using BNFC for a Haskell-based interpreter for its cousin language Erlang. Will give more feedback about BNFC in that project.

Enjoy parsing!

Leave a Comment

You must be logged in to post a comment.