When diving down among the various parser generators and compiler tools around, there is another one – beside BNFC – that sticks out: Spirit. A template-oriented library that is part of the terrific Boost libraries (http://www.boost.org.)
Whereas most tools rely on code generation, and even code generation generation (BNFC…), Spirit is a metaprogramming library that describes runtime code by the means available to a C++ compiler; i.e., the “code generation” is intrinsic and never leaves the target language. If this sounded complicated, just think of it as “doing cool stuff within the language itself”
One drawback of doing it in the language itself is that the language has a specific evaluation order and so does the grammar. This implication is at least true for systems built without an explicit continuation style. What this means for the language scientist is that the parser is recursive descent parser, so the grammar cannot be left recursive, since that would lead to an infinite recursion, which most machines fear more than anything. This in comparison with most code generating parser frameworks, which yield freer parsers, so called LALR parsers.
Let us leave computer science and just see what that this means for the Pragmatic Language Implementer.
The simple arithmetic language in the former post would look something like
-
exp_p = term_p >>
-
*(‘+’>> term_p | ‘-‘>> term_p);
-
term_p = factor_p >>
-
*(‘*’>> factor_p | ‘/’>> factor_p);
-
factor_p = int_p | ‘(‘>> exp_p>> ‘)’;
Yes, this is C++. What Spirit has done is to make extensive (ab-?)use of the operator overloading capabilities of C++, so the “repeat” or “closure” operator (‘*’) is placed before the expression to repeat instead of after as in EBNF (extended BNF) and regular expressions.
In order to actually parse something, one does
-
parse(“5*(7*8)”, exp_p);
which returns some matching information, such as whether the parsing succeeded.
What is a parser without action? Not much. So Spirit allows actions to be added to the rules in the form of the C++ index operator, such as in
-
exp_p = term_p[store_val]>> *(‘+’>> term_p[add_val] | …
where “store_val” and “add_val” are user-defined functions taking a sub string as parameter.
If you do not want to do everything in place, you can let Spirit create an Abstract Syntax Tree (AST) for you, which you can later manipulate. This is a more realistic way to deal with real languages than calculating on the fly.
This post ends with a complete sample using ASTs for the aforementioned arithmetic language. You need to install Boost and add it to the include path for the compiler.
Happy inline metaprogramming!
/* File: parse_arith.cpp Author: David Bergman This file describes a simple arithmetic language using Boost.Spirit. It does so by creating an AST and then implementing an evaluating visitor for such nodes. Not really a visitor, since it handles the traversal itself, but you get the point... */ #include <string> #include <iostream> #include <map> #include <boost/lexical_cast.hpp> #include <boost/function.hpp> #include <boost/spirit/core.hpp> #include <boost/spirit/tree/ast.hpp> using namespace std; using namespace boost::spirit; using namespace boost; typedef tree_match<const char*> TreeMatch; typedef TreeMatch::tree_iterator TreeIter; // We use Abstract Syntax Trees struct expression : public grammar<expression> { // Need explicit identifiers to switch properly // when evaluating the AST static const int factorID = 1; static const int termID = 2; static const int expID = 3; // Meta function from scanner type to a proper // rule type template<typename ScannerT> struct definition { rule<ScannerT, parser_context<>, parser_tag<expID> > exp_p; rule<ScannerT, parser_context<>, parser_tag<factorID> > factor_p; rule<ScannerT, parser_context<>, parser_tag<termID> > term_p; definition(const expression& self) { factor_p = leaf_node_d[int_p] | inner_node_d['(' >> exp_p >> ')']; term_p = factor_p >> *(root_node_d[ch_p('*')] >> factor_p | root_node_d[ch_p('/')] >> factor_p); exp_p = term_p >> *(root_node_d[ch_p('+')] >> term_p | root_node_d[ch_p('-')] >> term_p); BOOST_SPIRIT_DEBUG_RULE(factor_p); BOOST_SPIRIT_DEBUG_RULE(term_p); BOOST_SPIRIT_DEBUG_RULE(exp_p); } const rule<ScannerT, parser_context<>, parser_tag<expID> >& start() const { return exp_p; } }; }; // Map operators to operations static map<char, function<int (int, int)> > op; // The evaluation function for the ASTs static int eval_expression(const TreeIter& i); int evaluate(tree_parse_info<> match) { return eval_expression(match.trees.begin()); } int eval_expression(const TreeIter& i) { return (i->value.id() == expression::factorID) ? // Simple numeric literal lexical_cast<int>(string(i->value.begin(), i->value.end())) : // A dyadic operation op[*i->value.begin()] (eval_expression(i->children.begin()), eval_expression(i->children.begin() + 1)); } int main(int argc, char* argv[]) { // Init the operations op['*'] = multiplies<int>(); op['/'] = divides<int>(); op['+'] = plus<int>(); op['-'] = minus<int>(); const string input(argv[1]); expression my_exp; tree_parse_info<const char*> tree = ast_parse(input.c_str(), my_exp); if (tree.full) { cout << "result is " << evaluate(tree) << endl; } }
Download this code: parse_arith.cpp
bnf boost C++ compiler metaprogramming parsing spirit templates Tools Reviews