home mail me! RSS (2.0) feed

Spirit: parsing without leaving C++

When diving down among the various parser generators and compiler tools around, there is another one – beside BNFC – that sticks out: Spirit. A template-oriented library that is part of the terrific Boost libraries (http://www.boost.org.)

Whereas most tools rely on code generation, and even code generation generation (BNFC…), Spirit is a metaprogramming library that describes runtime code by the means available to a C++ compiler; i.e., the “code generation” is intrinsic and never leaves the target language. If this sounded complicated, just think of it as “doing cool stuff within the language itself” :-)

One drawback of doing it in the language itself is that the language has a specific evaluation order and so does the grammar. This implication is at least true for systems built without an explicit continuation style. What this means for the language scientist is that the parser is recursive descent parser, so the grammar cannot be left recursive, since that would lead to an infinite recursion, which most machines fear more than anything. This in comparison with most code generating parser frameworks, which yield freer parsers, so called LALR parsers.

Let us leave computer science and just see what that this means for the Pragmatic Language Implementer.

The simple arithmetic language in the former post would look something like

C++:
  1. exp_p = term_p >>
  2.   *(‘+’>> term_p | ‘-‘>> term_p);
  3. term_p = factor_p >>
  4.   *(‘*’>> factor_p | ‘/’>> factor_p);
  5. factor_p = int_p | ‘(‘>> exp_p>> ‘)’;

Yes, this is C++. What Spirit has done is to make extensive (ab-?)use of the operator overloading capabilities of C++, so the “repeat” or “closure” operator (‘*’) is placed before the expression to repeat instead of after as in EBNF (extended BNF) and regular expressions.

In order to actually parse something, one does

C++:
  1. parse(5*(7*8)”, exp_p);

which returns some matching information, such as whether the parsing succeeded.

What is a parser without action? Not much. So Spirit allows actions to be added to the rules in the form of the C++ index operator, such as in

C++:
  1. exp_p = term_p[store_val]>> *(‘+’>> term_p[add_val] | …

where “store_val” and “add_val” are user-defined functions taking a sub string as parameter.

If you do not want to do everything in place, you can let Spirit create an Abstract Syntax Tree (AST) for you, which you can later manipulate. This is a more realistic way to deal with real languages than calculating on the fly.

This post ends with a complete sample using ASTs for the aforementioned arithmetic language. You need to install Boost and add it to the include path for the compiler.

Happy inline metaprogramming!

/*
 
File: parse_arith.cpp
Author: David Bergman
 
This file describes a simple arithmetic language
using Boost.Spirit.
 
It does so by creating an AST and then implementing
an evaluating visitor for such nodes. Not really a
visitor, since it handles the traversal itself, but
you get the point...
 
*/
 
#include <string>
#include <iostream>
#include <map>
#include <boost/lexical_cast.hpp>
#include <boost/function.hpp>
#include <boost/spirit/core.hpp>
#include <boost/spirit/tree/ast.hpp>
 
using namespace std;
using namespace boost::spirit;
using namespace boost;
 
typedef tree_match<const char*> TreeMatch;
typedef TreeMatch::tree_iterator TreeIter;
 
// We use Abstract Syntax Trees
 
struct expression : public grammar<expression>
{
  // Need explicit identifiers to switch properly
  // when evaluating the AST
  static const int factorID = 1;
  static const int termID = 2;
  static const int expID = 3;
 
  // Meta function from scanner type to a proper
  // rule type
  template<typename ScannerT>
  struct definition
  {
    rule<ScannerT, parser_context<>,
	 parser_tag<expID> > exp_p;
    rule<ScannerT, parser_context<>,
	 parser_tag<factorID> > factor_p;
    rule<ScannerT, parser_context<>,
	 parser_tag<termID> > term_p;
    definition(const expression& self) {
      factor_p =
	leaf_node_d[int_p] |
	inner_node_d['(' >> exp_p >> ')'];
      term_p =
	factor_p >> *(root_node_d[ch_p('*')] >>
		      factor_p
		      |
		      root_node_d[ch_p('/')] >>
		      factor_p);
      exp_p = term_p >>
	*(root_node_d[ch_p('+')] >> term_p |
	  root_node_d[ch_p('-')] >> term_p);
      BOOST_SPIRIT_DEBUG_RULE(factor_p);
      BOOST_SPIRIT_DEBUG_RULE(term_p);
      BOOST_SPIRIT_DEBUG_RULE(exp_p);
    }

    const rule<ScannerT, parser_context<>,
	       parser_tag<expID> >&
    start()
      const {
      return exp_p;
    }
  };
};
 
// Map operators to operations
 
static map<char, function<int (int, int)> > op;
 
// The evaluation function for the ASTs
 
static int eval_expression(const TreeIter& i);
 
int evaluate(tree_parse_info<> match)
{
  return eval_expression(match.trees.begin());
}
 
int eval_expression(const TreeIter& i)
{
  return
    (i->value.id() == expression::factorID) ?
    // Simple numeric literal
    lexical_cast<int>(string(i->value.begin(),
			     i->value.end())) :
    // A dyadic operation
    op[*i->value.begin()]
    (eval_expression(i->children.begin()),
     eval_expression(i->children.begin() + 1));
}
 
int main(int argc, char* argv[])
{
  // Init the operations
 
  op['*'] = multiplies<int>();
  op['/'] = divides<int>();
  op['+'] = plus<int>();
  op['-'] = minus<int>();
 
  const string input(argv[1]);
  expression my_exp;
  tree_parse_info<const char*> tree =
    ast_parse(input.c_str(), my_exp);
  if (tree.full) {
    cout << "result is " << evaluate(tree)
	 << endl;
  }
}

Download this code: parse_arith.cpp

jdy said,

February 26, 2008 @ 12:59 pm

David,

I adjusted the grammar to support exponentiation and negation which are both right-to-left. I also added expect clauses to get better information on why a parse fails. Here is the new grammar:

factor_p = // numbers or parentheticals:
leaf_node_d[ureal_p]
| (inner_node_d[’(’ >> expectexpression(start_p) >> expectclose(ch_p(’)'))]);
power_p = // exponentials (right-to-left)
(factor_p >> root_node_d[ch_p(’^')]) >> expectexpression(unary_p) | factor_p;
unary_p = // unary operators (right-to-left)
(root_node_d[ch_p(’-')]) >> expectexpression(unary_p) | power_p;
term_p = // multiplicatives (left-to-right)
unary_p >> ( *(root_node_d[ch_p(’x')|’/'] >> expectexpression(unary_p)) ) ;
exp_p = // additives (left-to-right)
term_p >> *(root_node_d[ch_p(’+')|’-'] >> expectexpression(term_p));
start_p = expectexpression(exp_p);

Would you be willing to release this code into a boost, GPL, or MIT style license? I’d like to use it in a class and put it on a course website.

ps. I posted a similar comment, but it didn’t post so resubmitting.

Thanks,

Joel

davber said,

February 26, 2008 @ 1:45 pm

Joel,

I should change the snippets to include a reference to the license type used. Till then, I grant an MIT license to the above snippet (is this enough?) For completeness (or at least self-containment…), I include the full snippet herein, with the license reference added.

Thanks,

David

———————————————

/*

File: parse_arith.cpp
Author: David Bergman

This file describes a simple arithmetic language
using Boost.Spirit.

It does so by creating an AST and then implementing
an evaluating visitor for such nodes. Not really a
visitor, since it handles the traversal itself, but
you get the point…

License Information (MIT License):

Copyright (c) 2006-2008 David Bergman

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the “Software”), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

*/

#include
#include
#include
#include
#include

#include

#include

using namespace std;
using namespace boost::spirit;
using namespace boost;

typedef tree_match TreeMatch;
typedef TreeMatch::tree_iterator TreeIter;

// We use Abstract Syntax Trees

struct expression : public grammar
{
// Need explicit identifiers to switch properly
// when evaluating the AST
static const int factorID = 1;
static const int termID = 2;
static const int expID = 3;

// Meta function from scanner type to a proper
// rule type
template
struct definition
{
rule,
parser_tag > exp_p;
rule,
parser_tag > factor_p;
rule,
parser_tag > term_p;
definition(const expression& self) {
factor_p =
leaf_node_d[int_p] |
inner_node_d[’(’ >> exp_p >> ‘)’];
term_p =
factor_p >> *(root_node_d[ch_p(’*')] >>
factor_p
|
root_node_d[ch_p(’/')] >>
factor_p);
exp_p = term_p >>
*(root_node_d[ch_p(’+')] >> term_p |
root_node_d[ch_p(’-')] >> term_p);
BOOST_SPIRIT_DEBUG_RULE(factor_p);
BOOST_SPIRIT_DEBUG_RULE(term_p);
BOOST_SPIRIT_DEBUG_RULE(exp_p);
}

const rule,
parser_tag >&
start()
const {
return exp_p;
}
};
};

// Map operators to operations

static map > op;

// The evaluation function for the ASTs

static int eval_expression(const TreeIter& i);

int evaluate(tree_parse_info<> match)
{
return eval_expression(match.trees.begin());
}

int eval_expression(const TreeIter& i)
{
return
(i->value.id() == expression::factorID) ?
// Simple numeric literal
lexical_cast(string(i->value.begin(),
i->value.end())) :
// A dyadic operation
op[*i->value.begin()]
(eval_expression(i->children.begin()),
eval_expression(i->children.begin() + 1));
}

int main(int argc, char* argv[])
{
// Init the operations

op[’*'] = multiplies();
op[’/'] = divides
();
op[’+'] = plus
();
op[’-'] = minus
();

const string input(argv[1]);
expression my_exp;
tree_parse_info tree =
ast_parse(input.c_str(), my_exp);
if (tree.full) {
cout << “result is ” << evaluate(tree)
<< endl;
}
}

RSS feed for comments on this post · TrackBack URI

Leave a Comment

You must be logged in to post a comment.