Sophie

Sophie

distrib > Mandriva > 2010.0 > i586 > media > contrib-release > by-pkgid > 8de1f55ea6a1a64d0f3f3ea116288458 > files > 117

happy-1.17-3mdv2009.0.i586.rpm

<html><head><meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"><title>Chapter 2. Using Happy</title><link rel="stylesheet" href="fptools.css" type="text/css"><meta name="generator" content="DocBook XSL Stylesheets V1.73.2"><link rel="start" href="index.html" title="Happy User Guide"><link rel="up" href="index.html" title="Happy User Guide"><link rel="prev" href="sec-obtaining.html" title="1.4. Obtaining Happy"><link rel="next" href="sec-sequences.html" title="2.2. Parsing sequences"></head><body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF"><div class="navheader"><table width="100%" summary="Navigation header"><tr><th colspan="3" align="center">Chapter 2. Using <span class="application">Happy</span></th></tr><tr><td width="20%" align="left"><a accesskey="p" href="sec-obtaining.html">Prev</a> </td><th width="60%" align="center"> </th><td width="20%" align="right"> <a accesskey="n" href="sec-sequences.html">Next</a></td></tr></table><hr></div><div class="chapter" lang="en"><div class="titlepage"><div><div><h2 class="title"><a name="sec-using"></a>Chapter 2. Using <span class="application">Happy</span></h2></div></div></div><div class="toc"><p><b>Table of Contents</b></p><dl><dt><span class="sect1"><a href="sec-using.html#sec-other-datatypes">2.1. Returning other datatypes</a></span></dt><dt><span class="sect1"><a href="sec-sequences.html">2.2. Parsing sequences</a></span></dt><dd><dl><dt><span class="sect2"><a href="sec-sequences.html#sec-separators">2.2.1. Sequences with separators</a></span></dt></dl></dd><dt><span class="sect1"><a href="sec-Precedences.html">2.3. Using Precedences</a></span></dt><dd><dl><dt><span class="sect2"><a href="sec-Precedences.html#how-precedence-works">2.3.1. How precedence works</a></span></dt><dt><span class="sect2"><a href="sec-Precedences.html#context-precedence">2.3.2. Context-dependent Precedence</a></span></dt></dl></dd><dt><span class="sect1"><a href="sec-type-signatures.html">2.4. Type Signatures</a></span></dt><dt><span class="sect1"><a href="sec-monads.html">2.5. Monadic Parsers</a></span></dt><dd><dl><dt><span class="sect2"><a href="sec-monads.html#sec-exception">2.5.1. Handling Parse Errors</a></span></dt><dt><span class="sect2"><a href="sec-monads.html#sec-lexers">2.5.2. Threaded Lexers</a></span></dt><dd><dl><dt><span class="sect3"><a href="sec-monads.html#id2546160">2.5.2.1. Monadic productions with %lexer</a></span></dt></dl></dd><dt><span class="sect2"><a href="sec-monads.html#sec-line-numbers">2.5.3. Line Numbers</a></span></dt><dt><span class="sect2"><a href="sec-monads.html#sec-monad-summary">2.5.4. Summary</a></span></dt></dl></dd><dt><span class="sect1"><a href="sec-error.html">2.6. The Error Token</a></span></dt><dt><span class="sect1"><a href="sec-multiple-parsers.html">2.7. Generating Multiple Parsers From a Single Grammar</a></span></dt></dl></div><p> Users of <span class="application">Yacc</span> will find
  <span class="application">Happy</span> quite familiar.  The basic idea is
  as follows: </p><div class="itemizedlist"><ul type="disc"><li><p>Define the grammar you want to parse in a
      <span class="application">Happy</span> grammar file. </p></li><li><p> Run the grammar through <span class="application">Happy</span>, to generate
      a compilable Haskell module.</p></li><li><p> Use this module as part of your Haskell program, usually
      in conjunction with a lexical analyser (a function that splits
      the input into ``tokens'', the basic unit of parsing).</p></li></ul></div><p> Let's run through an example.  We'll implement a parser for a
  simple expression syntax, consisting of integers, variables, the
  operators <code class="literal">+</code>, <code class="literal">-</code>, <code class="literal">*</code>,
  <code class="literal">/</code>, and the form <code class="literal">let var = exp in exp</code>.
  The grammar file starts off like this:</p><pre class="programlisting">
{
module Main where
}
</pre><p>At the top of the file is an optional <em class="firstterm">module
    header</em>,
      <a class="indexterm" name="id2495340"></a>
    which is just a Haskell module header enclosed in braces.  This
    code is emitted verbatim into the generated module, so you can put
    any Haskell code here at all.  In a grammar file, Haskell code is
    always contained between curly braces to distinguish it from the
    grammar.</p><p>In this case, the parser will be a standalone program so
    we'll call the module <code class="literal">Main</code>.</p><p>Next comes a couple of declarations:</p><pre class="programlisting">
%name calc
%tokentype { Token }
%error { parseError }
</pre><a class="indexterm" name="id2495380"></a><a class="indexterm" name="id2495392"></a><a class="indexterm" name="id2495404"></a><p>The first line declares the name of the parsing function
    that <span class="application">Happy</span> will generate, in this case
    <code class="literal">calc</code>.  In many cases, this is the only symbol you need
    to export from the module.</p><p>The second line declares the type of tokens that the parser
    will accept.  The parser (i.e. the function
    <code class="function">calc</code>) will be of type <code class="literal">[Token] -&gt;
    T</code>, where <code class="literal">T</code> is the return type of the
    parser, determined by the production rules below.</p><p>The <code class="literal">%error</code> directive tells Happy the name
    of a function it should call in the event of a parse error.  More
    about this later.</p><p>Now we declare all the possible tokens:</p><pre class="programlisting">
%token 
      let             { TokenLet }
      in              { TokenIn }
      int             { TokenInt $$ }
      var             { TokenVar $$ }
      '='             { TokenEq }
      '+'             { TokenPlus }
      '-'             { TokenMinus }
      '*'             { TokenTimes }
      '/'             { TokenDiv }
      '('             { TokenOB }
      ')'             { TokenCB }
</pre><a class="indexterm" name="id2495486"></a><p>The symbols on the left are the tokens as they will be
    referred to in the rest of the grammar, and to the right of each
    token enclosed in braces is a Haskell pattern that matches the
    token.  The parser will expect to receive a stream of tokens, each
    of which will match one of the given patterns (the definition of
    the <code class="literal">Token</code> datatype is given later).</p><p>The <code class="literal">$$</code> symbol is a placeholder that
    represents the <span class="emphasis"><em>value</em></span> of this token.  Normally the value
    of a token is the token itself, but by using the
    <code class="literal">$$</code> symbol you can specify some component
    of the token object to be the value. </p><a class="indexterm" name="id2495536"></a><p>Like yacc, we include <code class="literal">%%</code> here, for no real
    reason.</p><pre class="programlisting">
%%
</pre><p>Now we have the production rules for the grammar.</p><pre class="programlisting">
Exp   : let var '=' Exp in Exp  { Let $2 $4 $6 }
      | Exp1                    { Exp1 $1 }

Exp1  : Exp1 '+' Term           { Plus $1 $3 }
      | Exp1 '-' Term           { Minus $1 $3 }
      | Term                    { Term $1 }

Term  : Term '*' Factor         { Times $1 $3 }
      | Term '/' Factor         { Div $1 $3 }
      | Factor                  { Factor $1 }

Factor			  
      : int                     { Int $1 }
      | var                     { Var $1 }
      | '(' Exp ')'             { Brack $2 }
</pre><a class="indexterm" name="id2495564"></a><p>Each production consists of a <em class="firstterm">non-terminal</em>
    symbol on the left, followed by a colon, followed by one or more
    expansions on the right, separated by <code class="literal">|</code>.  Each expansion
    has some Haskell code associated with it, enclosed in braces as
    usual.</p><p>The way to think about a parser is with each symbol having a
    `value': we defined the values of the tokens above, and the
    grammar defines the values of non-terminal symbols in terms of
    sequences of other symbols (either tokens or non-terminals).  In a
    production like this:</p><pre class="programlisting">
n   : t_1 ... t_n   { E }
</pre><p>whenever the parser finds the symbols <code class="literal">t_1..t_n</code> in
    the token stream, it constructs the symbol <code class="literal">n</code> and gives
    it the value <code class="literal">E</code>, which may refer to the values of
    <code class="literal">t_1...t_n</code> using the symbols
    <code class="literal">$1...$n</code>.</p><p>The parser reduces the input using the rules in the grammar
    until just one symbol remains: the first symbol defined in the
    grammar (namely <code class="literal">Exp</code> in our example).  The value of this
    symbol is the return value from the parser.</p><p>To complete the program, we need some extra code.  The
    grammar file may optionally contain a final code section, enclosed
    in curly braces.</p><pre class="programlisting">{</pre><p>All parsers must include a function to be called in the
    event of a parse error.  In the <code class="literal">%error</code>
    directive earlier, we specified that the function to be called on
    a parse error is <code class="literal">parseError</code>:</p><pre class="programlisting">
parseError :: [Token] -&gt; a
parseError _ = error "Parse error"
</pre><p>Note that <code class="literal">parseError</code> must be polymorphic
    in its return type <code class="literal">a</code>, which usually means it
    must be a call to <code class="literal">error</code>.  We'll see in <a class="xref" href="sec-monads.html" title="2.5. Monadic Parsers">Section 2.5, &#8220;Monadic Parsers&#8221;</a> how to wrap the parser in a monad so that we
    can do something more sensible with errors.  It's also possible to
    keep track of line numbers in the parser for use in error
    messages, this is described in <a class="xref" href="sec-monads.html#sec-line-numbers" title="2.5.3. Line Numbers">Section 2.5.3, &#8220;Line Numbers&#8221;</a>.</p><p>Next we can declare the data type that represents the parsed
    expression:</p><pre class="programlisting">
data Exp  
      = Let String Exp Exp
      | Exp1 Exp1
      deriving Show

data Exp1 
      = Plus Exp1 Term 
      | Minus Exp1 Term 
      | Term Term
      deriving Show

data Term 
      = Times Term Factor 
      | Div Term Factor 
      | Factor Factor
      deriving Show

data Factor 
      = Int Int 
      | Var String 
      | Brack Exp
      deriving Show
</pre><p>And the data structure for the tokens...</p><pre class="programlisting">
data Token
      = TokenLet
      | TokenIn
      | TokenInt Int
      | TokenVar String
      | TokenEq
      | TokenPlus
      | TokenMinus
      | TokenTimes
      | TokenDiv
      | TokenOB
      | TokenCB
 deriving Show
</pre><p>... and a simple lexer that returns this data
    structure.</p><pre class="programlisting">
lexer :: String -&gt; [Token]
lexer [] = []
lexer (c:cs) 
      | isSpace c = lexer cs
      | isAlpha c = lexVar (c:cs)
      | isDigit c = lexNum (c:cs)
lexer ('=':cs) = TokenEq : lexer cs
lexer ('+':cs) = TokenPlus : lexer cs
lexer ('-':cs) = TokenMinus : lexer cs
lexer ('*':cs) = TokenTimes : lexer cs
lexer ('/':cs) = TokenDiv : lexer cs
lexer ('(':cs) = TokenOB : lexer cs
lexer (')':cs) = TokenCB : lexer cs

lexNum cs = TokenInt (read num) : lexer rest
      where (num,rest) = span isDigit cs

lexVar cs =
   case span isAlpha cs of
      ("let",rest) -&gt; TokenLet : lexer rest
      ("in",rest)  -&gt; TokenIn : lexer rest
      (var,rest)   -&gt; TokenVar var : lexer rest
</pre><p>And finally a top-level function to take some input, parse
    it, and print out the result.</p><pre class="programlisting">
main = getContents &gt;&gt;= print . calc . lexer
}
</pre><p>And that's it! A whole lexer, parser and grammar in a few
    dozen lines.  Another good example is <span class="application">Happy</span>'s own
    parser. Several features in <span class="application">Happy</span> were developed
    using this as an example.</p><a class="indexterm" name="id2495825"></a><p>To generate the Haskell module for this parser, type the
    command <span class="command"><strong>happy example.y</strong></span> (where
    <code class="filename">example.y</code> is the name of the grammar file).
    The Haskell module will be placed in a file named
    <code class="filename">example.hs</code>.  Additionally, invoking the
    command <span class="command"><strong>happy example.y -i</strong></span> will produce the
    file <code class="filename">example.info</code> which contains detailed information
    about the parser, including states and reduction rules (see <a class="xref" href="sec-info-files.html" title="Chapter 7. Info Files">Chapter 7, <i>Info Files</i></a>).  This can be invaluable for debugging
    parsers, but requires some knowledge of the operation of a
    shift-reduce parser. </p><div class="sect1" lang="en"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="sec-other-datatypes"></a>2.1. Returning other datatypes</h2></div></div></div><p>In the above example, we used a data type to represent the
      syntax being parsed.  However, there's no reason why it has to
      be this way: you could calculate the value of the expression on
      the fly, using productions like this:</p><pre class="programlisting">
Term  : Term '*' Factor         { $1 * $3 }
      | Term '/' Factor         { $1 / $3 }
      | Factor                  { $1 }
</pre><p>The value of a <code class="literal">Term</code> would be the value of the
      expression itself, and the parser could return an integer.  </p><p>This works for simple expression types, but our grammar
      includes variables and the <code class="literal">let</code> syntax.  How do we know
      the value of a variable while we're parsing it?  We don't, but
      since the Haskell code for a production can be anything at all,
      we could make it a function that takes an environment of
      variable values, and returns the computed value of the
      expression:</p><pre class="programlisting">
Exp   : let var '=' Exp in Exp  { \p -&gt; $6 (($2,$4 p):p) }
      | Exp1                    { $1 }

Exp1  : Exp1 '+' Term           { \p -&gt; $1 p + $3 p }
      | Exp1 '-' Term           { \p -&gt; $1 p - $3 p }
      | Term                    { $1 }

Term  : Term '*' Factor         { \p -&gt; $1 p * $3 p }
      | Term '/' Factor         { \p -&gt; $1 p `div` $3 p }
      | Factor                  { $1 }

Factor			  
      : int                     { \p -&gt; $1 }
      | var                     { \p -&gt; case lookup $1 p of
	                                    Nothing -&gt; error "no var"
					    Just i  -&gt; i }
      | '(' Exp ')'             { $2 }
</pre><p>The value of each production is a function from an
      environment <span class="emphasis"><em>p</em></span> to a value.  When parsing a
      <code class="literal">let</code> construct, we extend the environment with the new
      binding to find the value of the body, and the rule for
      <code class="literal">var</code> looks up its value in the environment.  There's
      something you can't do in <code class="literal">yacc</code> :-)</p></div></div><div class="navfooter"><hr><table width="100%" summary="Navigation footer"><tr><td width="40%" align="left"><a accesskey="p" href="sec-obtaining.html">Prev</a> </td><td width="20%" align="center"> </td><td width="40%" align="right"> <a accesskey="n" href="sec-sequences.html">Next</a></td></tr><tr><td width="40%" align="left" valign="top">1.4. Obtaining <span class="application">Happy</span> </td><td width="20%" align="center"><a accesskey="h" href="index.html">Home</a></td><td width="40%" align="right" valign="top"> 2.2. Parsing sequences</td></tr></table></div></body></html>