<html><head><meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"><title>Chapter 2. Using Happy</title><link rel="stylesheet" href="fptools.css" type="text/css"><meta name="generator" content="DocBook XSL Stylesheets V1.73.2"><link rel="start" href="index.html" title="Happy User Guide"><link rel="up" href="index.html" title="Happy User Guide"><link rel="prev" href="sec-obtaining.html" title="1.4. Obtaining Happy"><link rel="next" href="sec-sequences.html" title="2.2. Parsing sequences"></head><body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF"><div class="navheader"><table width="100%" summary="Navigation header"><tr><th colspan="3" align="center">Chapter 2. Using <span class="application">Happy</span></th></tr><tr><td width="20%" align="left"><a accesskey="p" href="sec-obtaining.html">Prev</a> </td><th width="60%" align="center"> </th><td width="20%" align="right"> <a accesskey="n" href="sec-sequences.html">Next</a></td></tr></table><hr></div><div class="chapter" lang="en"><div class="titlepage"><div><div><h2 class="title"><a name="sec-using"></a>Chapter 2. Using <span class="application">Happy</span></h2></div></div></div><div class="toc"><p><b>Table of Contents</b></p><dl><dt><span class="sect1"><a href="sec-using.html#sec-other-datatypes">2.1. Returning other datatypes</a></span></dt><dt><span class="sect1"><a href="sec-sequences.html">2.2. Parsing sequences</a></span></dt><dd><dl><dt><span class="sect2"><a href="sec-sequences.html#sec-separators">2.2.1. Sequences with separators</a></span></dt></dl></dd><dt><span class="sect1"><a href="sec-Precedences.html">2.3. Using Precedences</a></span></dt><dd><dl><dt><span class="sect2"><a href="sec-Precedences.html#how-precedence-works">2.3.1. How precedence works</a></span></dt><dt><span class="sect2"><a href="sec-Precedences.html#context-precedence">2.3.2. Context-dependent Precedence</a></span></dt></dl></dd><dt><span class="sect1"><a href="sec-type-signatures.html">2.4. Type Signatures</a></span></dt><dt><span class="sect1"><a href="sec-monads.html">2.5. Monadic Parsers</a></span></dt><dd><dl><dt><span class="sect2"><a href="sec-monads.html#sec-exception">2.5.1. Handling Parse Errors</a></span></dt><dt><span class="sect2"><a href="sec-monads.html#sec-lexers">2.5.2. Threaded Lexers</a></span></dt><dd><dl><dt><span class="sect3"><a href="sec-monads.html#id2546160">2.5.2.1. Monadic productions with %lexer</a></span></dt></dl></dd><dt><span class="sect2"><a href="sec-monads.html#sec-line-numbers">2.5.3. Line Numbers</a></span></dt><dt><span class="sect2"><a href="sec-monads.html#sec-monad-summary">2.5.4. Summary</a></span></dt></dl></dd><dt><span class="sect1"><a href="sec-error.html">2.6. The Error Token</a></span></dt><dt><span class="sect1"><a href="sec-multiple-parsers.html">2.7. Generating Multiple Parsers From a Single Grammar</a></span></dt></dl></div><p> Users of <span class="application">Yacc</span> will find <span class="application">Happy</span> quite familiar. The basic idea is as follows: </p><div class="itemizedlist"><ul type="disc"><li><p>Define the grammar you want to parse in a <span class="application">Happy</span> grammar file. </p></li><li><p> Run the grammar through <span class="application">Happy</span>, to generate a compilable Haskell module.</p></li><li><p> Use this module as part of your Haskell program, usually in conjunction with a lexical analyser (a function that splits the input into ``tokens'', the basic unit of parsing).</p></li></ul></div><p> Let's run through an example. We'll implement a parser for a simple expression syntax, consisting of integers, variables, the operators <code class="literal">+</code>, <code class="literal">-</code>, <code class="literal">*</code>, <code class="literal">/</code>, and the form <code class="literal">let var = exp in exp</code>. The grammar file starts off like this:</p><pre class="programlisting"> { module Main where } </pre><p>At the top of the file is an optional <em class="firstterm">module header</em>, <a class="indexterm" name="id2495340"></a> which is just a Haskell module header enclosed in braces. This code is emitted verbatim into the generated module, so you can put any Haskell code here at all. In a grammar file, Haskell code is always contained between curly braces to distinguish it from the grammar.</p><p>In this case, the parser will be a standalone program so we'll call the module <code class="literal">Main</code>.</p><p>Next comes a couple of declarations:</p><pre class="programlisting"> %name calc %tokentype { Token } %error { parseError } </pre><a class="indexterm" name="id2495380"></a><a class="indexterm" name="id2495392"></a><a class="indexterm" name="id2495404"></a><p>The first line declares the name of the parsing function that <span class="application">Happy</span> will generate, in this case <code class="literal">calc</code>. In many cases, this is the only symbol you need to export from the module.</p><p>The second line declares the type of tokens that the parser will accept. The parser (i.e. the function <code class="function">calc</code>) will be of type <code class="literal">[Token] -> T</code>, where <code class="literal">T</code> is the return type of the parser, determined by the production rules below.</p><p>The <code class="literal">%error</code> directive tells Happy the name of a function it should call in the event of a parse error. More about this later.</p><p>Now we declare all the possible tokens:</p><pre class="programlisting"> %token let { TokenLet } in { TokenIn } int { TokenInt $$ } var { TokenVar $$ } '=' { TokenEq } '+' { TokenPlus } '-' { TokenMinus } '*' { TokenTimes } '/' { TokenDiv } '(' { TokenOB } ')' { TokenCB } </pre><a class="indexterm" name="id2495486"></a><p>The symbols on the left are the tokens as they will be referred to in the rest of the grammar, and to the right of each token enclosed in braces is a Haskell pattern that matches the token. The parser will expect to receive a stream of tokens, each of which will match one of the given patterns (the definition of the <code class="literal">Token</code> datatype is given later).</p><p>The <code class="literal">$$</code> symbol is a placeholder that represents the <span class="emphasis"><em>value</em></span> of this token. Normally the value of a token is the token itself, but by using the <code class="literal">$$</code> symbol you can specify some component of the token object to be the value. </p><a class="indexterm" name="id2495536"></a><p>Like yacc, we include <code class="literal">%%</code> here, for no real reason.</p><pre class="programlisting"> %% </pre><p>Now we have the production rules for the grammar.</p><pre class="programlisting"> Exp : let var '=' Exp in Exp { Let $2 $4 $6 } | Exp1 { Exp1 $1 } Exp1 : Exp1 '+' Term { Plus $1 $3 } | Exp1 '-' Term { Minus $1 $3 } | Term { Term $1 } Term : Term '*' Factor { Times $1 $3 } | Term '/' Factor { Div $1 $3 } | Factor { Factor $1 } Factor : int { Int $1 } | var { Var $1 } | '(' Exp ')' { Brack $2 } </pre><a class="indexterm" name="id2495564"></a><p>Each production consists of a <em class="firstterm">non-terminal</em> symbol on the left, followed by a colon, followed by one or more expansions on the right, separated by <code class="literal">|</code>. Each expansion has some Haskell code associated with it, enclosed in braces as usual.</p><p>The way to think about a parser is with each symbol having a `value': we defined the values of the tokens above, and the grammar defines the values of non-terminal symbols in terms of sequences of other symbols (either tokens or non-terminals). In a production like this:</p><pre class="programlisting"> n : t_1 ... t_n { E } </pre><p>whenever the parser finds the symbols <code class="literal">t_1..t_n</code> in the token stream, it constructs the symbol <code class="literal">n</code> and gives it the value <code class="literal">E</code>, which may refer to the values of <code class="literal">t_1...t_n</code> using the symbols <code class="literal">$1...$n</code>.</p><p>The parser reduces the input using the rules in the grammar until just one symbol remains: the first symbol defined in the grammar (namely <code class="literal">Exp</code> in our example). The value of this symbol is the return value from the parser.</p><p>To complete the program, we need some extra code. The grammar file may optionally contain a final code section, enclosed in curly braces.</p><pre class="programlisting">{</pre><p>All parsers must include a function to be called in the event of a parse error. In the <code class="literal">%error</code> directive earlier, we specified that the function to be called on a parse error is <code class="literal">parseError</code>:</p><pre class="programlisting"> parseError :: [Token] -> a parseError _ = error "Parse error" </pre><p>Note that <code class="literal">parseError</code> must be polymorphic in its return type <code class="literal">a</code>, which usually means it must be a call to <code class="literal">error</code>. We'll see in <a class="xref" href="sec-monads.html" title="2.5. Monadic Parsers">Section 2.5, “Monadic Parsers”</a> how to wrap the parser in a monad so that we can do something more sensible with errors. It's also possible to keep track of line numbers in the parser for use in error messages, this is described in <a class="xref" href="sec-monads.html#sec-line-numbers" title="2.5.3. Line Numbers">Section 2.5.3, “Line Numbers”</a>.</p><p>Next we can declare the data type that represents the parsed expression:</p><pre class="programlisting"> data Exp = Let String Exp Exp | Exp1 Exp1 deriving Show data Exp1 = Plus Exp1 Term | Minus Exp1 Term | Term Term deriving Show data Term = Times Term Factor | Div Term Factor | Factor Factor deriving Show data Factor = Int Int | Var String | Brack Exp deriving Show </pre><p>And the data structure for the tokens...</p><pre class="programlisting"> data Token = TokenLet | TokenIn | TokenInt Int | TokenVar String | TokenEq | TokenPlus | TokenMinus | TokenTimes | TokenDiv | TokenOB | TokenCB deriving Show </pre><p>... and a simple lexer that returns this data structure.</p><pre class="programlisting"> lexer :: String -> [Token] lexer [] = [] lexer (c:cs) | isSpace c = lexer cs | isAlpha c = lexVar (c:cs) | isDigit c = lexNum (c:cs) lexer ('=':cs) = TokenEq : lexer cs lexer ('+':cs) = TokenPlus : lexer cs lexer ('-':cs) = TokenMinus : lexer cs lexer ('*':cs) = TokenTimes : lexer cs lexer ('/':cs) = TokenDiv : lexer cs lexer ('(':cs) = TokenOB : lexer cs lexer (')':cs) = TokenCB : lexer cs lexNum cs = TokenInt (read num) : lexer rest where (num,rest) = span isDigit cs lexVar cs = case span isAlpha cs of ("let",rest) -> TokenLet : lexer rest ("in",rest) -> TokenIn : lexer rest (var,rest) -> TokenVar var : lexer rest </pre><p>And finally a top-level function to take some input, parse it, and print out the result.</p><pre class="programlisting"> main = getContents >>= print . calc . lexer } </pre><p>And that's it! A whole lexer, parser and grammar in a few dozen lines. Another good example is <span class="application">Happy</span>'s own parser. Several features in <span class="application">Happy</span> were developed using this as an example.</p><a class="indexterm" name="id2495825"></a><p>To generate the Haskell module for this parser, type the command <span class="command"><strong>happy example.y</strong></span> (where <code class="filename">example.y</code> is the name of the grammar file). The Haskell module will be placed in a file named <code class="filename">example.hs</code>. Additionally, invoking the command <span class="command"><strong>happy example.y -i</strong></span> will produce the file <code class="filename">example.info</code> which contains detailed information about the parser, including states and reduction rules (see <a class="xref" href="sec-info-files.html" title="Chapter 7. Info Files">Chapter 7, <i>Info Files</i></a>). This can be invaluable for debugging parsers, but requires some knowledge of the operation of a shift-reduce parser. </p><div class="sect1" lang="en"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="sec-other-datatypes"></a>2.1. Returning other datatypes</h2></div></div></div><p>In the above example, we used a data type to represent the syntax being parsed. However, there's no reason why it has to be this way: you could calculate the value of the expression on the fly, using productions like this:</p><pre class="programlisting"> Term : Term '*' Factor { $1 * $3 } | Term '/' Factor { $1 / $3 } | Factor { $1 } </pre><p>The value of a <code class="literal">Term</code> would be the value of the expression itself, and the parser could return an integer. </p><p>This works for simple expression types, but our grammar includes variables and the <code class="literal">let</code> syntax. How do we know the value of a variable while we're parsing it? We don't, but since the Haskell code for a production can be anything at all, we could make it a function that takes an environment of variable values, and returns the computed value of the expression:</p><pre class="programlisting"> Exp : let var '=' Exp in Exp { \p -> $6 (($2,$4 p):p) } | Exp1 { $1 } Exp1 : Exp1 '+' Term { \p -> $1 p + $3 p } | Exp1 '-' Term { \p -> $1 p - $3 p } | Term { $1 } Term : Term '*' Factor { \p -> $1 p * $3 p } | Term '/' Factor { \p -> $1 p `div` $3 p } | Factor { $1 } Factor : int { \p -> $1 } | var { \p -> case lookup $1 p of Nothing -> error "no var" Just i -> i } | '(' Exp ')' { $2 } </pre><p>The value of each production is a function from an environment <span class="emphasis"><em>p</em></span> to a value. When parsing a <code class="literal">let</code> construct, we extend the environment with the new binding to find the value of the body, and the rule for <code class="literal">var</code> looks up its value in the environment. There's something you can't do in <code class="literal">yacc</code> :-)</p></div></div><div class="navfooter"><hr><table width="100%" summary="Navigation footer"><tr><td width="40%" align="left"><a accesskey="p" href="sec-obtaining.html">Prev</a> </td><td width="20%" align="center"> </td><td width="40%" align="right"> <a accesskey="n" href="sec-sequences.html">Next</a></td></tr><tr><td width="40%" align="left" valign="top">1.4. Obtaining <span class="application">Happy</span> </td><td width="20%" align="center"><a accesskey="h" href="index.html">Home</a></td><td width="40%" align="right" valign="top"> 2.2. Parsing sequences</td></tr></table></div></body></html>