Sophie

Sophie

distrib > Mandriva > 2010.0 > i586 > media > contrib-release > by-pkgid > 1142aab15abbf19b571c813fc39a9643 > files > 12

python-dparser-1.15-1mdv2009.1.i586.rpm

<html>
<head>
<title>DParser for Python Documentation</title>
</head>
<body bgcolor="#ffffff">
<h1>DParser for Python Documentation</h1>
<p>
This page describes <a href="index.html">the Python interface to DParser</a>.  Please see <a href="http://dparser.sourceforge.net/d/manual.html">the DParser manual</a>
for more detailed information on DParser.
<ul>
<li><a href="py_dparser_manual.html#BI">basic ideas</a>
<li><a href="py_dparser_manual.html#AA">arguments to actions</a>
<li><a href="py_dparser_manual.html#ADP">arguments to dparser.Parser()</a>
<li><a href="py_dparser_manual.html#AP">arguments to dparser.Parser.parse()</a>
<li><a href="py_dparser_manual.html#PF">pitfalls and tips</a>
</ul>
<h2><A NAME="BI">Basic Ideas</a></h2>
<p>
Grammar rules are input to DParser using Python function documentation strings.  (A string placed as the first line of a Python function is the function's documentation string.)
In order to let DParser know that you want it to use a specific function's documentation string as part of your grammar, begin that function's name with "d_".  The function then becomes an action that is executed 
whenever the production defined in the documentation string reduces.  For example,
<pre>
def d_action1(t):
    " sentence : noun 'runs' "
    print 'found a sentence'
#...
</pre>
This function specifies an action, <code>d_action1</code>, and a production, <code>sentence</code>, to DParser.  <code>d_action1</code> will be called when DParser recognizes a <code>sentence</code>.  The argument, <code>t</code>, to <code>d_action1</code> is an array.  The array consists of the return values of the elements
making up the production, or, for terminal elements, the string the terminal matched.  In the above example, the array <code>t</code>
array will contain the return value of <code>noun</code>'s action as the first element and the Python string <code>'runs'</code> as 
the second.  
<P>
Regular expression are specified by enclosing the regular expression in double quotes:
<pre>
def d_number(t):
    ' number : "[0-9]+" '            # match a positive integer
    return int(t[0])             # turn the matched string into an integer
#...
</pre>
Make sure your documentation string is a Python raw string (precede it with the letter <code>r</code>) if it contains any <a href="http://www.python.org/doc/current/ref/strings.html">Python escape sequences</a>.
<P>
For more advanced features of productions, such as priorities and associativites, see <a href="http://dparser.sourceforge.net/d/manual.html">the DParser manual</a>.
<P>
For a simple, complete example to add integers, go back to the <a href="index.html">home page</a>.
<h2><A NAME="AA">Arguments to actions</a></h2>
All actions take at least one argument, an array, as described above.  
Other arguments are optional.  The interface recognizes which arguments you want based on the name you give the argument.  Possible
names are:
<dl><dt><code>spec</code>, <code>spec_only</code><dd>If an action takes <code>spec</code>, that action will be called for both speculative and final parses 
(otherwise, the action is only called for final parses).
The value of <code>spec</code> indicates whether the parse is final or speculative (1 is speculative, 0 is final).  
To reject a speculative parse, return <code>dparser.Reject</code>.
If an action takes <code>spec_only</code>, the action will be called only for speculative parses.  The return value of the 
action for the final parse will be the same Python object that was returned for the speculative parse.
<a href="test5.py">Complete example</a>.
<dt><code>g</code><dd>DParser's global state.  <code>g</code> is actually an array, the first element of which is the
global state.  (Using a one-element array in this manner allows the action to change the global state.)
<dt><code>s</code><dd>contains an array (a tree, really) of the strings that make up this reduction.  <code>s</code> is useful if the purpose of your parser is to alter some text, leaving it mostly intact.  See <a href="test6.py">here</a> for a complete example.  
<dt><code>nodes</code><dd> 
an array of Python wrappers around the reduction's <code>D_ParseNode</code>s.  They contain information on line numbers and such.  See <a href="test7.py">here</a> for useful fields.
<dt><code>this</code><dd> the <code>D_ParseNode</code> for the current production.  (<code>$$</code> in DParser.)  Again, see <a href="test7.py">this example</a>.
<dt><code>parser</code><dd>
your parser (sometimes useful if you're dealing with multiple files).
</dl>


<h2><A NAME="ADP">Arguments to dparser.Parser()</a></h2>
All arguments are optional.
<dl>
<dt><code>modules</code>:
	<dd>an array of modules containing the actions you want in your parser.  If this argument is not specified, the calling module will be used.
<dt><code>file_prefix</code>:<dd>prefix for the filename of the parse table cache and other such files.  Defaults to "d_parser_mach_gen"
</dl>

<h2><A NAME="AP">Arguments to dparser.Parser.parse()</a></h2>
The first argument to dparser.Parser.parse is always the string that is to be parsed.  All other arguments are optional.
<dl>
<dt><code>start_symbol</code>:
<dd>the start symbol.  Defaults to the topmost symbol defined. 
<dt><code>print_debug_info</code>:
        <dd>prints a list of actions that are called if non-zero.  Question marks indicate the action is speculative.
<dt><code>dont_fixup_internal_productions, dont_merge_epsilon_trees, commit_actions_interval, error_recovery</code>:
<dd>correspond to the members of <code>D_Parser</code> (see the DParser manual)
<dt><code>initial_skip_space_fn</code>:<dd>allows user-defined whitespace (as does the <code>whitespace</code> production,
and instead of the built-in, c-like whitespace parser).
Its argument is a <code>d_loc_t</code> structure.  This structure's member, <code>s</code>, is an index into the string that is being parsed.  Modify this index
to skip whitespace:
<pre>
def whitespace(loc):    # no d_ prefix
   while loc.s < len(loc.buf) and loc.buf[loc.s:loc.s+2] == ':)':    # make smiley face the whitespace
      loc.s = loc.s + 2
#...
Parser().parse('int:)var:)=:)2', initial_skip_space_fn = whitespace)
</pre>
<dt><code>syntax_error_fn</code>:<dd> called on a syntax error.  By default an exception is raised.  It is passed a <code>d_loc_t</code> structure (see <code>initial_skip_space_fn</code>)
indicating the location of the error.  The function below will put '<--error' and a line break at the location of the error:
<pre>
def syntax_error(loc):
    mn = max(loc.s - 10, 0)
    mx = min(loc.s + 10, len(loc.buf))
    begin = loc.buf[mn:loc.s]
    end = loc.buf[loc.s:mx]
    space = ' '*len(begin)
    print begin + '\n' + space + '<--error' + '\n' + space + end
#...
Parser().parse('python is bad.', syntax_error_fn = syntax_error)
</pre>
<dt><code>ambiguity_fn</code>:<dd>resolves ambiguities.  It takes an array of <code>D_ParseNode</code>s and expects one of them to be returned.  By default a <code>dparser.AmbiguityException</code> is raised.
</dl>

<h2><A NAME="PF">Pitfalls and Tips</a></h2>
Let me know if you run into a pitfall or have a tip, and I will put it here.  

<dl>
<dt>Debugging a Grammar<dd>Pass <code>print_debug_info=1</code> to <code>Parser.parse()</code> to see a list of the actions that are being called (pass it 2 to see only final actions).  
<p>Also, try looking at the grammar file that is created, d_parser_mach_gen.g.<p>
<dt>Regular expressions:<dd>
DParser does not understand all of the regular expressions understood by the Python regular expression module.  Make sure you are using regular expressions DParser can understand.  
<p>Also, make sure your documentation string is a Python raw string (precede it with the letter <code>r</code>) if it contains any <a href="http://www.python.org/doc/current/ref/strings.html">Python escape sequences</a>.
<dt>Whitespace:<dt><dd>
By default, DParser treats tabs, spaces, newlines and #line commands as whitespace.  If you want to deal with any of these yourself (especially be careful of the # character), you have to either create an <code>initial_skip_space_fn</code>, as shown above, or define the special <code>whitespace</code> production:
<pre>
def d_whitespace(t):
   'whitespace : "[ \t\n]*" '     # treat space, tab and newline as whitespace, but treat the # character normally
   print 'found whitespace:' + t[0]
</pre>
<dt>DParser specifiers/declarations:<dd>DParser can be passed declarations in documentation strings.  For example,
<pre>
from dparser import Parser
def d_somefunc(t) : '${declare longest_match}'
#...
</pre>
(see the DParser manual for an explanation of specifiers and declarations.)
<p>
<dt>Multiple productions per action:<dd>You can put multiple productions (even your entire grammar) into one documentation string.  Just make sure to add semicolons after each production:
<pre>
from dparser import Parser

def d_grammar(t):
   '''sentence : noun verb;
      noun : 'dog' | 'cat';
      verb : 'run'
   '''
   print 'this function gets called for every reduction'

Parser().parse("dog run")
</pre>