Sophie

Sophie

distrib > Mandriva > 2010.0 > i586 > media > contrib-release > by-pkgid > d9c1887ff364dc87e282490223567c41 > files > 124

ocaml-pxp-1.2.1-1mdv2010.0.i586.rpm

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<link rel="stylesheet" href="style.css" type="text/css">
<meta content="text/html; charset=iso-8859-1" http-equiv="Content-Type">
<link rel="Start" href="index.html">
<link rel="previous" href="Intro_resolution.html">
<link rel="next" href="Intro_advanced.html">
<link rel="Up" href="index.html">
<link title="Index of types" rel=Appendix href="index_types.html">
<link title="Index of exceptions" rel=Appendix href="index_exceptions.html">
<link title="Index of values" rel=Appendix href="index_values.html">
<link title="Index of class methods" rel=Appendix href="index_methods.html">
<link title="Index of classes" rel=Appendix href="index_classes.html">
<link title="Index of class types" rel=Appendix href="index_class_types.html">
<link title="Index of modules" rel=Appendix href="index_modules.html">
<link title="Index of module types" rel=Appendix href="index_module_types.html">
<link title="Pxp_types" rel="Chapter" href="Pxp_types.html">
<link title="Pxp_document" rel="Chapter" href="Pxp_document.html">
<link title="Pxp_dtd" rel="Chapter" href="Pxp_dtd.html">
<link title="Pxp_tree_parser" rel="Chapter" href="Pxp_tree_parser.html">
<link title="Pxp_core_types" rel="Chapter" href="Pxp_core_types.html">
<link title="Pxp_ev_parser" rel="Chapter" href="Pxp_ev_parser.html">
<link title="Pxp_event" rel="Chapter" href="Pxp_event.html">
<link title="Pxp_dtd_parser" rel="Chapter" href="Pxp_dtd_parser.html">
<link title="Pxp_codewriter" rel="Chapter" href="Pxp_codewriter.html">
<link title="Pxp_marshal" rel="Chapter" href="Pxp_marshal.html">
<link title="Pxp_yacc" rel="Chapter" href="Pxp_yacc.html">
<link title="Pxp_reader" rel="Chapter" href="Pxp_reader.html">
<link title="Intro_trees" rel="Chapter" href="Intro_trees.html">
<link title="Intro_extensions" rel="Chapter" href="Intro_extensions.html">
<link title="Intro_namespaces" rel="Chapter" href="Intro_namespaces.html">
<link title="Intro_events" rel="Chapter" href="Intro_events.html">
<link title="Intro_resolution" rel="Chapter" href="Intro_resolution.html">
<link title="Intro_getting_started" rel="Chapter" href="Intro_getting_started.html">
<link title="Intro_advanced" rel="Chapter" href="Intro_advanced.html">
<link title="Intro_preprocessor" rel="Chapter" href="Intro_preprocessor.html">
<link title="Example_readme" rel="Chapter" href="Example_readme.html"><link title="Parse a file and represent it as tree" rel="Section" href="#2_Parseafileandrepresentitastree">
<link title="Compiling and linking" rel="Section" href="#complink">
<link title="Variations" rel="Section" href="#2_Variations">
<link title="What PXP cannot do for you" rel="Section" href="#2_WhatPXPcannotdoforyou">
<link title="Catching and printing exceptions" rel="Subsection" href="#exn">
<link title="Printing trees in the O'Caml toploop" rel="Subsection" href="#toploop">
<link title="Parsing in well-formedness mode" rel="Subsection" href="#wfmode">
<link title="Validating well-formed trees" rel="Subsection" href="#lateval">
<link title="Encodings" rel="Subsection" href="#encodings">
<link title="Event parser (push/pull parsing)" rel="Subsection" href="#evparser">
<link title="Low-profile trees" rel="Subsection" href="#lowprofile">
<link title="Choosing the node types to represent" rel="Subsection" href="#nodetypes">
<link title="Controlling whitespace" rel="Subsection" href="#whitespace">
<link title="Checking the ID consistency and looking up nodes by ID" rel="Subsection" href="#idcheck">
<link title="Finding nodes by element names" rel="Subsection" href="#findelements">
<link title="Specifying sources" rel="Subsection" href="#sources">
<link title="Embedding large constant XML in source code" rel="Subsection" href="#codewriter">
<link title="Using the preprocessor to create XML trees" rel="Subsection" href="#prepro">
<link title="Namespaces" rel="Subsection" href="#namespaces">
<link title="Specifying which classes implement nodes - the mysterious spec parameter" rel="Subsection" href="#spec">
<title>PXP Reference : Intro_getting_started</title>
</head>
<body>
<div class="navbar"><a href="Intro_resolution.html">Previous</a>
&nbsp;<a href="index.html">Up</a>
&nbsp;<a href="Intro_advanced.html">Next</a>
</div>
<center><h1>Intro_getting_started</h1></center>
<br>
<br>
In the following sections we'll explain how to solve a basic
task in PXP, namely to parse a file and to represent it in 
memory, followed by paragraphs on variations of this task,
because not everybody will be happy with the basic solution.
<p>

<a name="2_Parseafileandrepresentitastree"></a>
<h2>Parse a file and represent it as tree</h2>
<p>

The basic piece of code to parse "filename.xml" is:
<p>

<pre></pre><code class="code"><span class="keyword">let</span>&nbsp;config&nbsp;=&nbsp;<span class="constructor">Pxp_types</span>.default_config<br>
<span class="keyword">let</span>&nbsp;spec&nbsp;=&nbsp;<span class="constructor">Pxp_tree_parser</span>.default_spec<br>
<span class="keyword">let</span>&nbsp;source&nbsp;=&nbsp;<span class="constructor">Pxp_types</span>.from_file&nbsp;<span class="string">"filename.xml"</span><br>
<span class="keyword">let</span>&nbsp;doc&nbsp;=&nbsp;<span class="constructor">Pxp_tree_parser</span>.parse_document_entity&nbsp;config&nbsp;source&nbsp;spec<br>
</code><pre></pre>
<p>

As you can see, a some defaults are loaded (<a href="Pxp_types.html#VALdefault_config"><code class="code"><span class="constructor">Pxp_types</span>.default_config</code></a>,
and <a href="Pxp_tree_parser.html#VALdefault_spec"><code class="code"><span class="constructor">Pxp_tree_parser</span>.default_spec</code></a>). These defaults have these effects
(as far as being important for an introduction):
<p>
<ul>
<li>The parsed document is represented in ISO-8859-1. The file can
  be encoded differently, however, and if so, it is automatically
  recoded to ISO-8859-1.</li>
<li>The generated tree only has nodes for elements and character data
  sections, but not for comments, and processing instructions.</li>
<li>The top-most node of the tree, <code class="code">doc<span class="keywordsign">#</span>root</code>, is the top-most element.</li>
<li>No namespace processing is performed.</li>
</ul>

XML does not know the concept of file names. All files (or other
resources) are named by so-called ID's. Although we can pass here a
file name to <code class="code">from_file</code>, it is immediately converted into a <code class="code"><span class="constructor">SYSTEM</span></code>
ID which is essentially a URL of the form
<code class="code">file:///dir1/.../dirN/filename.xml</code>. This ID can be processed -
especially it is now clear how to treat releative <code class="code"><span class="constructor">SYSTEM</span></code> ID's that
occur in the parsed document. For instance, if another file is
included by "filename.xml", and the <code class="code"><span class="constructor">SYSTEM</span></code> ID is "parts/part1.xml",
the usual rules for resolving relative URL's say that the effective
file to read is <code class="code">file:///dir1/.../dirN/parts/part1.xml</code>. Relative
<code class="code"><span class="constructor">SYSTEM</span></code> ID's are resolved relative to the URL of the file where the
entity reference occurs that leads to the inclusion of the other file
(this is comparable to how hyperlinks in HTML are treated).
<p>

Note that we make here some assumptions about the file system of the
computer. <a href="Pxp_reader.html#VALmake_file_url"><code class="code"><span class="constructor">Pxp_reader</span>.make_file_url</code></a> has to deal with character
encodings of file names. It assumes UTF-8 by default. By passing
arguments to this function, other assumptions about the encoding of
file names can be made. Unfortunately, there is no portable way of
determining the character encoding the system uses for file names
(see the hyperlinks at the end of this section).
<p>

The returned <code class="code">doc</code> object is of type <a href="Pxp_document.document.html"><code class="code"><span class="constructor">Pxp_document</span>.document</code></a>. This type
is used for all regular documents that exist independently. The root
of the node tree is returned by <code class="code">doc<span class="keywordsign">#</span>root</code> which is a
. See <a href="Intro_trees.html"><code class="code"><span class="constructor">Intro_trees</span></code></a> for more about the tree
representation.
<p>

The call <a href="Pxp_tree_parser.html#VALparse_document_entity"><code class="code"><span class="constructor">Pxp_tree_parser</span>.parse_document_entity</code></a> does not only parse,
but it also validates the document. This works only if there is a DTD,
and the document conforms to the DTD. There is a weaker criterion for
formal correctness called well-formedness. See below how to only the
check for well-formedness while parsing without doing the whole
validation.
<p>

Links about the file name encoding problem:<ul>
<li><a href="http://library.gnome.org/devel/glib/stable/glib-Character-Set-Conversion.html#g-get-filename-charsets">How GLib treats the file name encoding problem</a></li>
<li><a href="http://developer.apple.com/technotes/tn/tn1150.html"> OS X stores filenames on HFS+ volumes in a Unicode encoding</a>; the POSIX
   functions like <code class="code"><span class="keyword">open</span></code> expect file names in UTF-8 encoding.</li>
<li>Current Windows versions store filenames in Unicode. The Win32 functions
   are available in a Unicode and in a so-called ANSI version
   (see <a href="http://msdn.microsoft.com/en-us/library/dd317752(VS.85).aspx">
   Code Pages</a>), and the O'Caml runtime calls the latter. This means file
   names available to PXP are encoded in the active code page.</li>
</ul>

<a name="complink"></a>
<h2>Compiling and linking</h2>
<p>

It is strongly recommended to compile and link with the help of
<code class="code">ocamlfind</code>. For (byte) compiling use one of
<p>
<ul>
<li><code class="code">ocamlfind ocamlc -package pxp-engine -c file.ml</code></li>
<li><code class="code">ocamlfind ocamlc -package pxp -c file.ml</code></li>
</ul>

The package <code class="code">pxp-engine</code> refers to the core library while <code class="code">pxp</code> refers
to an extended version including the various lexers. For compiling, there
is no big difference between the two because the lexers are usually not
directly invoked. However, at link time you need these lexers. You can
choose between using the pre-defined package <code class="code">pxp</code> and a manually selected
combination of <code class="code">pxp-engine</code> with some lexer packages. So for linking
e.g. use one of:
<p>
<ul>
<li><code class="code">ocamlfind ocamlc -package pxp -linkpkg -o executable ... </code>
  to get the standard selection of lexers</li>
<li><code class="code">ocamlfind ocamlc -package pxp-engine,pxp-lex-iso88591,pxp-ulex-utf8 -linkpkg -o executable ... </code>
  to get lexers for ISO-8859-1 and UTF-8</li>
</ul>

There is a special lexer for every choice of encoding for the internal
representation of XML. If you e.g. choose to represent the document as
UTF-8 there must be a lexer capable of handling UTF-8. The package <code class="code">pxp</code>
includes a standard set of lexers, including UTF-8 and many encodings of
the ISO-8859 series. For more about encodings, see below
<a href="Intro_getting_started.html#encodings"><i>Encodings</i></a>.
<p>

<a name="2_Variations"></a>
<h2>Variations</h2>
<p>

<a name="exn"></a>
<h3>Catching and printing exceptions</h3>
<p>

The relevant exceptions are defined in <a href="Pxp_types.html"><code class="code"><span class="constructor">Pxp_types</span></code></a>. You can catch
these exceptions (as thrown by the parser) as in:
<p>

<pre></pre><code class="code"><span class="keyword">try</span>&nbsp;...<br>
<span class="keyword">with</span><br>
&nbsp;&nbsp;<span class="keywordsign">|</span>&nbsp;<span class="constructor">Pxp_types</span>.<span class="constructor">Validation_error</span>&nbsp;_<br>
&nbsp;&nbsp;<span class="keywordsign">|</span>&nbsp;<span class="constructor">Pxp_types</span>.<span class="constructor">WF_error</span>&nbsp;_<br>
&nbsp;&nbsp;<span class="keywordsign">|</span>&nbsp;<span class="constructor">Pxp_types</span>.<span class="constructor">Namespace_error</span>&nbsp;_<br>
&nbsp;&nbsp;<span class="keywordsign">|</span>&nbsp;<span class="constructor">Pxp_types</span>.<span class="constructor">Error</span>&nbsp;_<br>
&nbsp;&nbsp;<span class="keywordsign">|</span>&nbsp;<span class="constructor">Pxp_types</span>.<span class="constructor">At</span>(_,_)&nbsp;<span class="keyword">as</span>&nbsp;error&nbsp;<span class="keywordsign">-&gt;</span><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;print_endline&nbsp;(<span class="string">"PXP&nbsp;error&nbsp;"</span>&nbsp;^&nbsp;<span class="constructor">Pxp_types</span>.string_of_exn&nbsp;error)<br>
</code><pre></pre> 
<p>

There are more exceptions, but these are usually caught within PXP
and converted to one of the mentioned exceptions.
<p>

<a name="toploop"></a>
<h3>Printing trees in the O'Caml toploop</h3>
<p>

There are toploop printers for nodes and documents. They are automatically
activated when the findlib directive <code class="code"><span class="keywordsign">#</span>require <span class="string">"pxp"</span></code> is used to load
PXP into the toploop. Alternatively, one can also do
<p>

<pre></pre><code class="code"><span class="keywordsign">#</span>install_printer&nbsp;<span class="constructor">Pxp_document</span>.print_node;;<br>
<span class="keywordsign">#</span>install_printer&nbsp;<span class="constructor">Pxp_document</span>.print_doc;;<br>
</code><pre></pre>
<p>

For example, the tree <code class="code">&lt;x&gt;&lt;y&gt;foo&lt;/y&gt;&lt;/x&gt;</code> would be shown as:
<p>

<pre></pre><code class="code">&nbsp;&nbsp;<span class="keywordsign">#</span>&nbsp;tree;;<br>
&nbsp;&nbsp;_&nbsp;:&nbsp;(<span class="keywordsign">'</span>a&nbsp;<span class="constructor">Pxp_document</span>.node&nbsp;<span class="constructor">Pxp_document</span>.extension&nbsp;<span class="keyword">as</span>&nbsp;<span class="keywordsign">'</span>a)&nbsp;<span class="constructor">Pxp_document</span>.node&nbsp;=<br>
&nbsp;&nbsp;*&nbsp;<span class="constructor">T_element</span>&nbsp;<span class="string">"x"</span><br>
&nbsp;&nbsp;&nbsp;&nbsp;*&nbsp;<span class="constructor">T_element</span>&nbsp;<span class="string">"y"</span><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;*&nbsp;<span class="constructor">T_data</span>&nbsp;<span class="string">"foo"</span><br>
</code><pre></pre>
<p>

<a name="wfmode"></a>
<h3>Parsing in well-formedness mode</h3>
<p>

In well-formedness mode many checks are not performed regarding the
formal integrity of the document. Note that the terms "valid" and
"well-formed" are rigidly defined in the XML standard, and that PXP
strictly tries to conform to the standard. Especially note that the
<code class="code"><span class="constructor">DOCTYPE</span></code> clause is not rejected in well-formedness mode and that the
declarations are parsed although interpreted differently.
<p>

In order to call the parser in well-formedness mode, call one of the
"wf" functions, e.g.
<p>

<pre></pre><code class="code"><span class="keyword">let</span>&nbsp;doc&nbsp;=&nbsp;<span class="constructor">Pxp_tree_parser</span>.parse_wfdocument_entity&nbsp;config&nbsp;source&nbsp;spec<br>
</code><pre></pre>
<p>

<b>Details.</b>
Even in well-formedness mode there is a DTD object. The DTD object is,
however, differently treated:<ul>
<li>All declarations are parsed. However, the declarations of elements,
  attributes, and notations are not added to the DTD object. The
  declarations of entities are fully processed. Processing instructions
  are also not handled in any way differently than when validation
  is enabled. Note that all this means
  that you can get syntax errors about ill-formed declarations in 
  well-formedness mode, although the declarations are not further 
  processed.</li>
<li>When the parser checks the integrity of elements, attributes or
  notations it finds in the XML text to parse, it accepts that there
  is no declaration in the DTD object. This is controlled by a 
  special DTD mode called <code class="code">arbitrary_allowed</code> 
  (see <a href="Pxp_dtd.dtd.html#METHODallow_arbitrary"><code class="code"><span class="constructor">Pxp_dtd</span>.dtd.allow_arbitrary</code></a>). If enabled as done in
  well-formedness mode, the DTD reacts specially when a declaration
  is missing so that the parser knows it has to accept that. 
  Note that, if one added a declaration programmatically
  to the DTD object, the DTD would find it, and would actually
  validate against it. Effectively, validation is not disabled in
  well-formedness mode, only the constraints imposed by the DTD
  object on the document are weaker. There is in fact a way
  to add declarations in well-formedness mode to get partly the
  effects of validation: This is called <a href="Intro_advanced.html#mixedmode"><i>The mixed mode</i></a>.</li>
<li>It is not checked whether the top-most element is the one declared
  in the <code class="code"><span class="constructor">DOCTYPE</span></code> clause (if that clause exists).</li>
</ul>

When processing well-formed documents one should be more careful
because the parser has not done any checks on the structure of the
node tree.
<p>

<a name="lateval"></a>
<h3>Validating well-formed trees</h3>
<p>

It is possible to validate a tree later that was originally only 
parsed in well-formedness mode.
<p>

Of course, there is one obvious difficulty. As mentioned in the
previous section, the DTD object is incompletely built (declarations
of elements, attributes, and notations are ignored), so the DTD object
is not suitable for validating the document against it. For
validation, however, a complete DTD object is required.  The solution
is to replace the DTD object by a different one. As the DTD object is
referenced from all nodes of the tree, and thus intricately connected
with it, the only way to do so is to copy the entire tree. The
function <a href="Pxp_marshal.html#VALrelocate_document"><code class="code"><span class="constructor">Pxp_marshal</span>.relocate_document</code></a> can be used for this type of
copy operation.
<p>

We assume here that we can get the replacement DTD from an external
file, "file.dtd", and that another constraint is that the root
element must be <code class="code">start</code> (as if we had <code class="code">&lt;!<span class="constructor">DOCTYPE</span> start <span class="constructor">SYSTEM</span> <span class="string">"file.dtd"</span>&gt;</code>).
Also <code class="code">doc</code> is the parsed "filename.xml" file as retrieved by
<p>

<pre></pre><code class="code"><span class="keyword">let</span>&nbsp;config&nbsp;=&nbsp;<span class="constructor">Pxp_types</span>.default_config<br>
<span class="keyword">let</span>&nbsp;spec&nbsp;=&nbsp;<span class="constructor">Pxp_tree_parser</span>.default_spec<br>
<span class="keyword">let</span>&nbsp;source&nbsp;=&nbsp;<span class="constructor">Pxp_types</span>.from_file&nbsp;<span class="string">"filename.xml"</span><br>
<span class="keyword">let</span>&nbsp;doc&nbsp;=&nbsp;<span class="constructor">Pxp_tree_parser</span>.parse_wfdocument_entity&nbsp;config&nbsp;source&nbsp;spec<br>
</code><pre></pre>
<p>

Now the validation against a different DTD is done by:
<p>

<pre></pre><code class="code"><span class="keyword">let</span>&nbsp;rdtd_source&nbsp;=&nbsp;<span class="constructor">Pxp_types</span>.from_file&nbsp;<span class="string">"file.dtd"</span><br>
<span class="keyword">let</span>&nbsp;rdtd&nbsp;=&nbsp;<span class="constructor">Pxp_dtd_parser</span>.parse_dtd_entity&nbsp;config&nbsp;rdtd_source<br>
<span class="keyword">let</span>&nbsp;()&nbsp;=&nbsp;rdtd&nbsp;<span class="keywordsign">#</span>&nbsp;set_root&nbsp;<span class="string">"start"</span><br>
<span class="keyword">let</span>&nbsp;vroot&nbsp;=&nbsp;<span class="constructor">Pxp_marshal</span>.relocate_document&nbsp;doc<span class="keywordsign">#</span>root&nbsp;rdtd&nbsp;spec<br>
<span class="keyword">let</span>&nbsp;()&nbsp;=&nbsp;<span class="constructor">Pxp_document</span>.validate&nbsp;vroot<br>
<span class="keyword">let</span>&nbsp;vdoc&nbsp;=&nbsp;<span class="keyword">new</span>&nbsp;<span class="constructor">Pxp_document</span>.document&nbsp;config.warner&nbsp;config.encoding<br>
<span class="keyword">let</span>&nbsp;()&nbsp;=&nbsp;vdoc<span class="keywordsign">#</span>init_root&nbsp;vroot&nbsp;doc<span class="keywordsign">#</span>raw_root_name<br>
</code><pre></pre>
<p>

The <code class="code">vdoc</code> document has now the same contents as <code class="code">doc</code> but points to
a different DTD, namely <code class="code">rdtd</code>. Also, the validation checks have been
performed. A few more comments:
<p>
<ul>
<li>We use here the same <code class="code">config</code> for parsing the original document <code class="code">doc</code>
  and the replacement DTD <code class="code">rdtd</code>. This is not strictly required. However,
  the encoding of the in-memory representation must be identical
  (i.e. <code class="code">config.encoding</code>).</li>
<li>When you omit <code class="code">rdtd<span class="keywordsign">#</span>set_root</code>, any root element is allowed.</li>
<li>The entity definitions of the old DTD object are lost.</li>
<li>It is of course possible to modify <code class="code">doc</code> before doing the validation,
  or to validate a <code class="code">doc</code> that is not the result of a parser call but
  programmatically created.</li>
</ul>

<a name="encodings"></a>
<h3>Encodings</h3>
<p>

In PXP, the encoding of the parsed text (the external encoding), and the
encoding of the in-memory representation can be distinct. For processing
external encodings PXP relies on Ocamlnet. The external encoding is
usually indicated in the XML declaration at the beginning of the text,
e.g.
<p>

<pre></pre><code class="code">&lt;?xml&nbsp;version=<span class="string">"1.0"</span>&nbsp;encoding=<span class="string">"ISO-8859-2"</span><span class="keywordsign">?&gt;</span><br>
...<br>
</code><pre></pre>
<p>

There is also an autorecognition of the external encoding that works for
UTF-8 and UTF-16.
<p>

It is generally possible to override the external encoding
(e.g. because the file has already been converted but the XML
declaration was not changed at the same time). Some of the <code class="code">from_*</code>
sources allow it to override the encoding directly, e.g. by setting
the <code class="code">fixenc</code> argument when calling <a href="Pxp_types.html#VALfrom_channel"><code class="code"><span class="constructor">Pxp_types</span>.from_channel</code></a>. Note
that <a href="Pxp_types.html#VALfrom_file"><code class="code"><span class="constructor">Pxp_types</span>.from_file</code></a> does not have this option as this source
allows it to read any file. Overriding encodings is, however, only
interesting for certain files. A workaround is to combine <code class="code">from_file</code>
with a catalog of ID's, and to override the encodings for certain
files there. (Catalogs also allow to override external encodings.
See below, <a href="Intro_getting_started.html#sources"><i>Specifying sources</i></a> for examples using catalogs.)
<p>

As mentioned, the encoding of the in-memory representation can be
distinct from the external encoding. It is required that every character
in the document can be represented in the representation encoding.
Because of this, the chosen encoding should be a superset of all external
encodings that may occur. If you choose UTF-8 for the representation
every character can be represented anyway.
<p>

You set the representation encoding in the <code class="code">config</code> record, e.g.
<p>

<pre></pre><code class="code"><span class="keyword">let</span>&nbsp;config&nbsp;=<br>
&nbsp;&nbsp;{&nbsp;<span class="constructor">Pxp_types</span>.default_config<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span class="keyword">with</span>&nbsp;encoding&nbsp;=&nbsp;<span class="keywordsign">`</span><span class="constructor">Enc_utf8</span><br>
&nbsp;&nbsp;}<br>
</code><pre></pre>
<p>

It is strictly required that only a single encoding is used in a
document (and PXP also checks that).
<p>

The available encodings for the in-memory representation are a subset
of the encodings supported by Ocamlnet. Effectively, UTF-8 is 
supported and a number of 8-bit encodings as far as they are ASCII-
compatible (i.e. extensions of 7 bit ASCII).
<p>

For every representation encoding PXP needs a different lexer. PXP
already comes with a set of lexers for the supported encodings. However,
at link time the user program must ensure that the lexer is linked into
the executable. The lexers are available as separate findlib packages:<ul>
<li><code class="code">pxp-ulex-utf8</code>: This is the standard lexer for UTF-8</li>
<li><code class="code">pxp-wlex-utf8</code>: This is the old, wlex-based lexer for UTF-8. It is not
  built when ulex is available.</li>
<li><code class="code">pxp-lex-utf8</code>: This is the old, ocamllex-based lexer for UTF-8.
  It is slightly faster than <code class="code">pxp-ulex-utf8</code>, but consumes a lot more
  memory.</li>
<li><code class="code">pxp-lex-*</code>: These are lexers for various 8 bit character sets</li>
</ul>

For the link command, see above: <a href="Intro_getting_started.html#complink"><i>Compiling and linking</i></a>.
<p>

<a name="evparser"></a>
<h3>Event parser (push/pull parsing)</h3>
<p>

It is sometimes not desirable to represent the parsed XML data as
tree. An important reason is that the amount of data would exceed the
available memory resources. Another reason may be to combine XML
parsing with a custom grammar. In order to support this, PXP can be
called as event parser. Basically, PXP emits events (tokens) while parsing
certain syntax elements, and the caller of PXP processes these events.
This mode can only be used together with well-formedness mode - for
validation the tree representation is a prerequisite.
<p>

Here we show how to parse "filename.xml" with a pull parser:
<p>

<pre></pre><code class="code"><span class="keyword">let</span>&nbsp;config&nbsp;=&nbsp;<span class="constructor">Pxp_types</span>.default_config<br>
<span class="keyword">let</span>&nbsp;source&nbsp;=&nbsp;<span class="constructor">Pxp_types</span>.from_file&nbsp;<span class="string">"filename.xml"</span><br>
<span class="keyword">let</span>&nbsp;entmng&nbsp;=&nbsp;<span class="constructor">Pxp_ev_parser</span>.create_entity_manager&nbsp;config&nbsp;source<br>
<span class="keyword">let</span>&nbsp;entry&nbsp;=&nbsp;<span class="keywordsign">`</span><span class="constructor">Entry_document</span>&nbsp;[]<br>
<span class="keyword">let</span>&nbsp;next&nbsp;=&nbsp;<span class="constructor">Pxp_ev_parser</span>.create_pull_parser&nbsp;config&nbsp;entry&nbsp;entmng<br>
</code><pre></pre>
<p>

Now, one can call <code class="code">next()</code> repeatedly to get one event after the other.
The events have type <a href="Pxp_types.html#TYPEevent"><code class="code"><span class="constructor">Pxp_types</span>.event</code></a> <code class="code">option</code>.
<p>

More about event parsing can be found in <a href="Intro_events.html"><code class="code"><span class="constructor">Intro_events</span></code></a>.
<p>

<a name="lowprofile"></a>
<h3>Low-profile trees</h3>
<p>

When the tree classes in <a href="Pxp_document.html"><code class="code"><span class="constructor">Pxp_document</span></code></a> are too much overhead,
it is easily possible to define a specially crafted tree data type, and
to transform the event-parsed document into such trees. For example,
consider this cute definition:
<p>

<pre></pre><code class="code"><span class="keyword">type</span>&nbsp;tree&nbsp;=<br>
&nbsp;&nbsp;<span class="keywordsign">|</span>&nbsp;<span class="constructor">Element</span>&nbsp;<span class="keyword">of</span>&nbsp;string&nbsp;*&nbsp;(string&nbsp;*&nbsp;string)&nbsp;list&nbsp;*&nbsp;tree&nbsp;list<br>
&nbsp;&nbsp;<span class="keywordsign">|</span>&nbsp;<span class="constructor">Data</span>&nbsp;<span class="keyword">of</span>&nbsp;string<br>
</code><pre></pre>
<p>

A tree node is either an <code class="code"><span class="constructor">Element</span>(name,atts,children)</code> or a 
<code class="code"><span class="constructor">Data</span>(text)</code> node. Now we event-parse the XML file:
<p>

<pre></pre><code class="code"><span class="keyword">let</span>&nbsp;config&nbsp;=&nbsp;<span class="constructor">Pxp_types</span>.default_config<br>
<span class="keyword">let</span>&nbsp;source&nbsp;=&nbsp;<span class="constructor">Pxp_types</span>.from_file&nbsp;<span class="string">"filename.xml"</span><br>
<span class="keyword">let</span>&nbsp;entmng&nbsp;=&nbsp;<span class="constructor">Pxp_ev_parser</span>.create_entity_manager&nbsp;config&nbsp;source<br>
<span class="keyword">let</span>&nbsp;entry&nbsp;=&nbsp;<span class="keywordsign">`</span><span class="constructor">Entry_document</span>&nbsp;[]<br>
<span class="keyword">let</span>&nbsp;next&nbsp;=&nbsp;<span class="constructor">Pxp_ev_parser</span>.create_pull_parser&nbsp;config&nbsp;entry&nbsp;entmng<br>
</code><pre></pre>
<p>

Finally, here is a function <code class="code">build_tree</code> that calls the <code class="code">next</code> function to
build our low-profile tree:
<p>

<pre></pre><code class="code"><span class="keyword">let</span>&nbsp;<span class="keyword">rec</span>&nbsp;build_tree()&nbsp;=<br>
&nbsp;&nbsp;<span class="keyword">match</span>&nbsp;next()&nbsp;<span class="keyword">with</span><br>
&nbsp;&nbsp;&nbsp;&nbsp;<span class="keywordsign">|</span>&nbsp;<span class="constructor">Some</span>&nbsp;(<span class="constructor">E_start_tag</span>(name,atts,_,_))&nbsp;<span class="keywordsign">-&gt;</span><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span class="keyword">let</span>&nbsp;children&nbsp;=&nbsp;build_children&nbsp;[]&nbsp;<span class="keyword">in</span><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span class="keyword">let</span>&nbsp;tree&nbsp;=&nbsp;<span class="constructor">Element</span>(name,atts,children)&nbsp;<span class="keyword">in</span><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;skip_rest();<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;tree<br>
&nbsp;&nbsp;&nbsp;&nbsp;<span class="keywordsign">|</span>&nbsp;<span class="constructor">Some</span>&nbsp;(<span class="constructor">E_error</span>&nbsp;e)&nbsp;<span class="keywordsign">-&gt;</span><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;raise&nbsp;e<br>
&nbsp;&nbsp;&nbsp;&nbsp;<span class="keywordsign">|</span>&nbsp;<span class="constructor">Some</span>&nbsp;_&nbsp;<span class="keywordsign">-&gt;</span><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;build_tree()<br>
&nbsp;&nbsp;&nbsp;&nbsp;<span class="keywordsign">|</span>&nbsp;<span class="constructor">None</span>&nbsp;<span class="keywordsign">-&gt;</span><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span class="keyword">assert</span>&nbsp;<span class="keyword">false</span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<br>
<br>
<span class="keyword">and</span>&nbsp;build_node()&nbsp;=<br>
&nbsp;&nbsp;<span class="keyword">match</span>&nbsp;next()&nbsp;<span class="keyword">with</span><br>
&nbsp;&nbsp;&nbsp;&nbsp;<span class="keywordsign">|</span>&nbsp;<span class="constructor">Some</span>&nbsp;(<span class="constructor">E_char_data</span>&nbsp;data)&nbsp;<span class="keywordsign">-&gt;</span><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span class="constructor">Some</span>(<span class="constructor">Data</span>&nbsp;data)<br>
&nbsp;&nbsp;&nbsp;&nbsp;<span class="keywordsign">|</span>&nbsp;<span class="constructor">Some</span>&nbsp;(<span class="constructor">E_start_tag</span>(name,atts,_,_))&nbsp;<span class="keywordsign">-&gt;</span><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span class="keyword">let</span>&nbsp;children&nbsp;=&nbsp;build_children&nbsp;[]&nbsp;<span class="keyword">in</span><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span class="constructor">Some</span>(<span class="constructor">Element</span>(name,atts,children))<br>
&nbsp;&nbsp;&nbsp;&nbsp;<span class="keywordsign">|</span>&nbsp;<span class="constructor">Some</span>&nbsp;(<span class="constructor">E_end_tag</span>(_,_))&nbsp;<span class="keywordsign">-&gt;</span><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span class="constructor">None</span><br>
&nbsp;&nbsp;&nbsp;&nbsp;<span class="keywordsign">|</span>&nbsp;<span class="constructor">Some</span>&nbsp;(<span class="constructor">E_error</span>&nbsp;e)&nbsp;<span class="keywordsign">-&gt;</span><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;raise&nbsp;e<br>
&nbsp;&nbsp;&nbsp;&nbsp;<span class="keywordsign">|</span>&nbsp;<span class="constructor">Some</span>&nbsp;_&nbsp;<span class="keywordsign">-&gt;</span><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;build_node()<br>
&nbsp;&nbsp;&nbsp;&nbsp;<span class="keywordsign">|</span>&nbsp;<span class="constructor">None</span>&nbsp;<span class="keywordsign">-&gt;</span><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span class="keyword">assert</span>&nbsp;<span class="keyword">false</span><br>
<br>
<span class="keyword">and</span>&nbsp;build_children&nbsp;l&nbsp;=<br>
&nbsp;&nbsp;<span class="keyword">match</span>&nbsp;build_node()&nbsp;<span class="keyword">with</span><br>
&nbsp;&nbsp;&nbsp;&nbsp;<span class="keywordsign">|</span>&nbsp;<span class="constructor">Some</span>&nbsp;n&nbsp;<span class="keywordsign">-&gt;</span>&nbsp;build_children&nbsp;(n&nbsp;::&nbsp;l)<br>
&nbsp;&nbsp;&nbsp;&nbsp;<span class="keywordsign">|</span>&nbsp;<span class="constructor">None</span>&nbsp;<span class="keywordsign">-&gt;</span>&nbsp;<span class="constructor">List</span>.rev&nbsp;l<br>
&nbsp;&nbsp;&nbsp;&nbsp;<br>
<span class="keyword">and</span>&nbsp;skip_rest()&nbsp;=<br>
&nbsp;&nbsp;<span class="keyword">match</span>&nbsp;next()&nbsp;<span class="keyword">with</span><br>
&nbsp;&nbsp;&nbsp;&nbsp;<span class="keywordsign">|</span>&nbsp;<span class="constructor">Some</span>&nbsp;<span class="constructor">E_end_of_stream</span>&nbsp;<span class="keywordsign">-&gt;</span><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;()<br>
&nbsp;&nbsp;&nbsp;&nbsp;<span class="keywordsign">|</span>&nbsp;<span class="constructor">Some</span>&nbsp;(<span class="constructor">E_error</span>&nbsp;e)&nbsp;<span class="keywordsign">-&gt;</span><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;raise&nbsp;e<br>
&nbsp;&nbsp;&nbsp;&nbsp;<span class="keywordsign">|</span>&nbsp;<span class="constructor">Some</span>&nbsp;_&nbsp;<span class="keywordsign">-&gt;</span><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;skip_rest()<br>
&nbsp;&nbsp;&nbsp;&nbsp;<span class="keywordsign">|</span>&nbsp;<span class="constructor">None</span>&nbsp;<span class="keywordsign">-&gt;</span><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span class="keyword">assert</span>&nbsp;<span class="keyword">false</span><br>
</code><pre></pre>
<p>

Of course, this all is only reasonable for the well-forermedness mode,
as PXP's validation routines depend on the built-in tree representation
of <a href="Pxp_document.html"><code class="code"><span class="constructor">Pxp_document</span></code></a>.
<p>

<a name="nodetypes"></a>
<h3>Choosing the node types to represent</h3>
<p>

By default, PXP only represents element and data nodes (both in the
normal tree representation and in the event stream). It is possible
to enable more node types:
<p>
<ul>
<li><b>Comment</b> nodes are created for XML comments. In the tree 
  representation, the node type <code class="code"><span class="constructor">T_comment</span></code> is used for them.
  In the event stream, the event type <code class="code"><span class="constructor">E_comment</span></code> is used.</li>
<li><b>Processing instruction</b> nodes are created for processing
  instructions (PI's) occuring in the normal XML flow (i.e. outside
  of DTD's). In the tree representation, the <code class="code"><span class="constructor">T_pinstr</span></code> node type
  is used, and in the event stream, the event type <code class="code"><span class="constructor">E_pinstr</span></code> is
  used.</li>
<li>The <b>super root node</b> can be put at the top of the tree, so that
  the top-most element is a child of this node. This can be reasonable
  especially when comment nodes and PI nodes are also enabled, because
  when these nodes surround the top-most element they also become children
  of the super root node. In the tree representation, the <code class="code"><span class="constructor">T_super_root</span></code>
  node type is used, and in the event stream, the event type <code class="code"><span class="constructor">E_start_super</span></code>
  marks the beginning of this node, and <code class="code"><span class="constructor">E_end_super</span></code> marks the end of 
  this node.</li>
</ul>

These node types are enabled in the <code class="code">config</code> record, e.g.
<p>

<pre></pre><code class="code"><span class="keyword">let</span>&nbsp;config&nbsp;=<br>
&nbsp;&nbsp;{&nbsp;<span class="constructor">Pxp_types</span>.default_config<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span class="keyword">with</span>&nbsp;enable_comment_nodes&nbsp;=&nbsp;<span class="keyword">true</span>;<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;enable_pinstr_nodes&nbsp;=&nbsp;<span class="keyword">true</span>;<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;enable_super_root_node&nbsp;=&nbsp;<span class="keyword">true</span>&nbsp;<br>
&nbsp;&nbsp;}<br>
</code><pre></pre>
<p>

Note that the "super root node" is sometimes called "root node" in
various XML standards giving semantical model of XML.  For PXP the
name "super root node" is preferred because this node type is not
obligatory, and the top-most element node can also be considered as
root of the tree.
<p>

<a name="whitespace"></a>
<h3>Controlling whitespace</h3>
<p>

Depending on the mode, PXP applies some automatic whitespace rules. The
user can call functions to reduce whitespace even more.
<p>

In <b>validating mode</b>, there are whitespace rules for data nodes and
for attributes (the latter below). In this mode it is possible that an
element <code class="code">x</code> is declared such that a regular expression describes the
permitted children.  For instance,
<p>

<pre></pre><code class="code">&nbsp;&lt;!<span class="constructor">ELEMENT</span>&nbsp;x&nbsp;(y,z)&gt;&nbsp;</code><pre></pre>
<p>

is such a declaration, meaning that <code class="code">x</code> may only have <code class="code">y</code> and <code class="code">z</code>
as children, exactly in this order, as in
<p>

<pre></pre><code class="code">&nbsp;&lt;x&gt;&lt;y&gt;why&lt;/&lt;y&gt;&lt;z&gt;zet&lt;/z&gt;&lt;/x&gt;&nbsp;</code><pre></pre>
<p>

XML, however, allows that whitespace is added to make such terms more
readable, as in
<p>

<pre></pre><code class="code">&nbsp;<br>
&lt;x&gt;<br>
&nbsp;&nbsp;&lt;y&gt;why&lt;/&lt;y&gt;<br>
&nbsp;&nbsp;&lt;z&gt;zet&lt;/z&gt;<br>
&lt;/x&gt;&nbsp;<br>
</code><pre></pre>
<p>

The additional whitespace should not, however, appear as children of
node <code class="code">x</code>, because it is considered as a purely notational improvement
without impact on semantics. By default, PXP does not create data nodes
for such notational whitespace. It is possible to disable the
suppression of this type of whitespace by setting
<code class="code">drop_ignorable_whitespace</code> to <code class="code"><span class="keyword">false</span></code>:
<p>

<pre></pre><code class="code">&nbsp;&nbsp;<span class="keyword">let</span>&nbsp;config&nbsp;=<br>
&nbsp;&nbsp;&nbsp;&nbsp;{&nbsp;<span class="constructor">Pxp_types</span>.default_config&nbsp;<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span class="keyword">with</span>&nbsp;drop_ignorable_whitespace&nbsp;=&nbsp;<span class="keyword">false</span><br>
&nbsp;&nbsp;&nbsp;&nbsp;}<br>
</code><pre></pre>
<p>

In <b>well-formedness mode</b>, there is no such feature because element
declarations are ignored.
<p>

Note that although in <b>event mode</b> the parser is restricted to
well-formedness parsing, it is still possible to get the effect of
<code class="code">drop_ignorable_whitespace</code>. See
<a href="Pxp_event.html#VALdrop_ignorable_whitespace_filter"><code class="code"><span class="constructor">Pxp_event</span>.drop_ignorable_whitespace_filter</code></a> for how to selectively
enable this validation feature.
<p>

The other whitespace rules apply to attributes. In <b>all modes</b> line
breaks in attribute values are converted to spaces. That means <code class="code">a1</code>
and <code class="code">a2</code> have identical values:
<p>

<pre></pre><code class="code">&lt;x&nbsp;a1=<span class="string">"1&nbsp;2"</span>&nbsp;a2=<span class="string">"1\n2"</span>&nbsp;a3=<span class="string">"1&amp;#10;2"</span>/&gt;<br>
</code><pre></pre>
<p>

It is possible to suppress this conversion by using <code class="code"><span class="keywordsign">&amp;</span><span class="keywordsign">#</span>10;</code> as line
separator, as in <code class="code">a3</code>, which truly includes a line-feed character.
<p>

In <b>validating mode</b> only there are more rules because attributes
are declared. If the attribute is declared with a list value
(<code class="code"><span class="constructor">IDREFS</span></code>, <code class="code"><span class="constructor">ENTITIES</span></code>, or <code class="code"><span class="constructor">NMTOKENS</span></code>), any amount of whitespace can
be used to separate the list elements. PXP returns the value as
<code class="code"><span class="constructor">Valuelist</span> l</code> where <code class="code">l</code> is an O'Caml list of strings.
<p>

If the <b>tree representation</b> is chosen, the function
<a href="Pxp_document.html#VALstrip_whitespace"><code class="code"><span class="constructor">Pxp_document</span>.strip_whitespace</code></a> can be called to reduce the amount
of whitespace in data nodes.
<p>

<a name="idcheck"></a>
<h3>Checking the <code class="code"><span class="constructor">ID</span></code> consistency and looking up nodes by <code class="code"><span class="constructor">ID</span></code></h3>
<p>

In XML it is possible to identify elements by giving them an <code class="code"><span class="constructor">ID</span></code>
attribute. The requires a DTD, and could be done with declarations
like
<p>

<pre></pre><code class="code">&nbsp;&nbsp;&lt;!<span class="constructor">ATTLIST</span>&nbsp;x&nbsp;id&nbsp;<span class="constructor">ID</span>&nbsp;<span class="keywordsign">#</span><span class="constructor">REQUIRED</span>&gt;<br>
</code><pre></pre>
<p>

meaning that element <code class="code">x</code> has a mandatory attribute <code class="code">id</code> with the special
<code class="code"><span class="constructor">ID</span></code> property: Every node must have a unique <code class="code">id</code> value.
<p>

In the same context, it is possible to declare attributes as references
to other nodes, expressed by denoting the <code class="code">id</code> of the other node:
<p>

<pre></pre><code class="code">&nbsp;&nbsp;&lt;!<span class="constructor">ATTLIST</span>&nbsp;y&nbsp;r&nbsp;<span class="constructor">IDREF</span>&nbsp;<span class="keywordsign">#</span><span class="constructor">IMPLIED</span>&gt;<br>
</code><pre></pre>
<p>

Here, the (optional) attribute <code class="code">r</code> of <code class="code">y</code> is a reference to another node.
It is only allowed to put identifiers into such attributes that also
occur in the <code class="code"><span class="constructor">ID</span></code> of another node.
<p>

<b>By default, PXP does neither check the uniqueness of <code class="code"><span class="constructor">ID</span></code>-declared 
attributes nor the existence of the nodes referenced by <code class="code"><span class="constructor">IDREF</span></code>-declared
attributes.</b> In tree mode, it is possible to enable that, however.
<p>

For that purpose, one has to create an <a href="Pxp_tree_parser.index.html"><code class="code"><span class="constructor">Pxp_tree_parser</span>.index</code></a>. If
passed to the parser function, the parser adds the <code class="code"><span class="constructor">ID</span></code>-values of all
nodes to the index, and checks whether every <code class="code"><span class="constructor">ID</span></code> value is unique.
Additionally, when one enables the <code class="code">idref_pass</code> the parser also checks
whether <code class="code"><span class="constructor">IDREF</span></code> attributes only point to existing nodes. The code:
<p>

<pre></pre><code class="code"><span class="keyword">let</span>&nbsp;config&nbsp;=&nbsp;{&nbsp;<span class="constructor">Pxp_types</span>.default_config&nbsp;<span class="keyword">with</span>&nbsp;idref_pass&nbsp;=&nbsp;<span class="keyword">true</span>&nbsp;}<br>
<span class="keyword">let</span>&nbsp;spec&nbsp;=&nbsp;<span class="constructor">Pxp_tree_parser</span>.default_spec<br>
<span class="keyword">let</span>&nbsp;source&nbsp;=&nbsp;<span class="constructor">Pxp_types</span>.from_file&nbsp;<span class="string">"filename.xml"</span><br>
<span class="keyword">let</span>&nbsp;hash_index&nbsp;=&nbsp;<span class="keyword">new</span>&nbsp;<span class="constructor">Pxp_tree_parser</span>.hash_index<br>
<span class="keyword">let</span>&nbsp;id_index&nbsp;=&nbsp;(hash_index&nbsp;:&gt;&nbsp;_&nbsp;<span class="constructor">Pxp_tree_parser</span>.hash_index)<br>
<span class="keyword">let</span>&nbsp;doc&nbsp;=&nbsp;<span class="constructor">Pxp_tree_parser</span>.parse_document_entity&nbsp;~id_index&nbsp;config&nbsp;source&nbsp;spec<br>
</code><pre></pre>
<p>

The difference between <code class="code">hash_index</code> and <code class="code">id_index</code> is that the former
object has one additional method <code class="code">index</code> returning the whole index.
<p>

The <code class="code">id_index</code> may also be useful after the document has been parsed.
The code processing the parsed documennt can take advantage of it by
looking up nodes in it. For example, to find the node identified
by "foo", one can call
<p>

<pre></pre><code class="code">&nbsp;id_index&nbsp;<span class="keywordsign">#</span>&nbsp;find&nbsp;<span class="string">"foo"</span>&nbsp;</code><pre></pre>
<p>

which either returns this node, or raises <code class="code"><span class="constructor">Not_found</span></code>.
<p>

Note that the <code class="code">id_index</code> is not automatically updated when the parsed
tree is modified.
<p>

<a name="findelements"></a>
<h3>Finding nodes by element names</h3>
<p>

As we are at it: PXP does not maintain indexes of any kind. Unlike in
other tree representations, there is no index of elements that would
help one to quickly find elements by their names. The reason for this
omission is that such indexes need to be updated when the tree is
modified, and these updates can be quite expensive operations.
<p>

The <code class="code"><span class="constructor">ID</span></code> index explained in the last section is not automatically
updated, and it has only been added to comply fully to the XML
standard (which demands <code class="code"><span class="constructor">ID</span></code> checking).
<p>

Nevertheless, one can easily define indexes of one own (and for
the advanced programmer it might be an interesting task to develop
an extension module to PXP that generically solves this problem).
For instance, here is an index of elements:
<p>

<pre></pre><code class="code">&nbsp;&nbsp;<span class="keyword">let</span>&nbsp;index&nbsp;=&nbsp;<span class="constructor">Hashtbl</span>.create&nbsp;50<br>
<br>
&nbsp;&nbsp;<span class="constructor">Pxp_document</span>.iter_tree<br>
&nbsp;&nbsp;&nbsp;&nbsp;~pre:(<span class="keyword">fun</span>&nbsp;node&nbsp;<span class="keywordsign">-&gt;</span><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span class="keyword">match</span>&nbsp;node&nbsp;<span class="keyword">with</span><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span class="keywordsign">|</span>&nbsp;<span class="constructor">T_element</span>&nbsp;name&nbsp;<span class="keywordsign">-&gt;</span>&nbsp;<span class="constructor">Hashtbl</span>.add&nbsp;index&nbsp;name&nbsp;node<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span class="keywordsign">|</span>&nbsp;_&nbsp;<span class="keywordsign">-&gt;</span>&nbsp;()<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;)<br>
&nbsp;&nbsp;&nbsp;&nbsp;doc<span class="keywordsign">#</span>root<br>
</code><pre></pre>
<p>

Now, <code class="code"><span class="constructor">Hashtbl</span>.find</code> can be used to get the last occurrence, and
<code class="code"><span class="constructor">Hashtbl</span>.find_all</code> to get all occurrences.
<p>

If it is not worth-while to build an index, one can also call
the functions <a href="Pxp_document.html#VALfind_element"><code class="code"><span class="constructor">Pxp_document</span>.find_element</code></a> and 
<a href="Pxp_document.html#VALfind_all_elements"><code class="code"><span class="constructor">Pxp_document</span>.find_all_elements</code></a>, but these functions rely on
linear searching.
<p>

<a name="sources"></a>
<h3>Specifying sources</h3>
<p>

The <a href="Pxp_types.html#TYPEsource"><code class="code"><span class="constructor">Pxp_types</span>.source</code></a> says from where the data to parse comes.  The
task of the <code class="code">source</code> is more complex as it looks at the first glance,
as it not only says from where the initially parsed entity comes, but
also from where further entities can be loaded that are referenced and
included by the first one.
<p>

The mentioned function <a href="Pxp_types.html#VALfrom_file"><code class="code"><span class="constructor">Pxp_types</span>.from_file</code></a> allows that all files
can be opened as entities, and maps the <code class="code"><span class="constructor">SYSTEM</span></code> identifiers to file
names. It is very powerful.
<p>

There are three more <code class="code">from_*</code> functions:<ul>
<li><a href="Pxp_types.html#VALfrom_string"><code class="code"><span class="constructor">Pxp_types</span>.from_string</code></a> gets the data from a string</li>
<li><a href="Pxp_types.html#VALfrom_channel"><code class="code"><span class="constructor">Pxp_types</span>.from_channel</code></a> gets the data from an <code class="code">in_channel</code></li>
<li><a href="Pxp_types.html#VALfrom_obj_channel"><code class="code"><span class="constructor">Pxp_types</span>.from_obj_channel</code></a> gets the data from an <code class="code">in_obj_channel</code>
  (an Ocamlnet definition)</li>
</ul>

These three variants differ from <code class="code">from_file</code> in so far as <b>only
one</b> entity can be parsed at all (unless one passes alternate resolvers
to them). This means it is not possible that the initially parsed
entity includes data from another entity. Example code:
<p>

<pre></pre><code class="code">&nbsp;<span class="keyword">let</span>&nbsp;source&nbsp;=&nbsp;<span class="constructor">Pxp_types</span>.from_string&nbsp;<span class="string">"&lt;?xml&nbsp;version='1.0'?&gt;&lt;foo/&gt;"</span>&nbsp;</code><pre></pre>
<p>

So the <code class="code">source</code> mechanism has these limitations:<ul>
<li>The <a href="Pxp_types.html#VALfrom_file"><code class="code"><span class="constructor">Pxp_types</span>.from_file</code></a> function allows one to read from all
  files by using <code class="code"><span class="constructor">SYSTEM</span></code> URL's of the form <code class="code">file:///path</code>. It is
  not possible to restrict the file access in any way. There is no support
  for <code class="code"><span class="constructor">PUBLIC</span></code> identifiers.</li>
<li>The other functions like <a href="Pxp_types.html#VALfrom_string"><code class="code"><span class="constructor">Pxp_types</span>.from_string</code></a> allow one to
  parse data coming from everywhere, and it is not possible to access
  any files (as it is not possible to open any further external entity).</li>
</ul>

There is the <a href="Pxp_reader.html"><code class="code"><span class="constructor">Pxp_reader</span></code></a> module with a very powerful abstraction
called <a href="Pxp_reader.resolver.html"><code class="code"><span class="constructor">Pxp_reader</span>.resolver</code></a>. There are resolvers for files, for
alternate resources like data channels, and there is the possibility
of building more complex resolvers by composing simpler ones.
<p>

Please see <a href="Pxp_reader.html"><code class="code"><span class="constructor">Pxp_reader</span></code></a> and <a href="Intro_resolution.html"><code class="code"><span class="constructor">Intro_resolution</span></code></a> for deeper explanations.
Here are the most important recipes to use this advanced mechanism:
<p>

<b>Read from files, and define a catalog of exceptions:</b>
<p>

<pre></pre><code class="code"><span class="keyword">let</span>&nbsp;catalog&nbsp;=<br>
&nbsp;<span class="keyword">new</span>&nbsp;<span class="constructor">Pxp_reader</span>.lookup_id_as_file<br>
&nbsp;&nbsp;[&nbsp;<span class="constructor">System</span>(<span class="string">"http://foo.org/our.dtd"</span>),&nbsp;<span class="string">"/usr/share/foo.org/out.dtd"</span>;<br>
&nbsp;&nbsp;&nbsp;&nbsp;<span class="constructor">Public</span>(<span class="string">"-//W3C//DTD&nbsp;XHTML&nbsp;1.0&nbsp;Strict//EN"</span>,<span class="string">""</span>),&nbsp;<span class="string">"/home/stuff/xhtml_strict.dtd"</span><br>
&nbsp;&nbsp;]<br>
<span class="keyword">let</span>&nbsp;source&nbsp;=&nbsp;<span class="constructor">Pxp_types</span>.from_file&nbsp;~alt:[catalog]&nbsp;<span class="string">"filename.xml"</span><br>
</code><pre></pre>
<p>

This allows one to open all local files using the <code class="code">file:///path</code> 
URL's, but also maps the <code class="code"><span class="constructor">SYSTEM</span></code> ID "http://foo.org/our.dtd" and
the <code class="code"><span class="constructor">PUBLIC</span></code> ID "-//W3C//DTD XHTML 1.0 Strict//EN" to local files.
<p>

There is also <a href="Pxp_reader.lookup_id_as_string.html"><code class="code"><span class="constructor">Pxp_reader</span>.lookup_id_as_string</code></a> mapping to strings.
<p>

<b>Read from files, but restrict access, and map URL's</b>
<p>

<pre></pre><code class="code"><span class="keyword">let</span>&nbsp;resolver&nbsp;=<br>
&nbsp;&nbsp;<span class="keyword">new</span>&nbsp;<span class="constructor">Pxp_reader</span>.rewrite_system_id<br>
&nbsp;&nbsp;&nbsp;&nbsp;[&nbsp;<span class="string">"http://foo.org/"</span>,&nbsp;<span class="string">"file:///usr/share/foo.org"</span>;<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span class="string">"file:///"</span>,&nbsp;<span class="string">"file:///home/stuff/localxml"</span><br>
&nbsp;&nbsp;&nbsp;&nbsp;]<br>
&nbsp;&nbsp;&nbsp;&nbsp;(<span class="keyword">new</span>&nbsp;<span class="constructor">Pxp_reader</span>.resolve_as_file())<br>
<span class="keyword">let</span>&nbsp;file_url&nbsp;=&nbsp;<span class="constructor">Pxp_reader</span>.make_file_url&nbsp;<span class="string">"filename.xml"</span><br>
<span class="keyword">let</span>&nbsp;source&nbsp;=&nbsp;<span class="constructor">ExtID</span>(<span class="constructor">System</span>((<span class="constructor">Neturl</span>.string_of_url&nbsp;file_url),&nbsp;resolver)<br>
</code><pre></pre>
<p>

This allows one to open entities from the whole <code class="code">http://foo.org/</code>
hierarchy, but the data is not downloaded by HTTP, but instead
assumed to reside in the local directory hierarchy 
<code class="code">/usr/share/foo.org</code>. Also, the whole <code class="code">file:///</code> hierarchy is
re-rooted to <code class="code">/home/stuff/localxml</code>. As the URL's are normalized
before any access is tried, this scheme provides access protection
to other parts of the file system (i.e. one cannot escape from the
new root by "..").
<p>

In order to combine with a <code class="code">catalog</code> as defined above, use
<p>

<pre></pre><code class="code"><span class="keyword">let</span>&nbsp;resolver&nbsp;=<br>
&nbsp;&nbsp;<span class="keyword">new</span>&nbsp;<span class="constructor">Pxp_reader</span>.combine<br>
&nbsp;&nbsp;&nbsp;&nbsp;[&nbsp;catalog;<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span class="keyword">new</span>&nbsp;<span class="constructor">Pxp_reader</span>.rewrite_system_id&nbsp;...<br>
&nbsp;&nbsp;&nbsp;&nbsp;]<br>
</code><pre></pre>
<p>

<b>Virtual entity hierarchy</b>
<p>

Given we have the three identifiers <ul>
<li><code class="code">http://<span class="keyword">virtual</span>.com/f1.xml</code> </li>
<li><code class="code">http://<span class="keyword">virtual</span>.com/f2.xml</code> </li>
<li><code class="code">http://<span class="keyword">virtual</span>.com/f3.xml</code> </li>
</ul>

and these identifiers include each other by using relative <code class="code"><span class="constructor">SYSTEM</span></code> ID's,
and we have O'Caml strings <code class="code">f1_xml</code>, <code class="code">f2_xml</code>, and <code class="code">f3_xml</code> with the
contents, we want to make the <code class="code"><span class="keyword">virtual</span>.com</code> hierarchy available
while parsing from a string <code class="code">s</code>.
<p>

<pre></pre><code class="code"><span class="keyword">let</span>&nbsp;resolver&nbsp;=<br>
&nbsp;&nbsp;<span class="keyword">new</span>&nbsp;<span class="constructor">Pxp_reader</span>.norm_system_id<br>
&nbsp;&nbsp;&nbsp;&nbsp;(<span class="keyword">new</span>&nbsp;<span class="constructor">Pxp_reader</span>.lookup_id_as_string<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&nbsp;<span class="string">"http://virtual.com/f1.xml"</span>;&nbsp;f1_xml;<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span class="string">"http://virtual.com/f2.xml"</span>;&nbsp;f2_xml;<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span class="string">"http://virtual.com/f3.xml"</span>;&nbsp;f3_xml<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;]<br>
&nbsp;&nbsp;&nbsp;&nbsp;)<br>
<span class="keyword">let</span>&nbsp;source&nbsp;=&nbsp;<span class="constructor">Pxp_types</span>.from_string&nbsp;~alt:[resolver]&nbsp;s<br>
</code><pre></pre>
<p>

The trick is <a href="Pxp_reader.norm_system_id.html"><code class="code"><span class="constructor">Pxp_reader</span>.norm_system_id</code></a>. This class makes it possible
that these three enumerated documents can refer to each other by relative
URL. Without the <code class="code"><span class="constructor">SYSTEM</span></code> ID normalization, these documents can only be
opened when exactly the URL is referenced that is also mentioned in the
catalog.
<p>

<a name="codewriter"></a>
<h3>Embedding large constant XML in source code</h3>
<p>

Sometimes one needs to embed XML files into source code. For small files
this is no problem at all, just define them as string literals
<p>

<pre></pre><code class="code"><span class="keyword">let</span>&nbsp;s&nbsp;=&nbsp;<span class="string">"&lt;?xml?&gt;&nbsp;..."</span><br>
</code><pre></pre>
<p>

and parse the strings on demand, using the <a href="Pxp_types.html#VALfrom_string"><code class="code"><span class="constructor">Pxp_types</span>.from_string</code></a>
source. For larger files, the disadvantage of this approach is that
the whole document has to be parsed again for every run of the
program. There is an efficient way of avoiding that.
<p>

The <a href="Pxp_codewriter.html"><code class="code"><span class="constructor">Pxp_codewriter</span></code></a> module provides a function 
<a href="Pxp_codewriter.html#VALwrite_document"><code class="code"><span class="constructor">Pxp_codewriter</span>.write_document</code></a> that takes an already parsed XML tree
and writes O'Caml code as output that will create the tree again when
executed. This can be used as follows:<ul>
<li>Write a helper application <code class="code">generate</code> that parses the XML file with
  the required configuration options and that outputs the O'Caml code
  for this file using <a href="Pxp_codewriter.html"><code class="code"><span class="constructor">Pxp_codewriter</span></code></a></li>
<li>In the real program that needs to operate on the XML document
  reconstruct the document by running the generated code. Use the same
  configuration options as in <code class="code">generate</code></li>
</ul>

There is also <a href="Pxp_marshal.html"><code class="code"><span class="constructor">Pxp_marshal</span></code></a> for marshalling XML trees. The codewriter
module uses it.
<p>

<a name="prepro"></a>
<h3>Using the preprocessor to create XML trees</h3>
<p>

One way of creating XML trees programmatically is to call the <code class="code">create_*</code>
functions in <a href="Pxp_document.html"><code class="code"><span class="constructor">Pxp_document</span></code></a>, e.g. <a href="Pxp_document.html#VALcreate_element_node"><code class="code"><span class="constructor">Pxp_document</span>.create_element_node</code></a>.
However, this looks ugly, e.g. for creating <code class="code">&lt;x&gt;&lt;y&gt;foo&lt;/y&gt;&lt;/x&gt;</code> one ends
up with
<p>

<pre></pre><code class="code"><span class="keyword">let</span>&nbsp;tree&nbsp;=<br>
&nbsp;&nbsp;<span class="constructor">Pxp_document</span>.create_element_node&nbsp;spec&nbsp;dtd&nbsp;<span class="string">"x"</span>&nbsp;[]<br>
<span class="keyword">let</span>&nbsp;y&nbsp;=<br>
&nbsp;&nbsp;<span class="constructor">Pxp_document</span>.create_element_node&nbsp;spec&nbsp;dtd&nbsp;<span class="string">"y"</span>&nbsp;[]<br>
<span class="keyword">let</span>&nbsp;data&nbsp;=<br>
&nbsp;&nbsp;<span class="constructor">Pxp_document</span>.create_data_node&nbsp;spec&nbsp;dtd&nbsp;<span class="string">"foo"</span><br>
y&nbsp;<span class="keywordsign">#</span>&nbsp;append_node&nbsp;data;<br>
tree&nbsp;<span class="keywordsign">#</span>&nbsp;append_node&nbsp;y<br>
</code><pre></pre>
<p>

It is easier to use the PXP preprocessor, a camlp4 extension of the
O'Caml syntax. It simplifies the above code to (line breaks are
optional):
<p>

<pre></pre><code class="code">&nbsp;&nbsp;<span class="keyword">let</span>&nbsp;tree&nbsp;=<br>
&nbsp;&nbsp;&nbsp;&nbsp;&lt;:pxp_tree&lt;<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&lt;x&gt;<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&lt;y&gt;<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span class="string">"foo"</span><br>
&nbsp;&nbsp;&nbsp;&nbsp;&gt;&gt;<br>
</code><pre></pre>
<p>

For more about the preprocessor, see <a href="Intro_preprocessor.html"><code class="code"><span class="constructor">Intro_preprocessor</span></code></a>.
<p>

<a name="namespaces"></a>
<h3>Namespaces</h3>
<p>

PXP support namespaces, but<ul>
<li>this has to be enabled explicitly, and</li>
<li>the way of processing namespaces is different from what parsers
  do that output DOM trees</li>
</ul>

<b>How to enable namespace processing.</b> Depending on the mode different
things have to be done. In any case a namespace manager is required, and it 
has to be made available to PXP in the <code class="code">config</code> record:
<p>

<pre></pre><code class="code"><span class="keyword">let</span>&nbsp;m&nbsp;=&nbsp;<span class="constructor">Pxp_dtd</span>.create_namespace_manager()<br>
<br>
<span class="keyword">let</span>&nbsp;config&nbsp;=<br>
&nbsp;&nbsp;{&nbsp;<span class="constructor">Pxp_types</span>.default_config<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span class="keyword">with</span>&nbsp;enable_namespace_processing&nbsp;=&nbsp;<span class="constructor">Some</span>&nbsp;m<br>
&nbsp;&nbsp;}<br>
</code><pre></pre>
<p>

In event mode, this is already enough. In tree mode, you also need to
direct PXP that it uses the special namespace-enabled node classes:
<p>

<pre></pre><code class="code"><span class="keyword">let</span>&nbsp;spec&nbsp;=&nbsp;<span class="constructor">Pxp_tree_parser</span>.default_namespace_spec<br>
</code><pre></pre>
<p>

Of course, PXP can also parse namespace directives when namespace
processing is off. However, all the namespace-specific node methods
do not work like <a href="Pxp_document.node.html#METHODnamespace_uri"><code class="code"><span class="constructor">Pxp_document</span>.node.namespace_uri</code></a>.
<p>

<b>Prefix normalization.</b> PXP implements a technique called prefix
normalization when processing namespaces. The namespace prefix is
the part before the colon in element and attribute names like
<code class="code">prefix:localname</code>. The prefix is changed in the document so every
namespace is uniquely identified by a prefix. Note that this means
that the elements and attributes may be renamed by the parser.
<p>

For details how the prefix normalization works, see <a href="Intro_namespaces.html"><code class="code"><span class="constructor">Intro_namespaces</span></code></a>.
Namespace processing can also be combined with event-oriented
parsing, see <a href="Intro_events.html#namespaces"><i>Events and namespaces</i></a>.
<p>

<a name="spec"></a>
<h3>Specifying which classes implement nodes - the mysterious <code class="code">spec</code> parameter</h3>
<p>

For the tree representation PXP defines a set of classes implementing
the various node types. These classes, such as <code class="code">element_impl</code>, are
all defined in <a href="Pxp_document.html"><code class="code"><span class="constructor">Pxp_document</span></code></a>.
<p>

It is now possible to instruct PXP to use different classes. In the
last section we have already seen an example of this, because for
namespace-enabled parsing a different set of node classes is used:
<p>

<pre></pre><code class="code"><span class="keyword">let</span>&nbsp;spec&nbsp;=&nbsp;<span class="constructor">Pxp_tree_parser</span>.default_namespace_spec<br>
</code><pre></pre>
<p>

The mysterious <code class="code">spec</code> parameter controls which class it uses for
which node type. In the source code of <a href="Pxp_tree_parser.html"><code class="code"><span class="constructor">Pxp_tree_parser</span></code></a>, we find
<p>

<pre></pre><code class="code"><span class="keyword">let</span>&nbsp;default_spec&nbsp;=<br>
&nbsp;&nbsp;make_spec_from_mapping<br>
&nbsp;&nbsp;&nbsp;&nbsp;~super_root_exemplar:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;(<span class="keyword">new</span>&nbsp;super_root_impl&nbsp;default_extension)<br>
&nbsp;&nbsp;&nbsp;&nbsp;~comment_exemplar:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;(<span class="keyword">new</span>&nbsp;comment_impl&nbsp;default_extension)<br>
&nbsp;&nbsp;&nbsp;&nbsp;~default_pinstr_exemplar:&nbsp;&nbsp;(<span class="keyword">new</span>&nbsp;pinstr_impl&nbsp;default_extension)<br>
&nbsp;&nbsp;&nbsp;&nbsp;~data_exemplar:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;(<span class="keyword">new</span>&nbsp;data_impl&nbsp;default_extension)<br>
&nbsp;&nbsp;&nbsp;&nbsp;~default_element_exemplar:&nbsp;(<span class="keyword">new</span>&nbsp;element_impl&nbsp;default_extension)<br>
&nbsp;&nbsp;&nbsp;&nbsp;~element_mapping:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;(<span class="constructor">Hashtbl</span>.create&nbsp;1)<br>
&nbsp;&nbsp;&nbsp;&nbsp;()<br>
<br>
<br>
<span class="keyword">let</span>&nbsp;default_namespace_spec&nbsp;=<br>
&nbsp;&nbsp;make_spec_from_mapping<br>
&nbsp;&nbsp;&nbsp;&nbsp;~super_root_exemplar:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;(<span class="keyword">new</span>&nbsp;super_root_impl&nbsp;default_extension)<br>
&nbsp;&nbsp;&nbsp;&nbsp;~comment_exemplar:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;(<span class="keyword">new</span>&nbsp;comment_impl&nbsp;default_extension)<br>
&nbsp;&nbsp;&nbsp;&nbsp;~default_pinstr_exemplar:&nbsp;&nbsp;(<span class="keyword">new</span>&nbsp;pinstr_impl&nbsp;default_extension)<br>
&nbsp;&nbsp;&nbsp;&nbsp;~data_exemplar:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;(<span class="keyword">new</span>&nbsp;data_impl&nbsp;default_extension)<br>
&nbsp;&nbsp;&nbsp;&nbsp;~default_element_exemplar:&nbsp;(<span class="keyword">new</span>&nbsp;namespace_element_impl&nbsp;default_extension)<br>
&nbsp;&nbsp;&nbsp;&nbsp;~element_mapping:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;(<span class="constructor">Hashtbl</span>.create&nbsp;1)<br>
&nbsp;&nbsp;&nbsp;&nbsp;()<br>
</code><pre></pre>
<p>

The function <a href="Pxp_document.html#VALmake_spec_from_mapping"><code class="code"><span class="constructor">Pxp_document</span>.make_spec_from_mapping</code></a> creates a <code class="code">spec</code>
from a set of constructors. In the namespace version of <code class="code">spec</code>, the
only difference is that a special implementation for element nodes is
used.
<p>

One can also use this mechanism to let the parser create trees made of
customized classes. Note, however, that it is not possible to simply
create new classes by inherting from a predefined classes and then
adding new methods. The problem is that the typing constraints of PXP
do not allow that users add methods directly to node classes. However,
there is a special extension mechanism built-in, and one can use it to
add new methods indirectly to nodes. This means these methods do not
appear directly in the class type of nodes, but in the class type of
the node extension. See <a href="Intro_extensions.html"><code class="code"><span class="constructor">Intro_extensions</span></code></a> for more about this.
<p>

<a name="2_WhatPXPcannotdoforyou"></a>
<h2>What PXP cannot do for you</h2>
<p>

Although PXP has a long list of features, there are some types of parsing
XML it is not designed for:
<p>
<ul>
<li>It is not possible to leave entities unresolved in the text. Whenever
  there is an <code class="code"><span class="keywordsign">&amp;</span>entity;</code> or <code class="code">%entity;</code> PXP replaces it with the definition
  of that entity. It is an error if the entity turns out to be undefined,
  and parsing is stopped with an exception.</li>
<li>It is not possible to figure out notational details of the XML text,
  such as where CDATA sections are used</li>
<li>It is not possible to parse a syntactically wrong document as much as
  possible, and to return the parseable parts. PXP either parses the 
  document completely, or it fails completely.</li>
</ul>

Effectively, this makes it hard to use PXP for XML editing, but otherwise
does not limit its uses.
<p>

<br>
</body></html>