Sophie

Sophie

distrib > Mandriva > 2010.0 > i586 > media > contrib-release > by-pkgid > d9c1887ff364dc87e282490223567c41 > files > 126

ocaml-pxp-1.2.1-1mdv2010.0.i586.rpm

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<link rel="stylesheet" href="style.css" type="text/css">
<meta content="text/html; charset=iso-8859-1" http-equiv="Content-Type">
<link rel="Start" href="index.html">
<link rel="previous" href="Intro_advanced.html">
<link rel="next" href="Example_readme.html">
<link rel="Up" href="index.html">
<link title="Index of types" rel=Appendix href="index_types.html">
<link title="Index of exceptions" rel=Appendix href="index_exceptions.html">
<link title="Index of values" rel=Appendix href="index_values.html">
<link title="Index of class methods" rel=Appendix href="index_methods.html">
<link title="Index of classes" rel=Appendix href="index_classes.html">
<link title="Index of class types" rel=Appendix href="index_class_types.html">
<link title="Index of modules" rel=Appendix href="index_modules.html">
<link title="Index of module types" rel=Appendix href="index_module_types.html">
<link title="Pxp_types" rel="Chapter" href="Pxp_types.html">
<link title="Pxp_document" rel="Chapter" href="Pxp_document.html">
<link title="Pxp_dtd" rel="Chapter" href="Pxp_dtd.html">
<link title="Pxp_tree_parser" rel="Chapter" href="Pxp_tree_parser.html">
<link title="Pxp_core_types" rel="Chapter" href="Pxp_core_types.html">
<link title="Pxp_ev_parser" rel="Chapter" href="Pxp_ev_parser.html">
<link title="Pxp_event" rel="Chapter" href="Pxp_event.html">
<link title="Pxp_dtd_parser" rel="Chapter" href="Pxp_dtd_parser.html">
<link title="Pxp_codewriter" rel="Chapter" href="Pxp_codewriter.html">
<link title="Pxp_marshal" rel="Chapter" href="Pxp_marshal.html">
<link title="Pxp_yacc" rel="Chapter" href="Pxp_yacc.html">
<link title="Pxp_reader" rel="Chapter" href="Pxp_reader.html">
<link title="Intro_trees" rel="Chapter" href="Intro_trees.html">
<link title="Intro_extensions" rel="Chapter" href="Intro_extensions.html">
<link title="Intro_namespaces" rel="Chapter" href="Intro_namespaces.html">
<link title="Intro_events" rel="Chapter" href="Intro_events.html">
<link title="Intro_resolution" rel="Chapter" href="Intro_resolution.html">
<link title="Intro_getting_started" rel="Chapter" href="Intro_getting_started.html">
<link title="Intro_advanced" rel="Chapter" href="Intro_advanced.html">
<link title="Intro_preprocessor" rel="Chapter" href="Intro_preprocessor.html">
<link title="Example_readme" rel="Chapter" href="Example_readme.html"><link title="The PXP Preprocessor" rel="Section" href="#1_ThePXPPreprocessor">
<link title="Creating constant XML with pxp_tree - basic syntax" rel="Subsection" href="#pxp_tree">
<link title="Dynamic XML" rel="Subsection" href="#dynamic">
<link title="Character encodings: pxp_charset" rel="Subsection" href="#pxp_charset">
<link title="Validated trees: pxp_text, calling validate, and pxp_vtree" rel="Subsection" href="#validate">
<link title="Generating events: pxp_evlist and pxp_evpull" rel="Subsection" href="#events">
<link title="Documents" rel="Subsection" href="#documents">
<link title="Namespaces" rel="Subsection" href="#namespaces">
<title>PXP Reference : Intro_preprocessor</title>
</head>
<body>
<div class="navbar"><a href="Intro_advanced.html">Previous</a>
&nbsp;<a href="index.html">Up</a>
&nbsp;<a href="Example_readme.html">Next</a>
</div>
<center><h1>Intro_preprocessor</h1></center>
<br>
<br>
<a name="1_ThePXPPreprocessor"></a>
<h1>The PXP Preprocessor</h1>
<p>

Since PXP-1.1.95, there is a preprocessor as part of the PXP
distribution. It allows you to compose XML trees and event lists
dynamically, which is very handy to write XML transformations.
<p>

To enable the preprocessor, compile your source files as in:
<p>

<pre></pre><code class="code">&nbsp;ocamlfind&nbsp;ocamlc&nbsp;-syntax&nbsp;camlp4o&nbsp;-package&nbsp;pxp-pp,...&nbsp;...&nbsp;</code><pre></pre>
<p>

The package <code class="code">pxp-pp</code> contains the preprocessor. The <code class="code">-syntax</code> option
enables camlp4, on which the preprocessor is based. It is also
possible to use it together with the revised syntax, use <code class="code">-syntax
camlp4r</code> in this case.
<p>

In the toploop, type 
<p>

<pre></pre><code class="code">ocaml<br>
<span class="keywordsign">#</span>&nbsp;<span class="keywordsign">#</span>use&nbsp;<span class="string">"topfind"</span>;;<br>
<span class="keywordsign">#</span>&nbsp;<span class="keywordsign">#</span>camlp4o;;<br>
<span class="keywordsign">#</span>&nbsp;<span class="keywordsign">#</span>require&nbsp;<span class="string">"pxp-pp"</span>;;<br>
<span class="keywordsign">#</span>&nbsp;<span class="keywordsign">#</span>require&nbsp;<span class="string">"pxp"</span>;;<br>
</code><pre></pre>
<p>

The preprocessor defines the following new syntax notations, explained
below in detail:
<p>
<ul>
<li><code class="code">&lt;:pxp_charset&lt; <span class="constructor">CHARSET_DECL</span> &gt;&gt;</code></li>
<li><code class="code">&lt;:pxp_tree&lt; <span class="constructor">EXPR</span> &gt;&gt;</code></li>
<li><code class="code">&lt;:pxp_vtree&lt; <span class="constructor">EXPR</span> &gt;&gt;</code></li>
<li><code class="code">&lt;:pxp_evlist&lt; <span class="constructor">EXPR</span> &gt;&gt;</code></li>
<li><code class="code">&lt;:pxp_evpull&lt; <span class="constructor">EXPR</span> &gt;&gt;</code></li>
<li><code class="code">&lt;:pxp_text&lt; <span class="constructor">TEXT</span> &gt;&gt;</code></li>
</ul>

The basic notation is <code class="code">pxp_tree</code> which creates a tree of PXP document
nodes as described in EXPR. <code class="code">pxp_vtree</code> is the variant where the tree
is immediately validated - with <code class="code">pxp_tree</code> the tree is not validated,
but one can validate it later (e.g. when the whole output tree of the
program is built up).  <code class="code">pxp_evlist</code> creates a list of PXP events
instead of nodes, useful together with the event parser. <code class="code">pxp_evpull</code>
is a variation of the latter: Instead of an event list an event
generator is created that works like a pull parser.
<p>

The <code class="code">pxp_charset</code> notation only configures the character sets to
assume.  Finally, <code class="code">pxp_text</code> is a notation for string literals.
<p>

<a name="pxp_tree"></a>
<h2>Creating constant XML with <code class="code">pxp_tree</code> - basic syntax</h2>
<p>

The following examples are all written for <code class="code">pxp_tree</code>. You can also
use one of the other XML composers instead, but see the notes below
that explain a few differences.
<p>

In order to use <code class="code">pxp_tree</code>, you must define two variables in the
environment: <code class="code">spec</code> and <code class="code">dtd</code>:
<p>

<pre></pre><code class="code"><span class="keyword">let</span>&nbsp;spec&nbsp;=&nbsp;<span class="constructor">Pxp_tree_parser</span>.default_spec<br>
<span class="keyword">let</span>&nbsp;dtd&nbsp;=&nbsp;<span class="constructor">Pxp_dtd</span>.create_dtd&nbsp;<span class="keywordsign">`</span><span class="constructor">Enc_iso88591</span><br>
</code><pre></pre>
<p>

These variables are assumed to exist by the generated code. The <code class="code">dtd</code>
variable is the DTD object. Note that you need it even in
well-formedness mode (validation turned off) - see the explanations in
<a href="Intro_getting_started.html#wfmode"><i>Parsing in well-formedness mode</i></a> to understand why. The <code class="code">spec</code> variable
controls which classes are instantiated as node representation. See
<a href="Intro_getting_started.html#spec"><i>Specifying which classes implement nodes - the mysterious spec parameter</i></a> for the meaning of <code class="code">spec</code>.
<p>

<a name="3_Elementsattributesanddatanodes"></a>
<h3>Elements, attributes, and data nodes</h3>
<p>

Now you can create XML trees like in 
<p>

<pre></pre><code class="code"><span class="keyword">let</span>&nbsp;book&nbsp;=&nbsp;<br>
&nbsp;&nbsp;&lt;:pxp_tree&lt;&nbsp;<br>
&nbsp;&nbsp;&nbsp;&nbsp;&lt;book&gt;<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&nbsp;&lt;title&gt;[&nbsp;<span class="string">"The&nbsp;Lord&nbsp;of&nbsp;The&nbsp;Rings"</span>&nbsp;]<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&lt;author&gt;[&nbsp;<span class="string">"J.R.R.&nbsp;Tolkien"</span>&nbsp;]<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;]<br>
&nbsp;&nbsp;&gt;&gt;<br>
</code><pre></pre>
<p>

As you can see, the syntax is somehow XML-related but not really
XML. (Many ideas are borrowed from CDUCE, by the way.) In particular,
there are start tags like <code class="code">&lt;title&gt;</code> but no end tags. Instead, we are
using square brackets to denote where the list of the children of the
XML element starts and where it ends. Furthermore, character data must
be put into double quotes.
<p>

You may ask why the well-known XML syntax has been modified for this
preprocessor. There are many reasons, and they will become clearer in
the following explanations. For now, you can see the advantage that
the syntax is less verbose, as you need not to repeat the element
names in end tags (I know programmers like brevity).  Furthermore, you
can exactly control which characters are part of the data nodes
without having to make compromises with indentation.
<p>

Attributes are written as in XML: 
<p>

<pre></pre><code class="code"><span class="keyword">let</span>&nbsp;book&nbsp;=&nbsp;<br>
&nbsp;&nbsp;&lt;:pxp_tree&lt;&nbsp;<br>
&nbsp;&nbsp;&nbsp;&nbsp;&lt;book&nbsp;id=<span class="string">"BOOK_001"</span>&gt;<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&nbsp;&lt;title&nbsp;lang=<span class="string">"en"</span>&gt;[&nbsp;<span class="string">"The&nbsp;Lord&nbsp;of&nbsp;The&nbsp;Rings"</span>&nbsp;]<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&lt;author&gt;[&nbsp;<span class="string">"J.R.R.&nbsp;Tolkien"</span>&nbsp;]<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;]<br>
&nbsp;&nbsp;&gt;&gt;<br>
</code><pre></pre>
<p>

An element without children can be written 
<p>

<pre></pre><code class="code">&nbsp;&lt;element&gt;[]&nbsp;</code><pre></pre>
<p>

or slightly shorter: 
<p>

<pre></pre><code class="code">&nbsp;&lt;element/&gt;&nbsp;</code><pre></pre>
<p>

<a name="3_Processinginstructionsandcomments"></a>
<h3>Processing instructions and comments</h3>
<p>

You can also create processing instructions and comment nodes: 
<p>

<pre></pre><code class="code"><span class="keyword">let</span>&nbsp;list&nbsp;=<br>
&nbsp;&nbsp;&lt;:pxp_tree&lt;<br>
&nbsp;&nbsp;&nbsp;&nbsp;&lt;list&gt;<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&nbsp;&lt;!&gt;<span class="string">"Now&nbsp;the&nbsp;list&nbsp;of&nbsp;books&nbsp;follows!"</span><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&lt;?&gt;<span class="string">"formatter_directive"</span>&nbsp;<span class="string">"one&nbsp;book&nbsp;per&nbsp;page"</span><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;book<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;]<br>
&nbsp;&gt;&gt;<br>
</code><pre></pre>
<p>

The notation <code class="code">&lt;!&gt;</code> creates a comment node with the following string as
contents. The notation <code class="code">&lt;?&gt;</code> for constructing processing instructions
needs two strings, first the target, then the value (here, this
results in <code class="code">&lt;?formatter_directive one book per page<span class="keywordsign">?&gt;</span></code>).
<p>

Look again at the last example: The O'Caml variable <code class="code">book</code> occurs, and
it inserts its tree into the list of books. Identifiers without
"decoration" just refer to O'Caml variables. We will see more examples
below.
<p>

<a name="3_Elementswithonechild"></a>
<h3>Elements with one child</h3>
<p>

The preprocessor syntax knows a number of shortcuts and
variations. First, you can omit the square brackets when an element
has exactly one child:
<p>

<pre></pre><code class="code">&lt;element&gt;&lt;child&gt;<span class="string">"Data&nbsp;inside&nbsp;child"</span><br>
</code><pre></pre>
<p>

This is the same as 
<p>

<pre></pre><code class="code">&lt;element&gt;[&nbsp;&lt;child&gt;[&nbsp;<span class="string">"Data&nbsp;inside&nbsp;child"</span>&nbsp;]&nbsp;]<br>
</code><pre></pre>
<p>

<a name="3_Detailsofdatanodes"></a>
<h3>Details of data nodes</h3>
<p>

Second, we already have used a common abbreviation: Strings are
automatically converted to data nodes. The "expanded" syntax is
<p>

<pre></pre><code class="code">&lt;*&gt;<span class="string">"Data&nbsp;string"</span><br>
</code><pre></pre>
<p>

where <code class="code">&lt;*&gt;</code> denotes to construct a data node, and the following string
is used as contents.  Usually, you can omit <code class="code">&lt;*&gt;</code>, so this is the same
as
<p>

<pre></pre><code class="code"><span class="string">"Data&nbsp;string"</span><br>
</code><pre></pre>
<p>

However, there are a few occasions where the <code class="code">&lt;*&gt;</code> notation is still
useful, see below (essentially, it also works like a type annotation:
the following subexpression must be a string).
<p>

Inside strings, the usual entity references can be used: <code class="code"><span class="keywordsign">&amp;</span>lt;</code>,
<code class="code"><span class="keywordsign">&amp;</span>gt;</code>, <code class="code"><span class="keywordsign">&amp;</span>amp;</code>, <code class="code"><span class="keywordsign">&amp;</span>quot;</code>, <code class="code"><span class="keywordsign">&amp;</span>apos;</code>, and also numeric references work:
<code class="code"><span class="keywordsign">&amp;</span><span class="keywordsign">#</span></code><i>n</i><code class="code">;</code> where <i>n</i> is a number. Note that <code class="code"><span class="keywordsign">&amp;</span>lt;</code>, <code class="code"><span class="keywordsign">&amp;</span>gt;</code>, and
<code class="code"><span class="keywordsign">&amp;</span>apos;</code> are not obligatory, as <code class="code">&lt;</code>, <code class="code">&gt;</code>, and <code class="code"><span class="keywordsign">'</span></code> can be included
directly.
<p>

Example: <code class="code"><span class="string">"Double quotes: &amp;quot;"</span></code>. For a newline character, write
<code class="code"><span class="keywordsign">&amp;</span><span class="keywordsign">#</span>10;</code>.
<p>

<a name="3_Operators"></a>
<h3>Operators</h3>
<p>

The preprocessor knows two operators: <code class="code">^</code> concatenates strings, and
<code class="code">@</code> concatenates lists. Examples:
<p>

<pre></pre><code class="code">&lt;element&gt;[&nbsp;<span class="string">"Word1"</span>&nbsp;^&nbsp;<span class="string">"Word2"</span>&nbsp;]<br>
&lt;element&gt;([&nbsp;&lt;a/&gt;&nbsp;]&nbsp;@&nbsp;[&nbsp;&lt;b/&gt;&nbsp;])<br>
</code><pre></pre>
<p>

Parentheses can be used to clarify precedence. For example: 
<p>

<pre></pre><code class="code">&lt;element&gt;(l1&nbsp;@&nbsp;l2)<br>
</code><pre></pre>
<p>

Without parentheses, the concatenation operator <code class="code">@</code> would be parsed as
<p>

<pre></pre><code class="code">(&lt;element&gt;&nbsp;l1)&nbsp;@&nbsp;l2<br>
</code><pre></pre>
<p>

Parentheses may be used in any expression.
<p>

<a name="3_Superroot"></a>
<h3>Super root</h3>
<p>

Rarely used, there is also a notation for the "super root" nodes.
For uses of this node types, see <a href="Intro_getting_started.html#nodetypes"><i>Choosing the node types to represent</i></a>.
<p>

<pre></pre><code class="code">&lt;^&gt;[&nbsp;&lt;element&gt;&nbsp;...&nbsp;]<br>
</code><pre></pre>
<p>

<a name="dynamic"></a>
<h2>Dynamic XML</h2>
<p>

This section describes how to insert dynamically created content into
XML trees.
<p>

Let us begin with an example. The task is to convert O'Caml values of
type
<p>

<pre></pre><code class="code"><span class="keyword">type</span>&nbsp;book&nbsp;=&nbsp;<br>
&nbsp;&nbsp;{&nbsp;title&nbsp;:&nbsp;string;<br>
&nbsp;&nbsp;&nbsp;&nbsp;author&nbsp;:&nbsp;string;<br>
&nbsp;&nbsp;&nbsp;&nbsp;isbn&nbsp;:&nbsp;string;<br>
&nbsp;&nbsp;}<br>
</code><pre></pre>
<p>

to XML trees like 
<p>

<pre></pre><code class="code">&nbsp;<br>
&lt;book&nbsp;id=<span class="string">"BOOK_{isbn}"</span>&gt;<br>
&nbsp;&nbsp;&lt;title&gt;{title}&lt;/title&gt;<br>
&nbsp;&nbsp;&lt;author&gt;{author}&lt;/title&gt;<br>
&lt;/book&gt;<br>
</code><pre></pre>
<p>

(conventional syntax, with placeholders in {braces}). When <code class="code">b</code> is the
book variable, the solution is
<p>

<pre></pre><code class="code"><span class="keyword">let</span>&nbsp;book&nbsp;=&nbsp;<br>
&nbsp;&nbsp;<span class="keyword">let</span>&nbsp;title&nbsp;=&nbsp;b.title<br>
&nbsp;&nbsp;<span class="keyword">and</span>&nbsp;author&nbsp;=&nbsp;b.author<br>
&nbsp;&nbsp;<span class="keyword">and</span>&nbsp;isbn&nbsp;=&nbsp;b.isbn&nbsp;<span class="keyword">in</span><br>
&nbsp;&nbsp;&lt;:pxp_tree&lt;<br>
&nbsp;&nbsp;&nbsp;&nbsp;&lt;book&nbsp;id=(<span class="string">"BOOK_"</span>&nbsp;^&nbsp;isbn)&gt;<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&nbsp;&lt;title&gt;&lt;*&gt;title<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&lt;author&gt;&lt;*&gt;author<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;]<br>
&nbsp;&nbsp;&gt;&gt;<br>
</code><pre></pre>
<p>

First, we bind the simple O'Caml variables <code class="code">title</code>, <code class="code">author</code>, and
<code class="code">isbn</code>. The reason is that the preprocessor syntax does not allow
expressions like <code class="code">b.title</code> directly in the XML tree (but see below for
another, often better workaround).
<p>

The XML tree contains the O'Caml variables. The <code class="code">id</code> attribute is a
concatenation of the fixed prefix <code class="code"><span class="constructor">BOOK_</span></code> and the contents of
<code class="code">isbn</code>. The <code class="code">title</code> and <code class="code">author</code> elements contain a data node whose
contents are the O'Caml strings <code class="code">title</code>, and <code class="code">author</code>, respectively.
<p>

Why <code class="code">&lt;*&gt;</code>? If we just wrote <code class="code">&lt;title&gt;title</code>, the generated code would
assume that the <code class="code">title</code> variable is an XML node (of type
<a href="Pxp_document.node.html"><code class="code"><span class="constructor">Pxp_document</span>.node</code></a>), and not a string. From this point of view,
<code class="code">&lt;*&gt;</code> works like a type annotation, as it specialises the type of the
following expression.
<p>

<a name="3_Thenotation"></a>
<h3>The <code class="code">(: ... :)</code> notation</h3>
<p>

Here is an alternate solution: 
<p>

<pre></pre><code class="code"><span class="keyword">let</span>&nbsp;book&nbsp;=&nbsp;<br>
&nbsp;&nbsp;&lt;:pxp_tree&lt;<br>
&nbsp;&nbsp;&nbsp;&nbsp;&lt;book&nbsp;id=(<span class="string">"BOOK_"</span>&nbsp;^&nbsp;(:&nbsp;b.isbn&nbsp;:))&gt;<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&nbsp;&lt;title&gt;&lt;*&gt;(:&nbsp;b.title&nbsp;:)<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&lt;author&gt;&lt;*&gt;(:&nbsp;b.author&nbsp;:)<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;]<br>
&nbsp;&nbsp;&gt;&gt;<br>
</code><pre></pre>
<p>

The notation <code class="code">(: ... :)</code> allows you to include arbitrary O'Caml
expressions into the tree. In this solution it is no longer necessary
to artificially create O'Caml variables for the only purpose of
injecting values into trees.
<p>

<a name="3_Dynamicnames"></a>
<h3>Dynamic names</h3>
<p>

It is possible to create XML elements with dynamic names: Just put
parentheses around the expression. Example:
<p>

<pre></pre><code class="code"><span class="keyword">let</span>&nbsp;name&nbsp;=&nbsp;<span class="string">"book"</span>&nbsp;<span class="keyword">in</span><br>
&lt;:pxp_tree&lt;&nbsp;&lt;(name)&gt;&nbsp;...&nbsp;&gt;&gt;<br>
</code><pre></pre>
<p>

With the same notation, one can also set attribute names dynamically:
<p>

<pre></pre><code class="code"><span class="keyword">let</span>&nbsp;att_name&nbsp;=&nbsp;<span class="string">"id"</span>&nbsp;<span class="keyword">in</span><br>
&lt;:pxp_tree&lt;&nbsp;&lt;book&nbsp;(att_name)=...&gt;&nbsp;...&nbsp;&gt;&gt;<br>
</code><pre></pre>
<p>

<a name="3_Dynamicattributelists"></a>
<h3>Dynamic attribute lists</h3>
<p>

Finally, it is also possible to include complete attribute lists
dynamically:
<p>

<pre></pre><code class="code"><span class="keyword">let</span>&nbsp;att_list&nbsp;=&nbsp;[&nbsp;<span class="string">"id"</span>,&nbsp;(<span class="string">"BOOK_"</span>&nbsp;^&nbsp;b.isbn)&nbsp;]&nbsp;<span class="keyword">in</span><br>
&lt;:pxp_tree&lt;&nbsp;&lt;book&nbsp;(:&nbsp;att_list&nbsp;:)&nbsp;&gt;&nbsp;...&nbsp;&gt;&gt;<br>
</code><pre></pre>
<p>

Here, <code class="code">att_list</code> must be a <code class="code">(string*string) list</code> with the attributes
to include.
<p>

<a name="3_Typing"></a>
<h3>Typing</h3>
<p>

Depending on where a variable or O'Caml expression occurs, different
types are assumed. Compare the following examples:
<p>

<pre></pre><code class="code">&lt;:pxp_tree&lt;&nbsp;&lt;element&gt;x1&nbsp;&gt;&gt;<br>
&lt;:pxp_tree&lt;&nbsp;&lt;element&gt;[x2]&nbsp;&gt;&gt;<br>
&lt;:pxp_tree&lt;&nbsp;&lt;element&gt;&lt;*&gt;x3&nbsp;&gt;&gt;<br>
</code><pre></pre>
<p>

As a rule of thumb, the most general type is assumed that would make
sense at a certain location. As <code class="code">x1</code> could be replaced by a list of
children, its type is assumed to be a node list. As <code class="code">x2</code> could be
replaced by a single node, its type is assumed to be a node. And <code class="code">x3</code>
is a string, we had this case already.
<p>

<a name="pxp_charset"></a>
<h2>Character encodings: <code class="code">pxp_charset</code></h2>
<p>

As the preprocessor generates code that builds XML trees, it must know
two character encodings:
<p>
<ul>
<li> Which encoding is used in the source code (in the .ml file) </li>
<li> Which encoding is used in the XML representation, i.e. in the O'Caml 
   values representing the XML trees</li>
</ul>

Both encodings can be set independently. The syntax is: 
<p>

<pre></pre><code class="code">&lt;:pxp_charset&lt;&nbsp;source=<span class="string">"ENC"</span>&nbsp;representation=<span class="string">"ENC"</span>&nbsp;&gt;&gt;<br>
</code><pre></pre>
<p>

where <code class="code"><span class="constructor">ENC</span></code> is the name of the selected encoding.  The default is
ISO-8859-1 for both encodings. For example, to set the representation
encoding to UTF-8, use:
<p>

<pre></pre><code class="code">&lt;:pxp_charset&lt;&nbsp;representation=<span class="string">"UTF-8"</span>&nbsp;&gt;&gt;<br>
</code><pre></pre>
<p>

The <code class="code">pxp_charset</code> notation is a constant expression that always
evaluates to <code class="code">()</code>. (A requirement by camlp4 that looks artificial.)
<p>

When you set the representation encoding, it is required that the
encoding stored in the DTD object is the same. Remember that we need a
DTD object like
<p>

<pre></pre><code class="code"><span class="keyword">let</span>&nbsp;dtd&nbsp;=&nbsp;<span class="constructor">Pxp_dtd</span>.create_dtd&nbsp;<span class="keywordsign">`</span><span class="constructor">Enc_iso88591</span><br>
</code><pre></pre>
<p>

Of course, we must change this to the representation encoding, too. In
our example:
<p>

<pre></pre><code class="code"><span class="keyword">let</span>&nbsp;dtd&nbsp;=&nbsp;<span class="constructor">Pxp_dtd</span>.create_dtd&nbsp;<span class="keywordsign">`</span><span class="constructor">Enc_utf8</span><br>
</code><pre></pre>
<p>

The preprocessor cannot check this at compile time, and for
performance reasons, a runtime check is not generated. So it is up to
the programmer that the character encodings are used in a consistent
way.
<p>

<a name="validate"></a>
<h2>Validated trees: <code class="code">pxp_text</code>, calling <code class="code">validate</code>, and <code class="code">pxp_vtree</code></h2>
<p>

In order to validate trees, you need a filled DTD object. In
principle, you can create this object by a number of methods. For
example, you can parse an external file:
<p>

<pre></pre><code class="code"><span class="keyword">let</span>&nbsp;dtd&nbsp;=&nbsp;<span class="constructor">Pxp_dtd_parser</span>.parse_dtd_entity&nbsp;config&nbsp;(from_file&nbsp;<span class="string">"sample.dtd"</span>)<br>
</code><pre></pre>
<p>

It is, however, often more convenient to include the DTD literally
into the program. This works by
<p>

<pre></pre><code class="code"><span class="keyword">let</span>&nbsp;dtd&nbsp;=&nbsp;<span class="constructor">Pxp_dtd_parser</span>.parse_dtd_entity&nbsp;config&nbsp;(from_string&nbsp;<span class="string">"..."</span>)<br>
</code><pre></pre>
<p>

As the double quotes are often used inside DTDs, O'Caml string
literals are a bit impractical, as they are also delimited by double
quotes, and one needs to add backslashes as escape characters. The
<code class="code">pxp_text</code> notation is often more readable here: 
<p>

<pre></pre><code class="code">&nbsp;&lt;:pxp_text&lt;<span class="constructor">STRING</span>&gt;&gt;&nbsp;</code><pre></pre>
<p>

is just another way of writing <code class="code"><span class="string">"STRING"</span></code>. In our DTD, we have
<p>

<pre></pre><code class="code"><span class="keyword">let</span>&nbsp;dtd_text&nbsp;=<br>
&nbsp;&nbsp;&lt;:pxp_text&lt;<br>
&nbsp;&nbsp;&nbsp;&nbsp;&lt;!<span class="constructor">ELEMENT</span>&nbsp;book&nbsp;(title,author)&gt;<br>
&nbsp;&nbsp;&nbsp;&nbsp;&lt;!<span class="constructor">ATTLIST</span>&nbsp;book&nbsp;id&nbsp;<span class="constructor">CDATA</span>&nbsp;<span class="keywordsign">#</span><span class="constructor">REQUIRED</span>&gt;<br>
&nbsp;&nbsp;&nbsp;&nbsp;&lt;!<span class="constructor">ELEMENT</span>&nbsp;title&nbsp;(<span class="keywordsign">#</span><span class="constructor">PCDATA</span>)&gt;<br>
&nbsp;&nbsp;&nbsp;&nbsp;&lt;!<span class="constructor">ATTLIST</span>&nbsp;title&nbsp;lang&nbsp;<span class="constructor">CDATA</span>&nbsp;<span class="string">"en"</span>&gt;<br>
&nbsp;&nbsp;&nbsp;&nbsp;&lt;!<span class="constructor">ELEMENT</span>&nbsp;author&nbsp;(<span class="keywordsign">#</span><span class="constructor">PCDATA</span>)&gt;<br>
&nbsp;&nbsp;&gt;&gt;<br>
<span class="keyword">let</span>&nbsp;config&nbsp;=&nbsp;default_config<br>
<span class="keyword">let</span>&nbsp;dtd&nbsp;=&nbsp;<span class="constructor">Pxp_dtd_parser</span>.parse_dtd_entity&nbsp;config&nbsp;(from_string&nbsp;dtd_text)<br>
</code><pre></pre>
<p>

Note that <code class="code">pxp_text</code> is not restricted to DTDs, as it can be used for
any kind of string.
<p>

After we have the DTD, we can validate the trees. One option is to
call the <a href="Pxp_document.html#VALvalidate"><code class="code"><span class="constructor">Pxp_document</span>.validate</code></a> function:
<p>

<pre></pre><code class="code"><span class="keyword">let</span>&nbsp;book&nbsp;=&nbsp;<br>
&nbsp;&nbsp;&lt;:pxp_tree&lt;&nbsp;<br>
&nbsp;&nbsp;&nbsp;&nbsp;&lt;book&gt;<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&nbsp;&lt;title&gt;[&nbsp;<span class="string">"The&nbsp;Lord&nbsp;of&nbsp;The&nbsp;Rings"</span>&nbsp;]<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&lt;author&gt;[&nbsp;<span class="string">"J.R.R.&nbsp;Tolkien"</span>&nbsp;]<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;]<br>
&nbsp;&nbsp;&gt;&gt;<br>
<span class="keyword">let</span>&nbsp;()&nbsp;=<br>
&nbsp;&nbsp;<span class="constructor">Pxp_document</span>.validate&nbsp;book<br>
</code><pre></pre>
<p>

(This example is invalid, and <code class="code">validate</code> will throw an exception, as
the <code class="code">id</code> attribute is missing.)
<p>

Note that it is a misunderstanding that <code class="code">pxp_tree</code> builds XML trees in
well-formedness mode. You can create any tree with it, and the fact is
that <code class="code">pxp_tree</code> just does not invoke the validator. So if the DTD
enforces validation, the tree is validated when the <code class="code">validate</code>
function is called. If the DTD is in well-formedness mode, the tree is
effectively not validated, even when the <code class="code">validate</code> function is
invoked. Btw, the following statements would create a DTD in
well-formedness mode:
<p>

<pre></pre><code class="code"><span class="keyword">let</span>&nbsp;dtd&nbsp;=&nbsp;<span class="constructor">Pxp_dtd</span>.create_dtd&nbsp;<span class="keywordsign">`</span><span class="constructor">Enc_iso88591</span><br>
<span class="keyword">let</span>&nbsp;()&nbsp;=&nbsp;dtd&nbsp;<span class="keywordsign">#</span>&nbsp;allow_arbitrary<br>
</code><pre></pre>
<p>

<a name="3_Validatingwithpxpvtree"></a>
<h3>Validating with <code class="code">pxp_vtree</code></h3>
<p>

As an alternative of calling the <code class="code">validate</code> function, one can also use
<code class="code">pxp_vtree</code> instead. It immediately validates every XML element it
creates.  However, "injected" subtrees are not validated,
i.e. validation does not proceed recursively to subnodes as the
<code class="code">validate</code> function does it.
<p>

<code class="code">pxp_vtree</code> has the same syntax as <code class="code">pxp_tree</code>.
<p>

<a name="events"></a>
<h2>Generating events: <code class="code">pxp_evlist</code> and <code class="code">pxp_evpull</code></h2>
<p>

As PXP has also an event model to represent XML, the preprocessor can
also produce such events. In particular, there are two modes: The
<code class="code">pxp_evlist</code> notation outputs lists of events (of type
<a href="Pxp_types.html#TYPEevent"><code class="code"><span class="constructor">Pxp_types</span>.event</code></a><code class="code"> list</code>) representing the XML expression. The
<code class="code">pxp_evpull</code> notation creates an automaton from which one can "pull"
events (like from a pull parser). The automaton has type
<code class="code">unit <span class="keywordsign">-&gt;</span> </code><a href="Pxp_types.html#TYPEevent"><code class="code"><span class="constructor">Pxp_types</span>.event</code></a>.
<p>

<a name="3_pxpevlist"></a>
<h3><code class="code">pxp_evlist</code></h3>
<p>

Syntactically, these two notations work very much like <code class="code">pxp_tree</code>. For
example,
<p>

<pre></pre><code class="code"><span class="keyword">let</span>&nbsp;book&nbsp;=&nbsp;<br>
&nbsp;&nbsp;&lt;:pxp_evlist&lt;&nbsp;<br>
&nbsp;&nbsp;&nbsp;&nbsp;&lt;book&gt;<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&nbsp;&lt;title&gt;[&nbsp;<span class="string">"The&nbsp;Lord&nbsp;of&nbsp;The&nbsp;Rings"</span>&nbsp;]<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&lt;author&gt;[&nbsp;<span class="string">"J.R.R.&nbsp;Tolkien"</span>&nbsp;]<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;]<br>
&nbsp;&nbsp;&gt;&gt;<br>
</code><pre></pre>
<p>

returns this list of events:
<p>

<pre></pre><code class="code">[&nbsp;<span class="constructor">E_start_tag</span>&nbsp;(<span class="string">"book"</span>,&nbsp;[],&nbsp;<span class="constructor">None</span>,&nbsp;&lt;obj&gt;);<br>
&nbsp;&nbsp;<span class="constructor">E_start_tag</span>&nbsp;(<span class="string">"title"</span>,&nbsp;[],&nbsp;<span class="constructor">None</span>,&nbsp;&lt;obj&gt;);<br>
&nbsp;&nbsp;<span class="constructor">E_char_data</span>&nbsp;<span class="string">"The&nbsp;Lord&nbsp;of&nbsp;The&nbsp;Rings"</span>;&nbsp;<br>
&nbsp;&nbsp;<span class="constructor">E_end_tag</span>&nbsp;(<span class="string">"title"</span>,&nbsp;&lt;obj&gt;);<br>
&nbsp;&nbsp;<span class="constructor">E_start_tag</span>&nbsp;(<span class="string">"author"</span>,&nbsp;[],&nbsp;<span class="constructor">None</span>,&nbsp;&lt;obj&gt;);&nbsp;<br>
&nbsp;&nbsp;<span class="constructor">E_char_data</span>&nbsp;<span class="string">"J.R.R.&nbsp;Tolkien"</span>;<br>
&nbsp;&nbsp;<span class="constructor">E_end_tag</span>&nbsp;(<span class="string">"author"</span>,&nbsp;&lt;obj&gt;);&nbsp;<br>
&nbsp;&nbsp;<span class="constructor">E_end_tag</span>&nbsp;(<span class="string">"book"</span>,&nbsp;&lt;obj&gt;)<br>
]<br>
</code><pre></pre>
<p>

(Here, <code class="code">&lt;obj&gt;</code> denotes the <code class="code">entity_id</code> object for identifying the 
containing entity.)
<p>

Note that you neither need a <code class="code">dtd</code> variable nor a <code class="code">spec</code> variable in
event mode. 
<p>

There is one important culprit: Both single nodes and lists of nodes
are represented by the same type, <a href="Pxp_types.html#TYPEevent"><code class="code"><span class="constructor">Pxp_types</span>.event</code></a><code class="code"> list</code>. That has
the consequence that in the following example <code class="code">x1</code> and <code class="code">x2</code> have the
same type <a href="Pxp_types.html#TYPEevent"><code class="code"><span class="constructor">Pxp_types</span>.event</code></a><code class="code"> list</code>:
<p>

<pre></pre><code class="code">&lt;:pxp_evlist&lt;&nbsp;&lt;element&gt;x1&nbsp;&gt;&gt;<br>
&lt;:pxp_evlist&lt;&nbsp;&lt;element&gt;[x2]&nbsp;&gt;&gt;<br>
</code><pre></pre>
<p>

In principle, it could be checked at runtime whether <code class="code">x1</code> and <code class="code">x2</code>
have the right structure. However, this is not done because of
performance reasons, and because the generated XML is still
well-formed. The typing is just different from <code class="code">pxp_tree</code> which
distinguishes between a single <code class="code">node</code> and a <code class="code">node list</code>.
<p>

<a name="3_pxpevpull"></a>
<h3><code class="code">pxp_evpull</code></h3>
<p>

As mentioned, <code class="code">pxp_evpull</code> works like a pull parser. After defining
<p>

<pre></pre><code class="code"><span class="keyword">let</span>&nbsp;book&nbsp;=&nbsp;<br>
&nbsp;&nbsp;&lt;:pxp_evpull&lt;&nbsp;<br>
&nbsp;&nbsp;&nbsp;&nbsp;&lt;book&gt;<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&nbsp;&lt;title&gt;[&nbsp;<span class="string">"The&nbsp;Lord&nbsp;of&nbsp;The&nbsp;Rings"</span>&nbsp;]<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&lt;author&gt;[&nbsp;<span class="string">"J.R.R.&nbsp;Tolkien"</span>&nbsp;]<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;]<br>
&nbsp;&nbsp;&gt;&gt;<br>
</code><pre></pre>
<p>

<code class="code">book</code> is a function <code class="code">unit <span class="keywordsign">-&gt;</span> </code><a href="Pxp_types.html#TYPEevent"><code class="code"><span class="constructor">Pxp_types</span>.event</code></a><code class="code"> option</code>. One can call it to
pull the events out of it one after the other:
<p>

<pre></pre><code class="code"><span class="keyword">let</span>&nbsp;e1&nbsp;=&nbsp;book();;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span class="comment">(*&nbsp;=&nbsp;Some(E_start_tag&nbsp;("book",&nbsp;[],&nbsp;None,&nbsp;&lt;obj&gt;))&nbsp;*)</span><br>
<span class="keyword">let</span>&nbsp;e2&nbsp;=&nbsp;book();;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span class="comment">(*&nbsp;=&nbsp;Some(E_start_tag&nbsp;("title",&nbsp;[],&nbsp;None,&nbsp;&lt;obj&gt;))&nbsp;*)</span><br>
...<br>
</code><pre></pre>
<p>

After the last event, <code class="code">book</code> returns <code class="code"><span class="constructor">None</span></code> to indicate the end of the
event stream.
<p>

As for <code class="code">pxp_evlist</code>, it is not possible to distinguish between single
nodes and node lists by type. In this example, both <code class="code">x1</code> and <code class="code">x2</code> are
assumed to have type <code class="code">unit <span class="keywordsign">-&gt;</span> </code><a href="Pxp_types.html#TYPEevent"><code class="code"><span class="constructor">Pxp_types</span>.event</code></a>:
<p>

<pre></pre><code class="code">&lt;:pxp_evlist&lt;&nbsp;&lt;element&gt;x1&nbsp;&gt;&gt;<br>
&lt;:pxp_evlist&lt;&nbsp;&lt;element&gt;[x2]&nbsp;&gt;&gt;<br>
</code><pre></pre>
<p>

Note that <code class="code">&lt;element&gt;x1</code> actually means to build a new pull automaton
around the existing pull automaton <code class="code">x1</code>: The children of <code class="code">element</code> are
retrieved by pulling events from <code class="code">x1</code> until <code class="code"><span class="constructor">None</span></code> is returned.
<p>

A consequence of the pull semantics is that once an event is obtained
from an automaton, the state of the automaton is modified such that it
is not possible to get the same event again. If you need an automaton
that can be reset to the beginning, just wrap the <code class="code">pxp_evlist</code>
notation into a functional abstraction:
<p>

<pre></pre><code class="code"><span class="keyword">let</span>&nbsp;book_maker()&nbsp;=<br>
&nbsp;&nbsp;&lt;:pxp_evpull&lt;&nbsp;&lt;book&nbsp;...&gt;&nbsp;...&nbsp;&gt;&gt;<br>
<span class="keyword">let</span>&nbsp;book1&nbsp;=&nbsp;book_maker()<br>
<span class="keyword">let</span>&nbsp;book2&nbsp;=&nbsp;book_maker()<br>
</code><pre></pre>
<p>

This way, <code class="code">book1</code> and <code class="code">book2</code> generate independent event streams.
<p>

There is another implication of the nature of the automatons:
Subexpressions are lazily evaluated. For example, in
<p>

<pre></pre><code class="code">&lt;:pxp_evpull&lt;&nbsp;&lt;element&gt;[&nbsp;&lt;*&gt;&nbsp;(:&nbsp;get_data_contents()&nbsp;:)&nbsp;]&nbsp;&gt;&gt;<br>
</code><pre></pre>
<p>

the call of <code class="code">get_data_contents</code> is performed just before the event for
the data node is constructed instead of being done at automaton 
construction time.
<p>

<a name="documents"></a>
<h2>Documents</h2>
<p>

Note that none of the notations <code class="code">pxp_tree</code>, <code class="code">pxp_vtree</code>,
<code class="code">pxp_evlist</code>, or <code class="code">pxp_evpull</code> is able to create documents. They just
create what is equivalent to the node tree inside a document, but not
the document wrapping.
<p>

In the tree case, just put the node tree into a 
<a href="Pxp_document.document.html"><code class="code"><span class="constructor">Pxp_document</span>.document</code></a>:
<p>

<pre></pre><code class="code"><span class="keyword">let</span>&nbsp;book&nbsp;=&nbsp;&lt;:pxp_tree&lt;&nbsp;...&nbsp;&gt;&gt;<br>
<span class="keyword">let</span>&nbsp;doc&nbsp;=&nbsp;<span class="keyword">new</span>&nbsp;<span class="constructor">Pxp_document</span>.document&nbsp;warner&nbsp;dtd<span class="keywordsign">#</span>encoding<br>
doc&nbsp;<span class="keywordsign">#</span>&nbsp;init_root&nbsp;book&nbsp;<span class="string">"book"</span><br>
</code><pre></pre>
<p>

In the event case, the generated events do not include
<code class="code"><span class="constructor">E_start_doc</span></code>, <code class="code"><span class="constructor">E_end_doc</span></code>, or <code class="code"><span class="constructor">E_end_of_stream</span></code>. If required, one
has to add these events manually which is quite simple.
For <code class="code">pxp_evlist</code>, do something like
<p>

<pre></pre><code class="code"><span class="keyword">let</span>&nbsp;doc&nbsp;=<br>
&nbsp;&nbsp;<span class="constructor">E_start_doc</span>(<span class="string">"1.0"</span>,&nbsp;dtd)&nbsp;::<br>
&nbsp;&nbsp;(&nbsp;&lt;:pxp_evlist&lt;&nbsp;&lt;book&gt;...&nbsp;&gt;&gt;&nbsp;@<br>
&nbsp;&nbsp;&nbsp;&nbsp;[&nbsp;<span class="constructor">E_end_doc</span>(<span class="string">"book"</span>);<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span class="constructor">E_end_of_stream</span>&nbsp;<br>
&nbsp;&nbsp;&nbsp;&nbsp;]<br>
&nbsp;&nbsp;)<br>
</code><pre></pre>
<p>

For <code class="code">pxp_evpull</code>, do something like
<p>

<pre></pre><code class="code"><span class="keyword">let</span>&nbsp;doc&nbsp;=<br>
&nbsp;&nbsp;<span class="constructor">Pxp_event</span>.concat<br>
&nbsp;&nbsp;&nbsp;&nbsp;[&nbsp;<span class="constructor">Pxp_event</span>.of_list&nbsp;[&nbsp;<span class="constructor">E_start_doc</span>(<span class="string">"1.0"</span>,&nbsp;dtd)&nbsp;];<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&lt;:pxp_evpull&lt;&nbsp;&lt;book&gt;...&nbsp;&gt;&gt;;<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span class="constructor">Pxp_event</span>.of_list&nbsp;[<span class="constructor">E_end_doc</span>(<span class="string">"book"</span>);&nbsp;<span class="constructor">E_end_of_stream</span>&nbsp;]<br>
&nbsp;&nbsp;&nbsp;&nbsp;]<br>
</code><pre></pre>
<p>

(See <a href="Pxp_event.html#VALconcat"><code class="code"><span class="constructor">Pxp_event</span>.concat</code></a> and <a href="Pxp_event.html#VALof_list"><code class="code"><span class="constructor">Pxp_event</span>.of_list</code></a>.)
<p>

<a name="namespaces"></a>
<h2>Namespaces</h2>
<p>

By default, the preprocessor does not generate nodes or events that
support namespaces. It can, however, be configured to create
namespace-aware XML aggregations.
<p>

In any case, you need a namespace manager. This is an object that
tracks the usage of namespace prefixes in XML nodes. For example, we
can create a namespace manager that knows the <code class="code">html</code> prefix:
<p>

<pre></pre><code class="code"><span class="keyword">let</span>&nbsp;mng&nbsp;=&nbsp;<span class="keyword">new</span>&nbsp;<span class="constructor">Pxp_dtd</span>.namespace_manager&nbsp;<span class="keyword">in</span><br>
mng&nbsp;<span class="keywordsign">#</span>&nbsp;add_namespace&nbsp;<span class="string">"html"</span>&nbsp;<span class="string">"http://www.w3.org/1999/xhtml"</span><br>
</code><pre></pre>
<p>

(Also see <a href="Pxp_dtd.namespace_manager.html"><code class="code"><span class="constructor">Pxp_dtd</span>.namespace_manager</code></a>.)
Here, we declare that we want to use the <code class="code">html</code> prefix for the
internal representation of the XML nodes. This kind of prefix is
called normalized prefix, or normprefix for short. It is possible to
configure different prefixes for the external representation,
i.e. when the XML tree is printed to a file.  This other kind of
prefix is called display prefix. We will have a look at them later.
(For a more detailed discussion of namespaces, see
<a href="Intro_namespaces.html"><code class="code"><span class="constructor">Intro_namespaces</span></code></a>.)
<p>

Next, we must tell the DTD object that we have a namespace manager:
<p>

<pre></pre><code class="code"><span class="keyword">let</span>&nbsp;dtd&nbsp;=&nbsp;<span class="constructor">Pxp_dtd</span>.create_dtd&nbsp;<span class="keywordsign">`</span><span class="constructor">Enc_iso88591</span><br>
dtd&nbsp;<span class="keywordsign">#</span>&nbsp;set_namespace_manager&nbsp;mng<br>
</code><pre></pre>
<p>

For <code class="code">pxp_evlist</code> and <code class="code">pxp_evpull</code> we are now prepared (note that we
need now a <code class="code">dtd</code> variable, as only the DTD object knows the namespace
manager). For <code class="code">pxp_tree</code> and <code class="code">pxp_vtree</code>, it is required to use a
namespace-aware specification:
<p>

<pre></pre><code class="code"><span class="keyword">let</span>&nbsp;spec&nbsp;=&nbsp;<span class="constructor">Pxp_tree_parser</span>.default_namespace_spec&nbsp;<br>
</code><pre></pre>
<p>

(Normal specifications do not work, you would get "Namespace method not 
applicable" errors if you tried to use them.)
<p>

<a name="3_Usingautoscope"></a>
<h3>Using <code class="code">&lt;:autoscope&gt;</code></h3>
<p>

The special notation <code class="code">&lt;:autoscope&gt;</code> enables namespace mode in this
example:
<p>

<pre></pre><code class="code"><span class="keyword">let</span>&nbsp;list&nbsp;=<br>
&nbsp;&nbsp;&lt;:pxp_tree&lt;<br>
&nbsp;&nbsp;&nbsp;&nbsp;&lt;:autoscope&gt;<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&lt;html:ul&gt;<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&nbsp;&lt;html:li&gt;<span class="string">"Item1"</span><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&lt;html:li&gt;<span class="string">"Item2"</span><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;]<br>
&nbsp;&nbsp;&gt;&gt;<br>
</code><pre></pre>
<p>

In particular, <code class="code">&lt;:autoscope&gt;</code> defines a new O'Caml variable for its
subexpression: <code class="code">scope</code>. This variable contains the namespace scope
object, which contains the namespace declarations for the
subexpression. <code class="code">&lt;:autoscope&gt;</code> initialises this variable from the
namespace manager such that it contains now a declaration for the
<code class="code">html</code> prefix. <code class="code">scope</code> has type <a href="Pxp_dtd.namespace_scope.html"><code class="code"><span class="constructor">Pxp_dtd</span>.namespace_scope</code></a>.
<p>

In general, the namespace scope object contains the prefixes to use
for the external representation (as opposed to the namespace
manager which defines the prefixes for the internal representation).
If the external prefixes can be the same as the internal ones,
<code class="code">&lt;:autoscope&gt;</code> is the right directive, as it initalizes the <code class="code">scope</code>
object with the prefixes from the namespace manager, so that both
views are the same.
<p>

Print the tree by 
<p>

<pre></pre><code class="code">list&nbsp;<span class="keywordsign">#</span>&nbsp;display&nbsp;(<span class="keywordsign">`</span><span class="constructor">Out_channel</span>&nbsp;stdout)&nbsp;<span class="keywordsign">`</span><span class="constructor">Enc_iso88591</span><br>
</code><pre></pre>
<p>

Note that there is a <code class="code">display</code> and a <code class="code">write</code> method. The difference
is that <code class="code">display</code> prints the external prefixes (from <code class="code">scope</code>), and
that <code class="code">write</code> prints the internal prefixes (from the namespace
manager). In this introduction we prefer <code class="code">display</code>.
<p>

<a name="3_Usingscopeinitsbasicform"></a>
<h3>Using <code class="code">&lt;:scope&gt;</code> in its basic form</h3>
<p>

Alternatively, we can also create the <code class="code">scope</code> variable manually:
<p>

<pre></pre><code class="code"><span class="keyword">let</span>&nbsp;scope&nbsp;=&nbsp;<span class="constructor">Pxp_dtd</span>.create_namespace_scope<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;~decl:[&nbsp;<span class="string">""</span>,&nbsp;<span class="string">"http://www.w3.org/1999/xhtml"</span>&nbsp;]<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;mng<br>
<span class="keyword">let</span>&nbsp;list&nbsp;=<br>
&nbsp;&nbsp;&lt;:pxp_tree&lt;<br>
&nbsp;&nbsp;&nbsp;&nbsp;&lt;:scope&gt;<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&lt;html:ul&gt;<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&nbsp;&lt;html:li&gt;<span class="string">"Item1"</span><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&lt;html:li&gt;<span class="string">"Item2"</span><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;]<br>
&nbsp;&nbsp;&gt;&gt;<br>
</code><pre></pre>
<p>

Note that we now use <code class="code">&lt;:scope&gt;</code>. In this simple form, this construct
just enables namespace mode, and takes the <code class="code">scope</code> variable from the
environment.
<p>

Furthermore, the namespace scope contains now a different namespace
declaration: The display prefix <code class="code"><span class="string">""</span></code> is used for HTML. The empty
prefix just means to declare a default prefix (by <code class="code">xmlns=<span class="string">"URI"</span></code>). The
effect can be seen when the XML tree is printed by calling the
<code class="code">display</code> method.
<p>

If we had called <code class="code">create_namespace_scope</code> with the <code class="code">decl</code> argument
<p>

<pre></pre><code class="code">&nbsp;&nbsp;~decl:[&nbsp;<span class="string">"foo"</span>,&nbsp;<span class="string">"http://www.w3.org/1999/xhtml"</span>&nbsp;]<br>
</code><pre></pre>
<p>

the displayed tree would use the <code class="code">foo</code> prefix, and declare it as
<code class="code">xmlns:foo=<span class="string">"http://www.w3.org/1999/xhtml"</span></code>.
<p>

<a name="3_Usingscopetosetdisplayprefixes"></a>
<h3>Using <code class="code">&lt;:scope&gt;</code> to set display prefixes</h3>
<p>

Here is a third variant of the same example: 
<p>

<pre></pre><code class="code"><span class="keyword">let</span>&nbsp;scope&nbsp;=&nbsp;<span class="constructor">Pxp_dtd</span>.create_namespace_scope&nbsp;mng<br>
<span class="keyword">let</span>&nbsp;list&nbsp;=<br>
&nbsp;&nbsp;&lt;:pxp_tree&lt;<br>
&nbsp;&nbsp;&nbsp;&nbsp;&lt;:scope&nbsp;(<span class="string">""</span>)=<span class="string">"http://www.w3.org/1999/xhtml"</span>&gt;<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&lt;html:ul&gt;<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&nbsp;&lt;html:li&gt;<span class="string">"Item1"</span><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&lt;html:li&gt;<span class="string">"Item2"</span><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;]<br>
&nbsp;&nbsp;&gt;&gt;<br>
</code><pre></pre>
<p>

The <code class="code">scope</code> is now initially empty. The <code class="code">&lt;:scope&gt;</code> notation is used to
extend the scope for the time the subexpression is evaluated.
<p>

There is also a notation <code class="code">&lt;:emptyscope&gt;</code> that creates an empty scope
object, so one could even write
<p>

<pre></pre><code class="code"><span class="keyword">let</span>&nbsp;list&nbsp;=<br>
&nbsp;&nbsp;&lt;:pxp_tree&lt;<br>
&nbsp;&nbsp;&nbsp;&nbsp;&lt;:emptyscope&gt;<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&lt;:scope&nbsp;(<span class="string">""</span>)=<span class="string">"http://www.w3.org/1999/xhtml"</span>&gt;<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&lt;html:ul&gt;<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&nbsp;&lt;html:li&gt;<span class="string">"Item1"</span><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&lt;html:li&gt;<span class="string">"Item2"</span><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;]<br>
&nbsp;&nbsp;&gt;&gt;<br>
</code><pre></pre>
<p>

The <code class="code">&lt;:scope&gt;</code> notation can be used in any subexpression, and it
modifies the display prefix to use in that subexpression. For example,
here a different prefix <code class="code">foo</code> is used for the second item:
<p>

<pre></pre><code class="code"><span class="keyword">let</span>&nbsp;list&nbsp;=<br>
&nbsp;&nbsp;&lt;:pxp_tree&lt;<br>
&nbsp;&nbsp;&nbsp;&nbsp;&lt;:emptyscope&gt;<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&lt;:scope&nbsp;(<span class="string">""</span>)=<span class="string">"http://www.w3.org/1999/xhtml"</span>&gt;<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&lt;html:ul&gt;<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&nbsp;&lt;html:li&gt;<span class="string">"Item1"</span><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&lt;:scope&nbsp;foo=<span class="string">"http://www.w3.org/1999/xhtml"</span>&gt;<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&lt;html:li&gt;<span class="string">"Item2"</span><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;]<br>
&nbsp;&nbsp;&gt;&gt;<br>
</code><pre></pre>
<p>

It is recommended to create the <code class="code">scope</code> variable manually with a
reasonable initial declaration, and to use <code class="code">&lt;:scope&gt;</code> to enable
namespace processing, and to extend the scope where necessary. The
advantage of this approach is that the same scope object can be shared
by many XML nodes, so you need less memory.
<p>

One tip: To get a namespace scope that is initialised with all
prefixes of the namespace manager (as <code class="code">&lt;:autoscope&gt;</code> does it), define
<p>

<pre></pre><code class="code"><span class="keyword">let</span>&nbsp;scope&nbsp;=&nbsp;create_namespace_scope&nbsp;~decl:&nbsp;mng<span class="keywordsign">#</span>as_declaration&nbsp;mng<br>
</code><pre></pre>
<p>

For event-based processing of XML, the namespace mode works in the
same way as described here, there is no difference.
<br>
</body></html>