<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> <html> <head> <link rel="stylesheet" href="style.css" type="text/css"> <meta content="text/html; charset=iso-8859-1" http-equiv="Content-Type"> <link rel="Start" href="index.html"> <link rel="previous" href="Intro_advanced.html"> <link rel="next" href="Example_readme.html"> <link rel="Up" href="index.html"> <link title="Index of types" rel=Appendix href="index_types.html"> <link title="Index of exceptions" rel=Appendix href="index_exceptions.html"> <link title="Index of values" rel=Appendix href="index_values.html"> <link title="Index of class methods" rel=Appendix href="index_methods.html"> <link title="Index of classes" rel=Appendix href="index_classes.html"> <link title="Index of class types" rel=Appendix href="index_class_types.html"> <link title="Index of modules" rel=Appendix href="index_modules.html"> <link title="Index of module types" rel=Appendix href="index_module_types.html"> <link title="Pxp_types" rel="Chapter" href="Pxp_types.html"> <link title="Pxp_document" rel="Chapter" href="Pxp_document.html"> <link title="Pxp_dtd" rel="Chapter" href="Pxp_dtd.html"> <link title="Pxp_tree_parser" rel="Chapter" href="Pxp_tree_parser.html"> <link title="Pxp_core_types" rel="Chapter" href="Pxp_core_types.html"> <link title="Pxp_ev_parser" rel="Chapter" href="Pxp_ev_parser.html"> <link title="Pxp_event" rel="Chapter" href="Pxp_event.html"> <link title="Pxp_dtd_parser" rel="Chapter" href="Pxp_dtd_parser.html"> <link title="Pxp_codewriter" rel="Chapter" href="Pxp_codewriter.html"> <link title="Pxp_marshal" rel="Chapter" href="Pxp_marshal.html"> <link title="Pxp_yacc" rel="Chapter" href="Pxp_yacc.html"> <link title="Pxp_reader" rel="Chapter" href="Pxp_reader.html"> <link title="Intro_trees" rel="Chapter" href="Intro_trees.html"> <link title="Intro_extensions" rel="Chapter" href="Intro_extensions.html"> <link title="Intro_namespaces" rel="Chapter" href="Intro_namespaces.html"> <link title="Intro_events" rel="Chapter" href="Intro_events.html"> <link title="Intro_resolution" rel="Chapter" href="Intro_resolution.html"> <link title="Intro_getting_started" rel="Chapter" href="Intro_getting_started.html"> <link title="Intro_advanced" rel="Chapter" href="Intro_advanced.html"> <link title="Intro_preprocessor" rel="Chapter" href="Intro_preprocessor.html"> <link title="Example_readme" rel="Chapter" href="Example_readme.html"><link title="The PXP Preprocessor" rel="Section" href="#1_ThePXPPreprocessor"> <link title="Creating constant XML with pxp_tree - basic syntax" rel="Subsection" href="#pxp_tree"> <link title="Dynamic XML" rel="Subsection" href="#dynamic"> <link title="Character encodings: pxp_charset" rel="Subsection" href="#pxp_charset"> <link title="Validated trees: pxp_text, calling validate, and pxp_vtree" rel="Subsection" href="#validate"> <link title="Generating events: pxp_evlist and pxp_evpull" rel="Subsection" href="#events"> <link title="Documents" rel="Subsection" href="#documents"> <link title="Namespaces" rel="Subsection" href="#namespaces"> <title>PXP Reference : Intro_preprocessor</title> </head> <body> <div class="navbar"><a href="Intro_advanced.html">Previous</a> <a href="index.html">Up</a> <a href="Example_readme.html">Next</a> </div> <center><h1>Intro_preprocessor</h1></center> <br> <br> <a name="1_ThePXPPreprocessor"></a> <h1>The PXP Preprocessor</h1> <p> Since PXP-1.1.95, there is a preprocessor as part of the PXP distribution. It allows you to compose XML trees and event lists dynamically, which is very handy to write XML transformations. <p> To enable the preprocessor, compile your source files as in: <p> <pre></pre><code class="code"> ocamlfind ocamlc -syntax camlp4o -package pxp-pp,... ... </code><pre></pre> <p> The package <code class="code">pxp-pp</code> contains the preprocessor. The <code class="code">-syntax</code> option enables camlp4, on which the preprocessor is based. It is also possible to use it together with the revised syntax, use <code class="code">-syntax camlp4r</code> in this case. <p> In the toploop, type <p> <pre></pre><code class="code">ocaml<br> <span class="keywordsign">#</span> <span class="keywordsign">#</span>use <span class="string">"topfind"</span>;;<br> <span class="keywordsign">#</span> <span class="keywordsign">#</span>camlp4o;;<br> <span class="keywordsign">#</span> <span class="keywordsign">#</span>require <span class="string">"pxp-pp"</span>;;<br> <span class="keywordsign">#</span> <span class="keywordsign">#</span>require <span class="string">"pxp"</span>;;<br> </code><pre></pre> <p> The preprocessor defines the following new syntax notations, explained below in detail: <p> <ul> <li><code class="code"><:pxp_charset< <span class="constructor">CHARSET_DECL</span> >></code></li> <li><code class="code"><:pxp_tree< <span class="constructor">EXPR</span> >></code></li> <li><code class="code"><:pxp_vtree< <span class="constructor">EXPR</span> >></code></li> <li><code class="code"><:pxp_evlist< <span class="constructor">EXPR</span> >></code></li> <li><code class="code"><:pxp_evpull< <span class="constructor">EXPR</span> >></code></li> <li><code class="code"><:pxp_text< <span class="constructor">TEXT</span> >></code></li> </ul> The basic notation is <code class="code">pxp_tree</code> which creates a tree of PXP document nodes as described in EXPR. <code class="code">pxp_vtree</code> is the variant where the tree is immediately validated - with <code class="code">pxp_tree</code> the tree is not validated, but one can validate it later (e.g. when the whole output tree of the program is built up). <code class="code">pxp_evlist</code> creates a list of PXP events instead of nodes, useful together with the event parser. <code class="code">pxp_evpull</code> is a variation of the latter: Instead of an event list an event generator is created that works like a pull parser. <p> The <code class="code">pxp_charset</code> notation only configures the character sets to assume. Finally, <code class="code">pxp_text</code> is a notation for string literals. <p> <a name="pxp_tree"></a> <h2>Creating constant XML with <code class="code">pxp_tree</code> - basic syntax</h2> <p> The following examples are all written for <code class="code">pxp_tree</code>. You can also use one of the other XML composers instead, but see the notes below that explain a few differences. <p> In order to use <code class="code">pxp_tree</code>, you must define two variables in the environment: <code class="code">spec</code> and <code class="code">dtd</code>: <p> <pre></pre><code class="code"><span class="keyword">let</span> spec = <span class="constructor">Pxp_tree_parser</span>.default_spec<br> <span class="keyword">let</span> dtd = <span class="constructor">Pxp_dtd</span>.create_dtd <span class="keywordsign">`</span><span class="constructor">Enc_iso88591</span><br> </code><pre></pre> <p> These variables are assumed to exist by the generated code. The <code class="code">dtd</code> variable is the DTD object. Note that you need it even in well-formedness mode (validation turned off) - see the explanations in <a href="Intro_getting_started.html#wfmode"><i>Parsing in well-formedness mode</i></a> to understand why. The <code class="code">spec</code> variable controls which classes are instantiated as node representation. See <a href="Intro_getting_started.html#spec"><i>Specifying which classes implement nodes - the mysterious spec parameter</i></a> for the meaning of <code class="code">spec</code>. <p> <a name="3_Elementsattributesanddatanodes"></a> <h3>Elements, attributes, and data nodes</h3> <p> Now you can create XML trees like in <p> <pre></pre><code class="code"><span class="keyword">let</span> book = <br> <:pxp_tree< <br> <book><br> [ <title>[ <span class="string">"The Lord of The Rings"</span> ]<br> <author>[ <span class="string">"J.R.R. Tolkien"</span> ]<br> ]<br> >><br> </code><pre></pre> <p> As you can see, the syntax is somehow XML-related but not really XML. (Many ideas are borrowed from CDUCE, by the way.) In particular, there are start tags like <code class="code"><title></code> but no end tags. Instead, we are using square brackets to denote where the list of the children of the XML element starts and where it ends. Furthermore, character data must be put into double quotes. <p> You may ask why the well-known XML syntax has been modified for this preprocessor. There are many reasons, and they will become clearer in the following explanations. For now, you can see the advantage that the syntax is less verbose, as you need not to repeat the element names in end tags (I know programmers like brevity). Furthermore, you can exactly control which characters are part of the data nodes without having to make compromises with indentation. <p> Attributes are written as in XML: <p> <pre></pre><code class="code"><span class="keyword">let</span> book = <br> <:pxp_tree< <br> <book id=<span class="string">"BOOK_001"</span>><br> [ <title lang=<span class="string">"en"</span>>[ <span class="string">"The Lord of The Rings"</span> ]<br> <author>[ <span class="string">"J.R.R. Tolkien"</span> ]<br> ]<br> >><br> </code><pre></pre> <p> An element without children can be written <p> <pre></pre><code class="code"> <element>[] </code><pre></pre> <p> or slightly shorter: <p> <pre></pre><code class="code"> <element/> </code><pre></pre> <p> <a name="3_Processinginstructionsandcomments"></a> <h3>Processing instructions and comments</h3> <p> You can also create processing instructions and comment nodes: <p> <pre></pre><code class="code"><span class="keyword">let</span> list =<br> <:pxp_tree<<br> <list><br> [ <!><span class="string">"Now the list of books follows!"</span><br> <?><span class="string">"formatter_directive"</span> <span class="string">"one book per page"</span><br> book<br> ]<br> >><br> </code><pre></pre> <p> The notation <code class="code"><!></code> creates a comment node with the following string as contents. The notation <code class="code"><?></code> for constructing processing instructions needs two strings, first the target, then the value (here, this results in <code class="code"><?formatter_directive one book per page<span class="keywordsign">?></span></code>). <p> Look again at the last example: The O'Caml variable <code class="code">book</code> occurs, and it inserts its tree into the list of books. Identifiers without "decoration" just refer to O'Caml variables. We will see more examples below. <p> <a name="3_Elementswithonechild"></a> <h3>Elements with one child</h3> <p> The preprocessor syntax knows a number of shortcuts and variations. First, you can omit the square brackets when an element has exactly one child: <p> <pre></pre><code class="code"><element><child><span class="string">"Data inside child"</span><br> </code><pre></pre> <p> This is the same as <p> <pre></pre><code class="code"><element>[ <child>[ <span class="string">"Data inside child"</span> ] ]<br> </code><pre></pre> <p> <a name="3_Detailsofdatanodes"></a> <h3>Details of data nodes</h3> <p> Second, we already have used a common abbreviation: Strings are automatically converted to data nodes. The "expanded" syntax is <p> <pre></pre><code class="code"><*><span class="string">"Data string"</span><br> </code><pre></pre> <p> where <code class="code"><*></code> denotes to construct a data node, and the following string is used as contents. Usually, you can omit <code class="code"><*></code>, so this is the same as <p> <pre></pre><code class="code"><span class="string">"Data string"</span><br> </code><pre></pre> <p> However, there are a few occasions where the <code class="code"><*></code> notation is still useful, see below (essentially, it also works like a type annotation: the following subexpression must be a string). <p> Inside strings, the usual entity references can be used: <code class="code"><span class="keywordsign">&</span>lt;</code>, <code class="code"><span class="keywordsign">&</span>gt;</code>, <code class="code"><span class="keywordsign">&</span>amp;</code>, <code class="code"><span class="keywordsign">&</span>quot;</code>, <code class="code"><span class="keywordsign">&</span>apos;</code>, and also numeric references work: <code class="code"><span class="keywordsign">&</span><span class="keywordsign">#</span></code><i>n</i><code class="code">;</code> where <i>n</i> is a number. Note that <code class="code"><span class="keywordsign">&</span>lt;</code>, <code class="code"><span class="keywordsign">&</span>gt;</code>, and <code class="code"><span class="keywordsign">&</span>apos;</code> are not obligatory, as <code class="code"><</code>, <code class="code">></code>, and <code class="code"><span class="keywordsign">'</span></code> can be included directly. <p> Example: <code class="code"><span class="string">"Double quotes: &quot;"</span></code>. For a newline character, write <code class="code"><span class="keywordsign">&</span><span class="keywordsign">#</span>10;</code>. <p> <a name="3_Operators"></a> <h3>Operators</h3> <p> The preprocessor knows two operators: <code class="code">^</code> concatenates strings, and <code class="code">@</code> concatenates lists. Examples: <p> <pre></pre><code class="code"><element>[ <span class="string">"Word1"</span> ^ <span class="string">"Word2"</span> ]<br> <element>([ <a/> ] @ [ <b/> ])<br> </code><pre></pre> <p> Parentheses can be used to clarify precedence. For example: <p> <pre></pre><code class="code"><element>(l1 @ l2)<br> </code><pre></pre> <p> Without parentheses, the concatenation operator <code class="code">@</code> would be parsed as <p> <pre></pre><code class="code">(<element> l1) @ l2<br> </code><pre></pre> <p> Parentheses may be used in any expression. <p> <a name="3_Superroot"></a> <h3>Super root</h3> <p> Rarely used, there is also a notation for the "super root" nodes. For uses of this node types, see <a href="Intro_getting_started.html#nodetypes"><i>Choosing the node types to represent</i></a>. <p> <pre></pre><code class="code"><^>[ <element> ... ]<br> </code><pre></pre> <p> <a name="dynamic"></a> <h2>Dynamic XML</h2> <p> This section describes how to insert dynamically created content into XML trees. <p> Let us begin with an example. The task is to convert O'Caml values of type <p> <pre></pre><code class="code"><span class="keyword">type</span> book = <br> { title : string;<br> author : string;<br> isbn : string;<br> }<br> </code><pre></pre> <p> to XML trees like <p> <pre></pre><code class="code"> <br> <book id=<span class="string">"BOOK_{isbn}"</span>><br> <title>{title}</title><br> <author>{author}</title><br> </book><br> </code><pre></pre> <p> (conventional syntax, with placeholders in {braces}). When <code class="code">b</code> is the book variable, the solution is <p> <pre></pre><code class="code"><span class="keyword">let</span> book = <br> <span class="keyword">let</span> title = b.title<br> <span class="keyword">and</span> author = b.author<br> <span class="keyword">and</span> isbn = b.isbn <span class="keyword">in</span><br> <:pxp_tree<<br> <book id=(<span class="string">"BOOK_"</span> ^ isbn)><br> [ <title><*>title<br> <author><*>author<br> ]<br> >><br> </code><pre></pre> <p> First, we bind the simple O'Caml variables <code class="code">title</code>, <code class="code">author</code>, and <code class="code">isbn</code>. The reason is that the preprocessor syntax does not allow expressions like <code class="code">b.title</code> directly in the XML tree (but see below for another, often better workaround). <p> The XML tree contains the O'Caml variables. The <code class="code">id</code> attribute is a concatenation of the fixed prefix <code class="code"><span class="constructor">BOOK_</span></code> and the contents of <code class="code">isbn</code>. The <code class="code">title</code> and <code class="code">author</code> elements contain a data node whose contents are the O'Caml strings <code class="code">title</code>, and <code class="code">author</code>, respectively. <p> Why <code class="code"><*></code>? If we just wrote <code class="code"><title>title</code>, the generated code would assume that the <code class="code">title</code> variable is an XML node (of type <a href="Pxp_document.node.html"><code class="code"><span class="constructor">Pxp_document</span>.node</code></a>), and not a string. From this point of view, <code class="code"><*></code> works like a type annotation, as it specialises the type of the following expression. <p> <a name="3_Thenotation"></a> <h3>The <code class="code">(: ... :)</code> notation</h3> <p> Here is an alternate solution: <p> <pre></pre><code class="code"><span class="keyword">let</span> book = <br> <:pxp_tree<<br> <book id=(<span class="string">"BOOK_"</span> ^ (: b.isbn :))><br> [ <title><*>(: b.title :)<br> <author><*>(: b.author :)<br> ]<br> >><br> </code><pre></pre> <p> The notation <code class="code">(: ... :)</code> allows you to include arbitrary O'Caml expressions into the tree. In this solution it is no longer necessary to artificially create O'Caml variables for the only purpose of injecting values into trees. <p> <a name="3_Dynamicnames"></a> <h3>Dynamic names</h3> <p> It is possible to create XML elements with dynamic names: Just put parentheses around the expression. Example: <p> <pre></pre><code class="code"><span class="keyword">let</span> name = <span class="string">"book"</span> <span class="keyword">in</span><br> <:pxp_tree< <(name)> ... >><br> </code><pre></pre> <p> With the same notation, one can also set attribute names dynamically: <p> <pre></pre><code class="code"><span class="keyword">let</span> att_name = <span class="string">"id"</span> <span class="keyword">in</span><br> <:pxp_tree< <book (att_name)=...> ... >><br> </code><pre></pre> <p> <a name="3_Dynamicattributelists"></a> <h3>Dynamic attribute lists</h3> <p> Finally, it is also possible to include complete attribute lists dynamically: <p> <pre></pre><code class="code"><span class="keyword">let</span> att_list = [ <span class="string">"id"</span>, (<span class="string">"BOOK_"</span> ^ b.isbn) ] <span class="keyword">in</span><br> <:pxp_tree< <book (: att_list :) > ... >><br> </code><pre></pre> <p> Here, <code class="code">att_list</code> must be a <code class="code">(string*string) list</code> with the attributes to include. <p> <a name="3_Typing"></a> <h3>Typing</h3> <p> Depending on where a variable or O'Caml expression occurs, different types are assumed. Compare the following examples: <p> <pre></pre><code class="code"><:pxp_tree< <element>x1 >><br> <:pxp_tree< <element>[x2] >><br> <:pxp_tree< <element><*>x3 >><br> </code><pre></pre> <p> As a rule of thumb, the most general type is assumed that would make sense at a certain location. As <code class="code">x1</code> could be replaced by a list of children, its type is assumed to be a node list. As <code class="code">x2</code> could be replaced by a single node, its type is assumed to be a node. And <code class="code">x3</code> is a string, we had this case already. <p> <a name="pxp_charset"></a> <h2>Character encodings: <code class="code">pxp_charset</code></h2> <p> As the preprocessor generates code that builds XML trees, it must know two character encodings: <p> <ul> <li> Which encoding is used in the source code (in the .ml file) </li> <li> Which encoding is used in the XML representation, i.e. in the O'Caml values representing the XML trees</li> </ul> Both encodings can be set independently. The syntax is: <p> <pre></pre><code class="code"><:pxp_charset< source=<span class="string">"ENC"</span> representation=<span class="string">"ENC"</span> >><br> </code><pre></pre> <p> where <code class="code"><span class="constructor">ENC</span></code> is the name of the selected encoding. The default is ISO-8859-1 for both encodings. For example, to set the representation encoding to UTF-8, use: <p> <pre></pre><code class="code"><:pxp_charset< representation=<span class="string">"UTF-8"</span> >><br> </code><pre></pre> <p> The <code class="code">pxp_charset</code> notation is a constant expression that always evaluates to <code class="code">()</code>. (A requirement by camlp4 that looks artificial.) <p> When you set the representation encoding, it is required that the encoding stored in the DTD object is the same. Remember that we need a DTD object like <p> <pre></pre><code class="code"><span class="keyword">let</span> dtd = <span class="constructor">Pxp_dtd</span>.create_dtd <span class="keywordsign">`</span><span class="constructor">Enc_iso88591</span><br> </code><pre></pre> <p> Of course, we must change this to the representation encoding, too. In our example: <p> <pre></pre><code class="code"><span class="keyword">let</span> dtd = <span class="constructor">Pxp_dtd</span>.create_dtd <span class="keywordsign">`</span><span class="constructor">Enc_utf8</span><br> </code><pre></pre> <p> The preprocessor cannot check this at compile time, and for performance reasons, a runtime check is not generated. So it is up to the programmer that the character encodings are used in a consistent way. <p> <a name="validate"></a> <h2>Validated trees: <code class="code">pxp_text</code>, calling <code class="code">validate</code>, and <code class="code">pxp_vtree</code></h2> <p> In order to validate trees, you need a filled DTD object. In principle, you can create this object by a number of methods. For example, you can parse an external file: <p> <pre></pre><code class="code"><span class="keyword">let</span> dtd = <span class="constructor">Pxp_dtd_parser</span>.parse_dtd_entity config (from_file <span class="string">"sample.dtd"</span>)<br> </code><pre></pre> <p> It is, however, often more convenient to include the DTD literally into the program. This works by <p> <pre></pre><code class="code"><span class="keyword">let</span> dtd = <span class="constructor">Pxp_dtd_parser</span>.parse_dtd_entity config (from_string <span class="string">"..."</span>)<br> </code><pre></pre> <p> As the double quotes are often used inside DTDs, O'Caml string literals are a bit impractical, as they are also delimited by double quotes, and one needs to add backslashes as escape characters. The <code class="code">pxp_text</code> notation is often more readable here: <p> <pre></pre><code class="code"> <:pxp_text<<span class="constructor">STRING</span>>> </code><pre></pre> <p> is just another way of writing <code class="code"><span class="string">"STRING"</span></code>. In our DTD, we have <p> <pre></pre><code class="code"><span class="keyword">let</span> dtd_text =<br> <:pxp_text<<br> <!<span class="constructor">ELEMENT</span> book (title,author)><br> <!<span class="constructor">ATTLIST</span> book id <span class="constructor">CDATA</span> <span class="keywordsign">#</span><span class="constructor">REQUIRED</span>><br> <!<span class="constructor">ELEMENT</span> title (<span class="keywordsign">#</span><span class="constructor">PCDATA</span>)><br> <!<span class="constructor">ATTLIST</span> title lang <span class="constructor">CDATA</span> <span class="string">"en"</span>><br> <!<span class="constructor">ELEMENT</span> author (<span class="keywordsign">#</span><span class="constructor">PCDATA</span>)><br> >><br> <span class="keyword">let</span> config = default_config<br> <span class="keyword">let</span> dtd = <span class="constructor">Pxp_dtd_parser</span>.parse_dtd_entity config (from_string dtd_text)<br> </code><pre></pre> <p> Note that <code class="code">pxp_text</code> is not restricted to DTDs, as it can be used for any kind of string. <p> After we have the DTD, we can validate the trees. One option is to call the <a href="Pxp_document.html#VALvalidate"><code class="code"><span class="constructor">Pxp_document</span>.validate</code></a> function: <p> <pre></pre><code class="code"><span class="keyword">let</span> book = <br> <:pxp_tree< <br> <book><br> [ <title>[ <span class="string">"The Lord of The Rings"</span> ]<br> <author>[ <span class="string">"J.R.R. Tolkien"</span> ]<br> ]<br> >><br> <span class="keyword">let</span> () =<br> <span class="constructor">Pxp_document</span>.validate book<br> </code><pre></pre> <p> (This example is invalid, and <code class="code">validate</code> will throw an exception, as the <code class="code">id</code> attribute is missing.) <p> Note that it is a misunderstanding that <code class="code">pxp_tree</code> builds XML trees in well-formedness mode. You can create any tree with it, and the fact is that <code class="code">pxp_tree</code> just does not invoke the validator. So if the DTD enforces validation, the tree is validated when the <code class="code">validate</code> function is called. If the DTD is in well-formedness mode, the tree is effectively not validated, even when the <code class="code">validate</code> function is invoked. Btw, the following statements would create a DTD in well-formedness mode: <p> <pre></pre><code class="code"><span class="keyword">let</span> dtd = <span class="constructor">Pxp_dtd</span>.create_dtd <span class="keywordsign">`</span><span class="constructor">Enc_iso88591</span><br> <span class="keyword">let</span> () = dtd <span class="keywordsign">#</span> allow_arbitrary<br> </code><pre></pre> <p> <a name="3_Validatingwithpxpvtree"></a> <h3>Validating with <code class="code">pxp_vtree</code></h3> <p> As an alternative of calling the <code class="code">validate</code> function, one can also use <code class="code">pxp_vtree</code> instead. It immediately validates every XML element it creates. However, "injected" subtrees are not validated, i.e. validation does not proceed recursively to subnodes as the <code class="code">validate</code> function does it. <p> <code class="code">pxp_vtree</code> has the same syntax as <code class="code">pxp_tree</code>. <p> <a name="events"></a> <h2>Generating events: <code class="code">pxp_evlist</code> and <code class="code">pxp_evpull</code></h2> <p> As PXP has also an event model to represent XML, the preprocessor can also produce such events. In particular, there are two modes: The <code class="code">pxp_evlist</code> notation outputs lists of events (of type <a href="Pxp_types.html#TYPEevent"><code class="code"><span class="constructor">Pxp_types</span>.event</code></a><code class="code"> list</code>) representing the XML expression. The <code class="code">pxp_evpull</code> notation creates an automaton from which one can "pull" events (like from a pull parser). The automaton has type <code class="code">unit <span class="keywordsign">-></span> </code><a href="Pxp_types.html#TYPEevent"><code class="code"><span class="constructor">Pxp_types</span>.event</code></a>. <p> <a name="3_pxpevlist"></a> <h3><code class="code">pxp_evlist</code></h3> <p> Syntactically, these two notations work very much like <code class="code">pxp_tree</code>. For example, <p> <pre></pre><code class="code"><span class="keyword">let</span> book = <br> <:pxp_evlist< <br> <book><br> [ <title>[ <span class="string">"The Lord of The Rings"</span> ]<br> <author>[ <span class="string">"J.R.R. Tolkien"</span> ]<br> ]<br> >><br> </code><pre></pre> <p> returns this list of events: <p> <pre></pre><code class="code">[ <span class="constructor">E_start_tag</span> (<span class="string">"book"</span>, [], <span class="constructor">None</span>, <obj>);<br> <span class="constructor">E_start_tag</span> (<span class="string">"title"</span>, [], <span class="constructor">None</span>, <obj>);<br> <span class="constructor">E_char_data</span> <span class="string">"The Lord of The Rings"</span>; <br> <span class="constructor">E_end_tag</span> (<span class="string">"title"</span>, <obj>);<br> <span class="constructor">E_start_tag</span> (<span class="string">"author"</span>, [], <span class="constructor">None</span>, <obj>); <br> <span class="constructor">E_char_data</span> <span class="string">"J.R.R. Tolkien"</span>;<br> <span class="constructor">E_end_tag</span> (<span class="string">"author"</span>, <obj>); <br> <span class="constructor">E_end_tag</span> (<span class="string">"book"</span>, <obj>)<br> ]<br> </code><pre></pre> <p> (Here, <code class="code"><obj></code> denotes the <code class="code">entity_id</code> object for identifying the containing entity.) <p> Note that you neither need a <code class="code">dtd</code> variable nor a <code class="code">spec</code> variable in event mode. <p> There is one important culprit: Both single nodes and lists of nodes are represented by the same type, <a href="Pxp_types.html#TYPEevent"><code class="code"><span class="constructor">Pxp_types</span>.event</code></a><code class="code"> list</code>. That has the consequence that in the following example <code class="code">x1</code> and <code class="code">x2</code> have the same type <a href="Pxp_types.html#TYPEevent"><code class="code"><span class="constructor">Pxp_types</span>.event</code></a><code class="code"> list</code>: <p> <pre></pre><code class="code"><:pxp_evlist< <element>x1 >><br> <:pxp_evlist< <element>[x2] >><br> </code><pre></pre> <p> In principle, it could be checked at runtime whether <code class="code">x1</code> and <code class="code">x2</code> have the right structure. However, this is not done because of performance reasons, and because the generated XML is still well-formed. The typing is just different from <code class="code">pxp_tree</code> which distinguishes between a single <code class="code">node</code> and a <code class="code">node list</code>. <p> <a name="3_pxpevpull"></a> <h3><code class="code">pxp_evpull</code></h3> <p> As mentioned, <code class="code">pxp_evpull</code> works like a pull parser. After defining <p> <pre></pre><code class="code"><span class="keyword">let</span> book = <br> <:pxp_evpull< <br> <book><br> [ <title>[ <span class="string">"The Lord of The Rings"</span> ]<br> <author>[ <span class="string">"J.R.R. Tolkien"</span> ]<br> ]<br> >><br> </code><pre></pre> <p> <code class="code">book</code> is a function <code class="code">unit <span class="keywordsign">-></span> </code><a href="Pxp_types.html#TYPEevent"><code class="code"><span class="constructor">Pxp_types</span>.event</code></a><code class="code"> option</code>. One can call it to pull the events out of it one after the other: <p> <pre></pre><code class="code"><span class="keyword">let</span> e1 = book();; <span class="comment">(* = Some(E_start_tag ("book", [], None, <obj>)) *)</span><br> <span class="keyword">let</span> e2 = book();; <span class="comment">(* = Some(E_start_tag ("title", [], None, <obj>)) *)</span><br> ...<br> </code><pre></pre> <p> After the last event, <code class="code">book</code> returns <code class="code"><span class="constructor">None</span></code> to indicate the end of the event stream. <p> As for <code class="code">pxp_evlist</code>, it is not possible to distinguish between single nodes and node lists by type. In this example, both <code class="code">x1</code> and <code class="code">x2</code> are assumed to have type <code class="code">unit <span class="keywordsign">-></span> </code><a href="Pxp_types.html#TYPEevent"><code class="code"><span class="constructor">Pxp_types</span>.event</code></a>: <p> <pre></pre><code class="code"><:pxp_evlist< <element>x1 >><br> <:pxp_evlist< <element>[x2] >><br> </code><pre></pre> <p> Note that <code class="code"><element>x1</code> actually means to build a new pull automaton around the existing pull automaton <code class="code">x1</code>: The children of <code class="code">element</code> are retrieved by pulling events from <code class="code">x1</code> until <code class="code"><span class="constructor">None</span></code> is returned. <p> A consequence of the pull semantics is that once an event is obtained from an automaton, the state of the automaton is modified such that it is not possible to get the same event again. If you need an automaton that can be reset to the beginning, just wrap the <code class="code">pxp_evlist</code> notation into a functional abstraction: <p> <pre></pre><code class="code"><span class="keyword">let</span> book_maker() =<br> <:pxp_evpull< <book ...> ... >><br> <span class="keyword">let</span> book1 = book_maker()<br> <span class="keyword">let</span> book2 = book_maker()<br> </code><pre></pre> <p> This way, <code class="code">book1</code> and <code class="code">book2</code> generate independent event streams. <p> There is another implication of the nature of the automatons: Subexpressions are lazily evaluated. For example, in <p> <pre></pre><code class="code"><:pxp_evpull< <element>[ <*> (: get_data_contents() :) ] >><br> </code><pre></pre> <p> the call of <code class="code">get_data_contents</code> is performed just before the event for the data node is constructed instead of being done at automaton construction time. <p> <a name="documents"></a> <h2>Documents</h2> <p> Note that none of the notations <code class="code">pxp_tree</code>, <code class="code">pxp_vtree</code>, <code class="code">pxp_evlist</code>, or <code class="code">pxp_evpull</code> is able to create documents. They just create what is equivalent to the node tree inside a document, but not the document wrapping. <p> In the tree case, just put the node tree into a <a href="Pxp_document.document.html"><code class="code"><span class="constructor">Pxp_document</span>.document</code></a>: <p> <pre></pre><code class="code"><span class="keyword">let</span> book = <:pxp_tree< ... >><br> <span class="keyword">let</span> doc = <span class="keyword">new</span> <span class="constructor">Pxp_document</span>.document warner dtd<span class="keywordsign">#</span>encoding<br> doc <span class="keywordsign">#</span> init_root book <span class="string">"book"</span><br> </code><pre></pre> <p> In the event case, the generated events do not include <code class="code"><span class="constructor">E_start_doc</span></code>, <code class="code"><span class="constructor">E_end_doc</span></code>, or <code class="code"><span class="constructor">E_end_of_stream</span></code>. If required, one has to add these events manually which is quite simple. For <code class="code">pxp_evlist</code>, do something like <p> <pre></pre><code class="code"><span class="keyword">let</span> doc =<br> <span class="constructor">E_start_doc</span>(<span class="string">"1.0"</span>, dtd) ::<br> ( <:pxp_evlist< <book>... >> @<br> [ <span class="constructor">E_end_doc</span>(<span class="string">"book"</span>);<br> <span class="constructor">E_end_of_stream</span> <br> ]<br> )<br> </code><pre></pre> <p> For <code class="code">pxp_evpull</code>, do something like <p> <pre></pre><code class="code"><span class="keyword">let</span> doc =<br> <span class="constructor">Pxp_event</span>.concat<br> [ <span class="constructor">Pxp_event</span>.of_list [ <span class="constructor">E_start_doc</span>(<span class="string">"1.0"</span>, dtd) ];<br> <:pxp_evpull< <book>... >>;<br> <span class="constructor">Pxp_event</span>.of_list [<span class="constructor">E_end_doc</span>(<span class="string">"book"</span>); <span class="constructor">E_end_of_stream</span> ]<br> ]<br> </code><pre></pre> <p> (See <a href="Pxp_event.html#VALconcat"><code class="code"><span class="constructor">Pxp_event</span>.concat</code></a> and <a href="Pxp_event.html#VALof_list"><code class="code"><span class="constructor">Pxp_event</span>.of_list</code></a>.) <p> <a name="namespaces"></a> <h2>Namespaces</h2> <p> By default, the preprocessor does not generate nodes or events that support namespaces. It can, however, be configured to create namespace-aware XML aggregations. <p> In any case, you need a namespace manager. This is an object that tracks the usage of namespace prefixes in XML nodes. For example, we can create a namespace manager that knows the <code class="code">html</code> prefix: <p> <pre></pre><code class="code"><span class="keyword">let</span> mng = <span class="keyword">new</span> <span class="constructor">Pxp_dtd</span>.namespace_manager <span class="keyword">in</span><br> mng <span class="keywordsign">#</span> add_namespace <span class="string">"html"</span> <span class="string">"http://www.w3.org/1999/xhtml"</span><br> </code><pre></pre> <p> (Also see <a href="Pxp_dtd.namespace_manager.html"><code class="code"><span class="constructor">Pxp_dtd</span>.namespace_manager</code></a>.) Here, we declare that we want to use the <code class="code">html</code> prefix for the internal representation of the XML nodes. This kind of prefix is called normalized prefix, or normprefix for short. It is possible to configure different prefixes for the external representation, i.e. when the XML tree is printed to a file. This other kind of prefix is called display prefix. We will have a look at them later. (For a more detailed discussion of namespaces, see <a href="Intro_namespaces.html"><code class="code"><span class="constructor">Intro_namespaces</span></code></a>.) <p> Next, we must tell the DTD object that we have a namespace manager: <p> <pre></pre><code class="code"><span class="keyword">let</span> dtd = <span class="constructor">Pxp_dtd</span>.create_dtd <span class="keywordsign">`</span><span class="constructor">Enc_iso88591</span><br> dtd <span class="keywordsign">#</span> set_namespace_manager mng<br> </code><pre></pre> <p> For <code class="code">pxp_evlist</code> and <code class="code">pxp_evpull</code> we are now prepared (note that we need now a <code class="code">dtd</code> variable, as only the DTD object knows the namespace manager). For <code class="code">pxp_tree</code> and <code class="code">pxp_vtree</code>, it is required to use a namespace-aware specification: <p> <pre></pre><code class="code"><span class="keyword">let</span> spec = <span class="constructor">Pxp_tree_parser</span>.default_namespace_spec <br> </code><pre></pre> <p> (Normal specifications do not work, you would get "Namespace method not applicable" errors if you tried to use them.) <p> <a name="3_Usingautoscope"></a> <h3>Using <code class="code"><:autoscope></code></h3> <p> The special notation <code class="code"><:autoscope></code> enables namespace mode in this example: <p> <pre></pre><code class="code"><span class="keyword">let</span> list =<br> <:pxp_tree<<br> <:autoscope><br> <html:ul><br> [ <html:li><span class="string">"Item1"</span><br> <html:li><span class="string">"Item2"</span><br> ]<br> >><br> </code><pre></pre> <p> In particular, <code class="code"><:autoscope></code> defines a new O'Caml variable for its subexpression: <code class="code">scope</code>. This variable contains the namespace scope object, which contains the namespace declarations for the subexpression. <code class="code"><:autoscope></code> initialises this variable from the namespace manager such that it contains now a declaration for the <code class="code">html</code> prefix. <code class="code">scope</code> has type <a href="Pxp_dtd.namespace_scope.html"><code class="code"><span class="constructor">Pxp_dtd</span>.namespace_scope</code></a>. <p> In general, the namespace scope object contains the prefixes to use for the external representation (as opposed to the namespace manager which defines the prefixes for the internal representation). If the external prefixes can be the same as the internal ones, <code class="code"><:autoscope></code> is the right directive, as it initalizes the <code class="code">scope</code> object with the prefixes from the namespace manager, so that both views are the same. <p> Print the tree by <p> <pre></pre><code class="code">list <span class="keywordsign">#</span> display (<span class="keywordsign">`</span><span class="constructor">Out_channel</span> stdout) <span class="keywordsign">`</span><span class="constructor">Enc_iso88591</span><br> </code><pre></pre> <p> Note that there is a <code class="code">display</code> and a <code class="code">write</code> method. The difference is that <code class="code">display</code> prints the external prefixes (from <code class="code">scope</code>), and that <code class="code">write</code> prints the internal prefixes (from the namespace manager). In this introduction we prefer <code class="code">display</code>. <p> <a name="3_Usingscopeinitsbasicform"></a> <h3>Using <code class="code"><:scope></code> in its basic form</h3> <p> Alternatively, we can also create the <code class="code">scope</code> variable manually: <p> <pre></pre><code class="code"><span class="keyword">let</span> scope = <span class="constructor">Pxp_dtd</span>.create_namespace_scope<br> ~decl:[ <span class="string">""</span>, <span class="string">"http://www.w3.org/1999/xhtml"</span> ]<br> mng<br> <span class="keyword">let</span> list =<br> <:pxp_tree<<br> <:scope><br> <html:ul><br> [ <html:li><span class="string">"Item1"</span><br> <html:li><span class="string">"Item2"</span><br> ]<br> >><br> </code><pre></pre> <p> Note that we now use <code class="code"><:scope></code>. In this simple form, this construct just enables namespace mode, and takes the <code class="code">scope</code> variable from the environment. <p> Furthermore, the namespace scope contains now a different namespace declaration: The display prefix <code class="code"><span class="string">""</span></code> is used for HTML. The empty prefix just means to declare a default prefix (by <code class="code">xmlns=<span class="string">"URI"</span></code>). The effect can be seen when the XML tree is printed by calling the <code class="code">display</code> method. <p> If we had called <code class="code">create_namespace_scope</code> with the <code class="code">decl</code> argument <p> <pre></pre><code class="code"> ~decl:[ <span class="string">"foo"</span>, <span class="string">"http://www.w3.org/1999/xhtml"</span> ]<br> </code><pre></pre> <p> the displayed tree would use the <code class="code">foo</code> prefix, and declare it as <code class="code">xmlns:foo=<span class="string">"http://www.w3.org/1999/xhtml"</span></code>. <p> <a name="3_Usingscopetosetdisplayprefixes"></a> <h3>Using <code class="code"><:scope></code> to set display prefixes</h3> <p> Here is a third variant of the same example: <p> <pre></pre><code class="code"><span class="keyword">let</span> scope = <span class="constructor">Pxp_dtd</span>.create_namespace_scope mng<br> <span class="keyword">let</span> list =<br> <:pxp_tree<<br> <:scope (<span class="string">""</span>)=<span class="string">"http://www.w3.org/1999/xhtml"</span>><br> <html:ul><br> [ <html:li><span class="string">"Item1"</span><br> <html:li><span class="string">"Item2"</span><br> ]<br> >><br> </code><pre></pre> <p> The <code class="code">scope</code> is now initially empty. The <code class="code"><:scope></code> notation is used to extend the scope for the time the subexpression is evaluated. <p> There is also a notation <code class="code"><:emptyscope></code> that creates an empty scope object, so one could even write <p> <pre></pre><code class="code"><span class="keyword">let</span> list =<br> <:pxp_tree<<br> <:emptyscope><br> <:scope (<span class="string">""</span>)=<span class="string">"http://www.w3.org/1999/xhtml"</span>><br> <html:ul><br> [ <html:li><span class="string">"Item1"</span><br> <html:li><span class="string">"Item2"</span><br> ]<br> >><br> </code><pre></pre> <p> The <code class="code"><:scope></code> notation can be used in any subexpression, and it modifies the display prefix to use in that subexpression. For example, here a different prefix <code class="code">foo</code> is used for the second item: <p> <pre></pre><code class="code"><span class="keyword">let</span> list =<br> <:pxp_tree<<br> <:emptyscope><br> <:scope (<span class="string">""</span>)=<span class="string">"http://www.w3.org/1999/xhtml"</span>><br> <html:ul><br> [ <html:li><span class="string">"Item1"</span><br> <:scope foo=<span class="string">"http://www.w3.org/1999/xhtml"</span>><br> <html:li><span class="string">"Item2"</span><br> ]<br> >><br> </code><pre></pre> <p> It is recommended to create the <code class="code">scope</code> variable manually with a reasonable initial declaration, and to use <code class="code"><:scope></code> to enable namespace processing, and to extend the scope where necessary. The advantage of this approach is that the same scope object can be shared by many XML nodes, so you need less memory. <p> One tip: To get a namespace scope that is initialised with all prefixes of the namespace manager (as <code class="code"><:autoscope></code> does it), define <p> <pre></pre><code class="code"><span class="keyword">let</span> scope = create_namespace_scope ~decl: mng<span class="keywordsign">#</span>as_declaration mng<br> </code><pre></pre> <p> For event-based processing of XML, the namespace mode works in the same way as described here, there is no difference. <br> </body></html>