<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><title>CDuce: XML Schema</title><meta content="text/html; charset=iso-8859-1" http-equiv="Content-Type"/><link type="text/css" rel="stylesheet" href="cduce.css"/></head><body style="margin: 0; padding : 0;"><table border="0" width="100%" cellspacing="10" cellpadding="0"><tr><td valign="top" align="left" style="width:20%;"><div class="leftbar" id="leftbar"><div class="smallbox"><ul><li><a href="#overview">Overview</a></li><li><a href="#primer">XML Schema components (micro) introduction</a></li><li><a href="#import">XML Schema components import</a></li><li><a href="#directives">Toplevel directives</a></li><li><a href="#mapping">XML Schema → CDuce mapping</a></li><li><a href="#validation">XML Schema validation</a></li><li><a href="#print_xml">XML Schema instances output</a></li><li><a href="#nonsupp">Unsupported XML Schema features</a></li></ul></div></div></td><td><h1>XML Schema</h1><div class="mainpanel"><div class="smallbox"><p><a href="index.html">CDuce: documentation</a>: <a href="manual.html">User's manual</a>: XML Schema</p><p><a href="namespaces.html"><img class="icon" width="16" alt="Previous page:" height="16" src="img/left.gif"/> XML Namespaces</a> <a href="manual_schema_samples.html"><img class="icon" width="16" alt="Next page:" height="16" src="img/right.gif"/> XML Schema sample documents</a></p></div><div><h2><a name="overview">Overview</a></h2><p> CDuce partially supports <a href="http://www.w3.org/XML/Schema">XML Schema</a> Recommendations (<a href="http://www.w3.org/TR/xmlschema-0/">Primer</a>, <a href="http://www.w3.org/TR/xmlschema-1/">Structures</a>, <a href="http://www.w3.org/TR/xmlschema-2/">Datatypes</a>). Using this CDuce feature it is possible to manipulate XML documents whose leaves are typed values like integers, dates, binary data, and so on. </p><p> CDuce supports XML Schema by implementing the following features: </p><ul><li><a href="#import">XML Schema components import</a></li><li><a href="#validation">XML Schema validation</a></li><li><a href="#print_xml">XML Schema instances output</a></li></ul><p> This manual page describes how to use these features in CDuce, all the documents used in the examples are available in the manual section: <a href="manual_schema_samples.html">XML Schema sample documents</a>. </p><div class="note"><b>Note: </b> The support for XML Schema does not currently interact well with separate compilation. When a CDuce unit <b><tt><strong class="highlight">script</strong>.cd</tt></b> which uses an XML Schema is compiled, the resulting <b><tt><strong class="highlight">script</strong>.cdo</tt></b> object refers to the XML Schema by name. That is, when these units are run, the XML Schema must still be available from the current directory and must not have been changed since compilation. </div></div><div><h2><a name="primer">XML Schema components (micro) introduction</a></h2><p> An XML Schema document could define four different kinds of component, each of them could be imported in CDuce and used as CDuce types: </p><ul><li><b>Type definitions</b><br/> A type definition defines either a simple type or a complex type. The former could be used to type more precisely the string content of an element. You can think at it as a refinement of #PCDATA. XML Schema provides a set of <a href="http://www.w3.org/TR/xmlschema-2/#built-in-datatypes">predefined simple types</a> and a way to define new simple types. The latter could be used to constraint the content model and the attributes of an XML element. An XML Schema complex type is strictly more expressive than a DTD element declaration. </li><li><b>Element declarations</b> An element declaration links an attribute name to a complex type. Optionally, if the type is a simple type, it can constraints the set of possible values for the element mandating a fixed value or providing a default value. </li><li><b>Attribute group definitions</b> An attribute group definitions links a set of attribute declarations to a name which can be referenced from other XML Schema components. </li><li><b>Model group definitions</b> A model group definition links a name to a constraint over the complex content of an XML element. The linked name can be referenced from other XML Schema components. </li></ul><p> Attribute declaration currently don't produce any CDuce type and can't be used for validation themselves. </p></div><div><h2><a name="import">XML Schema components import</a></h2><p> In order to import XML Schema components in CDuce, you first need to tell CDuce to import an XML Schema document. You can do this using the <b><tt>schema</tt></b> keyword to bind an uppercase identifier to a local schema document: </p><div class="code"><pre> # <strong class="highlight">schema Mails = "tests/schema/mails.xsd"</strong>;; Registering schema type: attachmentType Registering schema type: mimeTopLevelType Registering schema type: mailsType Registering schema type: mailType Registering schema type: bodyType Registering schema type: envelopeType Registering schema element: header Registering schema element: Date Registering schema element: mails Registering schema attribute group: mimeTypeAttributes Registering schema model group: attachmentContent </pre></div><p> The above declaration will (try to) import all schema components included in the schema document <a href="manual_schema_samples.html">mails.xsd</a> as CDuce types. You can reference them using the dot operator, e.g. <b><tt>S.mails</tt></b>. </p><p> XML Schema permits ambiguity in components name. CDuce chooses to resolve references to Schema components in this order: elements, types, model groups, attribute group. </p><p> The result of a schema component reference is an ordinary CDuce type which you can use as usual in function definitions, pattern matching and so on. </p><div class="code"><pre> let is_valid_mail (Any -> Bool) | <strong class="highlight">Mails.mailType</strong> -> `true | _ -> `false </pre></div></div><div><p><em><b>Correctness remark:</b> while parsing XML Schema documents, CDuce assumes that they're correct with respect to XML Schema recommendations. At minimum they're required to be valid with respect to <a href="http://www.w3.org/TR/xmlschema-1/#normative-schemaSchema">XML Schema for Schemas</a>. It's recommended that you will check for validity your schemas before importing them in CDuce, strange behaviour is assured otherwise. </em></p></div><div><h2><a name="directives">Toplevel directives</a></h2><p> The toplevel directive <b><tt>#env</tt></b> supports schemas, it lists the currently defined schemas. </p><p> The toplevel directive <b><tt>#print_type</tt></b> supports schemas too, it can be used to print types corresponding to schema components. </p><div class="code"><pre> # #print_type <strong class="highlight">Mails.bodyType</strong>;; [ Char* ] </pre></div><p> For more information have a look at the manual section about <a href="manual_interpreter.html">toplevel directives</a>. </p></div><div><h2><a name="mapping">XML Schema → CDuce mapping</a></h2><ul><li><p> XML Schema <b>predefined simple types</b> are mapped to CDuce types directly in the CDuce implementation preserving as most as possible XML Schema constraints. The table below lists the most significant mappings. </p><table width="100%"><tr><td align="center"><table border="1"><col/><col/><tr><th><b>XML Schema predefined simple type</b></th><th><b>CDuce type</b></th></tr><tr><td><code>duration</code>, <code>dateTime</code>, <code>time</code>, <code>date</code>, <code>gYear</code>, <code>gMonth</code>, ... </td><td> closed record types with some of the following fields (depending on the Schema type): <code>year</code>, <code>month</code>, <code>day</code>, <code>hour</code>, <code>minute</code>, <code>second</code>, <code>timezone</code></td></tr><tr><td><code>boolean</code></td><td><code>Bool</code></td></tr><tr><td><code>anySimpleType</code>, <code>string</code>, <code>base64Binary</code>, <code>hexBinary</code>, <code>anyURI</code></td><td><code>String</code></td></tr><tr><td><code>integer</code></td><td><code>Int</code></td></tr><tr><td><code>nonPositiveInteger</code>, <code>negativeInteger</code>, <code>nonNegativeInteger</code>, <code>positiveInteger</code>, <code>long</code>, <code>int</code>, <code>short</code>, <code>byte</code></td><td>integer intervals with the appropriate limits</td></tr><tr><td><code>string</code>, <code>normalizedString</code>, and the other types derived (directly or indirectly) by restriction from string </td><td>String</td></tr><tr><td><code>NMTOKENS</code>, <code>IDREFS</code>, <code>ENTITIES</code></td><td><code>[String*]</code></td></tr><tr><td><code>decimal</code>,<code>float</code>,<code>double</code></td><td><code>Float</code></td></tr><tr><td> (<b>Not properly supported</b>)<br/><code>decimal</code>, <code>float</code>, <code>double</code>, <code>NOTATION</code>, <code>QName</code></td><td><code>String</code></td></tr></table></td></tr></table><p><b>Simple type definitions</b> are built from the above types following the XML Schema derivation rules. </p></li><li><p> XML Schema <b>complex type definitions</b> are mapped to CDuce types representing XML elements which can have any tag, but whose attributes and content are constrained to be valid with respect to the original complex type. </p><p> As an example, the following XML Schema complex type (a simplified version of the homonymous <b><tt>envelopeType</tt></b> defined in <a href="manual_schema_samples.html">mails.xsd</a>): </p><div class="code"><pre> <xsd:complexType name="envelopeType"> <xsd:sequence> <xsd:element name="From" type="xsd:string"/> <xsd:element name="To" type="xsd:string"/> <xsd:element name="Date" type="xsd:dateTime"/> <xsd:element name="Subject" type="xsd:string"/> </xsd:sequence> </xsd:complexType> </pre></div><p> will be mapped to an XML CDuce type which must have a <tt>From</tt> attribute of type String and four children. Among them the <tt>Date</tt> children must be an XML element containing a record which represents a <tt>dateTime</tt> Schema type. </p><div class="code"><pre> # #print_type Mails.envelopeType;; <(Any)>[ <From>String <To>String <Date>{ positive = Bool; year = Int; month = Int; day = Int; hour = Int; minute = Int; second = Int; timezone =? { positive = Bool; hour = Int; minute = Int } } <Subject}>String ] </pre></div></li><li><p> XML Schema <b>element declarations</b> can bound an XML element either to a complex type or to a simple type. In the former case the conversion is almost identical as what we have seen for complex type conversion. The only difference is that this time element's tag must correspond to the name of the XML element in the schema element declaration, whereas previously it was <b><tt>Any</tt></b> type. </p><p> In the latter case (element with simple type content), the corresponding CDuce types is an element type. Its tag must correspond to the name of the XML element in the schema element declaration; its content type its the CDuce translation of the simple type provided in the element declaration. </p><p> For example, the following XML Schema element (corresponding to the homonymous element defined in <a href="manual_schema_samples.html">mails.xsd</a>): </p><div class="code"><pre> <xsd:element name="header"> <xsd:complexType> <xsd:simpleContent> <xsd:extension base="xsd:string"> <xsd:attribute ref="name" use="required" /> </xsd:extension> </xsd:simpleContent> </xsd:complexType> </xsd:element> </pre></div><p> will be translated to the following CDuce type: </p><div class="code"><pre> # #print_type Mails.header;; <header name = String>String </pre></div><p> Note that the type of the element content <em>is not a sequence</em> unless the translation of the XML Schema types is a sequence itself (as you can notice in the example above). Compare it with the following where the element content is not a sequente, but a single record: </p><div class="code"><pre> # #print_type Mails.Date;; <Date>{ positive = Bool; year = Int; month = Int; day = Int; hour = Int; minute = Int; second = Int; timezone =? { positive = Bool; hour = Int; minute = Int } } </pre></div><p>XML Schema wildcards (<tt>xsd:any</tt>) and nullable elements (<tt>xsi:nil</tt>) are supported.</p></li><li><p> XML Schema <b>attribute group definitions</b> are mapped to record types containing one field for each attribute declarations contained in the group. <tt>use</tt> constraints are respected: optional attributes are mapped to optional fields, required attributes to required fields. XML Schema attribute wildcards are partly supported; they simply produce open record types instead of closed one, but the actual constraints of the wildcards are discarded. </p><p> The following XML Schema attribute group declaration: </p><div class="code"><pre> <xsd:attributeGroup name="mimeTypeAttributes"> <xsd:attribute name="type" type="mimeTopLevelType" use="required" /> <xsd:attribute name="subtype" type="xsd:string" use="required" /> </xsd:attributeGroup> </pre></div><p> will thus be mapped to the following CDuce type: </p><div class="code"><pre> # #print_type Mails.mimeTypeAttributes;; { type = [ 'image' | 'text' | 'application' | 'audio' | 'message' | 'multipart' | 'video' ]; subtype = String } </pre></div></li><li><p> XML Schema <b>model group definitions</b> are mapped to CDuce sequence types. <tt>minOccurs</tt> and <tt>maxOccurs</tt> constraints are respected, using CDuce recursive types to represent <tt>unbounded</tt> repetition (i.e. Kleene star). </p><p><tt>all</tt> constraints, also known as <em>interleaving constraints</em>, can't be expressed in the CDuce type system avoiding type sizes explosion. Thus, this kind of content models are normalized and considered, in the type system, as sequence types (the validator will reorder the actual XML documents). </p><p><b>Mixed content models</b> are supported. </p><p> As an example, the following XML Schema model group definition: </p><div class="code"><pre> <xsd:group name="attachmentContent"> <xsd:sequence> <xsd:element name="mimetype"> <xsd:complexType> <xsd:attributeGroup ref="mimeTypeAttributes" /> </xsd:complexType> </xsd:element> <xsd:element name="content" type="xsd:string" minOccurs="0" /> </xsd:sequence> </xsd:group> </pre></div><p> will be mapped to the following CDuce type: </p><div class="code"><pre> # #print_type Mails.attachmentContent;; [ X1 <content}>String | X1 ] where X1 = <mimetype S.mimeTypeAttributes>[ ] </pre></div></li></ul></div><div><h2><a name="validation">XML Schema validation</a></h2><p> The processes of XML Schema validation and assessment check that an XML Schema instance document is valid with respect to an XML Schema document and add missing information such as default values. The CDuce's notion of Schema validation is a bit different. </p><p> CDuce permits to have XML values made of arbitrary types, for example you can have XML elements which have integer attributes. Still, this feature is rarely used because the function used to load XML documents (<b><tt>load_xml</tt></b>) returns XML values which have as leaves values of type PCDATA. </p><p> Once you have imported an XML Schema in CDuce, you can use it to validate an XML value returned by <b><tt>load_xml</tt></b> against an XML Schema component defined in it. The process of validation will basically build a CDuce value which has the type corresponding to the conversion of the XML Schema type of the component used in validation to a CDuce type. The conversion is the same described in the previous secion. Note that is not strictly necessary that the input XML value comes from <b><tt>load_xml</tt></b> it's enough that it has PCDATA values as leaves. </p><p> During validation PCDATA strings are parsed to build CDuce values corresponding to XML Schema simple types and whitespace are handled as specified by XML Schema <b><tt>whiteSpace</tt></b> facet. For example, validating the <b><tt>1234567890 </tt></b><em>PCDATA string</em> against an <b><tt>xsd:integer</tt></b> simple type will return the CDuce value <b><tt>1234567890</tt></b> typed with type <b><tt>Int</tt></b>.<br/> Default values for missing attributes or elements are also added where specified. </p><p> You can use the <b><tt>validate</tt></b> keyword to perform validation in CDuce program. The syntax is as follows:<br/><b><tt>validate <expr> with <schema_ref></tt></b><br/> where schema_ref is defined as described in <a href="#import">XML Schema components import</a>. Same ambiguity rules will apply here. </p><p> More in detail, validation can be applied to different kind of CDuce values depending on the type of Schema component used for validation. </p><ul><li><p> The typical use of validation is to validate against <b>element declaration</b>. In such a case validate should be invoked on an XML CDuce value as in the following example. </p><div class="code"><pre> # let xml = <Date>"2003-10-15T15:44:01Z" in validate xml with Mails.Date;; - : S.Date = <Date> { time_kind=`dateTime; positive=`true; year=2003; month=10; day=15; hour=15; minute=44; second=1; timezone={ positive=`true; hour=0; minute=0 } } </pre></div><p> The tag of the given element is checked for consistency with the element declaration; attributes and content are checked against the Schema type declared for the element. </p></li><li><p> Sometimes you may want to validate an element against an XML Schema <b>complex type</b> without having to use element declarations. This case is really similar to the previous one with the difference that the Schema component you should use is a complex type declaration, you can apply such a validation to any XML value. The other important difference is that the tag name of the given value is completely ignored. </p><p> As an example: </p><div class="code"><pre> # let xml = load_xml "envelope.xml" ;; val xml : Any = <ignored_tag From="fake@microsoft.com">[ <From>[ 'user@unknown.domain.org' ] <To>[ 'user@cduce.org' ] <Date>[ '2003-10-15T15:44:01Z' ] <Subject>[ 'I desperately need XML Schema support in CDuce' ] <header name="Reply-To">[ 'bill@microsoft.com' ] ] # validate xml with Mails.envelopeType;; - : S.envelopeType = <ignored_tag From="fake@microsoft.com">[ <From>[ 'user@unknown.domain.org' ] <To>[ 'user@cduce.org' ] <Date> { time_kind=`dateTime; positive=`true; year=2003; month=10; day=15; hour=15; minute=44; second=1; timezone={ positive=`true; hour=0; minute=0 } } <Subject>[ 'I desperately need XML Schema support in CDuce' ] <header name="Reply-To">[ 'bill@microsoft.com' ] ] </pre></div></li><li><p> Similarly you may want to validate against a <b>model group</b>. In this case you can validate CDuce's sequences against model groups. Given sequences will be considered as content of XML elements. </p><p> As an example: </p><div class="code"><pre> # let xml = load_xml "attachment.xml";; val xml : Any = <ignored_tag ignored_attribute="foo">[ <mimetype type="application"; subtype="msword">[ ] <content>[ '\n ### removed by spamoracle ###\n ' ] ] # let content = match xml with <_>cont -> cont | _ -> raise "failure";; val content : Any = [ <mimetype type="application"; subtype="msword">[ ] <content>[ '\n ### removed by spamoracle ###\n ' ] ] # validate content with Mails.attachmentContent;; - : Mails.attachmentContent = [ <mimetype type="application"; subtype="msword">[ ] <content>[ '\n ### removed by spamoracle ###\n ' ] ] </pre></div></li><li><p> Finally is possible to validate records against <b>attribute groups</b>. All required attributes declared in the attribute group should have corresponding fields in the given record. The content of each of them is validate against the simple type defined for the corresponding attribute in the attribute group. Non required fields are added if missing using the corresponding default value (if any). </p><p> As an example: </p><div class="code"><pre> # let record = { type = "image"; subtype = "png" };; val record : { type = [ 'image' ] subtype = [ 'png' ] } = { type="image" subtype="png" } # validate record with Mails.mimeTypeAttributes ;; - : { type = [ 'image' | 'text' | ... ] subtype = String } = { type="image" subtype="png" } </pre></div></li></ul></div><div><h2><a name="print_xml">XML Schema instances output</a></h2><p> It is possible to use the normal <tt>print_xml</tt> and <tt>print_xml_utf8</tt> built-in functions to print values resulting from XML Schema validation. </p></div><div><h2><a name="nonsupp">Unsupported XML Schema features</a></h2><p> The support for XML Schema embedded in CDuce does not attempt to cover the full XML Schema specification. In particular, imported schemas are not checked to be valid. You can use for instance this <a href="http://apps.gotdotnet.com/xmltools/xsdvalidator/"> on-line validator</a> to check validity of a schema. </p><p> Also, some features from the XML Schema specification are not or only partially supported. Here is a non-exhaustive list of limitations: </p><ul><li> Substitution groups. </li><li> Some facets (pattern, totalDigits, fractionDigits). </li><li><tt><redefine></tt> (inclusion of an XML Schema with modifications). </li><li><tt>xsi:type</tt>. </li></ul></div><div class="meta"><p><a href="sitemap.html">Site map</a></p></div><div class="smallbox"><p><a href="index.html">CDuce: documentation</a>: <a href="manual.html">User's manual</a>: XML Schema</p><p><a href="namespaces.html"><img class="icon" width="16" alt="Previous page:" height="16" src="img/left.gif"/> XML Namespaces</a> <a href="manual_schema_samples.html"><img class="icon" width="16" alt="Next page:" height="16" src="img/right.gif"/> XML Schema sample documents</a></p></div></div></td></tr></table></body></html>