Sophie

Sophie

distrib > Mandriva > 2010.0 > i586 > media > contrib-release > by-pkgid > a2da5eab8fb68605fe995d94e514eeb0 > files > 23

cduce-0.5.3-2mdv2010.0.i586.rpm

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"  "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><title>CDuce: XML Schema</title><meta content="text/html; charset=iso-8859-1" http-equiv="Content-Type"/><link type="text/css" rel="stylesheet" href="cduce.css"/></head><body style="margin: 0; padding : 0;"><table border="0" width="100&#37;" cellspacing="10" cellpadding="0"><tr><td valign="top" align="left" style="width:20&#37;;"><div class="leftbar" id="leftbar"><div class="smallbox"><ul><li><a href="#overview">Overview</a></li><li><a href="#primer">XML Schema components (micro) introduction</a></li><li><a href="#import">XML Schema components import</a></li><li><a href="#directives">Toplevel directives</a></li><li><a href="#mapping">XML Schema &#8594; CDuce mapping</a></li><li><a href="#validation">XML Schema validation</a></li><li><a href="#print_xml">XML Schema instances output</a></li><li><a href="#nonsupp">Unsupported XML Schema features</a></li></ul></div></div></td><td><h1>XML Schema</h1><div class="mainpanel"><div class="smallbox"><p><a href="index.html">CDuce: documentation</a>: <a href="manual.html">User's manual</a>: XML Schema</p><p><a href="namespaces.html"><img class="icon" width="16" alt="Previous page:" height="16" src="img/left.gif"/> XML Namespaces</a> <a href="manual_schema_samples.html"><img class="icon" width="16" alt="Next page:" height="16" src="img/right.gif"/> XML Schema sample documents</a></p></div><div><h2><a name="overview">Overview</a></h2><p>
    CDuce partially supports <a href="http://www.w3.org/XML/Schema">XML
      Schema</a> Recommendations (<a href="http://www.w3.org/TR/xmlschema-0/">Primer</a>, <a href="http://www.w3.org/TR/xmlschema-1/">Structures</a>, <a href="http://www.w3.org/TR/xmlschema-2/">Datatypes</a>). Using this CDuce
    feature it is possible to manipulate XML documents whose leaves are typed
    values like integers, dates, binary data, and so on.
  </p><p>
    CDuce supports XML Schema by implementing the following features:
  </p><ul><li><a href="#import">XML Schema components import</a></li><li><a href="#validation">XML Schema validation</a></li><li><a href="#print_xml">XML Schema instances output</a></li></ul><p>
    This manual page describes how to use these features in CDuce, all the
    documents used in the examples are available in the manual section: <a href="manual_schema_samples.html">XML Schema sample documents</a>.
  </p><div class="note"><b>Note:  </b>
    The support for XML Schema does not currently interact well with
    separate compilation. When a CDuce unit <b><tt><strong class="highlight">script</strong>.cd</tt></b>
    which uses an XML Schema
    is compiled, the resulting <b><tt><strong class="highlight">script</strong>.cdo</tt></b> object
    refers to the XML Schema by name. That is, when these units
    are run, the XML Schema must still be available from the current
    directory and must not have been changed since compilation.
  </div></div><div><h2><a name="primer">XML Schema components (micro) introduction</a></h2><p>
    An XML Schema document could define four different kinds of component, each
    of them could be imported in CDuce and used as CDuce types:
  </p><ul><li><b>Type definitions</b><br/>
      A type definition defines either a simple type or a complex type. The
      former could be used to type more precisely the string content of an
      element. You can think at it as a refinement of #PCDATA. XML Schema
      provides a set of <a href="http://www.w3.org/TR/xmlschema-2/#built-in-datatypes">predefined
	simple types</a> and a way to define new simple types.  The latter could
      be used to constraint the content model and the attributes of an XML
      element. An XML Schema complex type is strictly more expressive than a DTD
      element declaration.
    </li><li><b>Element declarations</b>
      An element declaration links an attribute name to a complex type.
      Optionally, if the type is a simple type, it can constraints the set of
      possible values for the element mandating a fixed value or providing a
      default value.
    </li><li><b>Attribute group definitions</b>
      An attribute group definitions links a set of attribute declarations to a
      name which can be referenced from other XML Schema components.
    </li><li><b>Model group definitions</b>
      A model group definition links a name to a constraint over the complex
      content of an XML element. The linked name can be referenced from other
      XML Schema components.
    </li></ul><p>
   Attribute declaration currently don't produce any CDuce type
   and can't be used for validation themselves.
  </p></div><div><h2><a name="import">XML Schema components import</a></h2><p>
    In order to import XML Schema components in CDuce, you first need to tell
    CDuce to import an XML Schema document. You can do this using the
    <b><tt>schema</tt></b> keyword to bind an uppercase identifier to a local
    schema document:
  </p><div class="code"><pre>
# <strong class="highlight">schema Mails = &quot;tests/schema/mails.xsd&quot;</strong>;;
Registering schema type: attachmentType
Registering schema type: mimeTopLevelType
Registering schema type: mailsType
Registering schema type: mailType
Registering schema type: bodyType
Registering schema type: envelopeType
Registering schema element: header
Registering schema element: Date
Registering schema element: mails
Registering schema attribute group: mimeTypeAttributes
Registering schema model group: attachmentContent
  </pre></div><p>
    The above declaration will (try to) import all schema components included in
    the schema document <a href="manual_schema_samples.html">mails.xsd</a>
    as CDuce types. You can reference them using the
    dot operator, e.g. <b><tt>S.mails</tt></b>.
  </p><p>
    XML Schema permits ambiguity in components name. CDuce chooses
    to resolve references to Schema components in this order:
    elements, types, model groups, attribute group.
  </p><p>
    The result of a schema component reference is an ordinary CDuce type which
    you can use as usual in function definitions, pattern matching and so on.
  </p><div class="code"><pre>
let is_valid_mail (Any -&gt; Bool)
  | <strong class="highlight">Mails.mailType</strong> -&gt; `true
  | _ -&gt; `false
  </pre></div></div><div><p><em><b>Correctness remark:</b> while parsing XML Schema documents, CDuce
      assumes that they're correct with respect to XML Schema recommendations.
      At minimum they're required to be valid with respect to <a href="http://www.w3.org/TR/xmlschema-1/#normative-schemaSchema">XML
	Schema for Schemas</a>. It's recommended that you will check for
      validity your schemas before importing them in CDuce, strange behaviour is
      assured otherwise.
    </em></p></div><div><h2><a name="directives">Toplevel directives</a></h2><p>
    The toplevel directive <b><tt>#env</tt></b> supports schemas, it lists the
    currently defined schemas.
  </p><p>
    The toplevel directive <b><tt>#print_type</tt></b> supports schemas too, it can
    be used to print types corresponding to schema components.
  </p><div class="code"><pre>
# #print_type <strong class="highlight">Mails.bodyType</strong>;;
[ Char* ]
  </pre></div><p>
    For more information have a look at the manual section about <a href="manual_interpreter.html">toplevel directives</a>.
  </p></div><div><h2><a name="mapping">XML Schema &#8594; CDuce mapping</a></h2><ul><li><p>
	XML Schema <b>predefined simple types</b> are mapped to CDuce types
	directly in the CDuce implementation preserving as most as possible XML
	Schema constraints. The table below lists the most significant mappings.
      </p><table width="100&#37;"><tr><td align="center"><table border="1"><col/><col/><tr><th><b>XML Schema predefined simple type</b></th><th><b>CDuce type</b></th></tr><tr><td><code>duration</code>, <code>dateTime</code>, <code>time</code>,
	    <code>date</code>, <code>gYear</code>, <code>gMonth</code>, ...
	  </td><td>
	    closed record types with some of the following fields (depending on
	    the Schema type): <code>year</code>, <code>month</code>,
	    <code>day</code>, <code>hour</code>, <code>minute</code>,
	    <code>second</code>, <code>timezone</code></td></tr><tr><td><code>boolean</code></td><td><code>Bool</code></td></tr><tr><td><code>anySimpleType</code>, <code>string</code>,
	    <code>base64Binary</code>, <code>hexBinary</code>,
	    <code>anyURI</code></td><td><code>String</code></td></tr><tr><td><code>integer</code></td><td><code>Int</code></td></tr><tr><td><code>nonPositiveInteger</code>, <code>negativeInteger</code>,
	    <code>nonNegativeInteger</code>, <code>positiveInteger</code>,
	    <code>long</code>, <code>int</code>, <code>short</code>,
	    <code>byte</code></td><td>integer intervals with the appropriate limits</td></tr><tr><td><code>string</code>, <code>normalizedString</code>, and the other
	    types derived (directly or indirectly) by restriction from string
	  </td><td>String</td></tr><tr><td><code>NMTOKENS</code>, <code>IDREFS</code>, <code>ENTITIES</code></td><td><code>[String*]</code></td></tr><tr><td><code>decimal</code>,<code>float</code>,<code>double</code></td><td><code>Float</code></td></tr><tr><td>
	    (<b>Not properly supported</b>)<br/><code>decimal</code>,
	    <code>float</code>, <code>double</code>, <code>NOTATION</code>,
	    <code>QName</code></td><td><code>String</code></td></tr></table></td></tr></table><p><b>Simple type definitions</b> are built from the above types following
	the XML Schema derivation rules.
      </p></li><li><p>
	XML Schema <b>complex type definitions</b> are mapped to CDuce types
	representing XML elements which can have any tag, but whose attributes
	and content are constrained to be valid with respect to the original
	complex type.
      </p><p>
	As an example, the following XML Schema complex type (a simplified
	version of the homonymous <b><tt>envelopeType</tt></b> defined in <a href="manual_schema_samples.html">mails.xsd</a>):
      </p><div class="code"><pre>
 &lt;xsd:complexType name=&quot;envelopeType&quot;&gt;
  &lt;xsd:sequence&gt;
   &lt;xsd:element name=&quot;From&quot; type=&quot;xsd:string&quot;/&gt;
   &lt;xsd:element name=&quot;To&quot; type=&quot;xsd:string&quot;/&gt;
   &lt;xsd:element name=&quot;Date&quot; type=&quot;xsd:dateTime&quot;/&gt;
   &lt;xsd:element name=&quot;Subject&quot; type=&quot;xsd:string&quot;/&gt;
  &lt;/xsd:sequence&gt;
 &lt;/xsd:complexType&gt;
</pre></div><p>
	will be mapped to an XML CDuce type which must have a <tt>From</tt>
	attribute of type String and four children. Among them the <tt>Date</tt>
	children must be an XML element containing a record which represents a
	<tt>dateTime</tt> Schema type.
      </p><div class="code"><pre>
# #print_type Mails.envelopeType;;
&lt;(Any)&gt;[
  &lt;From&gt;String
  &lt;To&gt;String
  &lt;Date&gt;{
    positive = Bool;
    year = Int; month = Int; day = Int;
    hour = Int; minute = Int; second = Int;
    timezone =? { positive = Bool; hour = Int; minute = Int }
  }
  &lt;Subject}&gt;String
]
</pre></div></li><li><p>
	XML Schema <b>element declarations</b> can bound an XML element either
	to a complex type or to a simple type. In the former case the conversion
	is almost identical as what we have seen for complex type conversion.
	The only difference is that this time element's tag must correspond to
	the name of the XML element in the schema element declaration, whereas
	previously it was <b><tt>Any</tt></b> type.
      </p><p>
	In the latter case (element with simple type content), the corresponding
	CDuce types is an element type. Its tag must correspond to the name of
	the XML element in the schema element declaration; its content type its
	the CDuce translation of the simple type provided in the element
	declaration.
      </p><p>
	For example, the following XML Schema element (corresponding to the
	homonymous element defined in <a href="manual_schema_samples.html">mails.xsd</a>):
      </p><div class="code"><pre>
&lt;xsd:element name=&quot;header&quot;&gt;
 &lt;xsd:complexType&gt;
  &lt;xsd:simpleContent&gt;
   &lt;xsd:extension base=&quot;xsd:string&quot;&gt;
    &lt;xsd:attribute ref=&quot;name&quot; use=&quot;required&quot; /&gt;
   &lt;/xsd:extension&gt;
  &lt;/xsd:simpleContent&gt;
 &lt;/xsd:complexType&gt;
&lt;/xsd:element&gt;
</pre></div><p>
	will be translated to the following CDuce type:
      </p><div class="code"><pre>
# #print_type Mails.header;;
&lt;header name = String&gt;String
</pre></div><p>
	Note that the type of the element content <em>is not a sequence</em>
	unless the translation of the XML Schema types is a sequence itself (as
	you can notice in the example above). Compare it with the following
	where the element content is not a sequente, but a single record:
      </p><div class="code"><pre>
# #print_type Mails.Date;;
&lt;Date&gt;{
  positive = Bool;
  year = Int; month = Int; day = Int; hour = Int;
  minute = Int; second = Int;
  timezone =? { positive = Bool; hour = Int; minute = Int }
}
</pre></div><p>XML Schema wildcards (<tt>xsd:any</tt>) 
	and nullable elements (<tt>xsi:nil</tt>) are supported.</p></li><li><p>
	XML Schema <b>attribute group definitions</b> are mapped to record types
	containing one field for each attribute declarations contained in the
	group. <tt>use</tt> constraints are respected: optional attributes are
	mapped to optional fields, required attributes to required
	fields. XML Schema attribute wildcards are partly supported;
	they simply produce open record types instead of closed one,
	but the actual constraints of the wildcards are discarded.
      </p><p>
	The following XML Schema attribute group declaration:
      </p><div class="code"><pre>
&lt;xsd:attributeGroup name=&quot;mimeTypeAttributes&quot;&gt;
 &lt;xsd:attribute name=&quot;type&quot; type=&quot;mimeTopLevelType&quot; use=&quot;required&quot; /&gt;
 &lt;xsd:attribute name=&quot;subtype&quot; type=&quot;xsd:string&quot; use=&quot;required&quot; /&gt;
&lt;/xsd:attributeGroup&gt;
</pre></div><p>
	will thus be mapped to the following CDuce type:
      </p><div class="code"><pre>
# #print_type Mails.mimeTypeAttributes;;
{  type = [
      'image' | 'text' | 'application' | 'audio' | 'message' | 'multipart' | 'video'
    ];
   subtype = String }
      </pre></div></li><li><p>
	XML Schema <b>model group definitions</b> are mapped to CDuce sequence
	types. <tt>minOccurs</tt> and <tt>maxOccurs</tt> constraints are
	respected, using CDuce recursive types to represent <tt>unbounded</tt>
	repetition (i.e. Kleene star).
      </p><p><tt>all</tt> constraints, also known as <em>interleaving
	  constraints</em>, can't be expressed in the CDuce type system avoiding
	type sizes explosion. Thus, this kind of content models are normalized
	and considered, in the type system, as sequence types (the
	  validator will reorder the actual XML documents).
	  
      </p><p><b>Mixed content models</b> are supported.
      </p><p>
	As an example, the following XML Schema model group definition:
      </p><div class="code"><pre>
&lt;xsd:group name=&quot;attachmentContent&quot;&gt;
 &lt;xsd:sequence&gt;
  &lt;xsd:element name=&quot;mimetype&quot;&gt;
   &lt;xsd:complexType&gt;
    &lt;xsd:attributeGroup ref=&quot;mimeTypeAttributes&quot; /&gt;
   &lt;/xsd:complexType&gt;
  &lt;/xsd:element&gt;
  &lt;xsd:element name=&quot;content&quot; type=&quot;xsd:string&quot; minOccurs=&quot;0&quot; /&gt;
 &lt;/xsd:sequence&gt;
&lt;/xsd:group&gt;
</pre></div><p>
	will be mapped to the following CDuce type:
      </p><div class="code"><pre>
# #print_type Mails.attachmentContent;;
[ X1 &lt;content}&gt;String | X1 ] where
X1 = &lt;mimetype S.mimeTypeAttributes&gt;[  ]
</pre></div></li></ul></div><div><h2><a name="validation">XML Schema validation</a></h2><p>
    The processes of XML Schema validation and assessment check that an XML
    Schema instance document is valid with respect to an XML Schema document and
    add missing information such as default values. The CDuce's notion of Schema
    validation is a bit different.
  </p><p>
    CDuce permits to have XML values made of arbitrary types, for example you
    can have XML elements which have integer attributes. Still, this feature is
    rarely used because the function used to load XML documents
    (<b><tt>load_xml</tt></b>) returns XML values which have as leaves values of
    type PCDATA.
  </p><p>
    Once you have imported an XML Schema in CDuce, you can use it to validate an
    XML value returned by <b><tt>load_xml</tt></b> against an XML Schema component
    defined in it. The process of validation will basically build a CDuce value
    which has the type corresponding to the conversion of the XML Schema type of
    the component used in validation to a CDuce type. The conversion is the same
    described in the previous secion. Note that is not strictly necessary that
    the input XML value comes from <b><tt>load_xml</tt></b> it's enough that it has
    PCDATA values as leaves.
  </p><p>
    During validation PCDATA strings are parsed to build CDuce values
    corresponding to XML Schema simple types and whitespace are handled as
    specified by XML Schema <b><tt>whiteSpace</tt></b> facet. For example,
    validating the <b><tt>1234567890 </tt></b><em>PCDATA string</em> against an
    <b><tt>xsd:integer</tt></b> simple type will return the CDuce value
    <b><tt>1234567890</tt></b> typed with type <b><tt>Int</tt></b>.<br/>
    Default values for missing attributes or elements are also added where
    specified.
  </p><p>
    You can use the <b><tt>validate</tt></b> keyword to perform validation in CDuce
    program. The syntax is as follows:<br/><b><tt>validate &lt;expr&gt; with
      &lt;schema_ref&gt;</tt></b><br/> where schema_ref is defined as described
    in <a href="#import">XML Schema components import</a>. Same ambiguity rules
    will apply here.
  </p><p>
    More in detail, validation can be applied to different kind of CDuce values
    depending on the type of Schema component used for validation.
  </p><ul><li><p>
	The typical use of validation is to validate against <b>element
	  declaration</b>. In such a case validate should be invoked on an XML
	CDuce value as in the following example.
      </p><div class="code"><pre>
# let xml = &lt;Date&gt;&quot;2003-10-15T15:44:01Z&quot; in
  validate xml with Mails.Date;;
  - : S.Date =
  &lt;Date&gt; {
    time_kind=`dateTime;
    positive=`true;
    year=2003; month=10; day=15;
    hour=15; minute=44; second=1;
    timezone={ positive=`true; hour=0; minute=0 }
  }
</pre></div><p>
	The tag of the given element is checked for consistency with the
	element declaration; attributes and content are checked against the
	Schema type declared for the element.
      </p></li><li><p>
	Sometimes you may want to validate an element against an XML Schema
	<b>complex type</b> without having to use element declarations. This
	case is really similar to the previous one with the difference that the
	Schema component you should use is a complex type declaration, you can
	apply such a validation to any XML value. The other important difference
	is that the tag name of the given value is completely ignored.
      </p><p>
	As an example:
      </p><div class="code"><pre>
# let xml = load_xml &quot;envelope.xml&quot; ;;       
val xml : Any = &lt;ignored_tag From=&quot;fake@microsoft.com&quot;&gt;[
                  &lt;From&gt;[ 'user@unknown.domain.org' ]
                  &lt;To&gt;[ 'user@cduce.org' ]
                  &lt;Date&gt;[ '2003-10-15T15:44:01Z' ]
                  &lt;Subject&gt;[ 'I desperately need XML Schema support in CDuce' ]
                  &lt;header name=&quot;Reply-To&quot;&gt;[ 'bill@microsoft.com' ]
                  ]
# validate xml with Mails.envelopeType;;
- : S.envelopeType =
    &lt;ignored_tag From=&quot;fake@microsoft.com&quot;&gt;[
      &lt;From&gt;[ 'user@unknown.domain.org' ]
      &lt;To&gt;[ 'user@cduce.org' ]
      &lt;Date&gt; {
	time_kind=`dateTime; 
        positive=`true;
	year=2003; month=10; day=15;
	hour=15; minute=44; second=1;
	timezone={ positive=`true; hour=0; minute=0 }
      }
      &lt;Subject&gt;[ 'I desperately need XML Schema support in CDuce' ]
      &lt;header name=&quot;Reply-To&quot;&gt;[ 'bill@microsoft.com' ]
    ]
</pre></div></li><li><p>
	Similarly you may want to validate against a <b>model group</b>. In this
	case you can validate CDuce's sequences against model groups. Given
	sequences will be considered as content of XML elements.
      </p><p>
	As an example:
      </p><div class="code"><pre>
# let xml = load_xml &quot;attachment.xml&quot;;;
  val xml : Any =
    &lt;ignored_tag ignored_attribute=&quot;foo&quot;&gt;[
      &lt;mimetype type=&quot;application&quot;; subtype=&quot;msword&quot;&gt;[ ]
      &lt;content&gt;[ '\n    ### removed by spamoracle ###\n  ' ]
    ]
# let content = match xml with &lt;_&gt;cont -&gt; cont | _ -&gt; raise &quot;failure&quot;;;
  val content : Any = [
    &lt;mimetype type=&quot;application&quot;; subtype=&quot;msword&quot;&gt;[ ]
    &lt;content&gt;[ '\n    ### removed by spamoracle ###\n  ' ]
  ]
# validate content with Mails.attachmentContent;;
- : Mails.attachmentContent =
    [ &lt;mimetype type=&quot;application&quot;; subtype=&quot;msword&quot;&gt;[ ]
      &lt;content&gt;[ '\n    ### removed by spamoracle ###\n  ' ]
    ]
</pre></div></li><li><p>
	Finally is possible to validate records against <b>attribute groups</b>.
	All required attributes declared in the attribute group should have
	corresponding fields in the given record. The content of each of them is
	validate against the simple type defined for the corresponding attribute
	in the attribute group. Non required fields are added if missing using
	the corresponding default value (if any).
      </p><p>
	As an example:
      </p><div class="code"><pre>
# let record = { type = &quot;image&quot;; subtype = &quot;png&quot; };;
  val record :
    { type = [ 'image' ] subtype = [ 'png' ] } =
      { type=&quot;image&quot; subtype=&quot;png&quot; }
# validate record with Mails.mimeTypeAttributes ;;
- : { type = [ 'image' | 'text' | ... ] subtype = String } =
      { type=&quot;image&quot; subtype=&quot;png&quot; }
</pre></div></li></ul></div><div><h2><a name="print_xml">XML Schema instances output</a></h2><p>
  It is possible to use the normal <tt>print_xml</tt>
  and <tt>print_xml_utf8</tt> built-in functions to print
  values resulting from XML Schema validation.
  </p></div><div><h2><a name="nonsupp">Unsupported XML Schema features</a></h2><p>
The support for XML Schema embedded in CDuce does not attempt
to cover the full XML Schema specification. In particular,
imported schemas are not checked to be valid. You can use
for instance this <a href="http://apps.gotdotnet.com/xmltools/xsdvalidator/">
on-line validator</a> to check validity of a schema.
</p><p>
Also, some features from the XML Schema specification are not or
only partially supported. Here is a non-exhaustive list of limitations:
</p><ul><li>
  Substitution groups.
</li><li>
  Some facets (pattern, totalDigits, fractionDigits).
</li><li><tt>&lt;redefine&gt;</tt> (inclusion of an XML Schema with modifications).
</li><li><tt>xsi:type</tt>.
</li></ul></div><div class="meta"><p><a href="sitemap.html">Site map</a></p></div><div class="smallbox"><p><a href="index.html">CDuce: documentation</a>: <a href="manual.html">User's manual</a>: XML Schema</p><p><a href="namespaces.html"><img class="icon" width="16" alt="Previous page:" height="16" src="img/left.gif"/> XML Namespaces</a> <a href="manual_schema_samples.html"><img class="icon" width="16" alt="Next page:" height="16" src="img/right.gif"/> XML Schema sample documents</a></p></div></div></td></tr></table></body></html>