<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><title>CDuce: First functions</title><meta content="text/html; charset=iso-8859-1" http-equiv="Content-Type"/><link type="text/css" rel="stylesheet" href="cduce.css"/></head><body style="margin: 0; padding : 0;"><table border="0" width="100%" cellspacing="10" cellpadding="0"><tr><td valign="top" align="left" style="width:20%;"><div class="leftbar" id="leftbar"><div class="smallbox"><ul><li><a href="#t2">First functions</a></li><li><a href="#re">Regular Expressions</a></li></ul><p> You can cut and paste the code on this page and test it on the <a href="http://reglisse.ens.fr/cgi-bin/cduce">online interpreter</a>. </p></div></div></td><td><h1>First functions</h1><div class="mainpanel"><div class="smallbox"><p><a href="index.html">CDuce: documentation</a>: <a href="tutorial.html">Tutorial</a>: First functions</p><p><a href="tutorial_getting_started.html"><img class="icon" width="16" alt="Previous page:" height="16" src="img/left.gif"/> Getting started</a> <a href="tutorial_overloading.html"><img class="icon" width="16" alt="Next page:" height="16" src="img/right.gif"/> Overloading</a></p></div><div><h2><a name="t2">First functions</a></h2><p> A first example of transformation is <b><tt>names</tt></b>, which extracts the sequences of all names of parents in a <b><tt>ParentBook</tt></b> element: </p><div class="code"><pre> let names (ParentBook -> [Name*]) <parentbook>x -> (map x with <person ..>[ n _*] -> n) </pre></div><p> The name of the transformation is followed by an <i>interface</i> that states that <b><tt>names</tt></b> is a function from <b><tt>ParentBook</tt></b> elements to (possibly empty) sequences of <b><tt>Name</tt></b> elements. This is obtained by matching the argument of the function against the pattern </p><div class="code"><pre> <parentbook>x </pre></div><p>which binds <b><tt>x</tt></b> to the sequence of person elements forming the parentbook. The operator <b><tt>map</tt></b> applies to each element of a sequence (in this case <b><tt>x</tt></b>) the transformation defined by the subsequent pattern matching. Here <b><tt>map</tt></b> returns the sequence obtained by replacing each person in <b><tt>x</tt></b> by its <b><tt>Name</tt></b> element. Note that we use the pattern </p><div class="code"><pre><person ..>[ n _*], </pre></div><p>to match the person elements: <b><tt>n</tt></b> matches (and captures) the <b><tt>Name</tt></b> element-that is, the first element of the sequence-, <b><tt>_*</tt></b> matches (and discards) the sequence of elements that follow, and <b><tt>person</tt></b> matches the tag of the person. Since elements of type <b><tt>Person</tt></b> contain attributes (actually, just the attribute gender) then we use <b><tt>..</tt></b> to match (and discard) them. This is not necessary for the parentbook elements, but we could have specified it as well as <b><tt><parentbook ..>x</tt></b> since <b><tt>..</tt></b> matches any sequence of attibutes, the empty one as well. </p><p> The interface and the type definitions ensure that the tags will be the expected ones, so we could optimize the code by defining a body that skips the check of the tags: </p><div class="code"><pre> <_> x -> (map x with <_ ..>[ n _*] -> n) </pre></div><p> However this optimization would be useless since it is already done by the implementation (for technical details see <a href="http://www.cduce.org/papers/reg.pdf">this paper</a>) and, of course, it would make the code less readable. If instead of extracting the list of <i>all</i> parents we wanted to extract the sublist containing only parents with exactly two children, then we had to replace <b><tt>transform</tt></b> for <b><tt>map</tt></b>: </p><div class="code"><pre> let names2 (ParentBook -> [Name*]) <parentbook> x -> transform x with <person ..>[ n <children>[Person Person] _*] -> [n] </pre></div><p> While <b><tt>map</tt></b> must be applicable to all the elements of a sequence, <b><tt>transform</tt></b> filters only those that make its pattern succeed. The right-hand sides return sequences which are concatenated in the final result. In this case <b><tt>transform</tt></b> returns the names only of those persons that match the pattern <b><tt><person ..>[ n <children>[Person Person] _*]</tt></b>. Here again, the implementation compiles this pattern exactly as <b><tt><_ ..>[ n <_>[_ _] _*]</tt></b>, and in particular avoids checking that sub-elements of <b><tt><children></tt></b> are of type <b><tt>Person</tt></b> when static-typing enforces this property. </p><p> These first examples already show the essence of CDuce's patterns: all a pattern can do is to decompose values into subcomponents that are either captured by a variable or checked against a type. </p><p> The previous functions return only the names of the outer persons of a <b><tt>ParentBook</tt></b> element. If we want to capture all the <b><tt>name</tt></b> elements in it we have to recursively apply <b><tt>names</tt></b> to the sequence of children: </p><div class="code"><pre> let names (ParentBook -> [Name*]) <parentbook> x -> transform x with <person ..> [ n <children>c _*] -> [n]@(names <parentbook>c) </pre></div><p> where <b><tt>@</tt></b> denotes the concatenation of sequences. Note that in order to recursively call the function on the sequence of children we have to include it in a <b><tt>ParentBook</tt></b> element. A more elegant way to obtain the same behavior is to specify that names can be applied both to <b><tt>ParentBook</tt></b> elements and to <b><tt>Children</tt></b> elements, that is, to the union of the two types denoted by <b><tt>(ParentBook|Children)</tt></b>: </p><div class="code"><pre> let names ( ParentBook|Children -> [Name*] ) <_>x -> transform x with <person ..>[ n c _*] -> [n]@(names c) </pre></div><p> Note here the use of the pattern <b><tt><_></tt></b> at the beginning of the body which makes it possible for the function to work both on <b><tt>ParentBook</tt></b> and on <b><tt>Children</tt></b> elements. </p></div><div><h2><a name="re">Regular Expressions</a></h2><p> In all these functions we have used the pattern <b><tt>_*</tt></b> to match, and thus discard, the rest of a sequence. This is nothing but a particular regular expression over types. Type regexps can be used in patterns to match subsequences of a value. For instance the pattern <b><tt><person ..>[ _ _ Tel+]</tt></b> matches all person elements that specify no <b><tt>Email</tt></b> element and at least one <b><tt>Tel</tt></b> element. It may be useful to bind the sequence captured by a (pattern) regular expression to a variable. But since a regexp is not a type, we cannot write, say, <b><tt>x&Tel+</tt></b>. So we introduce a special notation <b><tt>x::<i>R</i></tt></b> to bind <b><tt>x</tt></b> to the sequence matched by the type regular expression <b><tt><i>R</i></tt></b>. For instance: </p><div class="code"><pre> let domain (Email ->String) <_>[ _*? d::(Echar+ '.' Echar+) ] -> d </pre></div><p> returns the last two parts of the domain of an e-mail (the <b><tt>*?</tt></b> is an ungreedy version of <b><tt>*</tt></b>, see <a href="tutorial_patterns.html#pre">regular expressions patterns</a>). If these ::-captures are used <i>inside</i> the scope of the regular expression operators <b><tt>*</tt></b> or <b><tt>+</tt></b>, or if the same variable appears several times in a regular expression, then the variable is bound to the concatenation of all the corresponding matches. This is one of the distinctive and powerful characteristics of CDuce, since it allows to define patterns that in a single match capture subsequences of non-consecutive elements. For instance: </p><div class="code"><pre> type PhoneItem = {name = String; phones = [String*] } let agendaitem (Person -> PhoneItem) <person ..>[<name>n _ (t::Tel | _)*] -> { name = n ; phones = map t with <tel ..> s ->s } </pre></div><p> transforms a <b><tt>person</tt></b> element into a record value with two fields containing the element's name and the list of all the phone numbers. This is obtained thanks to the pattern <b><tt>(t::Tel | _)*</tt></b> that binds to <b><tt>t</tt></b> the sequence of all <b><tt>Tel</tt></b> elements appearing in the person. By the same rationale the pattern </p><div class="code"><pre> ( w::<tel kind="work">_ | t::<tel kind=?"home">_ | e::<email>_ )* </pre></div><p> partitions the <b><tt>(Tel | Email)*</tt></b> sequence into three subsequences, binding the list of work phone numbers to <b><tt>w</tt></b>, the list of other numbers to <b><tt>t</tt></b>, and the list of e-mails to <b><tt>e</tt></b>. Alternative patterns <b><tt>|</tt></b> follow a first match policy (the second pattern is matched only if the first fails). Thus we can write a shorter pattern that (applied to <b><tt>(Tel|Email)*</tt></b> sequences) is equivalent: </p><div class="code"><pre> ( w::<tel kind="work">_ | t::Tel | e::_ )* </pre></div><p> Both patterns are compiled into </p><div class="code"><pre> ( w::<tel kind="work">_ | t::<tel ..>_ | e::_)* </pre></div><p> since checking the tag suffices to determine if the element is of type <b><tt>Tel</tt></b>. </p><p> Storing phone numbers in integers rather than in strings requires minimal modifications. It suffices to use a pattern regular expression to strip off the possible occurrence of a dash: </p><div class="code"><pre> let agendaitem2 (Person -> {name=String; phones=[Int*]}) <person ..>[ <name>n _ (t::Tel|_)* ] -> { name = n; phones = map t with <tel ..>[(s::'0'--'9'|_)*] -> int_of s } </pre></div><p> In this case <b><tt>s</tt></b> extracts the subsequence formed only by numerical characters, therefore <b><tt>int_of s</tt></b> cannot fail because <b><tt>s</tt></b> has type <b><tt>[ '0'--'9'+ ]</tt></b> (otherwise, the system would have issued a warning) (Actually the type system deduces for <b><tt>s</tt></b> the following type <b><tt>[ '0'--'9'+ '0'--'9'+]</tt></b> (subtype of the former) since there always are at least two digits). </p><h3>First use of overloading</h3><p> Consider the type declaration </p><div class="code"><pre> type PhoneBook = <phonebook>[PhoneItem*] </pre></div><p>If we add a new pattern matching branch in the definition of the function <b><tt>names</tt></b>, we make it work both with <b><tt>ParentBook</tt></b> and <b><tt> PhoneBook</tt></b> elements. This yields the following <i>overloaded</i> function: </p><a name="names3"/><div class="code"><pre> let names3 (ParentBook -> [Name*] ; PhoneBook -> [String*]) | <parentbook> x -> (map x with <person ..>[ n _* ] -> n) | <phonebook> x -> (map x with { name=n } -> n) </pre></div><p> The overloaded nature of <b><tt>names3</tt></b> is expressed by its interface, which states that when the function is applied to a <b><tt>ParentBook</tt></b> element it returns a list of names, while if applied to a <b><tt>PhoneBook</tt></b> element it returns a list of strings. We can factorize the two branches in a unique alternative pattern: </p><div class="code"><pre> let names4 (ParentBook -> [Name*] ; PhoneBook -> [String*]) <_> x -> map x with ( <person ..>[ n _* ] | { name=n } ) -> n </pre></div><p>The interface ensures that the two representations will never mix.</p></div><div class="meta"><p><a href="sitemap.html">Site map</a></p></div><div class="smallbox"><p><a href="index.html">CDuce: documentation</a>: <a href="tutorial.html">Tutorial</a>: First functions</p><p><a href="tutorial_getting_started.html"><img class="icon" width="16" alt="Previous page:" height="16" src="img/left.gif"/> Getting started</a> <a href="tutorial_overloading.html"><img class="icon" width="16" alt="Next page:" height="16" src="img/right.gif"/> Overloading</a></p></div></div></td></tr></table></body></html>