[1X1 Introduction and Example[0X The main purpose of the [5XGAPDoc[0m package is to define a file format for documentation of [5XGAP[0m-programs and -packages (see [GAP]). The problem is that such documentation should be readable in several output formats. For example it should be possible to read the documentation inside the terminal in which [5XGAP[0m is running (a text mode) and there should be a printable version in high typesetting quality (produced by some version of TeX). It is also popular to view [5XGAP[0m's online help with a Web-browser via an HTML-version of the documentation. Nowadays one can use LaTeX and standard viewer programs to produce and view on the screen [10Xdvi[0m- or [10Xpdf[0m-files with full support of internal and external hyperlinks. Certainly there will be other interesting document formats and tools in this direction in the future. Our aim is to find a [13Xformat for writing[0m the documentation which allows a relatively easy translation into the output formats just mentioned and which hopefully makes it easy to translate to future output formats as well. To make documentation written in the [5XGAPDoc[0m format directly usable, we also provide a set of programs, called converters, which produce text-, hyperlinked LaTeX- and HTML-output versions of a [5XGAPDoc[0m document. These programs are developed by the first named author. They run completely inside [5XGAP[0m, i.e., no external programs are needed. You only need [10Xlatex[0m and [10Xpdflatex[0m to process the LaTeX output. These programs are described in Chapter [14X5[0m. [1X1.1 XML[0X The definition of the [5XGAPDoc[0m format uses XML, the "eXtendible Markup Language". This is a standard (defined by the W3C consortium, see [7Xhttp://www.w3c.org[0m) which lays down a syntax for adding markup to a document or to some data. It allows to define document structures via introducing markup [13Xelements[0m and certain relations between them. This is done in a [13Xdocument type definition[0m. The file [11Xgapdoc.dtd[0m contains such a document type definition and is the central part of the [5XGAPDoc[0m package. The easiest way for getting a good idea about this is probably to look at an example. The Appendix [14XA[0m contains a short but complete [5XGAPDoc[0m document for a fictitious share package. In the next section we will go through this document, explain basic facts about XML and the [5XGAPDoc[0m document type, and give pointers to more details in later parts of this documentation. In the last Section [14X1.3[0m of this introductory chapter we try to answer some general questions about the decisions which lead to the [5XGAPDoc[0m package. [1X1.2 A complete example[0X In this section we recall the lines from the example document in Appendix [14XA[0m and give some explanations. [4X------------------------ from 3k+1.xml -------------------------[0X [4X<?xml version="1.0" encoding="UTF-8"?> [0X [4X------------------------------------------------------------------[0X This line just tells a human reader and computer programs that the file is a document with XML markup and that the text is encoded in the UTF-8 character set (other common encodings are ASCII or ISO-8895-X encodings). [4X------------------------ from 3k+1.xml -------------------------[0X [4X<!-- A complete "fake package" documentation [0X [4X $Id: intro.xml,v 1.13 2007/10/04 22:02:12 gap Exp $[0X [4X-->[0X [4X------------------------------------------------------------------[0X Everything in a XML file between "[10X<!--[0m" and "[10X-->[0m" is a comment and not part of the document content. [4X------------------------ from 3k+1.xml -------------------------[0X [4X<!DOCTYPE Book SYSTEM "gapdoc.dtd">[0X [4X------------------------------------------------------------------[0X This line says that the document contains markup which is defined in the system file [11Xgapdoc.dtd[0m and that the markup obeys certain rules defined in that file (the ending [11Xdtd[0m means "document type definition"). It further says that the actual content of the document consists of an element with name "Book". And we can really see that the remaining part of the file is enclosed as follows: [4X------------------------ from 3k+1.xml -------------------------[0X [4X<Book Name="3k+1">[0X [4X [...] (content omitted)[0X [4X</Book>[0X [4X------------------------------------------------------------------[0X This demonstrates the basics of the markup in XML. This part of the document is an "element". It consists of the "start tag" [10X<Book Name="3k+1">[0m, the "element content" and the "end tag" [10X</Book>[0m (end tags always start with [10X</[0m). This element also has an "attribute" [10XName[0m whose "value" is [10X3k+1[0m. If you know HTML, this will look familiar to you. But there are some important differences: The element name [10XBook[0m and attribute name [10XName[0m are [13Xcase sensitive[0m. The value of an attribute must [13Xalways[0m be enclosed in quotes. In XML [13Xevery[0m element has a start and end tag (which can be combined for elements defined as "empty", see for example [10X<TableOfContents/>[0m below). If you know LaTeX, you are familiar with quite different types of markup, for example: The equivalent of the [10XBook[0m element in LaTeX is [10X\begin{document} ... \end{document}[0m. The sectioning in LaTeX is not done by explicit start and end markup, but implicitly via heading commands like [10X\section[0m. Other markup is done by using braces [10X{}[0m and putting some commands inside. And for mathematical formulae one can use the [10X$[0m for the start [13Xand[0m the end of the markup. In XML [13Xall[0m markup looks similar to that of the [10XBook[0m element. The content of the book starts with a title page. [4X------------------------ from 3k+1.xml -------------------------[0X [4X<TitlePage>[0X [4X <Title>The <Package>ThreeKPlusOne</Package> Package</Title>[0X [4X <Version>Version 42</Version>[0X [4X <Author>Dummy Authör[0X [4X <Email>3kplusone@dev.null</Email>[0X [4X </Author>[0X [4X[0X [4X <Copyright>©right; 2000 The Author. <P/>[0X [4X You can do with this package what you want.<P/> Really.[0X [4X </Copyright>[0X [4X</TitlePage>[0X [4X------------------------------------------------------------------[0X The content of the [10XTitlePage[0m element consists again of elements. In Chapter [14X3[0m we describe which elements are allowed within a [10XTitlePage[0m and that their ordering is prescribed in this case. In the (stupid) name of the author you see that a German umlaut is used directly (in ISO-latin1 encoding). Contrary to LaTeX- or HTML-files this markup does not say anything about the actual layout of the title page in any output version of the document. It just adds information about the [13Xmeaning[0m of pieces of text. Within the [10XCopyright[0m element there are two more things to learn about XML markup. The [10X<P/>[0m is a complete element. It is a combined start and end tag. This shortcut is allowed for elements which are defined to be always "empty", i.e., to have no content. You may have already guessed that [10X<P/>[0m is used as a paragraph separator. Note that empty lines do not separate paragraphs (contrary to LaTeX). The other construct we see here is [10X©right;[0m. This is an example of an "entity" in XML and is a macro for some substitution text. Here we use an entity as a shortcut for a complicated expression which makes it possible that the term [13Xcopyright[0m is printed as some text like [10X(C)[0m in text terminal output and as a copyright character in other output formats. In [5XGAPDoc[0m we predefine some entities. Certain "special characters" must be typed via entities, for example "<", ">" and "&" to avoid a misinterpretation as XML markup. It is possible to define additional entities for your document inside the [10X<!DOCTYPE ...>[0m declaration, see [14X2.2-3[0m. Note that elements in XML must always be properly nested, as in this example. A construct like [10X<a><b>...</a></b>[0m is [13Xnot[0m allowed. [4X------------------------ from 3k+1.xml -------------------------[0X [4X<TableOfContents/>[0X [4X------------------------------------------------------------------[0X This is another example of an "empty element". It just means that a table of contents for the whole document should be included into any output version of the document. After this the main text of the document follows inside certain sectioning elements: [4X------------------------ from 3k+1.xml -------------------------[0X [4X<Body>[0X [4X <Chapter> <Heading>The <M>3k+1</M> Problem</Heading>[0X [4X <Section Label="sec:theory"> <Heading>Theory</Heading>[0X [4X [...] (content omitted)[0X [4X </Section>[0X [4X <Section> <Heading>Program</Heading>[0X [4X [...] (content omitted) [0X [4X </Section>[0X [4X </Chapter>[0X [4X</Body>[0X [4X------------------------------------------------------------------[0X These elements are used similarly to "\chapter" and "\section" in LaTeX. But note that the explicit end tags are necessary here. The sectioning commands allow to assign an optional attribute "Label". This can be used for referring to a section inside the document. The text of the first section starts as follows. The whitespace in the text is unimportant and the indenting is not necessary. [4X------------------------ from 3k+1.xml -------------------------[0X [4X[0X [4X Let <M>k \in &NN;</M> be a natural number. We consider the[0X [4X sequence <M>n(i, k), i \in &NN;,</M> with <M>n(1, k) = k</M> and[0X [4X else [0X [4X------------------------------------------------------------------[0X Here we come to the interesting question how to type mathematical formulae in a [5XGAPDoc[0m document. We did not find any alternative for writing formulae in TeX syntax. (There is MATHML, but even simple formulae contain a lot of markup, become quite unreadable and they are cumbersome to type. Furthermore there seem to be no tools available which translate such formulae in a nice way into TeX and text.) So, formulae are essentially typed as in LaTeX. (Actually, it is also possible to type unicode characters of some mathematical symbols directly, or via an entity like the [10X&NN;[0m above.) There are three types of elements containing formulae: "M", "Math" and "Display". The first two are for in-text formulae and the third is for displayed formulae. Here "M" and "Math" are equivalent, when translating a [5XGAPDoc[0m document into LaTeX. But they are handled differently for terminal text (and HTML) output. For the content of an "M"-element there are defined rules for a translation into well readable terminal text. More complicated formulae are in "Math" or "Display" elements and they are just printed as they are typed in text output. So, to make a section well readable inside a terminal window you should try to put as many formulae as possible into "M"-elements. In our example text we used the notation [10Xn(i, k)[0m instead of [10Xn_i(k)[0m because it is easier to read in text mode. See Sections [14X2.2-2[0m and [14X3.9[0m for more details. A few lines further on we find two non-internal references. [4X------------------------ from 3k+1.xml -------------------------[0X [4X problem, see <Cite Key="Wi98"/> or[0X [4X <URL>http://mathsrv.ku-eichstaett.de/MGF/homes/wirsching/</URL>[0X [4X------------------------------------------------------------------[0X The first within the "Cite"-element is the citation of a book. In [5XGAPDoc[0m we use the widely used BibTeX database format for reference lists. This does not use XML but has a well documented structure which is easy to parse. And many people have collections of references readily available in this format. The reference list in an output version of the document is produced with the empty element [4X------------------------ from 3k+1.xml -------------------------[0X [4X<Bibliography Databases="3k+1" />[0X [4X------------------------------------------------------------------[0X close to the end of our example file. The attribute "Databases" give the name(s) of the database ([11X.bib[0m) files which contain the references. Putting a Web-address into an "URL"-element allows one to create a hyperlink in output formats which allow this. The second section of our example contains a special kind of subsection defined in [5XGAPDoc[0m. [4X------------------------ from 3k+1.xml -------------------------[0X [4X <ManSection> [0X [4X <Func Name="ThreeKPlusOneSequence" Arg="k[, max]"/>[0X [4X <Description>[0X [4X This function computes for a natural number <A>k</A> the[0X [4X beginning of the sequence <M>n(i, k)</M> defined in section[0X [4X <Ref Sect="sec:theory"/>. The sequence stops at the first[0X [4X <M>1</M> or at <M>n(<A>max</A>, k)</M>, if <A>max</A> is[0X [4X given.[0X [4X<Example>[0X [4Xgap> ThreeKPlusOneSequence(101);[0X [4X"Sorry, not yet implemented. Wait for Version 84 of the package"[0X [4X</Example>[0X [4X </Description>[0X [4X </ManSection>[0X [4X------------------------------------------------------------------[0X A "ManSection" contains the description of some function, operation, method, filter and so on. The "Func"-element describes the name of a [13Xfunction[0m (there are also similar elements "Oper", "Meth", "Filt" and so on) and names for its arguments, optional arguments enclosed in square brackets. See Section [14X3.4[0m for more details. In the "Description" we write the argument names as "A"-elements. A good description of a function should usually contain an example of its use. For this there are some verbatim-like elements in [5XGAPDoc[0m, like "Example" above (here, clearly, whitespace matters which causes a slightly strange indenting). The text contains an internal reference to the first section via the explicitly defined label [10Xsec:theory[0m. The first section also contains a "Ref"-element which refers to the function described here. Note that there is no explicit label for such a reference. The pair [10X<Func Name="ThreeKPlusOneSequence" Arg="k[, max]"/>[0m and [10X<Ref Func="ThreeKPlusOneSequence"/>[0m does the cross referencing (and hyperlinking if possible) implicitly via the name of the function. Here is one further element from our example document which we want to explain. [4X------------------------ from 3k+1.xml -------------------------[0X [4X<TheIndex/>[0X [4X------------------------------------------------------------------[0X This is again an empty element which just says that an output version of the document should contain an index. Many entries for the index are generated automatically because the "Func" and similar elements implicitly produce such entries. It is also possible to include explicit additional entries in the index. [1X1.3 Some questions[0X [8XAre those XML files too ugly to read and edit?[0m Just have a look and decide yourself. The markup needs more characters than most TeX or LaTeX markup. But the structure of the document is easier to see. If you configure your favorite editor well, you do not need more key strokes for typing the markup than in LaTeX. [8XWhy do we not use LaTeX alone?[0m LaTeX is good for writing books. But LaTeX files are generally difficult to parse and to process to other output formats like text for browsing in a terminal window or HTML (or new formats which may become popular in the future). [5XGAPDoc[0m markup is one step more abstract than LaTeX insofar as it describes meaning instead of appearance of text. The inner workings of LaTeX are too complicated to learn without pain, which makes it difficult to overcome problems that occur occasionally. [8XWhy XML and not a newly defined markup language?[0m XML is a well defined standard that is more and more widely used. Lots of people have thought about it. Years of experience with SGML went into the design. It is easy to explain, easy to parse and lots of tools are available, there will be more in the future.