<?xml version="1.0"?> <html xmlns="http://www.w3.org/1999/xhtml"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/><title>rfc2045</title><link rel="stylesheet" href="style.css" type="text/css"/><meta name="generator" content="DocBook XSL Stylesheets V1.73.2"/><link rel="start" href="#rfc2045" title="rfc2045"/><link xmlns="" rel="stylesheet" type="text/css" href="manpage.css"/><meta xmlns="" name="MSSmartTagsPreventParsing" content="TRUE"/><link xmlns="" rel="icon" href="icon.gif" type="image/gif"/><!-- Copyright 1998 - 2007 Double Precision, Inc. See COPYING for distribution information. --></head><body><div class="refentry" lang="en" xml:lang="en"><a id="rfc2045" shape="rect"> </a><div class="titlepage"/><div class="refnamediv"><h2>Name</h2><p>rfc2045 — RFC 2045 (MIME) parsing library</p></div><div class="refsynopsisdiv"><h2>Synopsis</h2><div class="informalexample"><pre class="programlisting" xml:space="preserve"> #include <rfc822.h> #include <rfc2045.h> cc ... -lrfc2045 -lrfc822 </pre></div></div><div class="refsect1" lang="en" xml:lang="en"><a id="id369436" shape="rect"> </a><h2>DESCRIPTION</h2><p> The rfc2045 library parses MIME-formatted messages. The rfc2045 library is used to:</p><p> 1) Parse the structure of a MIME formatted message</p><p> 2) Examine the contents of each MIME section</p><p> 3) Optionally rewrite and reformat the message.</p><div class="refsect2" lang="en" xml:lang="en"><a id="id370453" shape="rect"> </a><h3>Creating an rfc2045 structure</h3><div class="informalexample"><pre class="programlisting" xml:space="preserve"> #include <rfc2045.h> struct rfc2045 *ptr=rfc2045_alloc(); void rfc2045_parse(struct rfc2045 *ptr, const char *txt, size_t cnt); struct rfc2045 *ptr=rfc2045_fromfd(int fd); struct rfc2045 *ptr=rfc2045_fromfp(FILE *fp); void rfc2045_free(struct rfc2045 *ptr); void rfc2045_error(const char *errmsg) { perror(errmsg); exit(0); } </pre></div><p> The <span class="structname">rfc2045</span> structure is created from an existing message. The function <code class="function">rfc2045_alloc</code>() allocates the structure, then <code class="function">rfc2045_parse</code>() is called to initialize the structure based on the contents of a message. <em class="parameter"><code>txt</code></em> points to the contents of the message, and <em class="parameter"><code>cnt</code></em> contains the number of bytes in the message.</p><p> Large messages are parsed by calling <code class="function">rfc2045_parse</code>() multiple number of times, each time passing a portion of the overall message. There is no need to call a separate function after the entire message is parsed -- the <span class="structname">rfc2045</span> structure is created dynamically, on the fly.</p><p> <code class="function">rfc2045_alloc</code>() returns NULL if there was insufficient memory to allocate the structure. The <code class="function">rfc2045_parse</code>() also allocates memory, internally, however no error indication is return in the event of a memory allocation failure. Instead, the function <code class="function">rfc2045_error</code>() is called, with <em class="parameter"><code>errmsg</code></em> set to <code class="literal">"Out of memory"</code>. <code class="function">rfc2045_error</code>() is also called by <code class="function">rfc2045_alloc</code>() - it also calls <code class="function">rfc2045_error</code>(), before returning a NULL pointer.</p><p> The <code class="function">rfc2045_error</code>() function is not included in the rfc2045 library, it must be defined by the application to report the error in some appropriate way. All functions below will use <code class="function">rfc2045_error</code>() to report an error condition (currently only insufficient memory is reported), in addition to returning any kind of an error indicator. Some functions do not return an error indicator, so <code class="function">rfc2045_error</code>() is the only reliable way to detect a failure.</p><p> The <code class="function">rfc2045_fromfd</code>() function initializes an <span class="structname">rfc2045</span> structure from a file descriptor. It is equivalent to calling <code class="function">rfc2045_alloc</code>(), then reading the contents of the given file descriptor, and calling <code class="function">rfc2045_parse</code>(). The rfc2045_fromfp() function initializes an <span class="structname">rfc2045</span> structure from a FILE.</p><p> After the <span class="structname">rfc2045</span> structure is initialized, the functions described below may be used to access and work with the contents of the structure. When the <span class="structname">rfc2045</span> structure is no longer needed, the function <code class="function">rfc2045_free</code>() deallocates and destroys the structure.</p></div><div class="refsect2" lang="en" xml:lang="en"><a id="id345091" shape="rect"> </a><h3>Structure of a MIME message</h3><div class="informalexample"><pre class="programlisting" xml:space="preserve"> struct rfc2045 { struct rfc2045 *parent; struct rfc2045 *firstpart; struct rfc2045 *next; int isdummy; int rfcviolation; } ; </pre></div><p>The <span class="structname">rfc2045</span> structure has many fields, only some are publicly documented. A MIME message is represented by a recursive tree of linked <span class="structname">rfc2045</span> structures. Each instance of the <span class="structname">rfc2045</span> structure represents a single MIME section of a MIME-formatted message.</p><p> The top-level structure that represents the entire message is created by the <code class="function">rfc2045_alloc</code>() function. The remaining structures are created dynamically by <code class="function">rfc2045_parse</code>(). Any <span class="structname">rfc2045</span> structure, except ones whose <em class="structfield"><code>isdummy</code></em> flag is set, may be used as an argument to any function described in the following chapters.</p><p> The <em class="structfield"><code>rfcviolation</code></em> field in the top-level <span class="structname">rfc2045</span> indicates any errors found while parsing the MIME message. <span class="structname">rfcviolation</span> is a bitmask of the following flags:</p><div class="variablelist"><dl><dt><span class="term"><span class="errorcode">RFC2045_ERR8BITHEADER</span></span></dt><dd><p> Illegal 8-bit characters in MIME headers.</p></dd><dt><span class="term"><span class="errorcode">RFC2045_ERR8BITCONTENT</span></span></dt><dd><p> Illegal 8-bit contents of a MIME section that declared a 7bit transfer encoding.</p></dd><dt><span class="term"><span class="errorcode">RFC2045_ERR2COMPLEX</span></span></dt><dd><p> The message has too many MIME sections, this is a potential denial-of-service attack.</p></dd><dt><span class="term"><span class="errorcode">RFC2045_ERRBADBOUNDARY</span></span></dt><dd><p> Ambiguous nested multipart MIME boundary strings. (Nested MIME boundary strings where one string is a prefix of another string).</p></dd></dl></div><p> In each <span class="structname">rfc2045</span> structure that represents a multipart MIME section (or one that contains <code class="literal">message/rfc822</code> content) the <em class="structfield"><code>firstpart</code></em> pointer points to the first MIME section in the multipart MIME section (or the included "message/rfc822" MIME section). If there are more than one MIME sections in a multipart MIME section <em class="structfield"><code>firstpart->next</code></em> gets you the second MIME section, <em class="structfield"><code>firstpart->next->next</code></em> gets you the third MIME section, and so on. <em class="structfield"><code>parent</code></em> points to the parent MIME section, which is NULL for the top-level MIME section.</p><p> Not all MIME sections are created equal. In a multipart MIME section, there is an initial, unused, "filler" section before the first MIME delimiter (see <a class="ulink" href="http://www.rfc-editor.org/rfc/rfc2045.txt" target="_top" shape="rect">RFC 2045</a> for more information). This filler section typically contains a terse message saying that this is a MIME-formatted message. This is not considered to be a "real" MIME section, and all MIME-aware software must ignore those. These filler sections are designated by setting the <em class="structfield"><code>isdummy</code></em> field to a non-zero value. All <span class="structname">rfc2045</span> structures that have <em class="structfield"><code>isdummy</code></em> set should be ignored, and skipped over, when traversing the <span class="structname">rfc2045</span> tree.</p></div><div class="refsect2" lang="en" xml:lang="en"><a id="id345305" shape="rect"> </a><h3>Basic MIME information</h3><div class="informalexample"><pre class="programlisting" xml:space="preserve"> const char *content_type, *content_transfer_encoding, *content_character_set; void rfc2045_mimeinfo(const struct rfc2045 *ptr, &content_type, &content_transfer_encoding, &content_character_set); off_t start_pos, end_pos, start_body, nlines, nbodylines; void rfc2045_mimepos(const struct rfc2045 *ptr, &start_pos, &end_pos, &start_body, &nlines, &nbodylines); </pre></div><p> The <code class="function">rfc2045_mimeinfo</code>() function returns the MIME content type, encoding method, and the character set of the given MIME section. Where the MIME section does not specify any property, <code class="function">rfc2045_mimeinfo</code>() automatically supplies a default value. The character set is only meaningful for MIME sections with a text content type, however it is still defaulted for other sections. It is not permissible to supply a NULL pointer for any argument to <code class="function">rfc2045_mimeinfo</code>().</p><p> The <code class="function">rfc2045_mimepos</code>() function locates the position of the given MIME section in the original message. It is not permissible to supply a NULL pointer for any argument to <code class="function">rfc2045_mimepos</code>(). All arguments must be used.</p><p> <em class="structfield"><code>start_pos</code></em> and <em class="structfield"><code>end_pos</code></em> point to the starting and the ending offset, from the beginning of the message, of this MIME section. <em class="structfield"><code>nlines</code></em> is initialized to the number of lines of text in this MIME section. <em class="structfield"><code>start_pos</code></em> is the start of MIME headers for this MIME section. <em class="structfield"><code>start_body</code></em> is the start of the actual content of this MIME section (after all the MIME headers, and the delimiting blank line), and <em class="structfield"><code>nbodylines</code></em> is the number of lines of actual content in this MIME section.</p><div class="informalexample"><pre class="programlisting" xml:space="preserve"> const char *id=rfc2045_content_id( const struct rfc2045 *ptr); const char *desc=rfc2045_content_description( const struct rfc2045 *ptr); const char *lang=rfc2045_content_language( const struct rfc2045 *ptr); const char *md5=rfc2045_content_md5( const struct rfc2045 *ptr); </pre></div><p> These functions return the contents of the corresponding MIME headers. If these headers do not exist, these functions return an empty string, "", NOT a null pointer.</p><div class="informalexample"><pre class="programlisting" xml:space="preserve"> char *id=rfc2045_related_start(const struct rfc2045 *ptr); </pre></div><p> This function returns the <em class="structfield"><code>start</code></em> attribute of the <code class="literal">Content-Type:</code> header, which is used by <code class="literal">multipart/related</code> MIME content. This function returns a dynamically-allocated buffer, which must be <code class="function">free</code>(3)-ed after use (a null pointer is returned if there was insufficient memory for the buffer, and rfc2045_error() is called).</p><div class="informalexample"><pre class="programlisting" xml:space="preserve"> const struct rfc2045 *ptr; const char *disposition=ptr->content_disposition; char *charset; char *language; char *value; int error; error=rfc2231_decodeType(rfc, "name", &charset, &language, &value); error=rfc2231_decodeDisposition(rfc, "name", &charset, &language, &value); </pre></div><p> These functions and structures provide a mechanism for reading the MIME attributes in the <code class="literal">Content-Type:</code> and <code class="literal">Content-Disposition:</code> headers. The MIME content type is returned by <code class="function">rfc2045_mimeinfo</code>(). The MIME content disposition can be accessed in the <em class="structfield"><code>content_disposition</code></em> directly (which may be <code class="literal">NULL</code> if the <code class="literal">Content-Disposition:</code> header was not specified).</p><p> <code class="function">rfc2231_decodeType</code>() reads MIME attributes from the <code class="literal">Content-Type:</code> header, and <code class="function">rfc2231_decodeType</code>() reads MIME attributes from the <code class="literal">Content-Disposition:</code> header. These functions understand MIME attributes that are encoded according to <a class="ulink" href="http://www.rfc-editor.org/rfc/rfc2231.txt" target="_top" shape="rect">RFC 2231</a>.</p><p> These functions initialize <em class="parameter"><code>charset</code></em>, <em class="parameter"><code>language</code></em>, and <em class="parameter"><code>value</code></em> parameters, allocating memory automatically. It is the caller's responsibility to use <code class="function">free</code>() to return the allocated memory. A <code class="literal">NULL</code> may be provided in place of a parameter, indicating that the caller does not require the corresponding information.</p><p> <em class="parameter"><code>charset</code></em> and <em class="parameter"><code>language</code></em> will be set to an empty string (<span class="emphasis"><em>not</em></span> <code class="literal">NULL</code>) if the MIME parameter does not exist, or is not encoded according to <a class="ulink" href="http://www.rfc-editor.org/rfc/rfc2231.txt" target="_top" shape="rect">RFC 2231</a>, or does not specify its character set and/or language. <em class="parameter"><code>value</code></em> will be set to an empty string if the MIME parameter does not exist.</p><div class="informalexample"><pre class="programlisting" xml:space="preserve"> char *url=rfc2045_content_base(struct rfc2045 *ptr); char *url=rfc2045_append_url(const char *base, const char *url); </pre></div><p> These functions are used to work with <code class="literal">multipart/related</code> MIME content. <code class="function">rfc2045_content_base</code>() returns the contents of either the <code class="literal">Content-Base:</code> or the <code class="literal">Content-Location:</code> header. If both are present, they are logically combined. <code class="function">rfc2045_append_url()</code> combines two URLs, <em class="parameter"><code>base</code></em> and <em class="parameter"><code>url</code></em>, and returns the absolute URL that results from the combination.</p><p> Both functions return a pointer to a dynamically-allocated buffer that must be <code class="function">free</code>(3)-ed after it is no longer needed. Both functions return NULL if there was no sufficient memory to allocate the buffer. <code class="function">rfc2045_content_base</code>() returns an empty string in the event that there are no <code class="literal">Content-Base:</code> or <code class="literal">Content-Location:</code> headers. Either argument to <code class="function">rfc2045_append_url</code>() may be a NULL, or an empty string.</p></div><div class="refsect2" lang="en" xml:lang="en"><a id="id389875" shape="rect"> </a><h3>Decoding a MIME section</h3><div class="informalexample"><pre class="programlisting" xml:space="preserve"> void rfc2045_cdecode_start(struct rfc2045 *ptr, int (*callback_func)(const char *, size_t, void *), void *callback_arg); int rfc2045_cdecode(struct rfc2045 *ptr, const char *stuff, size_t nstuff); int rfc2045_cdecode_end(struct rfc2045 *ptr); </pre></div><p> These functions are used to return the raw contents of the given MIME section, transparently decoding quoted-printable or base64-encoded content. Because the rfc2045 library does not require the message to be read from a file (it can be stored in a memory buffer), the application is responsible for reading the contents of the message and calling <code class="function">rfc2045_cdecode</code>().</p><p> The <code class="function">rfc2045_cdecode_start</code>() function begins the process of decoding the given MIME section. After calling <code class="function">rfc2045_cdecode_start</code>(), the application must the repeatedly call <code class="function">rfc2045_cdecode</code>() with the contents of the MIME message between the offsets given by the <em class="structfield"><code>start_body</code></em> and <em class="structfield"><code>end_pos</code></em> return values from <code class="function">rfc2045_mimepos</code>(). The <code class="function">rfc2045_cdecode</code>() function can be called repeatedly, if necessary, for successive portions of the MIME section. After the last call to <code class="function">rfc2045_cdecode</code>(), call <code class="function">rfc2045_cdecode_end</code>() to finish up (<code class="function">rfc2045_cdecode</code>() may have saved some undecoded content in an internal part, and <code class="function">rfc2045_cdecode_end</code>() flushes it out).</p><p> <code class="function">rfc2045_cdecode</code>() and <code class="function">rfc2045_cdecode_end</code>() repeatedly call <code class="function">callback_func</code>(), passing it the decoded contents of the MIME section. The first argument to <code class="function">callback_func</code>() is a pointer to a portion of the decoded content, the second argument is the number of bytes in this portion. The third argument is <em class="parameter"><code>callback_arg</code></em>.</p><p> <code class="function">callback_func</code>() is required to return zero, to continue decoding. If <code class="function">callback_func</code>() returns non-zero, the decoding immediately stops and <code class="function">rfc2045_cdecode</code>() or <code class="function">rfc2045_cdecode_end</code>() terminates with <code class="function">callback_func</code>'s return code.</p></div><div class="refsect2" lang="en" xml:lang="en"><a id="id390025" shape="rect"> </a><h3>Rewriting MIME messages</h3><p> This library contains functions that can be used to rewrite a MIME message in order to convert 8-bit-encoded data to 7-bit encoding, or to convert 7-bit encoded data to full 8-bit data, if possible.</p><div class="informalexample"><pre class="programlisting" xml:space="preserve"> struct rfc2045 *ptr=rfc2045_alloc_ac(); int necessary=rfc2045_ac_check(struct rfc2045 *ptr, int mode); int error=rfc2045_rewrite(struct rfc2045 *ptr, int fdin, int fdout, const char *appname); int rfc2045_rewrite_func(struct rfc2045 *p, int fdin, int (*funcout)(const char *, int, void *), void *funcout_arg, const char *appname); </pre></div><p> When rewriting will be used, the <code class="function">rfc2045_alloc_ac</code>() function must be used to create the initial <span class="structname">rfc2045</span> structure. This function allocates some additional structures that are used in rewriting. Use <code class="function">rfc2045_parse</code>() to parse the message, as usual. Use <code class="function">rfc2045_free</code>() in a normal way to destroy the <span class="structname">rfc2045</span> structure, when all is said and done.</p><p> The <code class="function">rfc2045_ac_check</code>() function must be called to determine whether rewriting is necessary. <em class="parameter"><code>mode</code></em> must be set to one of the following values:</p><div class="variablelist"><dl><dt><span class="term">RFC2045_RW_7BIT</span></dt><dd><p> We want to generate 7-bit content. If the original message contains any 8-bit content it will be converted to 7-bit content using quoted-printable encoding.</p></dd><dt><span class="term">RFC2045_RW_8BIT</span></dt><dd><p> We want to generate 8-bit content. If the original message contains any 7-bit quoted-printable content it should be rewritten as 8-bit content.</p></dd></dl></div><p> The <code class="function">rfc2045_ac_check</code>() function returns non-zero if there's any content in the MIME message that should be converted, OR if there are any missing MIME headers. <code class="function">rfc2045_ac_check</code>() returns zero if there's no need to rewrite the message. However it might still be worthwhile to rewrite the message anyway. There are some instances where it is desirable to provide defaults for some missing MIME headers, but they are too trivial to require the message to be rewritten. One such case would be a missing Content-Transfer-Encoding: header for a multipart section.</p><p> Either the <code class="function">rfc2045_rewrite</code>() or the <code class="function">rfc2045_rewrite_func</code>() function is used to rewrite the message. The only difference is that <code class="function">rfc2045_rewrite</code>() writes the new message to a given file descriptor, <em class="parameter"><code>fdout</code></em>, while <code class="function">rfc2045_rewrite_func</code>() repeatedly calls the <em class="parameter"><code>funcout</code></em> function. Both function read the original message from <em class="parameter"><code>fdin</code></em>. <em class="parameter"><code>funcout</code></em> receives to a portion of the MIME message, the number of bytes in the specified portion, and <em class="parameter"><code>funcout_arg</code></em>. When either function rewrites a MIME section, an informational header gets appended, noting that the message was converted by <em class="parameter"><code>appname</code></em>.</p></div></div><div class="refsect1" lang="en" xml:lang="en"><a id="id390199" shape="rect"> </a><h2>SEE ALSO</h2><p> <a class="ulink" href="rfc822.html" target="_top" shape="rect"><span class="citerefentry"><span class="refentrytitle">rfc822</span>(3)</span></a>, <a class="ulink" href="reformail.html" target="_top" shape="rect"><span class="citerefentry"><span class="refentrytitle">reformail</span>(1)</span></a>, <a class="ulink" href="reformime.html" target="_top" shape="rect"><span class="citerefentry"><span class="refentrytitle">reformime</span>(1)</span></a>.</p></div></div></body></html>