Sophie

Sophie

distrib > Mandriva > 2010.0 > i586 > media > contrib-release > by-pkgid > a2d29ba77c8fe4d655c72d0b897f51ad > files > 350

mnogosearch-3.3.8-3mdv2010.0.i586.rpm

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<HTML
><HEAD
><TITLE
>External parsers</TITLE
><META
NAME="GENERATOR"
CONTENT="Modular DocBook HTML Stylesheet Version 1.79"><LINK
REL="HOME"
TITLE="mnoGoSearch 3.3.8 reference manual"
HREF="index.html"><LINK
REL="PREVIOUS"
TITLE="Comments"
HREF="msearch-htmlparser-comments.html"><LINK
REL="NEXT"
TITLE="Storing mnoGoSearch data "
HREF="msearch-howstore.html"><LINK
REL="STYLESHEET"
TYPE="text/css"
HREF="mnogo.css"><META
NAME="Description"
CONTENT="mnoGoSearch - Full Featured Web site Open Source Search Engine Software over the Internet and Intranet Web Sites Based on SQL Database. It is a Free search software covered by GNU license."><META
NAME="Keywords"
CONTENT="shareware, freeware, download, internet, unix, utilities, search engine, text retrieval, knowledge retrieval, text search, information retrieval, database search, mining, intranet, webserver, index, spider, filesearch, meta, free, open source, full-text, udmsearch, website, find, opensource, search, searching, software, udmsearch, engine, indexing, system, web, ftp, http, cgi, php, SQL, MySQL, database, php3, FreeBSD, Linux, Unix, mnoGoSearch, MacOS X, Mac OS X, Windows, 2000, NT, 95, 98, GNU, GPL, url, grabbing"></HEAD
><BODY
CLASS="chapter"
BGCOLOR="#EEEEEE"
TEXT="#000000"
LINK="#000080"
VLINK="#800080"
ALINK="#FF0000"
><!--#include virtual="body-before.html"--><DIV
CLASS="NAVHEADER"
><TABLE
SUMMARY="Header navigation table"
WIDTH="100%"
BORDER="0"
CELLPADDING="0"
CELLSPACING="0"
><TR
><TH
COLSPAN="3"
ALIGN="center"
><SPAN
CLASS="application"
>mnoGoSearch</SPAN
> 3.3.8 reference manual: Full-featured search engine software</TH
></TR
><TR
><TD
WIDTH="10%"
ALIGN="left"
VALIGN="bottom"
><A
HREF="msearch-htmlparser-comments.html"
ACCESSKEY="P"
>Prev</A
></TD
><TD
WIDTH="80%"
ALIGN="center"
VALIGN="bottom"
></TD
><TD
WIDTH="10%"
ALIGN="right"
VALIGN="bottom"
><A
HREF="msearch-howstore.html"
ACCESSKEY="N"
>Next</A
></TD
></TR
></TABLE
><HR
ALIGN="LEFT"
WIDTH="100%"></DIV
><DIV
CLASS="chapter"
><H1
><A
NAME="parsers"
></A
>Chapter 6. External parsers<A
NAME="AEN2784"
></A
></H1
><DIV
CLASS="TOC"
><DL
><DT
><B
>Table of Contents</B
></DT
><DT
><A
HREF="msearch-parsers.html#pars-sup"
>Supported parser types</A
></DT
><DT
><A
HREF="msearch-parsers.html#pars-setup"
>Setting up parsers</A
></DT
><DT
><A
HREF="msearch-parsers.html#ParserTimeOut"
>Preventing <SPAN
CLASS="application"
>indexer</SPAN
> from getting stuck on a parser execution</A
></DT
><DT
><A
HREF="msearch-parsers.html#pars-pipes"
>Pipes in a parser command line</A
></DT
><DT
><A
HREF="msearch-parsers.html#pars-char"
>Parsers and character sets
       <A
NAME="AEN2896"
></A
></A
></DT
><DT
><A
HREF="msearch-parsers.html#pars-udmurl"
>The <CODE
CLASS="varname"
>UDM_URL</CODE
> environment variable</A
></DT
><DT
><A
HREF="msearch-parsers.html#pars-links"
>External parsers for the most common file types
    <A
NAME="AEN2924"
></A
></A
></DT
></DL
></DIV
><P
><SPAN
CLASS="application"
>mnoGoSearch</SPAN
> understands
  <TT
CLASS="literal"
>text/plain</TT
>, <TT
CLASS="literal"
>text/html</TT
>
  and <TT
CLASS="literal"
>text/xml</TT
> Mime types out of the box,
  and is able to index files of these types using the build-in parsers.
  For the other file types it can use <SPAN
CLASS="emphasis"
><I
CLASS="emphasis"
>external parsers</I
></SPAN
>.
  </P
><P
>An <SPAN
CLASS="emphasis"
><I
CLASS="emphasis"
>external parser</I
></SPAN
> is an executable program which can convert
    a file of some Mime type to the one of the types 
    supported by <SPAN
CLASS="application"
>mnoGoSearch</SPAN
>.
    For example, if you want <SPAN
CLASS="application"
>mnoGoSearch</SPAN
>
    to index <SPAN
CLASS="application"
>PostScript</SPAN
> files , you can
    do it with help of the <SPAN
CLASS="application"
>ps2ascii</SPAN
> parser,
    which reads a <SPAN
CLASS="application"
>PostScript</SPAN
> file from 
    <TT
CLASS="filename"
>STDIN</TT
> and produces plain text output to <TT
CLASS="filename"
>STDOUT</TT
>.
  </P
><DIV
CLASS="note"
><BLOCKQUOTE
CLASS="note"
><P
><B
>Note: </B
>
  <SPAN
CLASS="emphasis"
><I
CLASS="emphasis"
>External parsers</I
></SPAN
> are also often referenced 
   in this manual as <SPAN
CLASS="emphasis"
><I
CLASS="emphasis"
>filters</I
></SPAN
>, or <SPAN
CLASS="emphasis"
><I
CLASS="emphasis"
>converters</I
></SPAN
>.
  </P
></BLOCKQUOTE
></DIV
><DIV
CLASS="sect2"
><H2
CLASS="sect2"
><A
NAME="pars-sup"
>Supported parser types</A
></H2
><P
><SPAN
CLASS="application"
>mnoGoSearch</SPAN
> supports
    four types of external parsers:
    </P
><P
></P
><UL
><LI
><P
>read data from <TT
CLASS="filename"
>STDIN</TT
>,
        send the result to <TT
CLASS="filename"
>STDOUT</TT
></P
></LI
><LI
><P
>read data from a file, send the result to <TT
CLASS="filename"
>STDOUT</TT
></P
></LI
><LI
><P
>read data from a file, send the result to another file</P
></LI
><LI
><P
>read data from <TT
CLASS="filename"
>STDIN</TT
>, send the result to a file</P
></LI
></UL
></DIV
><DIV
CLASS="sect2"
><H2
CLASS="sect2"
><A
NAME="pars-setup"
>Setting up parsers</A
></H2
><P
></P
><OL
TYPE="1"
><LI
><P
>Configure Mime types</P
><P
>Make sure your HTTP server sends correct
        <TT
CLASS="literal"
>Content-Type</TT
> headers.
        For <SPAN
CLASS="application"
>Apache</SPAN
>, have a look at the file
        <TT
CLASS="filename"
>mime.types</TT
>, the most common Mime types are already defined there.
        </P
><P
>&#13;        <A
NAME="AEN2833"
></A
>
        If you want to index local files or an FTP server,
        use the <B
CLASS="command"
><A
HREF="msearch-cmdref-addtype.html"
>AddType</A
></B
>
        command in <TT
CLASS="filename"
>indexer.conf</TT
> to associate
        file name extensions with their Mime types. For example:
<PRE
CLASS="programlisting"
>&#13;AddType text/html *.html
</PRE
>
        </P
></LI
><LI
><P
>&#13;          <A
NAME="AEN2842"
></A
>
          Add parsers
        </P
><P
>Add parser definition commands. These commands have the following format with three arguments:
<PRE
CLASS="programlisting"
>&#13;Mime &#60;from_mime&#62; &#60;to_mime&#62; &#60;command line&#62;
</PRE
>
        </P
><P
>For example, the following command defines a parser for <SPAN
CLASS="application"
>man</SPAN
> pages:
<PRE
CLASS="programlisting"
>&#13;# Use deroff for parsing man pages ( *.man )
Mime  application/x-troff-man   text/plain   deroff
</PRE
>
        </P
><P
>This parser will take data from <TT
CLASS="filename"
>STDIN</TT
> and output
        results to <TT
CLASS="filename"
>STDOUT</TT
>.
        </P
><P
>Some parsers can not operate on <TT
CLASS="filename"
>STDIN</TT
> and require a file to read from.
        In this case <SPAN
CLASS="application"
>indexer</SPAN
> can create a
        temporary file in <TT
CLASS="filename"
>/tmp</TT
> and remove the file when the parser is done.
        Use the <TT
CLASS="literal"
>$1</TT
>
        macro in the parser command line to substitute the temporary file name. For example,
        the <B
CLASS="command"
><A
HREF="msearch-cmdref-mime.html"
>Mime</A
></B
> command for the <SPAN
CLASS="application"
>catdoc</SPAN
>
        <SPAN
CLASS="application"
>MS-Word</SPAN
>-to-text converter can look like this:
<PRE
CLASS="programlisting"
>&#13;Mime application/msword text/plain "/usr/bin/catdoc -a $1"
</PRE
>
        </P
><P
>If your parser writes the result
        into an output file, use the <TT
CLASS="literal"
>$2</TT
> macro.
        <SPAN
CLASS="application"
>indexer</SPAN
> will replace <TT
CLASS="literal"
>$2</TT
>
        with the output temporary file name, then start the parser,
        read the result from this temporary file and delete the file. For example:
<PRE
CLASS="programlisting"
>&#13;Mime application/msword text/plain "/usr/bin/catdoc -a $1 &#62;$2"
</PRE
>
        </P
><P
>The parser above will read data
        from the first temporary file and write results to the second file. Both
        temporary files will be deleted after reading parser results. Note that
        this command is effectively the same with the previous example. They
        only differ in the execution method used by <SPAN
CLASS="application"
>indexer</SPAN
>:
        <TT
CLASS="literal"
>file-to-STDOUT</TT
> versus <TT
CLASS="literal"
>file-to-file</TT
>.
        </P
></LI
></OL
></DIV
><DIV
CLASS="sect2"
><H2
CLASS="sect2"
><A
NAME="ParserTimeOut"
>Preventing <SPAN
CLASS="application"
>indexer</SPAN
> from getting stuck on a parser execution</A
></H2
><P
>&#13;  <A
NAME="AEN2876"
></A
>
  To prevent <SPAN
CLASS="application"
>indexer</SPAN
> from getting stuck on a parser execution
  you can specify the amount of time (in seconds) <SPAN
CLASS="application"
>indexer</SPAN
>
  waits for an external parser to return results.
  Use the <B
CLASS="command"
><A
HREF="msearch-cmdref-parsertimeout.html"
>ParserTimeOut</A
></B
> 
  <TT
CLASS="filename"
>indexer.conf</TT
> for this purpose. For example:
<PRE
CLASS="programlisting"
>&#13;ParserTimeOut 600
</PRE
>
  </P
><P
>&#13;  The default value is <TT
CLASS="literal"
>300</TT
> seconds (<TT
CLASS="literal"
>5</TT
> minutes).
  If an external parser does not return results within this period of time,
  <SPAN
CLASS="application"
>indexer</SPAN
> will kill the parser process, remove the
  associated temporary files and continue with the next document in
  the queue.
  </P
></DIV
><DIV
CLASS="sect2"
><H2
CLASS="sect2"
><A
NAME="pars-pipes"
>Pipes in a parser command line</A
></H2
><P
>You can use pipes in a parser command line. For
    example, these lines will be useful to index gzipped
    <SPAN
CLASS="application"
>man</SPAN
> pages from the local disk:
<PRE
CLASS="programlisting"
>&#13;AddType  application/x-gzipped-man  *.1.gz *.2.gz *.3.gz *.4.gz
Mime     application/x-gzipped-man  text/plain  "zcat | deroff"
</PRE
>
    </P
></DIV
><DIV
CLASS="sect2"
><H2
CLASS="sect2"
><A
NAME="pars-char"
>Parsers and character sets
       <A
NAME="AEN2896"
></A
></A
></H2
><P
>Some parsers can produce output in a character sets
    different from the one given in the
    <B
CLASS="command"
><A
HREF="msearch-cmdref-localcharset.html"
>LocalCharset</A
></B
> command.
    You can specify the output character set in a parser command line
    to make <SPAN
CLASS="application"
>indexer</SPAN
> convert the parser output to
    <B
CLASS="command"
><A
HREF="msearch-cmdref-localcharset.html"
>LocalCharset</A
></B
>.
    For example, if <SPAN
CLASS="application"
>catdoc</SPAN
> is configured
    to produce output in <TT
CLASS="literal"
>windows-1251</TT
> character
    set but LocalCharset is set to <TT
CLASS="literal"
>koi8-r</TT
>,
    you can use this command for parsing <SPAN
CLASS="application"
>MS Word</SPAN
> documents:
<PRE
CLASS="programlisting"
>&#13;Mime  application/msword  "text/plain; charset=windows-1251" "catdoc -a $1"
</PRE
>
    </P
></DIV
><DIV
CLASS="sect2"
><H2
CLASS="sect2"
><A
NAME="pars-udmurl"
>The <CODE
CLASS="varname"
>UDM_URL</CODE
> environment variable</A
></H2
><P
>When executing a parser, <SPAN
CLASS="application"
>indexer</SPAN
> creates
    the <TT
CLASS="literal"
>UDM_URL</TT
> environment variable with the
    document URL as a value. You can use this variable in the parser scripts.
    </P
><DIV
CLASS="note"
><BLOCKQUOTE
CLASS="note"
><P
><B
>Note: </B
>
      When running <SPAN
CLASS="application"
>indexer</SPAN
> with multiple threads
      it's not recommended to use the <CODE
CLASS="varname"
>UDM_URL</CODE
> environment variable,
      use the <CODE
CLASS="varname"
>${URL}</CODE
> variable in the parser
      command line instead. See <A
HREF="msearch-cmdref-mime.html"
>Mime</A
> for more details.
      </P
></BLOCKQUOTE
></DIV
></DIV
><DIV
CLASS="sect1"
><H1
CLASS="sect1"
><A
NAME="pars-links"
>External parsers for the most common file types
    <A
NAME="AEN2924"
></A
></A
></H1
><DIV
CLASS="sect2"
><H2
CLASS="sect2"
><A
NAME="pars-msword"
><SPAN
CLASS="application"
>MS Word</SPAN
> (<TT
CLASS="filename"
>*.doc</TT
>)</A
></H2
><P
></P
><UL
><LI
><P
><SPAN
CLASS="application"
>catdoc</SPAN
> <SPAN
CLASS="application"
>MS-Word</SPAN
>-to-text converter</P
><P
>&#13;          <A
HREF="http://freshmeat.net/redir/catdoc/1055/url_homepage/"
TARGET="_top"
>Home page</A
>, also listed on <A
HREF="http://freshmeat.net/"
TARGET="_top"
>Freshmeat</A
>.
        </P
><P
>&#13;          <TT
CLASS="filename"
>indexer.conf</TT
>:
<PRE
CLASS="programlisting"
>&#13;Mime application/msword "text/plain; charset=utf-8"  "catdoc -d utf-8 $1"
</PRE
>
        </P
></LI
><LI
><P
><SPAN
CLASS="application"
>wvWare</SPAN
> <SPAN
CLASS="application"
>MS-Word</SPAN
>-to-<ACRONYM
CLASS="acronym"
>HTML</ACRONYM
> converter</P
><P
>&#13;           <A
HREF="http://wvware.sourceforge.net/"
TARGET="_top"
>Home page</A
>,
           also listed on <A
HREF="http://freshmeat.net/"
TARGET="_top"
>Freshmeat</A
>.
        </P
><P
>&#13;          <TT
CLASS="filename"
>indexer.conf</TT
>:
<PRE
CLASS="programlisting"
>&#13;Mime application/msword    "text/html; charset=utf-8"    "wvHtml --charset=utf-8 $1 -"
</PRE
>
         </P
></LI
></UL
></DIV
><DIV
CLASS="sect2"
><H2
CLASS="sect2"
><A
NAME="pars-excel"
><SPAN
CLASS="application"
>MS Excel</SPAN
> (<TT
CLASS="filename"
>*.xls</TT
>)</A
></H2
><P
></P
><UL
><LI
><P
><SPAN
CLASS="application"
>xls2csv</SPAN
> <SPAN
CLASS="application"
>MS-Excel</SPAN
>-to-text converter</P
><P
>A part of <SPAN
CLASS="application"
>catdoc</SPAN
> distribution.
        </P
><P
>&#13;        <TT
CLASS="filename"
>indexer.conf</TT
>:
<PRE
CLASS="programlisting"
>&#13;Mime application/vnd.ms-excel   text/plain      "xls2csv $1"
</PRE
>
</P
></LI
><LI
><P
><SPAN
CLASS="application"
>xlhtml</SPAN
> <SPAN
CLASS="application"
>Excel-XLS</SPAN
>-to-<ACRONYM
CLASS="acronym"
>HTML</ACRONYM
> converter</P
><P
>Available from the project
        <A
HREF="http://www.xlhtml.org/"
TARGET="_top"
>homepage</A
>,
        also listed on <A
HREF="http://freshmeat.net/projects/xlhtml/"
TARGET="_top"
>Freshmeat</A
>.
        and  <A
HREF="http://chicago.sourceforge.net/xlhtml/"
TARGET="_top"
>SourceForge</A
>.
        Download page includes <A
HREF="http://chicago.sourceforge.net/xlhtml/binarys.html"
TARGET="_top"
>binaries for Windows</A
>.
        </P
><P
>&#13;          <TT
CLASS="filename"
>indexer.conf</TT
>:
<PRE
CLASS="programlisting"
>&#13;Mime application/vnd.ms-excel  text/html  "xltohtml $1"
</PRE
>
        </P
></LI
></UL
></DIV
><DIV
CLASS="sect2"
><H2
CLASS="sect2"
><A
NAME="pars-ppt"
><SPAN
CLASS="application"
>MS PowerPoint</SPAN
> (<TT
CLASS="filename"
>*.ppt</TT
>)</A
></H2
><P
></P
><UL
><LI
><P
>&#13;          <SPAN
CLASS="application"
>pptohtml</SPAN
>
          <SPAN
CLASS="application"
>PowerPoint-PPT</SPAN
>-to-<ACRONYM
CLASS="acronym"
>HTML</ACRONYM
> converter
        </P
><P
>A part of the <SPAN
CLASS="application"
>xlhtml</SPAN
> distribution. Available from the project
        <A
HREF="http://www.xlhtml.org/"
TARGET="_top"
>homepage</A
>,
        also listed on <A
HREF="http://freshmeat.net/projects/xlhtml/"
TARGET="_top"
>Freshmeat</A
>.
        and  <A
HREF="http://chicago.sourceforge.net/xlhtml/"
TARGET="_top"
>SourceForge</A
>.
        Download page includes <A
HREF="http://chicago.sourceforge.net/xlhtml/binarys.html"
TARGET="_top"
>binaries for Windows</A
>.
        </P
><P
>&#13;          <TT
CLASS="filename"
>indexer.conf</TT
>:
<PRE
CLASS="programlisting"
>&#13;Mime application/vnd.ms-powerpoint   text/html   "pptohtml $1"
</PRE
>
        </P
></LI
></UL
></DIV
><DIV
CLASS="sect2"
><H2
CLASS="sect2"
><A
NAME="pars-rtf"
><SPAN
CLASS="application"
>Rich Text</SPAN
> (<TT
CLASS="filename"
>*.rtf</TT
>)</A
></H2
><P
></P
><UL
><LI
><P
><SPAN
CLASS="application"
>unrtf</SPAN
> <SPAN
CLASS="application"
>RTF</SPAN
>-to-<ACRONYM
CLASS="acronym"
>HTML</ACRONYM
> converter</P
><P
>&#13;          <A
HREF="ftp://ftp.gnu.org/pub/gnu/unrtf/"
TARGET="_top"
>Homepage</A
>
        </P
><P
>&#13;          <TT
CLASS="filename"
>indexer.conf</TT
>:
        <PRE
CLASS="programlisting"
>&#13;Mime text/rtf*        text/html  "/usr/local/mnogosearch/sbin/unrtf --html $1
Mime application/rtf  text/html  "/usr/local/mnogosearch/sbin/unrtf --html $1
</PRE
>
        </P
></LI
><LI
><P
><SPAN
CLASS="application"
>rtfx</SPAN
> RTF-to-XML converter</P
><P
>&#13;          <A
HREF="http://memberwebs.com/stef/software/rtfx/"
TARGET="_top"
>Homepage</A
>,
          also listed on <A
HREF="http://freshmeat.net/"
TARGET="_top"
>Freshmeat</A
>.
        </P
><TT
CLASS="filename"
>indexer.conf</TT
>:
        <P
>&#13;<PRE
CLASS="programlisting"
>&#13;Mime text/rtf*       text/xml "rtfx -w $1 2&#62;/dev/null"
Mime application/rtf text/xml "rtfx -w $1 2&#62;/dev/null"
</PRE
>
        </P
></LI
><LI
><P
><SPAN
CLASS="application"
>rthc</SPAN
> <SPAN
CLASS="application"
>RTF</SPAN
>-to-<ACRONYM
CLASS="acronym"
>HTML</ACRONYM
> converter</P
><P
>&#13;          
          <TT
CLASS="filename"
>indexer.conf</TT
>:
<PRE
CLASS="programlisting"
>&#13;Mime text/rtf*       text/html "rthc --use-stdout $1 2&#62;/dev/null"
Mime application/rtf text/html "rthc --use-stdout $1 2&#62;/dev/null"
</PRE
>
        </P
></LI
></UL
></DIV
><DIV
CLASS="sect2"
><H2
CLASS="sect2"
><A
NAME="pars-pdf"
><SPAN
CLASS="application"
>Adobe Acrobat</SPAN
> (<TT
CLASS="filename"
>*.pdf</TT
>)</A
></H2
><P
></P
><UL
><LI
><P
><SPAN
CLASS="application"
>pdftotext</SPAN
> <SPAN
CLASS="application"
>Adobe PDF</SPAN
> converter</P
><P
>A part of the <SPAN
CLASS="application"
>xpdf</SPAN
> project.</P
><P
>&#13;          <A
HREF="http://freshmeat.net/redir/xpdf/12080/url_homepage/"
TARGET="_top"
>Homepage</A
>,
          also listed on <A
HREF="http://freshmeat.net/"
TARGET="_top"
>Freshmeat</A
>.
        </P
><P
>&#13;          <TT
CLASS="filename"
>indexer.conf</TT
>:
<PRE
CLASS="programlisting"
>&#13;Mime application/pdf            text/plain      "pdftotext $1 -"
</PRE
>
        </P
></LI
></UL
></DIV
><DIV
CLASS="sect2"
><H2
CLASS="sect2"
><A
NAME="pars-ps"
><SPAN
CLASS="application"
>PostScript</SPAN
> (<TT
CLASS="filename"
>*.ps</TT
>)</A
></H2
><P
></P
><UL
><LI
><P
><SPAN
CLASS="application"
>ps2ascii</SPAN
> <SPAN
CLASS="application"
>PostScript</SPAN
> converter</P
><P
>A part of the <SPAN
CLASS="application"
>GhostScript</SPAN
> project.</P
><P
>&#13;          <A
HREF="http://pages.cs.wisc.edu/~ghost/"
TARGET="_top"
>Homepage</A
>,
          also listed on <A
HREF="http://freshmeat.net/"
TARGET="_top"
>Freshmeat</A
>.
        </P
><P
>&#13;          <TT
CLASS="filename"
>indexer.conf</TT
>:
<PRE
CLASS="programlisting"
>&#13;Mime application/postscript    text/plain  "ps2ascii $1"
</PRE
>
        </P
></LI
></UL
></DIV
><DIV
CLASS="sect2"
><H2
CLASS="sect2"
><A
NAME="parse-rpm"
><SPAN
CLASS="application"
>RPM</SPAN
></A
></H2
><P
></P
><UL
><LI
><P
>&#13;          <SPAN
CLASS="application"
>RPM</SPAN
> converter by Mario Lang <CODE
CLASS="email"
>&#60;<A
HREF="mailto:lang[at]zid[dot]tu-graz[dot]ac[dot]at"
>lang[at]zid[dot]tu-graz[dot]ac[dot]at</A
>&#62;</CODE
>
        </P
><P
><TT
CLASS="filename"
>/usr/local/bin/rpminfo</TT
>:
<PRE
CLASS="programlisting"
>&#13;#!/bin/bash
/usr/bin/rpm -q --queryformat="&#60;html&#62;&#60;head&#62;&#60;title&#62;RPM: %{NAME} %{VERSION}-%{RELEASE}
(%{GROUP})&#60;/title&#62;&#60;meta name=\"description\" content=\"%{SUMMARY}\"&#62;&#60;/head&#62;&#60;body&#62;
%{DESCRIPTION}\n&#60;/body&#62;&#60;/html&#62;" -p $1
</PRE
>
        </P
><P
>&#13;          <TT
CLASS="filename"
>indexer.conf</TT
>:
<PRE
CLASS="programlisting"
>&#13;Mime application/x-rpm text/html "/usr/local/bin/rpminfo $1"
</PRE
>
        </P
><P
>It renders to such nice <SPAN
CLASS="application"
>RPM</SPAN
> information:
<PRE
CLASS="programlisting"
>&#13;3. RPM: mysql 3.20.32a-3 (Applications/Databases) [4]
       Mysql is a SQL (Structured Query Language) database server.
       Mysql was written by Michael (Monty) Widenius. See the CREDITS
       file in the distribution for more credits for mysql and related
       things....
       (application/x-rpm) 2088855 bytes
</PRE
>
        </P
></LI
></UL
></DIV
><P
>&#13;      If you're using an external parser not listed here,
      please contribute your parser configuration
      to <CODE
CLASS="email"
>&#60;<A
HREF="mailto:general@mnogosearch.org"
>general@mnogosearch.org</A
>&#62;</CODE
>.
    </P
></DIV
></DIV
><DIV
CLASS="NAVFOOTER"
><HR
ALIGN="LEFT"
WIDTH="100%"><TABLE
SUMMARY="Footer navigation table"
WIDTH="100%"
BORDER="0"
CELLPADDING="0"
CELLSPACING="0"
><TR
><TD
WIDTH="33%"
ALIGN="left"
VALIGN="top"
><A
HREF="msearch-htmlparser-comments.html"
ACCESSKEY="P"
>Prev</A
></TD
><TD
WIDTH="34%"
ALIGN="center"
VALIGN="top"
><A
HREF="index.html"
ACCESSKEY="H"
>Home</A
></TD
><TD
WIDTH="33%"
ALIGN="right"
VALIGN="top"
><A
HREF="msearch-howstore.html"
ACCESSKEY="N"
>Next</A
></TD
></TR
><TR
><TD
WIDTH="33%"
ALIGN="left"
VALIGN="top"
>Comments</TD
><TD
WIDTH="34%"
ALIGN="center"
VALIGN="top"
>&nbsp;</TD
><TD
WIDTH="33%"
ALIGN="right"
VALIGN="top"
>Storing <SPAN
CLASS="application"
>mnoGoSearch</SPAN
> data</TD
></TR
></TABLE
></DIV
><!--#include virtual="body-after.html"--></BODY
></HTML
>