Sophie

Sophie

distrib > Mandriva > 2010.0 > i586 > media > contrib-release > by-pkgid > a2d29ba77c8fe4d655c72d0b897f51ad > files > 327

mnogosearch-3.3.8-3mdv2010.0.i586.rpm

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<HTML
><HEAD
><TITLE
>Storing mnoGoSearch data </TITLE
><META
NAME="GENERATOR"
CONTENT="Modular DocBook HTML Stylesheet Version 1.79"><LINK
REL="HOME"
TITLE="mnoGoSearch 3.3.8 reference manual"
HREF="index.html"><LINK
REL="PREVIOUS"
TITLE="External parsers"
HREF="msearch-parsers.html"><LINK
REL="NEXT"
TITLE="Cache mode storage"
HREF="msearch-cachemode.html"><LINK
REL="STYLESHEET"
TYPE="text/css"
HREF="mnogo.css"><META
NAME="Description"
CONTENT="mnoGoSearch - Full Featured Web site Open Source Search Engine Software over the Internet and Intranet Web Sites Based on SQL Database. It is a Free search software covered by GNU license."><META
NAME="Keywords"
CONTENT="shareware, freeware, download, internet, unix, utilities, search engine, text retrieval, knowledge retrieval, text search, information retrieval, database search, mining, intranet, webserver, index, spider, filesearch, meta, free, open source, full-text, udmsearch, website, find, opensource, search, searching, software, udmsearch, engine, indexing, system, web, ftp, http, cgi, php, SQL, MySQL, database, php3, FreeBSD, Linux, Unix, mnoGoSearch, MacOS X, Mac OS X, Windows, 2000, NT, 95, 98, GNU, GPL, url, grabbing"></HEAD
><BODY
CLASS="chapter"
BGCOLOR="#EEEEEE"
TEXT="#000000"
LINK="#000080"
VLINK="#800080"
ALINK="#FF0000"
><!--#include virtual="body-before.html"--><DIV
CLASS="NAVHEADER"
><TABLE
SUMMARY="Header navigation table"
WIDTH="100%"
BORDER="0"
CELLPADDING="0"
CELLSPACING="0"
><TR
><TH
COLSPAN="3"
ALIGN="center"
><SPAN
CLASS="application"
>mnoGoSearch</SPAN
> 3.3.8 reference manual: Full-featured search engine software</TH
></TR
><TR
><TD
WIDTH="10%"
ALIGN="left"
VALIGN="bottom"
><A
HREF="msearch-parsers.html"
ACCESSKEY="P"
>Prev</A
></TD
><TD
WIDTH="80%"
ALIGN="center"
VALIGN="bottom"
></TD
><TD
WIDTH="10%"
ALIGN="right"
VALIGN="bottom"
><A
HREF="msearch-cachemode.html"
ACCESSKEY="N"
>Next</A
></TD
></TR
></TABLE
><HR
ALIGN="LEFT"
WIDTH="100%"></DIV
><DIV
CLASS="chapter"
><H1
><A
NAME="howstore"
></A
>Chapter 7. Storing <SPAN
CLASS="application"
>mnoGoSearch</SPAN
> data </H1
><DIV
CLASS="TOC"
><DL
><DT
><B
>Table of Contents</B
></DT
><DT
><A
HREF="msearch-howstore.html#sql-stor"
>Word modes with an <ACRONYM
CLASS="acronym"
>SQL</ACRONYM
> database</A
></DT
><DT
><A
HREF="msearch-cachemode.html"
>Cache mode storage</A
></DT
><DT
><A
HREF="msearch-perf.html"
>mnoGoSearch performance issues
<A
NAME="AEN3369"
></A
></A
></DT
><DT
><A
HREF="msearch-oracle.html"
>Oracle notes
<A
NAME="AEN3398"
></A
></A
></DT
><DT
><A
HREF="msearch-db2.html"
>IBM DB2 notes
    <A
NAME="AEN3479"
></A
></A
></DT
></DL
></DIV
><DIV
CLASS="sect1"
><H1
CLASS="sect1"
><A
NAME="sql-stor"
>Word modes with an <ACRONYM
CLASS="acronym"
>SQL</ACRONYM
> database</A
></H1
><DIV
CLASS="sect2"
><H2
CLASS="sect2"
><A
NAME="sql-stor-modes"
>Various modes used to store words</A
></H2
><P
>&#13;    <SPAN
CLASS="application"
>mnoGoSearch</SPAN
> can use
    a number of different formats (modes) to store
    word information in the database, suitable for
    different purposes. The available modes
    are: <TT
CLASS="literal"
>single</TT
>,
    <TT
CLASS="literal"
>multi</TT
> and <TT
CLASS="literal"
>blob</TT
>.
    The default mode is <TT
CLASS="literal"
>blob</TT
>.
    The mode can be selected using the
    <CODE
CLASS="option"
>DBMode</CODE
> part of 
    the <B
CLASS="command"
><A
HREF="msearch-cmdref-dbaddr.html"
>DBAddr</A
></B
> command
    in <TT
CLASS="filename"
>indexer.conf</TT
> and
    <TT
CLASS="filename"
>search.htm</TT
>.
    </P
><P
>&#13;Examples:
<PRE
CLASS="programlisting"
>&#13;DBAddr mysql://localhost/test/?DBMode=single
DBAddr mysql://localhost/test/?DBMode=multi
DBAddr mysql://localhost/test/?DBMode=blob
</PRE
>
    </P
></DIV
><DIV
CLASS="sect2"
><H2
CLASS="sect2"
><A
NAME="sql-stor-single"
>Storage mode - single
      <A
NAME="AEN3107"
></A
></A
></H2
><P
>&#13;    The <TT
CLASS="literal"
>single</TT
> mode is suitable for a small site
    with the total number of documents up to <TT
CLASS="literal"
>5000</TT
>.
    </P
><P
>&#13;    When the <TT
CLASS="literal"
>single</TT
> mode is specified,
    all words are stored in a single table <TT
CLASS="literal"
>dict</TT
>
    with three columns <TT
CLASS="literal"
>(url_id,word,coord)</TT
>,
    where <CODE
CLASS="varname"
>url_id</CODE
> is the <ACRONYM
CLASS="acronym"
>ID</ACRONYM
>
    of the document which is referenced by <CODE
CLASS="varname"
>rec_id</CODE
>
    field in the table <TT
CLASS="literal"
>url</TT
>, and <CODE
CLASS="varname"
>coord</CODE
>
    is a combination of the <TT
CLASS="literal"
>section ID</TT
> and
    position of the words in the section.
    Word has the <TT
CLASS="literal"
>variable char(32)</TT
> <ACRONYM
CLASS="acronym"
>SQL</ACRONYM
> type.
    Every appearance of the same word in a document produces a separate record in the table.
    </P
><P
>&#13;    The advantage of the <TT
CLASS="literal"
>single</TT
> mode is <SPAN
CLASS="emphasis"
><I
CLASS="emphasis"
>live updates</I
></SPAN
>
    support - a document updated by <SPAN
CLASS="application"
>indexer</SPAN
> 
    becomes immediately visible for searches with its new content.
    In other words <SPAN
CLASS="emphasis"
><I
CLASS="emphasis"
>crawling</I
></SPAN
> and 
    <SPAN
CLASS="emphasis"
><I
CLASS="emphasis"
>indexing</I
></SPAN
> is done at the same time,
    for every document individually.
    </P
><P
>&#13;    Another advantage of the <TT
CLASS="literal"
>single</TT
> mode is its
    simplicity and straightforward data format. You can use
    <SPAN
CLASS="application"
>mnoGoSearch</SPAN
> as a
    fulltext solution for your database-driven Web application.
    For example, you may find useful to create a simple search page
    which will query the data collected by <SPAN
CLASS="application"
>indexer</SPAN
>
    this way:
<PRE
CLASS="programlisting"
>&#13;SELECT
  url.url, count(*) AS rank
FROM
  dict, url
WHERE
  url.rec_id=dict.url_id
AND
  dict.word IN ('some','words')
GROUP BY
  url.url
ORDER BY
  rank DESC;
</PRE
>
    and display the results of this search query.
    </P
><DIV
CLASS="note"
><BLOCKQUOTE
CLASS="note"
><P
><B
>Note: </B
>
      The above query implements very simple ranking based
      on the count of the word hits. You can also integrate
      <SPAN
CLASS="application"
>mnoGoSearch</SPAN
> with your own 
      application using the <A
HREF="msearch-cmdref-usercachequery.html"
>UserCacheQuery</A
>
      command, which supports full-featured ranking taking into
      account all factors described in 
      <A
HREF="msearch-rel.html#score-commands"
>the Section called <I
>Commands affecting document score
  <A
NAME="AEN5684"
></A
></I
> in Chapter 10</A
>.
      </P
></BLOCKQUOTE
></DIV
><DIV
CLASS="note"
><BLOCKQUOTE
CLASS="note"
><P
><B
>Note: </B
>
      When you use <SPAN
CLASS="application"
>mnoGoSearch</SPAN
>
      to index data stored in your <ACRONYM
CLASS="acronym"
>SQL</ACRONYM
> tables
      (see <A
HREF="msearch-extended-indexing.html#htdb"
>the Section called <I
>Indexing <ACRONYM
CLASS="acronym"
>SQL</ACRONYM
> tables
    (<TT
CLASS="literal"
>htdb:/</TT
> virtual <ACRONYM
CLASS="acronym"
>URL</ACRONYM
> scheme)
    <A
NAME="AEN2154"
></A
></I
> in Chapter 4</A
> for details),
      you may find useful to run queries joining
      the table <TT
CLASS="literal"
>dict</TT
> with your own tables.
      </P
></BLOCKQUOTE
></DIV
></DIV
><DIV
CLASS="sect2"
><H2
CLASS="sect2"
><A
NAME="sql-stor-multi"
>Storage mode - multi
      <A
NAME="AEN3149"
></A
></A
></H2
><P
>&#13;    The <TT
CLASS="literal"
>multi</TT
> mode is suitable for a medium size Web space
    with up to about <TT
CLASS="literal"
>50000</TT
> documents. It can be
    useful if your documents are updated very often.
    </P
><P
>&#13;    If the <TT
CLASS="literal"
>multi</TT
> mode is selected, word information
    is distributed into <TT
CLASS="literal"
>256</TT
> separate tables
    <CODE
CLASS="varname"
>dict00</CODE
>..<CODE
CLASS="varname"
>dictFF</CODE
> using a hash
    function for distribution. The structure of these
    tables is close to the table <CODE
CLASS="varname"
>dict</CODE
> used
    in the <TT
CLASS="literal"
>single</TT
> mode: <TT
CLASS="literal"
>(url_id,secno,word,coords)</TT
>.
    The difference is that all positions of the same word (<TT
CLASS="literal"
>hits</TT
>)
    in a section of a document are grouped into a single binary array
    <CODE
CLASS="varname"
>coords</CODE
>, instead of producing multiple records.
    Word information for different sections is stored in separate records.
    </P
><P
>&#13;    Similar to the <TT
CLASS="literal"
>single</TT
> mode,
    the <TT
CLASS="literal"
>multi</TT
> mode supports <SPAN
CLASS="emphasis"
><I
CLASS="emphasis"
>live updates</I
></SPAN
>.
    That is, crawling and indexing are done at the same time.
    A new document (or an updated document) becomes available
    for search very soon after <SPAN
CLASS="application"
>indexer</SPAN
> 
    has crawled it.
    </P
><P
>&#13;    When working in the <TT
CLASS="literal"
>multi</TT
> mode, 
    <SPAN
CLASS="application"
>indexer</SPAN
> performs caching
    of the word information in memory for better 
    crawling performance. The word cache is 
    flushed to the database as soon as it grows up
    to the value given in <A
HREF="msearch-cmdref-wordcachesize.html"
>WordCacheSize</A
>,
    with <TT
CLASS="literal"
>8Mb</TT
> by default.
    You can change <A
HREF="msearch-cmdref-wordcachesize.html"
>WordCacheSize</A
> to
    a bigger value for better crawling performance.
    </P
><DIV
CLASS="note"
><BLOCKQUOTE
CLASS="note"
><P
><B
>Note: </B
>
      The disadvantage of having a too big <A
HREF="msearch-cmdref-wordcachesize.html"
>WordCacheSize</A
>
      value is that in case when <SPAN
CLASS="application"
>indexer</SPAN
>
      crashes or dies for any other reasons, all cached information gets lost.
      </P
></BLOCKQUOTE
></DIV
><P
>&#13;    Grouping word hits into the same record and distribution
    between multiple tables make the <TT
CLASS="literal"
>multi</TT
>
    mode much faster both for search and indexing comparing
    to the <TT
CLASS="literal"
>single</TT
> mode.
    </P
></DIV
><DIV
CLASS="sect2"
><H2
CLASS="sect2"
><A
NAME="sql-stor-blob"
>Storage mode - blob
      <A
NAME="AEN3185"
></A
></A
></H2
><P
>The <TT
CLASS="literal"
>blob</TT
> mode is the fastest mode currently
    available in <SPAN
CLASS="application"
>mnoGoSearch</SPAN
> for both purposes: indexing and searching.
    This mode can handle up to <TT
CLASS="literal"
>1,000,000</TT
> - <TT
CLASS="literal"
>2,000,000</TT
>
    million documents on a single machine.
    </P
><P
>&#13;    <TT
CLASS="literal"
>DBMode=blob</TT
> is know to work fine with
    <SPAN
CLASS="application"
>DB2</SPAN
>,
    <SPAN
CLASS="application"
>Mimer</SPAN
>,
    <SPAN
CLASS="application"
>MS SQL</SPAN
>,
    <SPAN
CLASS="application"
>MySQL</SPAN
>,
    <SPAN
CLASS="application"
>PostgreSQL</SPAN
>,
    <SPAN
CLASS="application"
>Oracle</SPAN
>,
    <SPAN
CLASS="application"
>Sybase</SPAN
>,
    <SPAN
CLASS="application"
>Firebird</SPAN
>/<SPAN
CLASS="application"
>Interbase</SPAN
>,
    <SPAN
CLASS="application"
>SQLite3</SPAN
>.
    </P
><P
>&#13;    In the <TT
CLASS="literal"
>blob</TT
> mode crawling and indexing are done separately.
    Crawling is done by starting <SPAN
CLASS="application"
>indexer</SPAN
> without
    any command line arguments. At crawling time <SPAN
CLASS="application"
>indexer</SPAN
>
    collects word information into the table <CODE
CLASS="varname"
>bdicti</CODE
>
    with a structure optimized for crawling purposes, but not suitable for
    search purposes.
    </P
><P
>&#13;    After crawling is done, an extra step is required to 
    create the <SPAN
CLASS="emphasis"
><I
CLASS="emphasis"
>search index</I
></SPAN
> by launching
    <KBD
CLASS="userinput"
>indexer -Eblob</KBD
>. When creating
    the search index, <SPAN
CLASS="application"
>indexer</SPAN
>
    loads information from the table <CODE
CLASS="varname"
>bdicti</CODE
>,
    groups all hits of the same word in different documents together
    and writes the grouped data into the table <CODE
CLASS="varname"
>bdict</CODE
>
    with a structure optimized for search purposes.
    The table <CODE
CLASS="varname"
>bdict</CODE
> consists of three columns
    (<CODE
CLASS="varname"
>word</CODE
>, <CODE
CLASS="varname"
>secno</CODE
>, <CODE
CLASS="varname"
>intag</CODE
>),
    where <CODE
CLASS="varname"
>intag</CODE
> is
    a binary array which includes information about all documents this
    word appears in (using 32-bit <TT
CLASS="literal"
>ID</TT
>s of the documents),
    as well as positions of the word in every document (for phrase search).
    The table <CODE
CLASS="varname"
>bdict</CODE
> has an index on the column
    <CODE
CLASS="varname"
>word</CODE
> for fast look-up at search time.
    Words from different sections (e.g. <TT
CLASS="literal"
>title</TT
>
    and <TT
CLASS="literal"
>body</TT
>) are written in separate records.
    <DIV
CLASS="note"
><BLOCKQUOTE
CLASS="note"
><P
><B
>Note: </B
>
      Separate records for different sections are needed to optimize
      searches with section limits, for example "<SPAN
CLASS="emphasis"
><I
CLASS="emphasis"
>find only in title</I
></SPAN
>".
      </P
></BLOCKQUOTE
></DIV
>
    </P
><P
>&#13;    Also, additional arrays of data are written into the table
    <CODE
CLASS="varname"
>bdict</CODE
>:

    <P
></P
><UL
><LI
><P
>&#13;    <TT
CLASS="literal"
>#rec_id</TT
> - a list of 32-bit document <TT
CLASS="literal"
>ID</TT
>s
    </P
></LI
><LI
><P
>&#13;    <TT
CLASS="literal"
>#last_mod_time</TT
> - an array of 32-bit
    <TT
CLASS="literal"
>Last-Modified</TT
> values
    (in Unix <TT
CLASS="literal"
>timestamp</TT
> format) -
    for fast limiting searches by date.
    </P
></LI
><LI
><P
>&#13;    <TT
CLASS="literal"
>#pop_rank</TT
> - an array of popularity rank values,
     each in the <SPAN
CLASS="type"
>32-bit float</SPAN
> format.
    </P
></LI
><LI
><P
>&#13;    <TT
CLASS="literal"
>#site_id</TT
> - an array of 32-bit
     site <TT
CLASS="literal"
>IDs</TT
>, for <TT
CLASS="literal"
>GroupBySite</TT
>.
    </P
></LI
><LI
><P
>&#13;    <TT
CLASS="literal"
>#limit#name</TT
> - a list of document <TT
CLASS="literal"
>IDs</TT
>
    covered by a user defined limit with name "<TT
CLASS="literal"
>name</TT
>".
    A separate <TT
CLASS="literal"
>#limit#xxx</TT
> record is created
    for every user defined <A
HREF="msearch-cmdref-limit.html"
>Limit</A
> configured
    in <TT
CLASS="filename"
>indexer.conf</TT
>.
    </P
></LI
><LI
><P
>&#13;    <TT
CLASS="literal"
>#ts</TT
> - the timestamp indicating when
    <KBD
CLASS="userinput"
>indexer -Eblob</KBD
> was executed last time,
    in textual representation, using the Unix timestamp format.
    This value is used for invalidating old queries stored
    in the search result cache, as well as for searches
    with <SPAN
CLASS="emphasis"
><I
CLASS="emphasis"
>live updates</I
></SPAN
>, described in
    <A
HREF="msearch-howstore.html#sql-stor-blob-live"
>the Section called <I
>Live updates emulator with <TT
CLASS="literal"
>DBMode=blob</TT
></I
></A
>.
    </P
></LI
><LI
><P
>&#13;    <TT
CLASS="literal"
>#version</TT
> - a string representing
    the version <TT
CLASS="literal"
>ID</TT
> of
    <SPAN
CLASS="application"
>indexer</SPAN
> which created
    the search index. For example, <SPAN
CLASS="application"
>indexer</SPAN
>
    from <SPAN
CLASS="application"
>mnoGoSearch 3.3.0</SPAN
> writes
    the string "<TT
CLASS="literal"
>30300</TT
>". This record
    is required for easier upgrade purposes, to make
    a newer version of <SPAN
CLASS="application"
>search.cgi</SPAN
>
    recognize an older format.
    </P
></LI
></UL
>
    </P
><P
>&#13;    Note, creating fast search index is also possible for 
    the databases using <TT
CLASS="literal"
>DBMode=single</TT
> and
    <TT
CLASS="literal"
>DBMode=multi</TT
>.
    This is useful when you need to quickly switch to
    <TT
CLASS="literal"
>DBMode=blob</TT
> when
    search performance with the other modes became bad -
    without even having to re-index
    your Web space. Later you can completely
    switch to <TT
CLASS="literal"
>DBMode=blob</TT
> in both
    <TT
CLASS="filename"
>indexer.conf</TT
> and
    <TT
CLASS="filename"
>search.htm</TT
>, and run indexing
    from the very beginning.
    </P
><P
>&#13;    The disadvantage of <TT
CLASS="literal"
>DBMode=blob</TT
> is that
    it does not support live updates directly. New or updated documents,
    crawled by <SPAN
CLASS="application"
>indexer</SPAN
>
    are not visible for search until <KBD
CLASS="userinput"
>indexer -Eblob</KBD
>
    is run again. Creating search index takes about <TT
CLASS="literal"
>6</TT
>
    minutes on a collection with <TT
CLASS="literal"
>200000</TT
>
    <ACRONYM
CLASS="acronym"
>HTML</ACRONYM
> documents, with <TT
CLASS="literal"
>10Gb</TT
>
    total size (on a Intel Core Duo 2.13GHz <ACRONYM
CLASS="acronym"
>CPU</ACRONYM
>),
    which can be unacceptably long for some applications
    (for example, on a news site,
     or when using <SPAN
CLASS="application"
>mnoGoSearch</SPAN
>
     as an external full-text engine for <ACRONYM
CLASS="acronym"
>SQL</ACRONYM
>
     tables with help of <A
HREF="msearch-extended-indexing.html#htdb"
>HTDB</A
>).
    </P
></DIV
><DIV
CLASS="sect2"
><H2
CLASS="sect2"
><A
NAME="sql-stor-blob-live"
>Live updates emulator with <TT
CLASS="literal"
>DBMode=blob</TT
></A
></H2
><P
>&#13;    Starting from version <TT
CLASS="literal"
>3.3.1</TT
>,
    <SPAN
CLASS="application"
>mnoGoSearch</SPAN
> emulates
    <SPAN
CLASS="emphasis"
><I
CLASS="emphasis"
>live updates</I
></SPAN
> by reading
    word information for the new or updated documents
    directly from the crawler table <CODE
CLASS="varname"
>bdicti</CODE
>.
    It allows to add or update up to about <TT
CLASS="literal"
>10,000</TT
>
    documents without having to run <KBD
CLASS="userinput"
>indexer -Eblob</KBD
>.
    To activate using live updates,
    please add <CODE
CLASS="option"
>LiveUpdates=yes</CODE
> parameter
    to the <A
HREF="msearch-cmdref-dbaddr.html"
>DBAddr</A
> command.
    </P
><P
>&#13;    <P
><B
>Example:</B
></P
>
<PRE
CLASS="programlisting"
>&#13;DBAddr mysql://root@localhost/test/?DBMode=blob&#38;LiveUpdates=yes
</PRE
>
    </P
></DIV
><DIV
CLASS="sect2"
><H2
CLASS="sect2"
><A
NAME="sql-stor-blob-ext"
>Extended features with <CODE
CLASS="option"
>DBMode=blob</CODE
></A
></H2
><P
>&#13;    Starting with the version <TT
CLASS="literal"
>3.3.0</TT
>,
    <KBD
CLASS="userinput"
>indexer -Eblob</KBD
> can be used in combination
    with <TT
CLASS="literal"
>URL</TT
> and <A
HREF="msearch-cmdref-tag.html"
>Tag</A
> limits,
    other limits described in <A
HREF="msearch-indexing.html#general-subsect"
>the Section called <I
>Subsection control</I
> in Chapter 3</A
>,
    as well as in combination with a user defined limit described
    by a <A
HREF="msearch-cmdref-limit.html"
>Limit</A
> command.
    The limits allow to generate a search index over a subset of
    the documents collected by <SPAN
CLASS="application"
>indexer</SPAN
>
    at crawling time.
    </P
><P
><P
><B
>Examples:</B
></P
>
<PRE
CLASS="programlisting"
>&#13;indexer -Eblob -u %/subdir/%
indexer -Eblob -t tag
indexer -Eblob --fl=limitname
</PRE
>
    </P
><P
>Starting with the version <TT
CLASS="literal"
>3.2.36</TT
> 
    an additional command is available: <KBD
CLASS="userinput"
>indexer -Erewriteurl</KBD
>.
    When <SPAN
CLASS="application"
>indexer</SPAN
> is launched with this parameter
    it rewrites <ACRONYM
CLASS="acronym"
>URL</ACRONYM
> data for <CODE
CLASS="option"
>DBMode=blob</CODE
>.
    It can be useful to rewrite <ACRONYM
CLASS="acronym"
>URL</ACRONYM
> data quickly
    without having to rebuild the entire search index, for example
    if you added the <CODE
CLASS="option"
>Deflate=yes</CODE
> parameter to 
    <A
HREF="msearch-cmdref-dbaddr.html"
>DBAddr</A
>, or after running
    <KBD
CLASS="userinput"
>indexer -n0 -R</KBD
> to update the
    <A
HREF="msearch-rel.html#poprank"
>Popularity Rank</A
>.
    
    </P
></DIV
><DIV
CLASS="sect2"
><H2
CLASS="sect2"
><A
NAME="sql-store-maxwords"
>Maximum amount of words collected from a  document</A
></H2
><P
>&#13;    Starting from the version <TT
CLASS="literal"
>3.3.0</TT
>,
    <SPAN
CLASS="application"
>mnoGoSearch</SPAN
> enumerates
    words positions for every section separately and
    allows to store information about up to 2 million
    words per section.
    </P
><P
>&#13;    In the versions prior to <TT
CLASS="literal"
>3.3.0</TT
>
    it was possible to store up to <TT
CLASS="literal"
>64K</TT
> words
    from a single document.
    </P
></DIV
><DIV
CLASS="sect2"
><H2
CLASS="sect2"
><A
NAME="sql-stor-noncrc"
>Substring search notes</A
></H2
><P
>&#13;      The
      <TT
CLASS="literal"
>single</TT
>,
      <TT
CLASS="literal"
>multi</TT
> and
      <TT
CLASS="literal"
>blob</TT
> modes support substring search.
      An <ACRONYM
CLASS="acronym"
>SQL</ACRONYM
> query containing
      a <B
CLASS="command"
>LIKE</B
> predicate is executed
      internally in order to do substring search. Substring
      search is usually slower than searching for a full word,
      especially in case of very short substring.
      You can use the <A
HREF="msearch-cmdref-substringmatchminwordlength.html"
>SubstringMatchMinWordLength</A
>
      command to limit the minimal word length allowed for substring search.
    </P
><DIV
CLASS="note"
><BLOCKQUOTE
CLASS="note"
><P
><B
>Note: </B
>
      When performing substring search in the <TT
CLASS="literal"
>multi</TT
>
      mode, <SPAN
CLASS="application"
>search.cgi</SPAN
> has to iterate
      search queries through all <TT
CLASS="literal"
>256</TT
> tables
      <CODE
CLASS="varname"
>dict00</CODE
>..<CODE
CLASS="varname"
>dictFF</CODE
>,
      which makes substring search especially slow. Using
      substring search is not recommended with <CODE
CLASS="option"
>DBMode=multi</CODE
>.
      </P
></BLOCKQUOTE
></DIV
></DIV
></DIV
></DIV
><DIV
CLASS="NAVFOOTER"
><HR
ALIGN="LEFT"
WIDTH="100%"><TABLE
SUMMARY="Footer navigation table"
WIDTH="100%"
BORDER="0"
CELLPADDING="0"
CELLSPACING="0"
><TR
><TD
WIDTH="33%"
ALIGN="left"
VALIGN="top"
><A
HREF="msearch-parsers.html"
ACCESSKEY="P"
>Prev</A
></TD
><TD
WIDTH="34%"
ALIGN="center"
VALIGN="top"
><A
HREF="index.html"
ACCESSKEY="H"
>Home</A
></TD
><TD
WIDTH="33%"
ALIGN="right"
VALIGN="top"
><A
HREF="msearch-cachemode.html"
ACCESSKEY="N"
>Next</A
></TD
></TR
><TR
><TD
WIDTH="33%"
ALIGN="left"
VALIGN="top"
>External parsers<A
NAME="AEN2784"
></A
></TD
><TD
WIDTH="34%"
ALIGN="center"
VALIGN="top"
>&nbsp;</TD
><TD
WIDTH="33%"
ALIGN="right"
VALIGN="top"
>Cache mode storage</TD
></TR
></TABLE
></DIV
><!--#include virtual="body-after.html"--></BODY
></HTML
>