Sophie

Sophie

distrib > Mandriva > 2010.0 > i586 > media > contrib-release > by-pkgid > a2d29ba77c8fe4d655c72d0b897f51ad > files > 342

mnogosearch-3.3.8-3mdv2010.0.i586.rpm

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<HTML
><HEAD
><TITLE
>Multiple languages support</TITLE
><META
NAME="GENERATOR"
CONTENT="Modular DocBook HTML Stylesheet Version 1.79"><LINK
REL="HOME"
TITLE="mnoGoSearch 3.3.8 reference manual"
HREF="index.html"><LINK
REL="PREVIOUS"
TITLE="Tags"
HREF="msearch-tags.html"><LINK
REL="NEXT"
TITLE="Search pages with multi-lingual interface
    
  "
HREF="msearch-multilang.html"><LINK
REL="STYLESHEET"
TYPE="text/css"
HREF="mnogo.css"><META
NAME="Description"
CONTENT="mnoGoSearch - Full Featured Web site Open Source Search Engine Software over the Internet and Intranet Web Sites Based on SQL Database. It is a Free search software covered by GNU license."><META
NAME="Keywords"
CONTENT="shareware, freeware, download, internet, unix, utilities, search engine, text retrieval, knowledge retrieval, text search, information retrieval, database search, mining, intranet, webserver, index, spider, filesearch, meta, free, open source, full-text, udmsearch, website, find, opensource, search, searching, software, udmsearch, engine, indexing, system, web, ftp, http, cgi, php, SQL, MySQL, database, php3, FreeBSD, Linux, Unix, mnoGoSearch, MacOS X, Mac OS X, Windows, 2000, NT, 95, 98, GNU, GPL, url, grabbing"></HEAD
><BODY
CLASS="chapter"
BGCOLOR="#EEEEEE"
TEXT="#000000"
LINK="#000080"
VLINK="#800080"
ALINK="#FF0000"
><!--#include virtual="body-before.html"--><DIV
CLASS="NAVHEADER"
><TABLE
SUMMARY="Header navigation table"
WIDTH="100%"
BORDER="0"
CELLPADDING="0"
CELLSPACING="0"
><TR
><TH
COLSPAN="3"
ALIGN="center"
><SPAN
CLASS="application"
>mnoGoSearch</SPAN
> 3.3.8 reference manual: Full-featured search engine software</TH
></TR
><TR
><TD
WIDTH="10%"
ALIGN="left"
VALIGN="bottom"
><A
HREF="msearch-tags.html"
ACCESSKEY="P"
>Prev</A
></TD
><TD
WIDTH="80%"
ALIGN="center"
VALIGN="bottom"
></TD
><TD
WIDTH="10%"
ALIGN="right"
VALIGN="bottom"
><A
HREF="msearch-multilang.html"
ACCESSKEY="N"
>Next</A
></TD
></TR
></TABLE
><HR
ALIGN="LEFT"
WIDTH="100%"></DIV
><DIV
CLASS="chapter"
><H1
><A
NAME="international"
></A
>Chapter 9. Multiple languages support</H1
><DIV
CLASS="TOC"
><DL
><DT
><B
>Table of Contents</B
></DT
><DT
><A
HREF="msearch-international.html#charset"
>Character sets
    <A
NAME="AEN3599"
></A
></A
></DT
><DT
><A
HREF="msearch-multilang.html"
>Search pages with multi-lingual interface
    <A
NAME="AEN3955"
></A
></A
></DT
><DT
><A
HREF="msearch-cjk.html"
>Segmenters for Chinese, Thai and Japanese languages</A
></DT
><DT
><A
HREF="msearch-vary.html"
>Indexing multilingual servers</A
></DT
></DL
></DIV
><DIV
CLASS="sect1"
><H1
CLASS="sect1"
><A
NAME="charset"
>Character sets
    <A
NAME="AEN3599"
></A
></A
></H1
><DIV
CLASS="sect2"
><H2
CLASS="sect2"
><A
NAME="supcharsets"
>Supported character sets</A
></H2
><P
>&#13;    <SPAN
CLASS="application"
>mnoGoSearch</SPAN
> supports almost all known 8 bit
    character sets as well as the most widely used multi-byte character sets
    including Korean EUC-KR, Chinese Big5 and GB2312, Japanese Shift-JIS,
    EUC-JP and ISO-2022-JP, as well as UTF-8. Some multi-byte character
    sets are not supported by default, because the conversion tables for
    them are large which makes the size of executable programs larger.
    See <TT
CLASS="filename"
>configure</TT
> parameters to enable support
    for extra character sets.
    </P
><P
><SPAN
CLASS="application"
>mnoGoSearch</SPAN
> also supports the following
    Macintosh character sets: MacCE, MacCroatian, MacGreek, MacRoman,
    MacTurkish, MacIceland, MacRomania, MacThai, MacArabic, MacHebrew,
    MacCyrillic, MacGujarati.
    </P
><DIV
CLASS="table"
><A
NAME="AEN3608"
></A
><P
><B
>Table 9-1. Supported character sets</B
></P
><TABLE
BORDER="1"
CLASS="CALSTABLE"
><COL><COL><TBODY
><TR
><TD
>Languages</TD
><TD
>Character sets</TD
></TR
><TR
><TD
>&#13;            Western Europe:
            Albanian, Catalan, Danish, Dutch, English, Faeroese, Finnish, French,
            Galician, German, Icelandic, Italian, Norwegian, Portuguese, Spanish,
            Swedish
            </TD
><TD
>&#13;            ASCII 8, CP437,
            CP850, CP860, CP1252, ISO 8859-1, ISO 8859-15, MacRoman,
            MacIceland
            </TD
></TR
><TR
><TD
>Eastern Europe:
            Croatian, Czech, Hungarian, Polish, Romanian, Slovak, Slovene
            </TD
><TD
>&#13;            CP852, CP1250, ISO 8859-2, MacCentralEurope, MacRomania,
            MacCroatian
            </TD
></TR
><TR
><TD
>Baltic: Latvian, Lithuanian, Estonian</TD
><TD
>CP1257, ISO-8859-4, ISO-8859-13</TD
></TR
><TR
><TD
>Cyrillic: Bulgarian, Belorussian, Macedonian, Russian, Serbian, Ukrainian</TD
><TD
>CP855, CP866, CP1251, ISO 8859-5, Koi8-r, Koi8-u, MacCyrillic</TD
></TR
><TR
><TD
>Arabic</TD
><TD
>CP864, CP1256, ISO 8859-6, MacArabic</TD
></TR
><TR
><TD
>Greek</TD
><TD
>CP869, CP1253, ISO 8859-7, MacGreek</TD
></TR
><TR
><TD
>Hebrew</TD
><TD
>CP1255, ISO 8859-8, MacHebrew</TD
></TR
><TR
><TD
>Turkish</TD
><TD
>CP857, CP1254, ISO 8859-9, MacTurkish</TD
></TR
><TR
><TD
>Japanese</TD
><TD
>Shift-JIS, EUC-JP, ISO-2022-JP</TD
></TR
><TR
><TD
>Simplified Chinese</TD
><TD
>GB2312</TD
></TR
><TR
><TD
>Traditional Chinese</TD
><TD
>Big5</TD
></TR
><TR
><TD
>Korean</TD
><TD
>EUC-KR</TD
></TR
><TR
><TD
>Thai</TD
><TD
>CP874, TIS 620, MacThai</TD
></TR
><TR
><TD
>Vietnamese</TD
><TD
>CP1258</TD
></TR
><TR
><TD
>Indian</TD
><TD
>MacGujarati, TSCII</TD
></TR
><TR
><TD
>Georgian</TD
><TD
>geostd8</TD
></TR
><TR
><TD
>Unicode: over 650 languages</TD
><TD
>UTF-8</TD
></TR
></TBODY
></TABLE
></DIV
></DIV
><DIV
CLASS="sect2"
><H2
CLASS="sect2"
><A
NAME="charset-onedb"
>Multiple languages in the same database</A
></H2
><P
><SPAN
CLASS="application"
>mnoGoSearch</SPAN
> allows to index
    documents in different languages into the same database. Disk space,
    required to store search data, depends on the choice of the
    character set that <SPAN
CLASS="application"
>mnoGoSearch</SPAN
> uses to store
    data. The character set is specified using the
    <B
CLASS="command"
><A
HREF="msearch-cmdref-localcharset.html"
>LocalCharset</A
></B
>
    command.
  </P
></DIV
><DIV
CLASS="sect2"
><H2
CLASS="sect2"
><A
NAME="recoding"
>Character set conversion</A
></H2
><P
>&#13;      <TT
CLASS="literal"
>indexer</TT
> converts all
      documents to the character set specified in the
      <B
CLASS="command"
><A
HREF="msearch-cmdref-localcharset.html"
>LocalCharset</A
></B
>
      command in <TT
CLASS="filename"
>indexer.conf</TT
> .
      Internally conversion is implemented using Unicode.
    </P
><P
>&#13;    <SPAN
CLASS="application"
>mnoGoSearch</SPAN
> performs character conversion
    in loss-less manner. Usually, conversion between different character
    sets can loose some data. For example, conversion of a text file from
    Greek <TT
CLASS="literal"
>cp1253</TT
> to Russian <TT
CLASS="literal"
>cp1251</TT
>
    will loose all Greek characters.
    To avoid data loss, <SPAN
CLASS="application"
>mnoGoSearch</SPAN
> stores all
    characters which cannot be simply covered to <A
HREF="msearch-cmdref-localcharset.html"
>LocalCharset</A
>
    using <TT
CLASS="literal"
>&#38;#nnn;</TT
> notation, where
    <TT
CLASS="literal"
>nnn</TT
> is the decimal code point of a character,
    according to Unicode.
    </P
><P
>&#13;    To avoid excessive use of disk space which can be caused
    by a huge amount of the <TT
CLASS="literal"
>&#38;#nnn;</TT
> sequences
    (each requires from 5 to 7 bytes) it's important to choose
    a good value for <A
HREF="msearch-cmdref-localcharset.html"
>LocalCharset</A
>. 
    If your document collection consists of documents in many scripts,
    like Greek and Russian and German, <TT
CLASS="literal"
>UTF-8</TT
> is
    usually the best choice for <A
HREF="msearch-cmdref-localcharset.html"
>LocalCharset</A
>.
    </P
></DIV
><DIV
CLASS="sect2"
><H2
CLASS="sect2"
><A
NAME="charset-searchdec"
>Character set conversion at search time</A
></H2
><P
>You can specify the <A
HREF="msearch-cmdref-browsercharset.html"
>BrowserCharset</A
>
    command to choose the character set which will be used to display
    search results.
    If <A
HREF="msearch-cmdref-browsercharset.html"
>BrowserCharset</A
> and <A
HREF="msearch-cmdref-localcharset.html"
>LocalCharset</A
>
    have different values, <SPAN
CLASS="application"
>mnoGoSearch</SPAN
>
    will apply character set conversion. Similar to indexing time,
    if some characters cannot be converted to
    <A
HREF="msearch-cmdref-browsercharset.html"
>BrowserCharset</A
>, they will be displayed using
    <TT
CLASS="literal"
>&#38;nnn;</TT
> notation.
    </P
></DIV
><DIV
CLASS="sect2"
><H2
CLASS="sect2"
><A
NAME="charsetsalias"
>Character sets aliases</A
></H2
><P
>Every character set is recognized by a number of its aliases.
    Different web servers can return the same charset using different
    notations. For example, ISO-8859-2, ISO8859-2, latin2 are the names
    same of the same character set. <SPAN
CLASS="application"
>mnoGoSearch</SPAN
>
    understands the following character set name aliases:
    </P
><DIV
CLASS="table"
><A
NAME="AEN3706"
></A
><P
><B
>Table 9-2. Character set aliases</B
></P
><TABLE
BORDER="1"
CLASS="CALSTABLE"
><COL><COL><TBODY
><TR
><TD
>ISO-2022-JP:</TD
><TD
>ISO-2022-JP</TD
></TR
><TR
><TD
>ISO-8859-1:</TD
><TD
>&#13;              CP819, CSISOLATIN, IBM819, ISO-8859-1, ISO-IR-100, ISO_8859-1, ISO_8859-1:1987, L1, LATIN1
            </TD
></TR
><TR
><TD
>ISO-8859-10:</TD
><TD
>&#13;              CSISOLATIN6, ISO-8859-10, ISO-IR-157, ISO_8859-10, ISO_8859-10:1992, L6, LATIN6
            </TD
></TR
><TR
><TD
>ISO-8859-11:</TD
><TD
>&#13;              ISO-8859-11, TIS-620, TIS620, TACTIS
            </TD
></TR
><TR
><TD
>ISO-8869-13:</TD
><TD
>&#13;              ISO-8859-13, ISO-IR-179, ISO_8859-13, L7, LATIN7
            </TD
></TR
><TR
><TD
>ISO-8859-14:</TD
><TD
>&#13;              ISO-8859-14, ISO-IR-199, ISO_8859-14, ISO_8859-14:1998, L8, LATIN8
            </TD
></TR
><TR
><TD
>ISO-8859-15:</TD
><TD
>&#13;              ISO-8859-15, ISO-IR-203, ISO_8859-15, ISO_8859-15:1998
            </TD
></TR
><TR
><TD
>ISO-8859-16:</TD
><TD
>&#13;              ISO-8859-16, ISO-IR-226, ISO_8859-16, ISO_8859-16:2000
            </TD
></TR
><TR
><TD
>ISO-8859-2:</TD
><TD
>&#13;              CSISOLATIN2, ISO-8859-2, ISO-IR-101, ISO_8859-2, ISO_8859-2:1987, L2, LATIN2
            </TD
></TR
><TR
><TD
>ISO-8859-3:</TD
><TD
>&#13;              CSISOLATIN3, ISO-8859-3, ISO-IR-109, ISO_8859-3, ISO_8859-3:1988, L3, LATIN3
            </TD
></TR
><TR
><TD
>ISO-8859-4:</TD
><TD
>&#13;              CSISOLATIN4, ISO-8859-4, ISO-IR-110, ISO_8859-4, ISO_8859-4:1988, L4, LATIN4
            </TD
></TR
><TR
><TD
>ISO-8859-5:</TD
><TD
>CSISOLATINCYRILLIC, CYRILLIC, ISO-8859-5, ISO-IR-144, ISO_8859-5, ISO_8859-5:1988</TD
></TR
><TR
><TD
>ISO-8859-6:</TD
><TD
>&#13;              ARABIC, ASMO-708, CSISOLATINARABIC, ECMA-114, ISO-8859-6, ISO-IR-127, ISO_8859-6, ISO_8859-6:1987
            </TD
></TR
><TR
><TD
>ISO-8859-7:</TD
><TD
>&#13;              CSISOLATINGREEK, ECMA-118, ELOT_928, GREEK, GREEK8, ISO-8859-7, ISO-IR-126, ISO_8859-7, ISO_8859-7:1987
            </TD
></TR
><TR
><TD
>ISO-8859-8:</TD
><TD
>&#13;              CSISOLATINHEBREW, HEBREW, ISO-8859-8, ISO-IR-138, ISO_8859-8, ISO_8859-8:1988
            </TD
></TR
><TR
><TD
>ISO-8859-9:</TD
><TD
>&#13;              CSISOLATIN5, ISO-8859-9, ISO-IR-148, ISO_8859-9, ISO_8859-9:1989, L5, LATIN5
            </TD
></TR
><TR
><TD
>armscii-8:</TD
><TD
>ARMSCII-8, ARMSCII8</TD
></TR
><TR
><TD
>big5:</TD
><TD
>&#13;              BIG-5, BIG-FIVE, BIG5, BIGFIVE, CN-BIG5, CSBIG5
            </TD
></TR
><TR
><TD
>cp1250:</TD
><TD
>&#13;              CP1250, MS-EE, WINDOWS-1250
            </TD
></TR
><TR
><TD
>cp1251:</TD
><TD
>&#13;              CP1251, MS-CYRL, WINDOWS-1251
            </TD
></TR
><TR
><TD
>cp1252:</TD
><TD
>&#13;              CP1252, MS-ANSI, WINDOWS-1252
            </TD
></TR
><TR
><TD
>cp1253:</TD
><TD
>&#13;              CP1253, MS-GREEK, WINDOWS-1253
            </TD
></TR
><TR
><TD
>cp1254:</TD
><TD
>&#13;              CP1254, MS-TURK, WINDOWS-1254
            </TD
></TR
><TR
><TD
>cp1255:</TD
><TD
>&#13;              CP1255, MS-HEBR, WINDOWS-1255
            </TD
></TR
><TR
><TD
>cp1256:</TD
><TD
>&#13;              CP1256, MS-ARAB, WINDOWS-1256
            </TD
></TR
><TR
><TD
>cp1257:</TD
><TD
>&#13;              CP1257, WINBALTRIM, WINDOWS-1257
            </TD
></TR
><TR
><TD
>cp1258:</TD
><TD
>&#13;              CP1258, WINDOWS-1258
            </TD
></TR
><TR
><TD
>cp437:</TD
><TD
>&#13;              437, CP437, IBM437
            </TD
></TR
><TR
><TD
>cp850:</TD
><TD
>&#13;              850, CP850, CSPC850MULTILINGUAL, IBM850
            </TD
></TR
><TR
><TD
>cp852:</TD
><TD
>&#13;              852, CP852, IBM852
            </TD
></TR
><TR
><TD
>cp855:</TD
><TD
>&#13;              855, CP855, IBM855
            </TD
></TR
><TR
><TD
>cp857:</TD
><TD
>&#13;              857, CP857, IBM857
            </TD
></TR
><TR
><TD
>cp860:</TD
><TD
>&#13;              860, CP860, IBM860
            </TD
></TR
><TR
><TD
>cp861:</TD
><TD
>&#13;              861, CP861, IBM861
            </TD
></TR
><TR
><TD
>cp862:</TD
><TD
>&#13;              862, CP862, IBM862
            </TD
></TR
><TR
><TD
>cp863:</TD
><TD
>&#13;              863, CP863, IBM863
            </TD
></TR
><TR
><TD
>cp864:</TD
><TD
>&#13;              864, CP864, IBM864
            </TD
></TR
><TR
><TD
>cp865:</TD
><TD
>&#13;              865, CP865, IBM865
            </TD
></TR
><TR
><TD
>cp866:</TD
><TD
>&#13;              866, CP866, CSIBM866, IBM866
            </TD
></TR
><TR
><TD
>cp869:</TD
><TD
>&#13;              869, CP869, IBM869, CP874, WINDOWS-874
            </TD
></TR
><TR
><TD
>EUC-JP:</TD
><TD
>&#13;              CSEUCJP, EUC-JP, EUCJP, UJIS, X-EUC-JP
            </TD
></TR
><TR
><TD
>EUC-KR:</TD
><TD
>&#13;              CSEUCKR, EUC-KR, EUCKR
            </TD
></TR
><TR
><TD
>GB2312:</TD
><TD
>&#13;              CHINESE, CSGB2312, CSISO58GB231280, GB2312, GB_2312-80, ISO-IR-58
            </TD
></TR
><TR
><TD
>koi8-r:</TD
><TD
>&#13;              CSKOI8R, KOI8-R, KOI8R
            </TD
></TR
><TR
><TD
>KOI8-u</TD
><TD
>&#13;              KOI8-U, KOI8U
            </TD
></TR
><TR
><TD
>shift-JIS:</TD
><TD
>&#13;              CSSHIFTJIS, MS_KANJI, S-JIS, SHIFT-JIS, SHIFT_JIS, SJIS
            </TD
></TR
><TR
><TD
>cp367:</TD
><TD
>&#13;              ANSI_X3.4-1968, ASCII, CP367, CSASCII, IBM367, ISO-IR-6, ISO646-US, ISO_646.IRV:1991, US, US-ASCII
            </TD
></TR
><TR
><TD
>UTF8:</TD
><TD
>&#13;              UTF-8, UTF8
            </TD
></TR
><TR
><TD
>viscii:</TD
><TD
>&#13;              CSVISCII, VISCII, VISCII1.1-1
            </TD
></TR
><TR
><TD
>MacCyrillic:</TD
><TD
>&#13;              MACCYRILLIC, X-MAC-CYRILLIC
            </TD
></TR
><TR
><TD
>MacRoman:</TD
><TD
>&#13;             MACROMAN, MACINTOSH, CSMACINTOSH,  MAC
            </TD
></TR
><TR
><TD
>MacCentralEurope:</TD
><TD
>&#13;             MACCENTRALEUROPE, MACCE 
            </TD
></TR
></TBODY
></TABLE
></DIV
></DIV
><DIV
CLASS="sect2"
><H2
CLASS="sect2"
><A
NAME="charsetdetect"
>Document character set detection</A
></H2
><P
><SPAN
CLASS="application"
>indexer</SPAN
> detects document
    character set in this order:</P
><P
></P
><OL
TYPE="1"
><LI
><P
>&#13;          <TT
CLASS="literal"
>Content-type: text/html; charset=xxx</TT
> - HTTP
        response readers.
        </P
></LI
><LI
><P
>&#13;        <TT
CLASS="literal"
>&#60;META NAME="Content-Type" CONTENT="text/html; charset=xxx"&#62;</TT
>
        (for HTML documents) or
        </P
><P
>&#13;        <TT
CLASS="literal"
>&#60;?xml version="1.0" encoding="xxx"?&#62;</TT
>
        (for XML documents)
        </P
><P
>&#13;          <A
NAME="AEN3880"
></A
>
          <DIV
CLASS="note"
><BLOCKQUOTE
CLASS="note"
><P
><B
>Note: </B
>
          Processing of the meta tags can be switched off by adding
          <B
CLASS="command"
>GuesserUseMeta no</B
>
          into <TT
CLASS="filename"
>indexer.conf</TT
>.
          </P
></BLOCKQUOTE
></DIV
>
        </P
></LI
><LI
><P
>The default value, according to the command
        <B
CLASS="command"
><A
HREF="msearch-cmdref-remotecharset.html"
>RemoteCharset</A
></B
>
        of the corresponding
          <B
CLASS="command"
><A
HREF="msearch-cmdref-server.html"
>Server</A
></B
> or
          <B
CLASS="command"
><A
HREF="msearch-cmdref-realm.html"
>Realm</A
></B
> command.
        </P
></LI
></OL
></DIV
><DIV
CLASS="sect2"
><H2
CLASS="sect2"
><A
NAME="charset-guesser"
>Automatic character set guesser</A
></H2
><P
>Starting with the version 3.2.0, <SPAN
CLASS="application"
>mnoGoSearch</SPAN
>
    has an automatic character set and language guesser. It currently
    recognizes more than 100 various character set and language combinations.
    Charset and language detection is implemented using the
    <TT
CLASS="literal"
>"N-Gram-Based Text Categorization"</TT
> technique.
    There is a number of so called <SPAN
CLASS="emphasis"
><I
CLASS="emphasis"
>language map</I
></SPAN
> files,
    one for every language-charset pair. They are installed under
    <TT
CLASS="filename"
>/usr/local/mnogosearch/etc/langmap/</TT
> directory by
    default. Have a look into this directory to check the list of the
    currently provided character set-language pairs.
    </P
><DIV
CLASS="note"
><BLOCKQUOTE
CLASS="note"
><P
><B
>Note: </B
>
    Character set and language guesser works fine for the texts longer
    than 500 characters. Shorter texts may not be guessed so well.
    </P
></BLOCKQUOTE
></DIV
><DIV
CLASS="sect3"
><H3
CLASS="sect3"
><A
NAME="mguesser"
>Building your own language maps</A
></H3
><P
>&#13;      To build your own language map use
      the <A
NAME="AEN3907"
></A
>
      <SPAN
CLASS="application"
>mguesser</SPAN
> utility. In addition,
      you'll need a set of text files with the sample texts
      (the models) for the desired language and character set.
      To create a new language map, run the following command:
<PRE
CLASS="programlisting"
>&#13;mguesser -p -c charset -l language &#60; FILENAME &#62; language.charset.lm
</PRE
>
      </P
><P
>&#13;      You can also use <TT
CLASS="literal"
>mguesser</TT
> to
      guess language and character set for a document
      using the existing language maps. Try the following command:
<PRE
CLASS="programlisting"
>&#13;mguesser [-n maxhits] &#60; FILENAME
</PRE
>
      </P
><P
>&#13;      You may want to create map files for different character sets
      for the same language. To convert a model file between character
      sets supported by  <SPAN
CLASS="application"
>mnoGoSearch</SPAN
>,
      use the <A
NAME="AEN3916"
></A
><TT
CLASS="literal"
>mconv</TT
>
      utility, which is  part of <SPAN
CLASS="application"
>mnoGoSearch</SPAN
>
      distribution.
<PRE
CLASS="programlisting"
>&#13;mconv [OPTIONS] -f charset_from -t charset_to [configfile] &#60; infile &#62; outfile
</PRE
>
      </P
><P
>&#13;      By default, both <TT
CLASS="literal"
>mguesser</TT
> and <TT
CLASS="literal"
>mconv</TT
>
      utilities are installed into the
      <TT
CLASS="filename"
>/usr/local/mnogosearch/sbin/</TT
> directory.
      </P
></DIV
><P
>&#13;    <A
NAME="AEN3926"
></A
>
    Starting from the version 3.2.14, <SPAN
CLASS="application"
>mnoGoSearch</SPAN
>
    can update the existing language and character set maps
    automatically during indexing, if the remote server supplies
    pages with correctly specified language and character set.
    To enable this function, specify command
<PRE
CLASS="programlisting"
>&#13;LangMapUpdate yes
</PRE
>
in your <TT
CLASS="filename"
>indexer.conf</TT
>.
    </P
></DIV
><DIV
CLASS="sect2"
><H2
CLASS="sect2"
><A
NAME="defcharset"
>The default character set
      <A
NAME="AEN3934"
></A
></A
></H2
><P
>&#13;      Use the <A
HREF="msearch-cmdref-remotecharset.html"
>RemoteCharset</A
>
      <TT
CLASS="filename"
>indexer.conf</TT
> command
      to choose the default character set of the sites you index.
    </P
></DIV
><DIV
CLASS="sect2"
><H2
CLASS="sect2"
><A
NAME="deflang"
>The default Language
      <A
NAME="AEN3942"
></A
></A
></H2
><P
>You can also set the default language for
      the sites you index with help of the 
      <A
HREF="msearch-cmdref-defaultlang.html"
>DefaultLang</A
>
      <TT
CLASS="filename"
>indexer.conf</TT
> command.
      <DIV
CLASS="note"
><BLOCKQUOTE
CLASS="note"
><P
><B
>Note: </B
>
      You can restricts search results to a specific
      language by using the <TT
CLASS="literal"
>g</TT
> query string variable.
      Have a look into <B
CLASS="command"
><A
HREF="msearch-doingsearch.html#search-params"
>the Section called <I
>Search parameters
    <A
NAME="AEN4282"
></A
></I
> in Chapter 10</A
></B
>
      for details.
      </P
></BLOCKQUOTE
></DIV
>
    </P
></DIV
></DIV
></DIV
><DIV
CLASS="NAVFOOTER"
><HR
ALIGN="LEFT"
WIDTH="100%"><TABLE
SUMMARY="Footer navigation table"
WIDTH="100%"
BORDER="0"
CELLPADDING="0"
CELLSPACING="0"
><TR
><TD
WIDTH="33%"
ALIGN="left"
VALIGN="top"
><A
HREF="msearch-tags.html"
ACCESSKEY="P"
>Prev</A
></TD
><TD
WIDTH="34%"
ALIGN="center"
VALIGN="top"
><A
HREF="index.html"
ACCESSKEY="H"
>Home</A
></TD
><TD
WIDTH="33%"
ALIGN="right"
VALIGN="top"
><A
HREF="msearch-multilang.html"
ACCESSKEY="N"
>Next</A
></TD
></TR
><TR
><TD
WIDTH="33%"
ALIGN="left"
VALIGN="top"
>Tags</TD
><TD
WIDTH="34%"
ALIGN="center"
VALIGN="top"
>&nbsp;</TD
><TD
WIDTH="33%"
ALIGN="right"
VALIGN="top"
>Search pages with multi-lingual interface
    <A
NAME="AEN3955"
></A
></TD
></TR
></TABLE
></DIV
><!--#include virtual="body-after.html"--></BODY
></HTML
>