<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"> <HTML ><HEAD ><TITLE >Cached copies </TITLE ><META NAME="GENERATOR" CONTENT="Modular DocBook HTML Stylesheet Version 1.79"><LINK REL="HOME" TITLE="mnoGoSearch 3.3.8 reference manual" HREF="index.html"><LINK REL="UP" TITLE="Indexing" HREF="msearch-indexing.html"><LINK REL="PREVIOUS" TITLE="Disabling Apache logging" HREF="msearch-itips.html"><LINK REL="NEXT" TITLE="Extended indexing features" HREF="msearch-extended-indexing.html"><LINK REL="STYLESHEET" TYPE="text/css" HREF="mnogo.css"><META NAME="Description" CONTENT="mnoGoSearch - Full Featured Web site Open Source Search Engine Software over the Internet and Intranet Web Sites Based on SQL Database. It is a Free search software covered by GNU license."><META NAME="Keywords" CONTENT="shareware, freeware, download, internet, unix, utilities, search engine, text retrieval, knowledge retrieval, text search, information retrieval, database search, mining, intranet, webserver, index, spider, filesearch, meta, free, open source, full-text, udmsearch, website, find, opensource, search, searching, software, udmsearch, engine, indexing, system, web, ftp, http, cgi, php, SQL, MySQL, database, php3, FreeBSD, Linux, Unix, mnoGoSearch, MacOS X, Mac OS X, Windows, 2000, NT, 95, 98, GNU, GPL, url, grabbing"></HEAD ><BODY CLASS="sect1" BGCOLOR="#EEEEEE" TEXT="#000000" LINK="#000080" VLINK="#800080" ALINK="#FF0000" ><!--#include virtual="body-before.html"--><DIV CLASS="NAVHEADER" ><TABLE SUMMARY="Header navigation table" WIDTH="100%" BORDER="0" CELLPADDING="0" CELLSPACING="0" ><TR ><TH COLSPAN="3" ALIGN="center" ><SPAN CLASS="application" >mnoGoSearch</SPAN > 3.3.8 reference manual: Full-featured search engine software</TH ></TR ><TR ><TD WIDTH="10%" ALIGN="left" VALIGN="bottom" ><A HREF="msearch-itips.html" ACCESSKEY="P" >Prev</A ></TD ><TD WIDTH="80%" ALIGN="center" VALIGN="bottom" >Chapter 3. Indexing</TD ><TD WIDTH="10%" ALIGN="right" VALIGN="bottom" ><A HREF="msearch-extended-indexing.html" ACCESSKEY="N" >Next</A ></TD ></TR ></TABLE ><HR ALIGN="LEFT" WIDTH="100%"></DIV ><DIV CLASS="sect1" ><H1 CLASS="sect1" ><A NAME="stored" >Cached copies <A NAME="AEN2001" ></A ></A ></H1 ><P > Starting from the version 3.2.2 <SPAN CLASS="application" >mnoGoSearch</SPAN > is able to store compressed copies of the indexed documents, so called <SPAN CLASS="emphasis" ><I CLASS="emphasis" >cached copies</I ></SPAN >. Cached copies are stored in the same <ACRONYM CLASS="acronym" >SQL</ACRONYM > database. </P ><P > <SPAN CLASS="application" >search.cgi</SPAN > uses cached copies for two purposes: <P ></P ><OL TYPE="1" ><LI ><P > To display smart excerpts from every found document with the search query words in their context. </P ></LI ><LI ><P > To display the entire original copy of the document, with the search words highlighted. <DIV CLASS="note" ><BLOCKQUOTE CLASS="note" ><P ><B >Note: </B > A cached copy is opened in the browser when the user clicks on the <TT CLASS="literal" >Display cached copy</TT > link near every document in search results. </P ></BLOCKQUOTE ></DIV > Watching a cached copy can be especially useful when the original site is temporarily down or the document does not exist any longer. </P ></LI ></OL > </P ><P > Cached copies are displayed by with help of <SPAN CLASS="application" >search.cgi</SPAN > executed with a special <ACRONYM CLASS="acronym" >HTTP</ACRONYM > query string parameter. <SPAN CLASS="application" >search.cgi</SPAN > fetches a cached copy of the document from the <ACRONYM CLASS="acronym" >SQL</ACRONYM > database, decompresses it, and the document is displayed in your web browser, with search keywords highlighted. </P ><P > To enable cached copies support, compile <SPAN CLASS="application" >mnoGoSearch</SPAN > with <TT CLASS="literal" >zlib</TT > support: <PRE CLASS="programlisting" > ./configure --with-zlib <other arguments> </PRE > </P ><DIV CLASS="sect2" ><H2 CLASS="sect2" ><A NAME="stored-start" >Configuring cached copies</A ></H2 ><P > Collecting cached copies is enabled in the default version of <TT CLASS="filename" >indexer.conf</TT > using this line: <PRE CLASS="programlisting" > Section CachedCopy 0 64000 </PRE > </P ><P > The number <TT CLASS="literal" >64000</TT > is the maximum allowed cached copy size. When crawling, <SPAN CLASS="application" >indexer</SPAN > stores a cached copy only if its compressed size is smaller than the maximum allowed size. You can change this number according to your needs and your <ACRONYM CLASS="acronym" >SQL</ACRONYM > database capabilities. <DIV CLASS="note" ><BLOCKQUOTE CLASS="note" ><P ><B >Note: </B > Storing too large cached copies can affect search performance negatively. </P ></BLOCKQUOTE ></DIV > </P ><P > You can disable collecting cached copies: open <TT CLASS="filename" >indexer.conf</TT > in your favorite text editor and delete the <TT CLASS="literal" >Section CachedCopy</TT > line. Disabling cached copies will save disk space, however search results presentation will be not as good as with cached copies enabled. </P ></DIV ><DIV CLASS="sect2" ><H2 CLASS="sect2" ><A NAME="stored-search" >Using cached copies at search time</A ></H2 ><P > Displaying cached copies is enabled in the default search result template <TT CLASS="filename" >search.htm-dist</TT >. To check if your template enables displaying cached copies, open the template in a text editor and make sure that you have this <ACRONYM CLASS="acronym" >HTML</ACRONYM > code in the section <TT CLASS="literal" ><!--res--></TT >: <PRE CLASS="programlisting" > <A HREF="$(stored_href)">Display cached copy</A> </PRE > </P ><P > When using the default search template, <SPAN CLASS="application" >search.cgi</SPAN > refers to itself recursively, that is it when you follow the <TT CLASS="literal" >Display Cached Copy</TT > link in your browser, you'll open <SPAN CLASS="application" >search.cgi</SPAN > again (just with special query string parameters which tell to display a cached copy rather than search results). </P ><P >After cached copies have been configured, it works in the following order during search time:</P ><P ></P ><OL TYPE="1" ><LI ><P > For each document a link to its cached copy is displayed; </P ></LI ><LI ><P >When the user clicks the link, <SPAN CLASS="application" >search.cgi</SPAN > is executed. It sends a query to the <ACRONYM CLASS="acronym" >SQL</ACRONYM > database and fetches the cached copy content. </P ></LI ><LI ><P > <SPAN CLASS="application" >search.cgi</SPAN > decompresses the requested cached copy and sends it to the web browser, highlighting the search keywords using the highlighting method given in the <A HREF="msearch-cmdref-hlbeg.html" >HlBeg</A > and <A HREF="msearch-cmdref-hlend.html" >HlEnd</A > commands; </P ></LI ></OL ></DIV ><DIV CLASS="sect2" ><H2 CLASS="sect2" ><A NAME="stored-distributed" >Moving cached copies to another machine</A ></H2 ><P > You can optionally specify an alternative <ACRONYM CLASS="acronym" >URL</ACRONYM > for the <TT CLASS="literal" >Display Cached Copy</TT > links, to have cached copies reside under another location of the same server, or even on another physical server. For example: <PRE CLASS="programlisting" > <A HREF="http://site2/cgi-bin/search.cgi?$(stored_href)">Display cached copy</A> </PRE > Moving cached copies to another server can be useful to distribute <ACRONYM CLASS="acronym" >CPU</ACRONYM > load between machines. <DIV CLASS="note" ><BLOCKQUOTE CLASS="note" ><P ><B >Note: </B > <SPAN CLASS="application" >mnoGoSearch</SPAN > must be installed on the machine <TT CLASS="literal" >site2</TT >. </P ></BLOCKQUOTE ></DIV > </P ></DIV ><DIV CLASS="sect2" ><H2 CLASS="sect2" ><A NAME="stored-remote" >Using the original document as a cached copy source</A ></H2 ><P > Starting from the version <TT CLASS="literal" >3.3.8</TT >, <SPAN CLASS="application" >mnoGoSearch</SPAN > understands the <A HREF="msearch-cmdref-uselocalcachedcopy.html" >UseLocalCachedCopy</A > command in <TT CLASS="filename" >search.htm</TT > to force downloading documents from their original locations when generating <SPAN CLASS="emphasis" ><I CLASS="emphasis" >smart excerpts</I ></SPAN > for search results as well as when generating the "<SPAN CLASS="emphasis" ><I CLASS="emphasis" >Cached Copy</I ></SPAN >" documents. This command can be useful when you index the documents residing on your local file system and helps to avoid storing of cached copies in the database and thus makes the database smaller. </P ></DIV ></DIV ><DIV CLASS="NAVFOOTER" ><HR ALIGN="LEFT" WIDTH="100%"><TABLE SUMMARY="Footer navigation table" WIDTH="100%" BORDER="0" CELLPADDING="0" CELLSPACING="0" ><TR ><TD WIDTH="33%" ALIGN="left" VALIGN="top" ><A HREF="msearch-itips.html" ACCESSKEY="P" >Prev</A ></TD ><TD WIDTH="34%" ALIGN="center" VALIGN="top" ><A HREF="index.html" ACCESSKEY="H" >Home</A ></TD ><TD WIDTH="33%" ALIGN="right" VALIGN="top" ><A HREF="msearch-extended-indexing.html" ACCESSKEY="N" >Next</A ></TD ></TR ><TR ><TD WIDTH="33%" ALIGN="left" VALIGN="top" >Disabling Apache logging</TD ><TD WIDTH="34%" ALIGN="center" VALIGN="top" ><A HREF="msearch-indexing.html" ACCESSKEY="U" >Up</A ></TD ><TD WIDTH="33%" ALIGN="right" VALIGN="top" >Extended indexing features</TD ></TR ></TABLE ></DIV ><!--#include virtual="body-after.html"--></BODY ></HTML >