# ----- Example 9 - Filtering PDF with "prog" ------- # # Please see the swish-e documentation for # information on configuration directives. # Documentation is included with the swish-e # distribution, and also can be found on-line # at http://swish-e.org # # # This example demonstrates how to use swish's # "prog" document source feature to filter documents. # # The "prog" document source feature allows # an external program to feed documents to # swish, one after another. This allows you # to index documents from any source (e.g. web, DBMS) # and to filter and adjust the content before swish # indexes the content. # # Using the "prog" method to filter documents requires more # work to set up than using the "filters" described in # example8.config because you must write a program to retrieve # the documents and feed them to swish. # # On the otherhand, the "prog" method should be faster than the # filter method in example8.config because swish doesn't need to fork # itself and run an external program for each document to filter. # This can be significant if you are using a perl script as a filter since # the perl script must be compiled each time it is run. This "prog" method # avoides that overhead. # # This example uses the example9.pl program. This program # is very similar to the included DirTree.pl program found in # the prog-bin directory. This program simple reads files from the # file system, and passes their content onto swish if they are the correct # type. PDF files are converted by the prog-bin/pdf2xml.pm module. # # The PDF info fields (e.g. author) are placed in xml tags # which allows indexing the PDF info as MetaNames. # By specifying metanemes you can limit searches by this PDF info. # # For this example, you will need the xpdf package. # Type "perldoc pdf2xml" from the prog-bin directory for # more information. # # Run this example as: # # swish-e -S prog -c example9.config # #--------------------------------------------------- # Include our site-wide configuration settings: IncludeConfigFile example4.config # Define the program to run IndexDir ./example9.pl # Pass in the top-level directory to index # (here we specify the current directory) SwishProgParameters . # Swish can index a number of different types of documents. # .config are text, and .pdf are converted (filtered) to xml: IndexContents TXT .config IndexContents XML .pdf # Since the pdf2xml module generates xml for the PDF info fields and # for the PDF content, let's use MetaNames # Instead of specifying each metaname, let's let swish do it automatically. UndefinedMetaTags auto # Show what's happening IndexReport 3 # end of example