README.htdig
上传用户:seven77cht
上传日期:2007-01-04
资源大小:486k
文件大小:6k
- WWWOFFLE - World Wide Web Offline Explorer - Version 2.4c
- =========================================================
- The progam ht://Dig is a free (GPL) internet indexing and search program. The
- ht://Dig documentation describes itself as follows:
- The ht://Dig system is a complete world wide web indexing and
- searching system for a small domain or intranet. This system
- is *not* meant to replace the need for powerful internet-wide
- search systems like Lycos, Infoseek, Webcrawler and AltaVista.
- Instead it is meant to cover the search needs for a single
- company, campus, or even a particular sub section of a web site.
- As opposed to some WAIS-based or web-server based search
- engines, ht://Dig can span several web servers at a site. The
- type of these different web servers doesn't matter as long as
- they understand the HTTP 1.0 protocol.
- ht://Dig was developed at San Diego State University as a way
- to search the various web servers on the campus network.
- I have configured ht://Dig so that it can be used with WWWOFFLE so that the
- entire cache of pages can be indexed. There are three stages to using the
- program that are described in this document; installation, digging and
- searching.
- Installing ht://Dig
- -------------------
- Note: If you already have version 3.1.0b3 or later of htdig installed and
- working then you can skip this section.
- To be able to use this program it must be installed. The instructions below
- give a step-by-step guide to this process.
- 1) Get the ht://Dig source code
- Download the source for version 3.1.0b4 of the program
- http://www.htdig.org/files/htdig-3.1.0b4.tar.gz
- 2) Unpack the source code
- Use
- tar -xvzf htdig-3.1.0b4.tar.gz
- to create the directory htdig-3.1.0b4 with the program source files in.
- 3) Configure the ht://Dig program
- Move to the htdig-3.1.0b4 directory and run the configuration program
- cd htdig-3.1.0b4
- ./configure
- 4) Compile ht://Dig
- Run make to compile htdig
- make
- make install
- This will compile and install it. Any problems at this stage will require the
- use of the ht://Dig documentation to solve.
- Configure WWWOFFLE to run with ht://Dig
- ---------------------------------------
- The configuration files for the htdig programs as used with WWWOFFLE will have
- been installed in /var/spool/wwwoffle/html/htdig/conf when WWWOFFLE was
- installed. The scripts used to run the htdig programs will have been installed
- in /var/spool/wwwoffle/html/htdig/scripts when WWWOFFLE was installed.
- These files should be correct if the information in the WWWOFFLE Makefile
- (LOCALHOST and SPOOLDIR) was set correctly. Check them, they should have the
- spool directory and the proxy hostname and port set correctly.
- Also they should be checked to ensure that the ht://Dig programs are on the path
- (you can edit the PATH variable here if they are not in /usr/local/bin). The
- merging process can use a lot of disk space when the sort program is run, you
- can change the location of the temporary directory used for this with the TMPDIR
- variable.
- The Fuzzy Database
- ------------------
- The ht://Dig programs use a database of fuzzy word endings and synonyms. This
- needs to be created just once, there is a script provided with WWWOFFLE that
- does this.
- /var/spool/wwwoffle/html/htdig/scripts/wwwoffle-htfuzzy
- If you have an existing ht://Dig installation then this step will probably have
- already been performed and is not required again.
- Note: When you do this will take a *long* time since it produces two databases
- that htsearch uses to help in matching words.
- Digging and Merging
- -------------------
- Digging is the name that is given to the process of searching through the
- web-pages to make the list of words. Merging is the process of converting the
- raw list of words into a database that can be searched.
- The ht://Dig installation will include a script called 'rundig' that
- demonstrates how digging and merging is supposed to work. To work with WWWOFFLE
- I have produced my own scripts that should be used instead.
- /var/spool/wwwoffle/html/htdig/scripts/wwwoffle-htdig-full
- /var/spool/wwwoffle/html/htdig/scripts/wwwoffle-htdig-incr
- /var/spool/wwwoffle/html/htdig/scripts/wwwoffle-htdig-lasttime
- The first of these scripts will do a full search and index all of the URLs in
- the cache. The second one will do an incremental search and will only index
- those that have changed since the last full search was done. The third will add
- in the files in the lasttime index into the database.
- Note: The lastime index requires the use of htdig version 3.1.0 or later.
- Unfortunately due to the way that the htmerge program works, it will take almost
- as long to do an incremental search or a lasttime search as to do a full search.
- The only differnce is that for the incremental search and lasttime search the
- WWWOFFLE cache is only accessed for the files that have changed.
- Searching
- ---------
- The search page for ht://Dig is located at http://localhost:8080/htdig/ and is
- linked to from the "Welcome Page". The word or words that you want to search
- for should be entered here.
- This form actually calls the script
- /var/spool/wwwoffle/html/htdig/scripts/wwwoffle-htsearch
- to do the searching so it is possible to edit this to modify it if required.
- Thanks to
- ---------
- I would like to thank the htdig maintainer (Geoffrey.R.Hutchison@williams.edu)
- for the help that he has provided to get me started with htdig and the patches
- and comments that he has accepted from me into the htdig program.
- Andrew Bishop
- 10th Jan 1999