wwwoffle.conf.man
上传用户:seven77cht
上传日期:2007-01-04
资源大小:486k
文件大小:22k
- ." $Header: /home/amb/wwwoffle/RCS/wwwoffle.conf.man 2.50 1999/09/08 18:35:06 amb Exp $
- ."
- ." WWWOFFLE - World Wide Web Offline Explorer - Version 2.5.
- ."
- ." Manual page for wwwoffle.conf
- ."
- ." Written by Andrew M. Bishop
- ."
- ." This file Copyright 1997,98,99 Andrew M. Bishop
- ." It may be distributed under the GNU Public License, version 2, or
- ." any higher version. See section COPYING of the GNU Public license
- ." for conditions under which this file may be redistributed.
- ."
- .TH wwwoffle.conf 5 "August 23rd, 1999"
- .SH NAME
- wwwoffle.conf - The configuration file for the proxy server for the World Wide Web Offline Explorer.
- .SH DESCRIPTION
- The
- .I wwwoffle.conf
- file contains the configuration for the wwwoffled proxy HTTP server part of the
- .I
- World Wide Web Offline Explorer
- program.
- .LP
- The file is split into sections, each of which can be empty or contain one or
- more lines of configuration information. The sections are named and the order
- that they appear in the file is not important.
- .LP
- The general format of each of the sections is the same. The name of the section
- is on a line by itself to mark the start. The contents of the section are
- enclosed between a pair of lines containing the '{' and '}' characters or '['
- and ']' characters. When the '{' and '}' characters are used the lines between
- contain configuration information. When the '[' and ']' characters are used the
- there must only be a single non-empty line between them that contains the name
- of a file (in the same directory) containing the configuration information.
- .LP
- Comments are marked by a '#' character at the start of the line and blank lines
- are also allowed, both are ignored.
- .LP
- The
- .B StartUp
- section can contain the following:
- .TP
- .B http-port = <port>
- The port number to use on the local host as the HTTP proxy (default=8080).
- .TP
- .B wwwoffle-port = <port>
- The port number to use on the local host as the WWWOFFLE control port
- (default=8081).
- .TP
- .B spool-dir = <dirname>
- The name of the spool directory to use for the cache. A subdirectory is created
- in this for each new web server that is contacted (default=/var/spool/wwwoffle).
- .TP
- .B run-uid = <username> | <uid> | none |
- The username of numeric uid to run the WWWOFFLE server with. To use this
- option, the program must be started by root.
- .TP
- .B run-gid = <groupname> | <gid> | none |
- The groupname of numeric gid to run the WWWOFFLE server with. To use this
- option, the program must be started by root.
- .TP
- .B use-syslog = yes | no
- The syslog facility can be used to log the important error messages.
- (default=yes).
- .TP
- .B password = <word> | none |
- The authorisation password that is required to use the wwwoffle program to
- configure the server or use the interactive control page (default=none). If
- this is not present, or set to an empty string or 'none' then there is no
- password required. If there is a password set, then the -c option to wwwoffle
- must be used, and the file wwwoffle.conf should be made readable only by
- authorised users.
- .TP
- .B max-servers = <integer>
- .B max-fetch-servers = <integer>
- The maximum number of servers processes that are started (default=8). The
- maximum number of server processes that are forked to fetch pages that were
- requested in offline mode (default = 4). The
- .I max-fetch-servers
- value must be less than
- .I max-servers
- or you will not be able to use WWWOFFLE interactively online while fetching.
- .TP
- .B dir-perm = <octal integer>
- The permissions to use when creating spool directories (default=0755), this
- overrides the umask value and must be in octal starting with a '0' character.
- .TP
- .B file-perm = <octal integer>
- The permissions to use when creating spool files (default=0644), this overrides
- the umask value and must be in octal starting with a '0' character.
- .TP
- .B run-online = <filename>
- The name of a program to run when switched to online mode (default=none).
- .TP
- .B run-offline = <filename>
- The name of a program to run when switched to offline mode (default=none).
- .TP
- .B run-autodial = <filename>
- The name of a program to run when switched to autodial mode (default=none). The
- programs run using the
- .I run-online,
- .I run-offline
- and
- .I run-autodial
- options are started with a single parameter set to the current mode.
- .LP
- The
- .B Options
- section contains other options that configure the server.
- .TP
- .B log-level = debug | info | important | warning | fatal
- The error messages that have a priority the same as that specified or greater
- are recorded on the output, either syslog or stderr (see wwwoffled(1)).
- .TP
- .B index-latest-days = <age>
- The maximum age in days of pages to show in the index of the latest pages
- (default=7).
- .TP
- .B request-changed = <time>
- While online pages will only be fetched if the cached version is older than this
- specified time in seconds (default=600). A negative value will force the cache
- to always be used in preference.
- .TP
- .B request-changed-once = yes | no
- While online pages will only be fetched if the cached version has not already
- been fetched once this session (default=yes). This option takes precedence over
- the request-changed option.
- .TP
- .B request-expired = yes | no
- While online pages that have expired will always be requested again
- (default=no). This option takes precedence over the request-changed and
- request-changed-once options.
- .TP
- .B request-no-cache = yes | no
- While online pages that ask not to be cached will always be requested again
- (default=no). This option takes precedence over the request-changed and
- request-changed-once options.
- .TP
- .B pragma-no-cache = yes | no
- Whether to request a new copy of a page if the request from the browser has
- "Pragma: no-cache" (default=yes). This option should be set to 'no' if when
- browsing offline all pages are re-requested by a 'broken' browser.
- .TP
- .B confirm-requests = yes | no
- Whether to return a page requiring user confirmation instead of automatically
- recording requests made while offline (default=no).
- .TP
- .B connect-timeout = <time>
- The time in seconds that WWWOFFLE will wait for the socket connection
- (default=30).
- .TP
- .B socket-timeout = <time>
- The time in seconds that WWWOFFLE will wait for data to arrive on a socket
- connection before timing out and giving an error (default=120 seconds).
- .TP
- .B connect-retry = yes | no
- If a connection cannot be made to a remote server then try again after a short
- delay (default=no).
- .TP
- .B ssl-allow-port = <integer>
- A port number that can be used for Secure Socket Layer (SSL) connections,
- e.g. https (default=none, for https use 443). There can be more than one of
- these entries to allow other ports.
- .TP
- .B no-lasttime-index = yes | no
- Disables creation of the lasttime/prevtime indexes (default=no).
- .TP
- .B intr-download-keep = yes | no
- If the browser closes the connection while online the currently downloaded part
- of the page should be kept (default=no).
- .TP
- .B intr-download-size = <integer>
- If the browser closes the connection while online the page should continue to
- download if smaller than this size in kB (default=1).
- .TP
- .B intr-download-percent = <integer>
- If the browser closes the connection while online the page should continue to
- download if more than this amount complete (default=80).
- .TP
- .B timeout-download-keep = yes | no
- If the server connection timeouts while reading then the currently downloaded
- partial page should be kept (default=no).
- .LP
- The
- .B FetchOptions
- section contains options that configure the automated downloading of pages.
- When pages are requested offline and downloaded later, there is a choice of
- whether to fetch stylesheets, images, frames, scripts or other objects
- referenced in the HTML.
- .TP
- .B stylesheets = yes | no
- Fetch the style sheets from these pages as well (default=no).
- .TP
- .B images = yes | no
- Fetch the images from these pages as well (default=no).
- .TP
- .B frames = yes | no
- Fetch the frames from these pages as well (default=no).
- .TP
- .B scripts = yes | no
- Fetch the scripts from these pages as well (default=no).
- .TP
- .B objects = yes | no
- Fetch the objects (e.g. Java class files) from these pages as well (default=no).
- .LP
- The
- .B ModifyHTML
- section contains options that control how the HTML that is provided from the
- cache is modified. They all rely on the HTML being syntactically correct HTML,
- if it is not then the result is undefined.
- .TP
- .B enable-modify-html = yes | no
- Enable the HTML modifications in this section (has a speed penalty)
- (default=no).
- .TP
- .B add-cache-info = yes | no
- At the bottom of all of the spooled pages the date that the page was cached and
- some buttons are to be added (default=no).
- .TP
- .B anchor-cached-begin =<HTML code>
- Anchors (links) that are cached are to have the specified HTML inserted at the
- beginning (default="").
- .TP
- .B anchor-cached-end = <HTML code>
- Anchors (links) that are cached are to have the specified HTML inserted at the
- end (default="").
- .TP
- .B anchor-requested-begin =<HTML code>
- Anchors (links) that have been requested are to have the specified HTML inserted
- at the beginning (default="").
- .TP
- .B anchor-requested-end = <HTML code>
- Anchors (links) that have been requested are to have the specified HTML inserted
- at the end (default="").
- .TP
- .B anchor-not-cached-begin = <HTML code>
- Anchors (links) that are not cached or requested are to have the specified HTML
- inserted at the beginning (default="").
- .TP
- .B anchor-not-cached-end = <HTML code>
- Anchors (links) that are not cached or requested are to have the specified HTML
- inserted at the end (default="").
- .TP
- .B disable-script = yes | no
- Removes all scripts and scripted events (default=no).
- .TP
- .B disable-blink = yes | no
- Removes the <blink> tag (default=no).
- .TP
- .B disable-animated-gif = yes | no
- Disables the animation of GIF files (default=no).
- .LP
- The
- .B LocalHost
- section contains a list of possible names or IP addresses that the host running
- wwwoffled may be known as.
- .TP
- .I hostname
- The server may be known as
- .I hostname
- so does not need to contact itself to get pages. The entries must match
- exactly. All of the entries here are also used as if they were in the LocalNet
- and AllowConnect sections. None of the entries here are fetched via a proxy.
- .LP
- The
- .B LocalNet
- section contains a list of host names or IP addresses that are not to be cached
- because they are on the local network.
- .TP
- .I hostname
- A server that matches
- .I hostname
- is on the local network and not to be cached. The matching uses wildcards as
- described in the WILDCARD section. A host can be excluded by appending a '!' to
- the start of the name, all possible aliases and IP addresses for the host are
- also required. All entries here are assumed to be reachable even when offline.
- All of the entries in the LocalHost section are also not cached as if they were
- here also. None of the entries here are fetched via a proxy.
- .LP
- The
- .B AllowedConnectHosts
- section contains a list of host names or IP addresses that are allowed to
- connect to the server.
- .TP
- .I hostname
- A server that matches
- .I hostname
- is allowed to connect to the server. The matching uses wildcards as described
- in the WILDCARD section. A host can be excluded by appending a '!' to the start
- of the name, all possible aliases and IP addresses for the host are also
- required. All of the entries in the LocalHost section are also allowed to
- connect.
- .LP
- The
- .B AllowedConnectUsers
- contains a list of the users that are allowed to connect to the server.
- .TP
- .I <username>:<password>
- The username and password of the users that are allowed to connect to the
- server. The username and password are both stored in plaintext format. This
- requires the use of browsers that handle the HTTP/1.1 standard.
- .LP
- The
- .B DontCache
- section contains a way of recognising URLs not to be cached. They will still
- be cached however if it is fetched non-interactively.
- .TP
- .B URL-SPECIFICATION
- Don't cache files that match
- .B URL-SPECIFICATION.
- See the URL-SPECIFICATION section for details of the
- .B URL-SPECIFICATION
- option. The URL-SPECIFICATION can be negated, see the URL-SPECIFICATION
- section.
- .LP
- The
- .B DontGet
- section contains a way of recognising URLs not to be got. This can be used to
- reject junk adverts for example.
- .TP
- .B URL-SPECIFICATION [ = <URL> ]
- Don't get files that match
- .B URL-SPECIFICATION.
- Replace them with the optional replacement URL. See the URL-SPECIFICATION
- section for details of the
- .B URL-SPECIFICATION
- option. The URL-SPECIFICATION can be negated, see the URL-SPECIFICATION
- section.
- .TP
- The
- .B replacement = <URL>
- option allows a default replacement URL to be specified that will be used to
- replace all URLs that match any of the URL-SPECIFICATIONS in this section that
- do not have a replacement specified.
- .LP
- The
- .B DontGetRecursive
- section contains a way of recognising URLs not to be got when getting
- recursively.
- .TP
- See the DontCache section for a description of the options in this section.
- .LP
- The
- .B DontRequestOffline
- section contains a way of recognising URLs not to requested by users when offline
- .TP
- See the DontCache section for a description of the options in this section.
- .LP
- The
- .B CensorHeader
- section contains a list of the header lines that are to be modified in the
- request sent from the browser to the server or the reply from the server to the
- browser.
- .TP
- .I header = <string> | none |
- The lines in the request that start with
- .I header
- , followed by a ':' are removed before being passed on if there is no string on
- the right hand side, else that string replaces the one in the header. This
- option does not allow you to add headers that were not present in the original
- header.
- .TP
- .B referer-self = yes | no
- Sets the Referer header to the same as the URL (default = no).
- .TP
- .B referer-self-dir = yes | no
- Sets the Referer header to the URL directory name (default = no). This option
- takes precedence over referer-self if both are set.
- .LP
- The
- .B FTPOptions
- section contains the information that is required to be able to do anonymous ftp.
- .TP
- .B anon-username = <string>
- Specifies the username to use to fetch files using ftp (default is "anonymous",
- "ftp" is another option).
- .TP
- .B anon-password = <string>
- Specifies the password to use to fetch files using ftp (default is determined
- from the user running wwwoffled and the hostname, this may not work reliably
- especially if you are behind a firewall).
- .TP
- .B auth-hostname = <host[:port]>
- .B auth-username = <string>
- .B auth-password = <string>
- Specifies a triplet of hostname, username and password that allow non-anonymous
- access to a specific server. (These options must come in groups of three.) The
- auth-hostname must match exactly, no wildcards are used.
- .LP
- The
- .B MIMETypes
- section is a list of the mime type to associate with files that are not fetched
- using HTTP. This is required by browsers, most browsers come with a list that
- can be used here.
- .TP
- .B default = <mime-type>/<subtype>
- The default MIME type to use for files that do not match any of the other rules.
- .TP
- .I .<file-ext> = <mime-type>/<subtype>
- The MIME type to use for files that match the file extension.
- .LP
- The
- .B Proxy
- section contains a list of the hosts that are to be served via specified proxy
- servers. If no proxy is required then use 'none' or leave the proxy name blank.
- .TP
- .B default = <hostname:[port]> | none |
- Specifies the default proxy that all requests are to use.
- .TP
- .I URL-SPECIFICATION = <hostname:[port]> | none |
- For URLs that match
- .I URL-SPECIFICATION
- use the specified proxy.
- See the URL-SPECIFICATION section for details of the
- .B URL-SPECIFICATION
- option.
- .TP
- .B auth-hostname = <host[:port]>
- .B auth-username = <string>
- .B auth-password = <string>
- Specifies a proxy server host that requires proxy authentication by username and
- password to use it. (These options must come in groups of three.) The
- auth-hostname must match exactly, no wildcards are used.
- .TP
- .B ssl = <hostname:[port]> | none |
- A proxy server that should be used for Secure Socket Layer (SSL) connections
- e.g. https (default = none).
- .LP
- None of the entries in the LocalHost or LocalNet section are fetched using a
- proxy.
- .LP
- The
- .B DontIndex
- section contains a way of recognising URLs not to be indexed.
- .TP
- .B outgoing = URL-SPECIFICATION
- Do not index any URLs that match
- .I URL-SPECIFICATION
- in the outgoing index.
- .TP
- .B latest = URL-SPECIFICATION
- Do not index any URLs that match
- .I URL-SPECIFICATION
- in the lasttime/prevtime/latest indexes.
- .TP
- .B monitor = URL-SPECIFICATION
- Do not index any URLs that match
- .I URL-SPECIFICATION
- in the monitor index.
- .TP
- .B host = URL-SPECIFICATION
- Do not index any URLs that match
- .I URL-SPECIFICATION
- in the host indexes.
- .TP
- .B URL-SPECIFICATION
- Do not index any URLs that match
- .I URL-SPECIFICATION
- in any of the indexes.
- .LP
- See the URL-SPECIFICATION section for details of the
- .B URL-SPECIFICATION
- option.
- .LP
- The
- .B Alias
- A list of aliases that are used to replace the server name and path with another
- server name and path. Also for servers known by two names.
- .TP
- .I URL-SPECIFICATION1 = URL-SPECIFICATION2
- When a request matching
- .I URL-SPECIFICATION1
- is used the request is modified into a request for
- .I URL-SPECIFICATION2
- , the two are also considered identical for the purposes of indexing, purging
- and recursive fetching.
- The
- .I URL-SPECIFICATION
- must not be a wildcard match and the URL arguments are ignored.
- .LP
- The
- .B Purge
- section controls how the cache is purged. The method to determine which pages
- to purge, the default age, the host specific maximum age of the pages in days,
- and a maximum allowed cache size. An age of zero means to always delete when a
- purge is done, a negative age means never purge. The maximum cache size and
- minimum free space include the files that are from hosts that are marked never
- to be purged but will not purge them.
- .TP
- .B use-mtime = yes | no
- The decision of which pages to purge can be made on last access time (atime) or
- last modification time (mtime) (default=no).
- .TP
- .B max-size = <size>
- The maximum size of the cache in MB after purging, excluding the hosts that are
- never to be purged, if this is zero then it does not apply (default=0).
- .TP
- .B min-free = <size>
- The minimum amount of free disk space in MB after purging, excluding the hosts
- that are never to be purged, if this is zero then it does not apply (default=0).
- .TP
- .B use-url = yes | no
- If true then use the URL to decide on the purge age, otherwise use the protocol
- and host only (default=no).
- .TP
- .B del-dontget = yes | no
- If true then delete the files from hosts that are in the DontGet section
- (default=no).
- .TP
- .B del-dontcache = yes | no
- .TP
- If true then delete the files from hosts that are in the DontCache section
- (default=no).
- .B default = <age>
- The age to purge hosts that are not otherwise specified here (default=28).
- .TP
- .I URL-SPECIFICATION = ...
- The age to purge hosts with URLs that match
- .I URL-SPECIFICATION
- this does not include the path and extension part which are ignored.
- See the URL-SPECIFICATION section for details of the
- .B URL-SPECIFICATION
- option.
- .LP
- .SH WILDCARD
- A wildcard match is one that uses the '*' character to represent any group of
- characters.
- .LP
- This is basically the same as the command line file matching expressions in DOS
- or the UNIX shell, except that the '*' can match the '/' character. A maximum
- of 2 '*' characters can be used in any wildcard.
- .LP
- For example
- .LP
- *.gif matches foo.gif and bar.gif
- *.foo.com matches www.foo.com and ftp.foo.com
- /foo/* matches /foo/bar.html and /foo/bar/foobar.html
- .SH URL-SPECIFICATION
- When specifying a host and protocol and pathname in many of the sections a
- .B URL-SPECIFICATION
- can be used, this is a way of recognising a URL.
- .LP
- For the purposes of this explanation a URL is considered to be made up of five
- parts.
- .TP
- .B proto
- The protocol that is used (e.g. 'http', 'ftp')
- .TP
- .B host
- The server hostname (e.g. 'www.gedanken.demon.co.uk').
- .TP
- .B port
- The port number on the host (e.g. default of 80 for HTTP).
- .TP
- .B path
- The pathname on the host (e.g. '/bar.html') or a directory name (e.g. '/foo/').
- .TP
- .B args
- Optional arguments with the URL used for CGI scripts etc. (e.g. 'search=foo').
- .LP
- For example the WWWOFFLE homepage: http://www.gedanken.demon.co.uk/wwwoffle/
- The protocol is 'http', the host is 'www.gedanken.demon.co.uk', the port is
- the default (in this case 80), and the pathname is '/wwwoffle/'.
- .LP
- In general this is written as <proto>://<host>[:<port>]/<path>[?<args>]
- .LP
- Where [] indicates an optional feature, and <> indicate a user supplied name
- or number.
- .LP
- Some example URL-SPECIFICATION options are the following:
- .TP
- .B *://*
- Any protocol, Any host, Any port, Any path, Any args (This is that same as saying 'default').
- .TP
- .B *://*/<path>
- Any protocol, Any host, Any port, Named path, Any args
- .TP
- .B *://*/*.<ext>
- Any protocol, Any host, Any port, Named path, Any args
- .TP
- .B *://*/*?
- Any protocol, Any host, Any port, Any path, No args
- .TP
- .B *://<path>?*
- Any protocol, Any host, Any port, Named path, Any args
- .TP
- .B *://<host>
- Any protocol, Named host, Any port, Any path, Any args
- .TP
- .B <proto>://
- Named protocol, Any host, Any port, Any path, Any args
- .TP
- .B <proto>://<host>
- Named protocol, Named host, Any port, Any path, Any args
- .TP
- .B <proto>://<host>:
- Named protocol, Named host, Default port, Any path Any args
- .TP
- .B *://<host>:<port>
- Any protocol, Named host, Named port, Any path, Any args
- .LP
- The matching of the host, the path and the args use the wildcard matching that
- is described above.
- .LP
- In some sections that accept URL-SPECIFICATIONs they can be negated by appending
- the '!' character to the start. This will mean that the comparison of a URL
- with the URL-SPECIFICATION will return the logically opposite value to what
- would be returned without the '!'. If all of the URL-SPECIFICATIONs in a
- section are negated and '*://*/*' is added to the end then the sense of the
- whole section is negated.
- .SH EXAMPLE
- StartUp
- {
- http-port = 8080
- wwwoffle-port = 8081
- spool-dir = /var/spool/wwwoffle
- use-syslog = yes
- password =
- }
- Options
- {
- index-latest-days = 14
- add-info-refresh = no
- request-changed = 3600
- }
- FetchOptions
- {
- images = yes
- frames = yes
- }
- LocalHost
- {
- wwwoffle.foo.com
- localhost
- 127.0.0.1
- }
- DontGet
- [
- wwwoffle.DontGet.conf
- ]
- LocalNet
- {
- *.foo.com
- }
- AllowedConnectHosts
- {
- *.foo.com
- }
- Proxy
- {
- http://foo.com/* = www-cache.foo.com:8080
- }
- Purge
- {
- default = 28
- max-size = 10
- http://*.bar.com/* = 7
- }
- .SH FILES
- CONFDIR/wwwoffle.conf The wwwoffled(8) configuration file.
- .LP
- SPOOLDIR The WWWOFFLE spool directory.
- .SH SEE ALSO
- wwwoffle(1), wwwoffled(8).
- .SH AUTHOR
- Andrew M. Bishop 1996,1997,1998,1999 (amb@gedanken.demon.co.uk)