formats.txt
上传用户:ycwykj01
上传日期:2007-01-04
资源大小:1819k
文件大小:8k
- Mailbox Format Characteristics
- Mark Crispin
- 5 June 1999
- When a mailbox storage technology uses local files and
- directories directly, the file(s) and directories are layed out in a
- mailbox format.
- I. Flat-File Formats
- In these formats, a mailbox and all the messages inside are a
- single file on the filesystem. The mailbox name is the name of the
- file in the filesystem, relative to the user's "mail home directory."
- A flat-file format mailbox is always a file, never a directory.
- This means that it is impossible to have a flat-file format mailbox
- that has inferior mailbox names under it (so-called "dual-usage"
- mailboxes). For some inexplicable reason, some people want this.
- The mail home directory is usually the same as the user login
- home directory if that concept is meaningful; otherwise, it is some
- other default directory (e.g. "C:My Documents" on Windows 98). This
- can be redefined by modifying the c-client source code or in an
- application via the SET_HOMEDIR mail_parameters() call.
- For example, a mailbox named "project" is likely to be found in
- the file "project" in the user's home directory. Similarly, a mailbox
- named "test/trial1" (assuming a UNIX system) is likely to be found in
- the file "trial1" in the subdirectory "test" in the user's home
- directory.
- Note that the name "INBOX" has special semantics and rules, as
- described in the file naming.txt.
- The following flat-file formats are supported by c-client as of
- the time of this writing:
- . unix This is the traditional UNIX mailbox format, in use for nearly
- 30 years. It uses a line starting with "From " to indicate
- start of message, and stores the message status inside the
- RFC822 message header.
- unix is not particularly efficient; the entire mailbox file
- must be read when the mailbox is open, and when reading message
- texts it is necessary to convert the newline convention to
- Internet standard CR LF form. unix preserves UIDs, and allows
- the creation of keywords.
- Only one process may have a unix-format mailbox open
- read/write at a time.
- . mmdf This is the format used by the MMDF mailer. It uses a line
- consisting of 4 <CTRL/A> (0x01) characters to indicate start
- and end of message. Optionally, there may also be a unix
- format "From " line. It otherwise has the same
- characteristics as unix format.
- . mbx This is the current preferred mailbox format. It can be
- handled quite efficiently by c-client, without the problems
- that exist with unix and mmdf formats. Messages are stored
- in Internet standard CR LF format.
- mbx permits shared access, including shared expunge. It
- preserves UIDs, and allows the creation of keywords.
- . mtx This is supported for compatibility with the past. This is
- the old Tenex/TOPS-20 mail.txt format. It can be handled
- quite efficiently by c-client, and has most of the
- characteristics of mbx format.
- mtx is deficient in that it does not support shared expunge;
- it has no means to store UIDs; and it has no way to define
- keywords except through an external configuration file.
- . tenex This is supported for compatibility with the past. This is
- the old Columbia MM format. This is similar to mtx format,
- only it uses UNIX-style bare-LF newlines instead of CR LF
- newlines, thus incurring a performance penalty for newline
- conversion.
- . phile This is not strictly a format. Any file which is not in a
- recognized format is in phile format, which treats the entire
- contents of the file as a single message.
- II. File/Message Formats
- In these formats, a mailbox is a directory, and each the messages
- inside are separate files inside the directory. The file names of
- these files are generally the text form of a number, which also
- matches the UID of the message.
- In the case of mx, the mailbox name is the name of the directory
- in the filesystem, relative to the user's "mail home directory." In
- the case of news and mh, the mailbox name is in a separate namespace
- as described in the file naming.txt.
- A file/message format mailbox is always a directory. This means
- that it is possible to have a file/message format mailbox that has
- inferior mailbox names under it (so-called "dual-usage" mailboxes).
- For some inexplicable reason, some people want this.
- Note that the name "INBOX" has special semantics and rules, as
- described in the file naming.txt.
- The following file/message formats are supported by c-client as of
- the time of this writing:
- . mx This is an experimental format, and may be removed in a future
- release. An mx format mailbox has a .mxindex file which holds
- the message status and unique identifiers. Messages are
- stored in Internet standard CF LF form, so the file size of
- the message file equals the size of the message.
- mx is somewhat inefficient; the entire directory must be read
- and each file stat()'d. We found it intolerable for a
- moderate sized mailbox (2000 messages) and have more or less
- abandoned it.
- . mh This is supported for compatibility with the past. This is
- the format used by the old mh program.
- mh is very inefficient; the entire directory must be read
- and each file stat()'d, and in order to determine the size
- of a message, the entire file must be read and newline
- conversion performed.
- mh is deficient in that it does not support any permanent
- flags or keywords; and has no means to store UIDs (because
- the mh "compress" command renames all the files, that's
- why).
- . news This is an export of the local filesystem's news spool, e.g.
- /var/spool/news. Access to mailboxes in news format is read
- only; however, message "deleted" status is preserved in a
- .newsrc file in the user's home directory. There is no other
- status or keywords.
- news is very inefficient; the entire directory must be
- read and each file stat()'d, and in order to determine the
- size of a message, the entire file must be read and newline
- conversion performed.
- news is deficient in that it does not support permanent flags
- other than deleted; does not support keywords; and has no
- expunge.
- Soapbox on File/Message Formats
- If it sounds from the above descriptions that we're not putting
- too much effort into file/message formats, you are correct.
- There's a general reason why file/message formats are a bad idea.
- Just about every filesystem in existance serializes file creation and
- deletions because these manipulate the free space map. This turns out
- to be an enormous problem when you start creating/deleting more than a
- few messages per second; you spend all your time thrashing in the
- filesystem.
- It is also extremely slow to do a text search through a
- file/message format mailbox. All of those open()s and close()s really
- add up to major filesystem thrashing.
- What about Cyrus and Maildir?
- Both formats are vulnerable to the filesystem trashing outlined
- above.
- The Cyrus format used by CMU's Cyrus server (and Esys' server)
- has a special associated flat file in each directory that contains
- extensive data (including pre-parsed ENVELOPEs and BODYSTRUCTUREs)
- about the messages. Put another way, it's a (considerably) more
- featureful form of mx. It also uses certain operating system
- facilities (e.g. file/memory mapping) which are not available on older
- systems, at a cost of much more limited portability than c-client.
- These considerably ameliorate the fundamental problems with
- file/message formats; in fact, Cyrus is halfway to being a database.
- Rather than support Cyrus format in c-client, you should run Cyrus or
- Esys if you want that format.
- The Maildir format used by qmail has all of the performance
- disadvantages of mh noted above, with the additional problem that the
- files are renamed in order to change their status so you end up having
- to rescan the directory frequently the current names (particularly in
- a shared mailbox scenario). It doesn't scale, and it represents a
- support nightmare; it will therefore never be supported in the
- official distribution. Maildir support code for c-client is available
- from third parties; but, if you use it, it is entirely at your own
- risk (read: don't complain about how poorly it performs or bugs).
- So what does this all mean?
- A database (such as used by Exchange) is really a much better
- approach if you want to move away from flat files. mx and especially
- Cyrus take a tenative step in that direction; mx failed mostly because
- it didn't go anywhere near far enough. Cyrus goes much further, and
- scores remarkable benefits from doing so.
- However, a well-designed pure database without the overhead of
- separate files would do even better.