README
上传用户:ykyjsl
上传日期:2022-01-30
资源大小:145k
文件大小:8k
- Word, Character, Integer, and Bit Based Compression Using Arithmetic Coding.
- -------------------------------------------------------------------
- Authors:
- John Carpinelli
- Alistair Moffat (alistair@cs.mu.oz.au)
- Radford Neal (radford@ai.toronto.edu)
- Wayne Salamonsen
- Lang Stuiver (langs@cs.mu.oz.au)
- Andrew Turpin (aht@cs.mu.oz.au)
- Ian Witten (ihw@waikato.ac.nz)
- Contact:
- Alistair Moffat
- Department of Computer Science
- University of Melbourne
- Victoria, Australia 3052.
- Fax: +61 3 93481184
- alistair@cs.mu.oz.au
- http://www.cs.mu.oz.au/~alistair/
- Based on:
- A. Moffat, R. Neal, I.H. Witten, "Arithmetic Coding Revisted", ACM
- Transactions on Information Systems, 16(3):256-294, 1998.
- http://www.cs.mu.oz.au/~alistair/abstracts/mnw98:acmtois.html
- A. Moffat, "An improved data structure for cummulative probability
- tables", Software-Practice and Experience, Software-Practice and
- Experience, 29(7):647-659, 1999.
- http://www.cs.mu.oz.au/~alistair/abstracts/mof99:spe.html
- See also:
- I.H. Witten, R. Neal, J.G. Cleary, "Arithmetic Coding for Data
- Compression", Comm ACM., 30(6):520-541, June 1987.
- ftp://ftp.cpsc.ucalgary.ca/pub/projects/ar.cod
- Radford M. Neal, Low-Precision Arithmetic Coding Implementation.
- ftp://ftp.cs.toronto.edu/pub/radford/lowp_ac.shar
- Thanks:
- To Mahesh Naik, Julian Seward, Michael Schindler, Sascha Kratky, Tim
- Wentford, and the many other people who have provided feedback on
- earlier versions of this software.
- Overview:
- This package contains C source code for an illustrative implementation
- of arithmetic coding. Four different models are included with the
- package: an adaptive zero-order word based model, an adaptive
- zero-order character based model, and adaptive zero-order (unsigned)
- integer based model, and an adaptive fixed-context bit-based model.
- The option of using shift/add operations to effect the necessary
- integer multiplication and division operations is also supported. Use
- of the shift/add option may result in speed improvement on some
- architectures.
- Each of the four programs can be thought of as consisting of three
- sections: model, statistics, and coder. Four different models have
- been provided, one statistics module, and one coder module (but with a
- compile time option to use hardware or software arithmetic).
- Models:
- The word model follows a strictly alternating sequence of reading a
- word then a non-word. After reading each word/non-word it uses a
- hashtable to find the corresponding ordinal word number which it then
- passes to the statistics module together with the corresponding
- word/non word context. A context stores the corresponding frequency for
- each ordinal symbol. This model illustrates the advantages of the new
- coder on large alphabets, and the manipulation of a number of
- ``contexts'' within a single program.
- The adaptive zero-order character-based model is broadly equivalent to
- the one presented in the CACM paper of Witten, Neal and Cleary.
- Compression is relatively poor.
- The adaptive zero-order integer-based model is a simple derivative of
- the character-based one, and reads uints from the input rather than
- bytes. Note that each integer must be in the range 1..r+1, where r is
- the largest integer seen to date. The first integer in the file must be
- 1. Think of these integers as being symbol identifiers assigned on the
- fly in order of first appearance.
- The bit-based model takes one bit at a time from the input stream and
- passes it and the corresponding context into the statistics module.
- The correct context is chosen based on the previous n bits, where 0 <=
- n <= 20 is specified as a commandline parameter. Relatively good
- compression results when n is large, but it runs slowly. The intention
- is to illustrate the use of the arithmetic coding routines when
- restricted to a binary alphabet.
- The zero-order unsigned-integer based model was used in experiments
- when testing the cummulative statistics data structure used in stats.c,
- and many other coding exeriments (see the WWW site for details).
- Statistics:
- The statistics module interfaces the model and the coder. It is
- responsible for probability estimation and generation of the parameters
- required by the coder. It does so by adaptively counting symbol
- occurrences, and uses a tree-based structure to maintain cumulative
- frequencies. Accessing the structure takes O(log s) time per symbol
- number s. Just as important, it is space-economical, and requires just
- n words of storage compared to the 3n words required by the CACM
- structure.
- Coder:
- The default coder uses shift/add integer arithmetic in its
- calculations, as described in "Arithmetic Coding Revisited". The
- compile time option "-DMULT_DIV" forces the use of multiplications and
- divisions. The low precision that is required allows, on some
- architectures, faster execution with shift/add.
- One notable feature about the coder is that by using a limited
- precision implementation, the total frequency can be up to 30 bits
- (assuming 32 bit integers, or more generally n-2 for n-bit integers).
- That is, the total frequency may reach a value as large as
- 1,073,741,824 before frequencies need be halved. The number of bits
- used for frequency counts can be set as a command line argument
- allowing one to use fewer bits for more precise coding. The default
- number of bits used for frequency counts is 27.
- The use of large values for the F_bits parameter also allows faster
- execution, since much of the arithmetic can be done to low precision.
- Very large values for F_bits do, however, introduce rounding effects in
- the coder that may negate the gain of using more accurate model
- probabilities. See the paper for analysis of these rounding effects.
- The package:
- The package consists of 22 files: 9 source files (.c), 5 header files (.h), 1
- include file (.i), a makefile, 4 man pages (.1), a WHATSNEW file, and this
- README file.
- arith_coder.c: Front end to the program.
- word.c: Driving code for the word based compression model.
- char.c: Driving code for the character based compression model.
- bits.c: Driving code for the bit based compression model.
- uint.c: Driving code for the unsigned integer based compression model.
- bitio.c: Input output functions (bit and byte level)
- stats.c: Functions for maintaining frequency counts.
- arith.c: Functions associated with arithmetic coding.
- hashtable.c: Functions associated with maintaining the hash table
- used to convert words to ordinal word numbers
- arith.h: Function prototypes exported from arith.c, #defines used in
- arith.c
- stats.h: Function prototypes exported from stats.c, #defines used in
- stats.c
- bitio.h: Input / output macros (bit and byte level)
- arith_coder.h: Function prototypes #defines used in arith_coder.c, char.c,
- bits.c, word.c.
- hashtable.h: Function prototypes exported from hashtable.c, #defines used
- in hashtable.c
- unroll.i: Generic include file to unroll loops / repeat code fragments.
- arith_coder.1: Manual page
- char.1, word.1, bits.1, uint.1: Pointers to arith_coder.1 man page.
- Usage:
- The programs share a common interface (and are in fact the same program).
- Output is written to stdout. If no input file is specified, input is taken
- from stdin. One of -e or -d must be specified, to encode or decode.
- For help on usage, type "arith_coder -h". For more detail refer to the man
- page.
- Here is a sample output of arith_coder -h:
- Usage: arith_coder [-e [-t s] [-f n] [-b n] [-m n] [-c n] | -d] [-v] [file]
- -e: Encode
- -t s Encoding method (char, word, bits, default = char)
- -b n Code bits to use (27, fixed at compile time)
- -f n Frequency bits to use (32, fixed at compile time)
- -m n Memory size to use (Mbytes) (1..255, default = 1) bits & word
- -c n Bits of context to use (0..20, default = 16) bits only
- -d: Decode
- -v: Verbose Give timing, compression, and memory usage information.
- Conditions:
- These programs are supplied free of charge for research purposes only, and
- may not sold or incorporated into any commercial product. There is
- ABSOLUTELY NO WARRANTY of any sort, nor any undertaking that they are fit for
- ANY PURPOSE WHATSOEVER. Use them at your own risk. If you do happen to find
- a bug, or have modifications to suggest, please report the same to Alistair
- Moffat at the address above.
- Have fun!
- Alistair Moffat,
- February 1999