ARCHITECTURE.rtf
资源名称:mpeg2.rar [点击查看]
上传用户:ma_junhua
上传日期:2008-04-11
资源大小:2752k
文件大小:7k
源码类别:
其他嵌入式/单片机内容
开发平台:
C/C++
- This file contains a couple of (currently unstructured and incomplete) encoder
- implementation notes.
- Basic Assumptions
- - data structures
- Primary data structures are:
- - picture data arrays, containing either source or reconstructed pels
- - mbinfo array, containing all macroblock level side information
- - blocks array, containing DCT coefficients
- mbinfo and blocks together are completely equivalent to the VLC encoded
- picture_data() part of the MPEG stream, although in uncompressed and
- therefore rather voluminous internal format.
- a) picture data arrays
- Frames are represented internally as in the following declaration:
- unsigned char *frame[3];
- i.e., an array of three pointers to the picture data proper. frame[0]
- points to the luminance (Y) data, frame[1] to the first chrominance (Cb, U)
- component, frame[2] to the second chrominance component (Cr, V).
- Width and height of the luminance data array is 16*mb_width and 16*mb_height,
- even if horizontal_size and vertical_size are not divisible by 16. The
- actual pels are top left aligned and padded to the right and bottom (by
- pixel replication) while being read into the encoder.
- Width and height of the two chrominance arrays depends on the chroma_format.
- For 4:2:0 data, they are half size in both directions, for 4:2:2 data only
- width is half that of the luminance array, for 4:4:4 data, chrominance and
- luminance are of identical size.
- The dimensions are stored in the following global variables:
- | horizontal | vertical
- -------------+--------------+------------
- luminance | width | height
- chrominance | chrom_width | chrom_height
- Picture data (array of pels) is always stored as frames, even when a
- field picture sequence is to be generated. The two fields are stored
- in interleaved from, that is alternating lines from both fields. In
- most cases, however, it's easier to view the two fields as being stored
- side to side of each other as an array of double width and half height.
- The following picture demonstrates this:
- <----- width --------->
- T1 T1 T1 T1 T1 T1 T1 T1 <+
- B1 B1 B1 B1 B1 B1 B1 B1 |
- T2 T2 T2 T2 T2 T2 T2 T2 | height
- B2 B2 B2 B2 B2 B2 B2 B2 |
- T3 T3 T3 T3 T3 T3 T3 T3 |
- B3 B3 B3 B3 B3 B3 B3 B3 <+
- <------------- 2*width (width2) --------------->
- T1 T1 T1 T1 T1 T1 T1 T1 B1 B1 B1 B1 B1 B1 B1 B1 <+
- T2 T2 T2 T2 T2 T2 T2 T2 B2 B2 B2 B2 B2 B2 B2 B2 | height/2 (height2)
- T3 T3 T3 T3 T3 T3 T3 T3 B3 B3 B3 B3 B3 B3 B3 B3 <+
- Tn: top field pels
- Bn: bottom field pels
- These are of cause only two different two-dimensional interpretations of
- the same one-dimensional memory layout. Either the top or the bottom field
- can be the earlier field in time.
- The following table shows the relation between width, height, width2 and
- height2 for frame and field pictures:
- | frame | field
- --------+--------+---------
- width2 | width | 2*width
- height2 | height | height/2
- Using the convenience variables width2 and height2, many loops are
- independent of frame / field picture coding.
- b) mbinfo array
- This array contains the complete encoded picture (field or frame),
- except the DCT coefficients. This includes macroblock type, motion vectors,
- motion vector type, dct type, quantization parameter etc. The number
- of entries (size of the array) is identical to the number of macroblocks
- in the picture.
- c) blocks
- This array contains all DCT coefficients of the picture. It is declared as
- EXTERN short (*blocks)[64];
- i.e. a pointer to an array whose elements are blocks of 64 short integers
- each. The number of blocks is block_count times the number of macroblocks
- in the picture. block_count depends on the chroma format (6, 8 or 12 blocks
- per macroblock).
- The actual content depends on the encoding stage. Either prediction error
- (or intra block data), DCT transformed data, quantized DCT coefficients or
- inverse transformed data is stored in blocks. This is done to keep
- memory requirements reasonable.
- Encoding procedure
- The high-level structure of the bitstream is determined in putseq().
- This includes writing the appropriate headers and extensions and the
- decision which picture coding type to choose for each picture.
- Encoding of a picture is divided into the following steps:
- - motion estimation (motion_estimation())
- - calculate prediction (predict())
- - DCT type estimation (dct_type_estimation())
- - subtract prediction from picture and perform DCT (transform())
- - quantize DCT coefficients and generate VLC data (putpict())
- - inverse quantize DCT coefficients (iquant())
- - perform IDCT and add prediction (itransform())
- Each of these steps is performed for the complete picture before
- proceding to the next one. The intention is to keep these
- steps as independent from each other as possible. They communicate
- only via the above mentioned basic data structures. This should simplify
- experimenting with different coding models.
- Quantization and VLC generation could not be separated from each other,
- as quantization parameters and output buffer content usually form a
- closed loop on macroblock level.
- - Motion Estimation
- The procedures for motion estimation are in the file motion.c. The main
- function motion_estimation() loops through all macroblocks of the current
- picture calling frame_ME (for frame pictures) or field_ME (for field
- pictures) to calculate motion vectors for each macroblock.
- Motion estimation is currently done separately. More efficient schemes
- like telescopic motion estimation, which span several pictures (e.g.
- two I/P pictures and all intervening B pictures), are not yet implemented.
- Motion estimation splits into three steps:
- - calculation of optimum motion vectors for each of the possible motion
- compensation types (frame, field, 16x8, dual prime)
- - selection of the best motion compensation type by calculating and comparing
- a prediction error based cost function
- - selection of either motion compensation or any other possible encoding
- type (intra coding, No MC coding)
- Motion vectors are estimated by integer pel full search in a search window
- of user defined size. This full search is based on the original source
- reference pictures. Subsequently the cost functions for the 9 motion vectors
- with an offset of -0.5, 0, and 0.5 relative to the best integer pel vectors
- are evaluated (using the reconstructed reference picture) and the vector with
- smallest cost function is used as estimation.
- To speed up full search, the cost function calculation is aborted if the
- intermediate values exceed the cost function value of an earlier motion
- vector in the same full search. This is most efficient if vectors of
- potentially low cost are evaluated fist. Therefore full search is organized in
- an outward spiral.