mpeg2enc.doc
资源名称:mpeg2编解码.zip [点击查看]
上传用户:hkgotone
上传日期:2013-02-17
资源大小:293k
文件大小:24k
源码类别:
Windows Mobile
开发平台:
C/C++
- mpeg2encode
- ===========
- MPEG-2 Video Encoder, Version 1.1, June 1994
- MPEG Software Simulation Group
- (MPEG-L@netcom.com)
- Contents:
- =========
- 1. Overview
- 2. Features and Limitations
- 3. Usage
- 4. Interpreting the status information
- 5. Description of the Encoder Model
- 6. References
- 1. Overview
- ===========
- This is the second public release of our MPEG-2 encoder. It converts an
- ordered set of uncompressed input pictures into a compressed bitstream
- compliant with ISO/IEC 13818-2 DIS [1] (MPEG-2).
- This program will evolve to become: ISO/IEC 13818-5 Software Simulation
- of MPEG-2 Systems, Video, and Audio.
- 2. Features and Limitations
- ===========================
- 2.1 Features
- - generates constant bit rate streams
- - encoder model based on MPEG-2 test model 5 (TM5) rev. 2 [2]
- - progressive and interlaced video
- - also generates ISO/IEC 11172-2 (MPEG-1) sequences
- - input formats: separate YUV, combined YUV, PPM
- - trace and statistics output
- - verifies user parameter settings are legal within Profile and Level
- 2.2 Current limitations
- The encoder currently does not support
- - scalable extensions
- - MPEG-1 integer pel vectors (half-pel is typically 1 dB better anyway)
- and D frame sequences.
- - checking for maximum number of generated bits per macroblock
- - automatic 3:2 pulldown detection or irregular 3:2 pulldown signaling.
- - intra refresh slices (e.g. low delay)
- - concealment motion vectors
- - special low delay mode rate control
- - editing of encoded video
- - variable bit rate encoding
- - scene change rate control
- 3. Usage
- ========
- The execution template for the encoder is:
- mpeg2encode parameter_file output.m2v
- Coding parameters can be modified by editing the parameter_file. Since the
- parser expects the operating parameters to be on certain line numbers,
- kindly do not insert or delete lines from the file.
- We have provided a couple of sample parameter files in the par directory.
- It is recommended to use the one closest to your application as a starting
- point for customization.
- The first line of the parameter file is a comment which is inserted near
- the beginning of the MPEG bitstream as a user_data field, and can be
- used for arbitrary purposes.
- The remaining lines are described below:
- /* name of source frame files */
- A printf format string defining the name of the input files. It has to
- contain exactly one numerical descriptor (%d, %x etc.):
- Example: frame%02d
- Then the encoder looks for files: frame00, frame01, frame02
- The encoder adds an extension (.yuv, .ppm, etc.) which depends on the
- input file format. Input files have to be in frame format, containing
- two interleaved fields (for interlaced video).
- /* name of reconstructed frame files */
- This user parameter tells the encoder what name to give the reconstructed
- frames. These frames are identical to frame reconstructions of decoders
- following normative guidelines (except of course for differences caused by
- different IDCT implementation). Specifying a name starting with - (or just
- - by itself) disables output of reconstructed frames.
- The reconstructed frames are always stored in Y,U,V format (see below),
- independent of the input file format.
- /* name of intra quant matrix file ("-": default matrix) */
- Setting this to a value other than - specifies a file containing
- a custom intra quantization matrix to be used instead of the default
- matrix specified in ISO/IEC 13818-2 and 11172-2. This file has to contain
- 64 integer values (range 1...255) separated by white space (blank, tab,
- or newline), one corresponding to each of the 64 DCT coefficients. They
- are ordered line by line, i.e. v-u frequency matrix order (not by the
- zig-zag pattern used for transmission). The file intra.mat contains the
- default matrix as a starting point for customization. It is neither
- necessary or recommended to specify the default matrix explicitly.
- Large values correspond to coarse quantization and consequently more
- noise at that particular spatial frequency.
- For the intra quantization matrix, the first value in the file (DC value)
- is ignored. Use the parameter intra_dc_precision (see below) to define
- the quantization of the DC value.
- /* name of non intra quant matrix file ("-": default matrix) */
- This parameter field follows the same rules as described for the above
- intra quant matrix parameter, but specifies the file for the NON-INTRA
- coded (predicted / interpolated) blocks. In this case the first
- coefficient of the matrix is NOT ignored.
- The default matrix uses a constant value of 16 for all 64 coefficients.
- (a flat matrix is thought to statistically minimize mean square error).
- The file inter.mat contains an alternate matrix, used in the MPEG-2 test
- model.
- /* name of statistics file */
- Statistics output is stored into the specified file. - directs statistics
- output to stdout.
- /* input picture file format */
- A number defining the format of the source input frames.
- Code Format description
- ---- ------------------------------------------------------------------
- 0 separate files for luminance (.Y extension), and chrominance (.U, .V)
- all files are in headerless 8 bit per pixel format. .U and .V must
- correspond to the selected chroma_format (4:2:0, 4:2:2, 4:4:4, see
- below). Note that in this document, Cb = U, and Cr = V. This format
- is also used in the Stanford PVRG encoder.
- 1 similar to 0, but concatenated into one file (extension .yuv).
- This is the format used by the Berkeley MPEG-1 encoder.
- 2 PPM, Portable PixMap, only the raw format (P6) is supported.
- /* number of frames */
- This defines the length of the sequence in integer units of frames.
- /* number of first frame */
- Usually 0 or 1, but any other (positive) value is valid.
- /* timecode of first frame */
- This line is used to set the timecode encoded into the first 'Group of
- Pictures' header. The format is based on the SMPTE style:
- hh:mm:ss:ff (hh=hour, mm=minute, ss=second, ff=frame (0..picture_rate-1)
- /* N (# of frames in GOP) */
- This defines the distance between I frames (and 'Group of Pictures'
- headers). Common values are 15 for 30 Hz video and 12 for 25 Hz video.
- /* M (I/P frame distance) */
- Distance between consecutive I or P frames. Usually set to 3.
- N has to be a multiple of M. M = 1 means no B frames in the sequence.
- (in a future edition of this program, M=0 will mean only I frames).
- /* ISO/IEC 11172-2 stream */
- Set to 1 if you want to generate an MPEG-1 sequence. In this case some
- of the subsequent MPEG-2 specific values are ignored.
- /* picture format */
- 0 selects frame picture coding, in which both fields of a frame are coded
- simultaneously, 1 select field picture coding, where fields are coded
- separately. The latter is permitted for interlaced video only.
- /* horizontal_size */
- Pixel width of the frames. It does not need to be a multiple of 16.
- You have to provide a correct value even for PPM files (the PPM
- file header is currently ignored).
- /* vertical_size */
- Pixel height of the frames. It does not need to be a multiple of 16.
- You have to provide a correct value even for PPM files (the PPM file
- header is currently ignored).
- /* aspect_ratio_information */
- Defines the display aspect ratio. Legal values are:
- Code Meaning
- ---- --------------
- 1 square pels
- 2 4:3 display
- 3 16:9 display
- 4 2.21:1 display
- MPEG-1 uses a different coding of aspect ratios. In this cases codes
- 1 to 14 are valid.
- /* frame_rate_code */
- Defines the frame rate (for interlaced sequences: field rate is twice
- the frame rate). Legal values are:
- Code Frames/sec Meaning
- ---- ---------- -----------------------------------------------
- 1 24000/1001 23.976 fps -- NTSC encapsulated film rate
- 2 24 Standard international cinema film rate
- 3 25 PAL (625/50) video frame rate
- 4 30000/1001 29.97 -- NTSC video frame rate
- 5 30 NTSC drop-frame (525/60) video frame rate
- 6 50 double frame rate/progressive PAL
- 7 60000/1001 double frame rate NTSC
- 8 60 double frame rate drop-frame NTSC
- /* bit_rate */
- A positive floating point value specifying the target bitrate.
- In units of bits/sec.
- /* vbv_buffer_size (in multiples 16 kbit) */
- Specifies, according to the Video Buffering Verifier decoder model,
- the size of the bitstream input buffer required in downstream
- decoders in order for the sequence to be decoded without underflows or
- or overflows. You probably will wish to leave this value at 112 for
- MPEG-2 Main Profile at Main Level, and 20 for Constrained Parameters
- Bitstreams MPEG-1.
- /* low_delay */
- When set to 1, this flag specifies whether encoder operates in low delay
- mode. Essentially, no B pictures are coded and a different rate control
- strategy is adopted which allows picture skipping and VBV underflows.
- This feature has not yet been implemented. Please leave at zero for now.
- /* constrained_parameters_flag */
- Always 0 for MPEG-2. You may set this to 1 if you encode an MPEG-1
- sequence which meets the parameter limits defined in ISO/IEC 11172-2
- for constrained parameter bitstreams:
- horizontal_size <= 768
- vertical_size <= 576
- picture_area <= 396 macroblocks
- pixel_rate <= 396x25 macroblocks per second
- vbv_buffer_size <= 20x16384 bit
- bitrate <= 1856000 bits/second
- motion vector range <= -64...63.5
- /* Profile ID */
- Specifies the subset of the MPEG-2 syntax required for decoding the
- sequence. All MPEG-2 sequences generated by the current version of
- the encoder are either Main Profile or Simple Profile sequences.
- Code Meaning Typical use
- ---- -------------------------- ------------------------
- 1 High Profile production equipment requiring 4:2:2
- 2 Spatially Scalable Profile Simulcasting
- 3 SNR Scalable Profile Simulcasting
- 4 Main Profile 95 % of TVs, VCRs, cable applications
- 5 Simple Profile Low cost memory, e.g. no B pictures
- /* Level ID */
- Specifies coded parameter constraints, such as bitrate, sample rate, and
- maximum allowed motion vector range.
- Code Meaning Typical use
- ---- --------------- -----------------------------------------------
- 4 High Level HDTV production rates: e.g. 1920 x 1080 x 30 Hz
- 6 High 1440 Level HDTV consumer rates: e.g. 1440 x 960 x 30 Hz
- 8 Main Level CCIR 601 rates: e.g. 720 x 480 x 30 Hz
- 10 Low Level SIF video rate: e.g. 352 x 240 x 30 Hz
- /* progressive_sequence */
- 0 in the case of a sequences containing interlaced video (e.g.
- video camera source), 1 for progressive video (e.g. film source).
- /* chroma_format */
- Specifies the resolution of chrominance data
- Code Meaning
- ---- ------- ----------------------------------------
- 1 4:2:0 half resolution in both dimensions (most common format)
- 2 4:2:2 half resolution in horizontal direction (High Profile only)
- 3 4:4:4 full resolution (not allowed in any currently defined profile)
- /* video_format: 0=comp., 1=PAL, 2=NTSC, 3=SECAM, 4=MAC, 5=unspec. */
- /* color_primaries */
- Specifies the x, y chromaticity coordinates of the source primaries.
- Code Meaning
- ---- -------
- 1 ITU-R Rec. 709 (1990)
- 2 unspecified
- 4 ITU-R Rec. 624-4 System M
- 5 ITU-R Rec. 624-4 System B, G
- 6 SMPTE 170M
- 7 SMPTE 240M (1987)
- /* transfer_characteristics */
- Specifies the opto-electronic transfer characteristic of the source picture.
- Code Meaning
- ---- -------
- 1 ITU-R Rec. 709 (1990)
- 2 unspecified
- 4 ITU-R Rec. 624-4 System M
- 5 ITU-R Rec. 624-4 System B, G
- 6 SMPTE 170M
- 7 SMPTE 240M (1987)
- 8 linear transfer characteristics
- /* matrix_coefficients */
- Specifies the matrix coefficients used in deriving luminance and chrominance
- signals from the green, blue, and red primaries.
- Code Meaning
- ---- -------
- 1 ITU-R Rec. 709 (1990)
- 2 unspecified
- 4 FCC
- 5 ITU-R Rec. 624-4 System B, G
- 6 SMPTE 170M
- 7 SMPTE 240M (1987)
- /* display_horizontal_size */
- /* display_vertical_size */
- Display_horizontal_size and display_vertical_size specify the "intended
- display's" active region (which may be smaller or larger than the
- encoded frame size).
- /* intra_dc_precision */
- Specifies the effective precision of the DC coefficient in MPEG-2
- intra coded macroblocks. 10-bits usually achieves quality saturation.
- Code Meaning
- ---- -----------------
- 0 8 bit
- 1 9 bit
- 2 10 bit
- 3 11 bit
- /* top_field_first */
- Specifies which of the two fields of an interlaced frame comes earlier.
- The top field corresponds to what is often called the "odd field," and
- the bottom field is also sometimes called the "even field."
- Code Meaning
- ---- -----------------
- 0 bottom field first
- 1 top field first
- /* frame_pred_frame_dct (I P B) */
- Setting this parameter to 1 restricts motion compensation to frame
- prediction and DCT to frame DCT. You have to specify this separately for
- I, P and B picture types.
- /* concealment_motion_vectors (I P B) */
- Setting these three flags informs encoder whether or not to generate
- concealment motion vectors for intra coded macroblocks in the
- three respective coded picture types. This feature is mostly useful
- in Intra-coded pictures, but may also be used in low-delay applications
- (which attempts to exclusively use P pictures for video signal refresh,
- saving the time it takes to download a coded Intra picture across a
- channel). concealment_motion_vectors in B pictures are rather pointless
- since there is no error propagation from B pictures. This feature is
- currently not implemented. Please leave values at zero.
- /* q_scale_type (I P B) */
- These flag sets linear (0) or non-linear (1) quantization scale type
- for the three respective picture types.
- /* intra_vlc_format (I P B) */
- Selects one of the two variable length coding tables for intra coded blocks.
- Table 1 is considered to be statistically optimized for Intra coded
- pictures coded within the sweet spot range (e.g. 0.3 to 0.6 bit/pixel)
- of MPEG-2.
- Code Meaning
- ---- -----------------
- 0 table 0 (= MPEG-1)
- 1 table 1
- /* alternate_scan (I P B) */
- Selects one of two entropy scanning patterns defining the order in
- which quantized DCT coefficients are run-length coded. The alternate
- scanning pattern is considered to be better suited for interlaced video
- where the encoder does not employ sophisticated forward quantization
- (as is the case in our current encoder).
- Code Meaning
- ---- -----------------
- 0 Zig-Zag scan (= MPEG-1)
- 1 Alternate scan
- /* repeat_first_field */
- If set to one, the first field of a frame is repeated after the second by
- the display process. The exact function depends on progressive_sequence
- and top_field_first. repeat_first_field is mainly intended to serve as
- a signal for the Decoder's Display Process to perform 3:2 pulldown.
- /* progressive_frame */
- Specifies whether the frames are interlaced (0) or progressive (1).
- MPEG-2 permits mixing of interlaced and progressive video. The encoder
- currently only supports either interlaced or progressive video.
- progressive_frame is therefore constant for all frames and usually
- set identical to progressive_sequence.
- /* intra_slice refresh picture period (P factor) */
- This value indicates the number of successive P pictures in which
- all slices (macroblock rows in our encoder model) are refreshed
- with intra coded macroblocks. This feature assists low delay mode
- coding. It is currently not implemented.
- /* rate control: r (reaction parameter) */
- /* rate control: avg_act (initial average activity) */
- /* rate control: Xi (initial I frame global complexity measure) */
- /* rate control: Xp (initial P frame global complexity measure) */
- /* rate control: Xb (initial B frame global complexity measure) */
- /* rate control: d0i (initial I frame virtual buffer fullness) */
- /* rate control: d0p (initial P frame virtual buffer fullness) */
- /* rate control: d0b (initial B frame virtual buffer fullness) */
- These parameters modify the behavior of the rate control scheme. Usually
- set them to 0, in which case default values are computed by the encoder.
- /* P: forw_hor_f_code forw_vert_f_code search_width/height */
- /* B1: forw_hor_f_code forw_vert_f_code search_width/height */
- /* B1: back_hor_f_code back_vert_f_code search_width/height */
- /* B2: forw_hor_f_code forw_vert_f_code search_width/height */
- /* B2: back_hor_f_code back_vert_f_code search_width/height */
- This set of parameters specifies the maximum length of the motion
- vectors. If this length is set smaller than the actual movement
- of objects in the picture, motion compensation becomes ineffective
- and picture quality drops. If it is set too large, an excessive
- number of bits is allocated for motion vector transmission, indirectly
- reducing picture quality, too.
- All f_code values have to be in the range 1 to 9 (1 to 7 for MPEG-1),
- which translate into maximum motion vector lengths as follows:
- code range (inclusive) max search width/height
- ================================================
- 1 -8 ... +7.5 7
- 2 -16 ... +15.5 15
- 3 -32 ... +31.5 31
- 4 -64 ... +63.5 63
- 5 -128 ... +127.5 127
- 6 -256 ... +255.5 255
- 7 -512 ... +511.5 511
- 8 -1024 ... +1023.5 1023
- 9 -2048 ... +2047.5 2047
- f_code is specified individually for each picture type (P,Bn), direction
- (forward prediction, backward prediction) and component (horizontal,
- vertical). Bn is the n'th B frame surrounded by I or P frames
- (e.g.: I B1 B2 B3 P B1 B2 B3 P ...).
- For MPEG-1 sequences, horizontal and vertical f_code have to be
- identical and the range is restricted to 1...7.
- P frame values have to be specified if N (N = # of frames in GOP) is
- greater than 1 (otherwise the sequences contains only I frames).
- M - 1 (M = distance between I/P frames) sets (two lines each) of values
- have to specified for B frames. The first line of each set defines
- values for forward prediction (i.e. from a past frame), the second
- line those for backward prediction (from a future frame).
- search_width and search_height set the (half) width of the window used
- for motion estimation. The encoder currently employs exhaustive
- integer vector block matching. Execution time for this algorithm depends
- on the product of search_width and search_height and, too a large extent,
- determines the speed of the encoder. Therefore these values have to be
- chosen carefully.
- Here is an example of how to set these values, assuming a maximum
- motion of 10 pels per frame in horizontal and 5 pels per frame in
- vertical direction and M=3 (I B1 B2 P):
- search width / height:
- forward hor. vert. backward hor. vert.
- I -> B1 10 5 B1 <- P 20 10
- I -> B2 20 10 B2 <- P 10 5
- I -> P 30 15
- f_code values are then selected as the smallest ones resulting in a range
- larger than the search widths / heights:
- 3 2 30 15 /* P: forw_hor_f_code forw_vert_f_code search_width/height */
- 2 1 10 5 /* B1: forw_hor_f_code forw_vert_f_code search_width/height */
- 3 2 20 10 /* B1: back_hor_f_code back_vert_f_code search_width/height */
- 3 2 20 10 /* B2: forw_hor_f_code forw_vert_f_code search_width/height */
- 2 1 10 5 /* B2: back_hor_f_code back_vert_f_code search_width/height */
- 4. Interpreting the status information
- ======================================
- 4.1 Description of the macroblock_type map
- The status information routine prints a two-character code for
- each macroblock in the picture currently being coded:
- First character:
- S: skipped
- I: intra coded
- 0: forward prediction without motion compensation
- F: forward frame/16x8 prediction (in frame/field pictures resp.)
- f: forward field prediction
- p: dual prime prediction
- B: backward frame/16x8 prediction
- b: backward field prediction
- D: frame/16x8 interpolation
- d: field interpolation
- Second character:
- space: coded, no quantization change
- Q: coded, quantization change
- N: not coded (only predicted)
- 4.2 Description of the mquant map
- The mquant map displays the dynamically changing quantization
- parameter as determined by the rate control algorithm. Larger
- values correspond to coarser quantization. Not displayed values
- indicate same quantization as in the previous macroblock.
- Note that for for MPEG-1 sequences the displayed values are twice
- as large as the quant_scale parameter defined in ISO 11172-2.
- 5. Encoder model
- ================
- The encoder model describes the "algorithm" or "intelligence" of the
- encoder.
- 5.1 Motion estimation
- Exhaustive search pattern with L1 (Minimum Absolute Difference) matching
- criteria on the original (non-coded) reference frame. Performed only
- on luminance samples. Half pel search is done on a neighborhood of 8
- bi-linearly interpolated luminance samples from the reconstructed
- reference frame.
- 5.2 Rate control and adaptive quantization
- The encoder adheres to a single-pass coding philosophy (no a priori
- measurements guide the allocation of bits at the global layers).
- Target bit allocation for the current picture being encoded is based
- on global bit budget for the Group of Pictures (dependently coded
- sequence of pictures), and a ratio of weighted relative coding
- complexities of the three picture types. Coding complexity (X) is
- estimated as a picture's product of: average macroblock quantization
- step size (Q) and the number of bits (S) generated by coding the picture
- (e.g. X = Q*S).
- Local bit allocation is based on two measurements:
- 1. deviancy from estimated buffer fullness for the Nth macroblock
- 2. normalized spatial activity
- The picture is approximated and estimated to have a uniform distribution
- of bits. If the local trend of generated bits begin to stray from this
- estimation, a compensation factor emerges to modulate the macroblock
- quantization scale (mquant). The compensation factor is the difference
- between where the buffer fullness was predicted at the Nth macroblock
- and where the buffer fullness truly is)
- Local spatial activity is estimated as the minimum variance of the
- four luminance 8x8 frame blocks and the corresponding four luminance 8x8
- field blocks of the original picture being coded. The variance measure
- is then normalized against the average variance of the most recently coded
- picture. The scaled product of the local activity measure and the buffer
- compensation factor are combined to modify the final mquant value which
- will be used in the quantization stage of the subsequent block coding
- processes.
- 5.3 Decision
- The macroblock decision process selects the various prediction
- switches (yes/no coding of prediction error, yes/no motion
- compensation) which provide the best quality for the relative coding
- bit cost. The decision process follows the piecewise path of least
- distortion, approximating that the decision stages (whether to use
- forwards, backwards compensation or none at all..., etc.) are mostly
- uncorrelated (i.e. can be independently decided).
- 5.4 Macroblock coding
- The forward DCT is based on the fast Chen-Wang 1984 algorithm.
- Quantization is a mirror of the normative inverse quantizer, as
- specified in the MPEG-2 Test Model. No special dithering or
- entropy-constrained methods are currently employed.
- 6. References
- =============
- [1] ISO/IEC 13818 Draft International Standard: Generic Coding of
- Moving Pictures and Associated Audio, Part 2: video
- [2] Test Model 5, ISO/IEC JTC1/SC29/WG11/ N0400, MPEG93/457, April 1993.