Python笔记 ＋

# H.264相关概念和技术细节

• frame number，与 decoding order 并不对应，由 POC （Picture Order Count）定义。
• reference pictures，就是 previously coded picture，被组织在两个list里面，list 0（P）和list 1（B）。
• macroblock（MB），16x16（luma sample）大小的块，最小处理单位。
• macroblocks are arranged in slices， which are set of macroblocks in raster order.
• I slice，only contain I macroblock type, which is predicted using intra prediction from decoded sample in the current slice.
• P slice，may contain P and I macroblock types. （还有skipped macroblock，）
• B slice，may contain B and I macroblock types.
• P macroblock, predicted using inter prediction from one reference picture in list 0.
• B macroblock, predicted using inter prediction from one or two reference pictures, one from list 0, another from list 1.
• macroblock partition for inter coded macroblock. 可以分成16x16, 16x8, 8x16, 8x8大小，如果分成了4个8x8，还可以继续分，8x8, 8x4, 4x8, 4x4（在中间横向或纵向画线），叫 sub-macroblock partition。MB是选择prediction的最小单位，macroblock partition和sub-macroblock partition，要使用相同的reference picture。

Performance limits for codecs are defined by a set of Levels, each placing limits on parameters such as sample processing rate, picture size, coded bitrate and memory requirements.

• VCL: Video Coding Layer，
• NAL: Network Abstraction layer

slice最少只有1个macroblock，最多可以包含整个picture的所有marcoblock（1 slice per picture）。一个picture中的所有slice各自包含的macroblock数量不需要相同。slice之间 minimal inter-dependency，这是为了限制error的影响范围。

DPB: Decoded Picture Buffer

IDR：Instantaneous Decoder Refresh，made up of I- or SI- slices， used to clear up the contents of refernece picture buffer. 当decoder收到IDR picture时，decoder marks all pictures in the reference buffer as unused for reference. The first picture is always an IDR picture.

ASO：Arbitrary Slice Order. Slices in a coded picture may follow any decoding order. 只要某个slice的第一个MB的序号，比前面的slice的第一个MB序号要小，就是这个feature了。

slice group， 没看懂...

Tree Structured Motion Compensation

chroma部分等比例缩小操作，包括motion vector。

six tap Finite Impulse Response (FIR) filter withweights (1/32, -5/32, 5/8, 5/8, -5/32, 1/32)。这是在做 interpolation 的时候用的方法， interpolation是为了做sub-sample motion compensation，还是 half-pel sample。做quarter-pel sample时，就用 linear interpolation。（此时产生的 motion vector 是 float number）

Motion Vector Prediction

Intra Prediction

4x4 luma block has 9 optional modes, 16x16 has 4 modes, and 4 modes for chroma components.

Mode 3 Plane: A linear plane function is fitted to the upper and left-hand samples H and V. This works well in areas of smoothly-varying luminance.

mode 0： DC
mode 1： horizontal
mode 2： vertical
mode 3： plane

Deblocking Filter

A filter is applied to each decoded macroblock to reduce blocking distortion. The deblocking filter is applied just after the inverse transform. The filter smooths block edges, improving the appearance of decoded frames.

When QP is larger, blocking distortion is likely to be more significant.

H.264有3种transform：

• Hadamard transform for the 4x4 array of luma DC coefficients in intra MB predicted in 16x16 mode,
• Hadamard transform for the 2x2 array of chroma DC coefficients,
• a DCT-based transform for all other 4x4 blocks in residual data.

transformed data 的传输顺序：

H.264的transform基于DCT，但是也有很大的不同！

DCT的计算如下：

$$Y=AXA^T= \begin{bmatrix} a & a & a & a \\ b & c & -c-& -b \\ a & -a & -a & a \\ c & -b & b & -c \end{bmatrix} [X] \begin{bmatrix} a & b & a & c \\ a & c & -a & -b \\ a & -c & -a & b \\ a & -b & a & -c \end{bmatrix}$$

\begin{align} Y &= (CXC^T)*E \\ &= \left( \begin{bmatrix} 1 & 1 & 1 & 1 \\ 1 & d & -d & -1 \\ 1 & -1 & -1 & 1 \\ d & -1 & 1 & -d \end{bmatrix} [X] \begin{bmatrix} 1 & 1 & 1 & d \\ 1 & d & -1 & -1 \\ 1 & -d & -1 & 1 \\ 1 & -1 & 1 & -d \end{bmatrix} \right) * \begin{bmatrix} a^2 & ab & a^2 & ab \\ ab & b^2 & ab & b^2 \\ a^2 & ab & a^2 & ab \\ ab & b^2 & ab & b^2 \end{bmatrix} \end{align}

$$(CXC^T)$$ is a core 2D transform. The constant a and b are as before and $$d=\frac{c}{b}=0.414$$

\begin{align} Y &= (C_fXC_f^T)*E_f \\ &= \left( \begin{bmatrix} 1 & 1 & 1 & 1 \\ 2 & 1 & -1 & -2 \\ 1 & -1 & -1 & 1 \\ 1 & -2 & 2 & -1 \end{bmatrix} [X] \begin{bmatrix} 1 & 2 & 1 & 1 \\ 1 & 1 & -1 & -2 \\ 1 & -1 & -1 & 2 \\ 1 & -2 & 1 & -1 \end{bmatrix} \right) * \begin{bmatrix} a^2 & \frac{ab}{2} & a^2 & \frac{ab}{2} \\ \frac{ab}{2} & \frac{b^2}{4} & \frac{ab}{2} & \frac{b^2}{4} \\ a^2 & \frac{ab}{2} & a^2 & \frac{ab}{2} \\ \frac{ab}{2} & \frac{b^2}{4} & \frac{ab}{2} & \frac{b^2}{4} \end{bmatrix} \end{align}

• transform计算过程全是integer（16 bits），在decoding的时候，没有精度损失，zero mismatch；
• transform的核心计算，可以只有加法和移位；
• 最后处理E矩阵，这一步操作并入quantization，减少做乘法的次数；

inverse transform的计算公式：

\begin{align} Y &= C_i^T(X*E_i)C_i \\ &= \begin{bmatrix} 1 & 1 & 1 & \frac{1}{2} \\ 1 & \frac{1}{2} & -1 & -1 \\ 1 & -\frac{1}{2} & -1 & 1 \\ 1 & -1 & 1 & -\frac{1}{2} \end{bmatrix} \left([X]* \begin{bmatrix} a^2 & ab & a^2 & ab \\ ab & b^2 & ab & b^2 \\ a^2 & ab & a^2 & ab \\ ab & b^2 & ab & b^2 \end{bmatrix}\right) \begin{bmatrix} 1 & 1 & 1 & 1 \\ 1 & \frac{1}{2} & -\frac{1}{2} & -1 \\ 1 & -1 & -1 & 1 \\ \frac{1}{2} & -1 & 1 & -\frac{1}{2} \end{bmatrix} \end{align}

B slices，reference picture可以一个来自past，一个来自future，也可以两个都来自past，或两个都来自future。B slices使用 list0和list1，这两个list含有short term和long term，这两个list也都可以含有past和future。

Prediction Options:

• Direct: skipped MB in B slices
• List0
• List1
• Bi-prediction: 分别从list0和list1中得到两个参考区域，对这两个区域取平均：$$pred(i,j)=(pred0(i,j)+pred1(i,j)+1)>>1$$

weighted prediction

echo prediction sample pred0(i,j) or pred1(i,j) is scaled by a weighting factor w0 or w1 prior to motion-compensated prediction. In the explicit types, the weighting factors are determined by the encoder and transmitted in the slice header. If implicit prediction is used, w0 and w1 are calculated based on the relative temporal position of the list0 and list1 reference picture. 这个距离越近，w越大。应用场景：fade transition where one scene fades into another.

EXTENDED PROFILE，很适合video stream，stream的特点是switching，在不同的stream之间切换（内容不同，或者内容相同但bitrate不同）。

SP-slice用来在相同内容的不同bitrate的stream中切换使用：

AB2的关键，参考A1，生成B2，有了B2，就可以继续decode B3.......

SP-slice的另一个作用是提供fast forward的功能，

-- EOF --

### 留言区

《H.264相关概念和技术细节》有7条留言

• 麦新杰

[B frame]B picture or B frame (bipredictive coded picture) – contains motion-compensated difference information relative to previously decoded pictures. In older designs such as MPEG-1 and H.262/MPEG-2, each B picture can only reference two pictures, the one which precedes the B picture in display order and the one which follows, and all referenced pictures must be I or P pictures. These constraints do not apply in newer standards H.264/MPEG-4 AVC and HEVC. [回复]

• 麦新杰

[P frame]P picture or P frame (predictive coded picture) – contains motion-compensated difference information relative to previously decoded pictures. In older designs such as MPEG-1, H.262/MPEG-2 and H.263, each P picture can only reference one picture, and that picture must precede the P picture in display order as well as in decoding order and must be an I or P picture. These constraints do not apply in the newer standards H.264/MPEG-4 AVC and HEVC. [回复]

• 麦新杰

关于P和B slice：P Slice里所有帧间预测的预测块只能有一个运动补偿预测信息。P Slice只能有一个参考图像列表。B Slice里所有帧间预测的预测块最多可以有两个运动补偿预测信息，B Slice可以使用两个参考图像列表。reference picture list里面可以存放多张图片哦。 [回复]

• 麦新杰

P-frames provide the “differences” between the current frame and one (or more) frames that came before it. P-frames offer much better compression than I-frames, because they take advantage of both temporal and spatial compression and use less bits within a video stream. [回复]

• 麦新杰

H.264/AVC uses a 6-tap filter for half-pixel interpolation and then simple linear interpolation to achieve quarter-pixel precision from the half-pixel data. [回复]

• 麦新杰

从书中的说明来分析，决定如何划分一个MB，还是要分析residual data，变化不大，partition就大，变化不小，partition就小。 [回复]

• 麦新杰

The terms‘sub-pixel’,‘half-pixel’and‘quarter-pixel’are widely used in this context although in fact the processis usually applied to luma and chroma samples, not pixels. [回复]