The main idea behind the scalable extension of H.264/AVC is to take the block based hybrid video
coding scheme one step further and achieve spatiotemporal and signal-to-noise-ratio (SNR)
scalability. The term scalability in the video coding context means that physically meaningful
video information can be recovered by decoding only a portion of the compressed bit stream.
For example, one should be able to recover from the compressed bit stream a video with lower
resolution than the original by decoding only the lowest spatial layer and discarding other
spatial layers. In SVC, scalability is achieved by taking advantage of the layered approach.
The structure of the encoding depends on which kind of scalability is needed. For example, Figure 1
depicts the block diagram of an SVC encoder with two spatial layers, which contain additional
SNR enhancement layers.
In each spatial layer hierarchical motion compensation and prediction is made. The redundancy
between adjacent pictures and layers is based on interand intraprediction techniques. After
motion compensated prediction, transform coding is applied using the same transformation
techniques as in the H.264/AVC standard. SNR and quality scalability is achieved by coding the
difference between transformed and not transformed slices using progressive coding. These
progressively coded slices can then be truncated at any position within each slice thus improving
the userperceived visual quality proportional to the number of bits included in the truncated
slice. Mean while, temporal scalability is achieved using hierarchical B pictures, which provide a
predictive structure already included in H.264/AVC. Motion compensated temporal filtering can also
be used but it is, for the time being, included as a non-normative option only for achieving
temporal scalability. An example of hierarchical coding structure for group of pictures (GOP)
which length is eight pictures is illustrated in Figure 2. All of these scalability modes can be
combined to achieve three dimensional (spatial, temporal and SNR) scalability.
Figure 1: Block diagram for the H.264 scalable extension
Figure 2: Hierarchical GOP structure
The dependency between layers in scalable video coding
Layers in scalable video coding are classified as a base layer and enhancement layer(s). In SVC, the
base layer can be decoded using a standard H.264/AVC decoder. Information from lower layers is used
to remove the redundancy between different layers. This increases coding efficiency but it also
increases the importance of the lowest layers during decoding process and reduce error resiliency.
If the base layer or one of the most important layers is lost, less important layers are useless
because decoding them requires redundant data from the most important layers. The dependency between
layers makes the prioritization of different layers during transmission suitable. The base layer
also usually needs less transmission bandwidth than the enhancement layers, which is also quite
important when allocating resources to different prioritization classes. Based on the SVC layer
prioritization suitability we propose a mechanism for adapting the video transmission to rapidly
changing wireless channel and network conditions. One of the main requirements for the architecture
is to be general enough to work with different access networks from IEEE 802.11 (WiFi) and
IEEE 802.16 (WiMAX) to 3GPP and UMTS.