HTTP Adaptive Streaming (HAS)

Concept

HAS Concept

HAS can adjust to different video qualities depending on the bandwidth of the network, in order to prevent the video from freezing. Furthermore, HTTP-based video streams are able to go over firewalls without difficulty and reuse already established HTTP servers, proxies and Content Delivery Network (CDN) nodes.

HAS divides a media file into fragments, conventionally of equal size in terms of playtime (e.g. 10 seconds). Each fragment is called a chunk or segment.

Every segment is independently encoded (or transcoded from a single master high-quality source) at several different bitrates, and the output representations are stored at a server from which clients fetch the segments. HAS is video encoding type agnostic. The chunks are all stored together with a description file called Media Presentation Description (MPD). MPD is an XML metadata file containing a description of the available chunks.

They are encoded in multiple video rates to satisfy the requirements of different devices and network conditions. Each segment can be decoded independently of other segments.

A HAS client initiates a new session by downloading a manifest file. This manifest file provides a description of the different available quality levels and segments. Based on the network conditions and the current buffer-filling level, the Rate Determination Algorithm (RDA) in the HAS client determines the quality for the next segment download. The objective of the RDA is to optimize the global Quality of Experience (QoE) determined by the occurrence of video freezes, the average quality level, and the frequency of quality changes. The main advantage of HAS over progressive download and traditional real-time streaming is its ability to adapt the video quality to the available bandwidth in order to avoid video freezes.

HAS-concept-2 The client’s Rate Determination Algorithm (RDA) selects the segment that is needed for its objective function. There are tens of RDA algorithms in the literature and are all vendor proprietary.

Quality of Experience (QoE)

QoE Quality of Experience

When viewing a video, a user wants to view the video as clearly and smoothly as possible. Initial playback latency, playback interruption frequency, and playback interruption duration are metrics of the video waiting times.

The initial playback latency due to video buffering degrades the QoE if the time is too long. The playback interruption frequency and duration are dominant in the QoE degradation. When the playback interruption frequently occurs and its duration is long, the user stops viewing the video due to the serious QoE degradation. Quality switching frequency, quality switching magnitude, and quality playback duration are metrics of the video adaptation. If the video quality frequently changes and degree of the changes is large, the user feels that the video has noise and artifacts. To improve the QoE, the high-quality video should be played for a long time. These metrics greatly affect the QoE, but the QoE degradation from frequent quality switching is not more severe than that from the playback interruption. Video quality is determined by a resolution, frame rate, and image quality.

HAS Implementation

Lets see how YouTube is achieving this conceptually.

youtube-multiple-resilutions YouTube streaming multiple video resolutions - the higher the resolution the higher the encoded bit rate

YouTube chooses how many bits are used to encode a particular resolution (within the limits that the codecs provide). A higher bitrate generally leads to better video quality for a given resolution but only up to a point. After that, a higher bitrate just makes the chunk bigger even though it doesn’t look better. When the encoding bitrate for a resolution is chosen, we select the sweet spot on the corresponding bitrate-quality curve

rate-quality-curves-1 Rate Distortion Curves

YouTube player is doing something better. The above sweet spots assume that viewers are not bandwidth limited but if we set our encoding bitrates based only on those sweet spots for best looking video, we see that in practice video quality is often constrained by viewers’ bandwidth limitations. Youtube employs an optimal packing algorithm that manages to pack a higher resolution to a lower available bandwidth.

youtube-packing By analyzing resolution statistics, and correspondingly altering the encoded bit rates for various resolutions, the algorithm allows a higher resolution chunk to be supported compared to previously.

streaming-bandwidth-estimation Create denser operating points around the highest probability that the player will switch to.

The reader is advised to read reference (1) to fully grasp the concept of adjusting the operating points to minimize the average streaming rate subject to a constraint on average quality above a quality threshold.

References

Encoding Bitrate Optimization Using Playback Statistics for HTTP-based Adaptive Video Streaming - This reference is a must read to understand what youtube is doing.
Streaming Video over HTTP with Consistent Quality