Those that have spent any amount of time examining software will know that there’s a lot of stuff going on in there. The world of computer coding is full of specific variables and various methods. This is especially true In regards to live-streaming.
Every decision regarding protocols or formats can have long reaching effects. Without careful planning and research there’s a strong possibility that some unforeseen roadblock will plop right in the middle of your dev timeline.
Luckily, Red5 Pro has spent the past 14 years examining the various elements of live-streaming so you wouldn’t have to. Focusing on reducing latency and increasing scalability, we used those principles to guide our decisions as we designed and configured our platform.
One such decision we made was to not use the Common Media Application Format (CMAF). As a part of our always ongoing series of technical articles, this post seeks to explain that decision and the factors that disqualified CMAF ultimately leading to the implementation of WebRTC.
CMAF Origins: HTTP Live Streaming and HLS Redundancy
Originally, the HTTP based streaming protocols HLS and MPEG-DASH were (and continue to be) the most widely used methods to distribute live streaming media. Delivering an HTTP stream involves dividing the video into small chunks that reside on an HTTP server. In this way, the individual video chunks can be downloaded by a video player via TCP. This allows the video to traverse firewalls easily, where it can be delivered as it is watched, thus minimizing caching.
However, there is a drawback involving the encoding of audio and video data. HLS traditionally uses TS containers to hold the muxed a/v data, while DASH prefers the ISO-Base Media File Format holding demuxed tracks.
In order to connect with a variety of devices, content owners must package and store two sets of files, each holding exactly the same audio and video data. This creates an issue for Content Delivery Networks (CDNs) using their HTTP networks to deliver streaming content as it doubles storage costs and bottlenecks bandwidth throughput with redundant processing.
Creation of CMAF
The creation of the CMAF came out of efforts to solve this redundancy problem. Created in 2017 by a joint effort between Microsoft and Apple, CMAF is a standardized container designed to hold video, audio, or text data that is deployed using HTTP based streaming protocols: HLS or MPEG-DASH.
Bitmovin provides the following definition:
CMAF defines the encoding and packaging of segmented media objects for delivery and decoding on end-user devices in adaptive multimedia presentations. In particular, this is (i) storage, (ii) identification, and (iii) delivery of encoded media objects with various constraints on encoding and packaging. That means CMAF defines not only the segment format but also codecs and most importantly media profiles (i.e., for AVC, HEVC, AAC).
The advantage of CMAF is that media segments can be referenced simultaneously by HLS playlists and DASH manifests. This allows CDNs to store only one set of files which in turn doubles the cache hit rate making the process more efficient.
How Does It Reduce Latency?
In actuality, CMAF containers are not able to reduce latency by themselves. Instead, a low latency mode (known as Low Latency CMAF, or Chunked CMAF) must be configured.
If you can’t tell by now, this is the really technical part so feel free to skip ahead if so inclined.
Chunked CMAF divides each container into smaller sections. Listing them from biggest to smallest: CMAF Container > Segment > Fragment > Chunk. Each chunk is defined as the smallest referenceable unit that contains at least a moof and mdat atom. One or more chunks are combined together to form a fragment which in turn can be merged into a segment.
In order to transmit each CMAF Segment, a POST request containing a CMAF Header is sent to the ingest origin server. Immediately after each CMAF Chunk completes encoding and packaging, it is sent through HTTP 1.1 chunked transfer encoding. This means that each segment can be progressively delivered as each chunk is ready rather than waiting for the entire segment to load before it can be sent out.
As an example, if the encoder is producing 4-second segments with 30 frames per second then it would make a POST request to the origin every 4 seconds and each of the 120 frames would be sent using chunked transfer encoding.
Then, the chunks ingested into the origin are delivered over HTTP chunked transfer encoding to a CDN where Edge servers make them available to the players which will eventually display the media. In order to retrieve the segment, the player uses the manifest or playlist associated with a stream to establish a connection with the correct Edge and then it makes a GET request.
What Are the Results?
Now that we’ve covered how it works, let’s get into the important part: how does it actually perform.
Laboratory tests with CMAF have achieved end-to-end latencies as low as 600ms. However, results in the real world– away from the lab’s controlled environment– are less impressive. Current proof-of-concepts deployed over the open internet have shown a sustainable Quality of Experience (QoE) only when the end-to-end latency is around 3 seconds.
This increase is due in part to the fact that it requires many small chunks of 250 milliseconds. This would add to the number of HTTP calls happening (at least four per second), create higher bandwidth video, and increase server load; all of which add latency.
It should also be noted that CMAF creates problems for consistent stream quality– specifically in regards to adaptive bitrate– because of the way current algorithms estimate throughput. To dig deeper into the details of why that is, take a look at our whitepaper.
Is a 3-Second Latency Good Enough for Real-Time?
Definitely not. While CMAF shows improved latency over vanilla HLS and DASH, it is still not low enough to enable fully interactive live video experiences.
As we’ve mentioned before, you can only have real-time streaming if the latency is under 500 milliseconds. In today’s world of instant communication and information, every second (and partial second) counts. Drone surveillance, social media chatting, live auctions, broadcasting live events among many other use-cases, all require real-time latency.
That’s why Red5 Pro integrated with WebRTC. Our sub 500 milliseconds of latency is the only way to achieve true real-time live streaming. Importantly, Red5 Pro maintains that performance even when scaling to millions of broadcasters and subscribers.
For a more in-depth view of how Red5 Pro works along with all the live streaming protocols in general (WebRTC, WebSockets and MSE, HLS, and more), take a look at our whitepaper.
Looking for something else? Send an email to firstname.lastname@example.org or schedule a call.