Live streaming is complicated. The entire process of broadcasting a stream and transporting it over the internet involves a series of methods with a variety of formats that can be employed. An important component is the codec used for the encoding and decoding of the media file. The codec also defines the types of tools that can be used to conduct the streaming.
To greatly simplify the process; in order for a video to be streamed over the internet, it must first capture the audio and video using a microphone and camera. Then that raw data must be compressed (encoded) into a codec, broadcast over an internet connection (using a transport protocol), sent to some kind of server-side solution (typically a CDN or a cloud-based cluster like Red5 Pro), and subsequently decompressed (decoded) for the subscriber to finally watch the video.
There are quite a few codecs currently in use today including, VP8/9, h.264 (AVC), h.265 (HEVC), and AV1. In previous posts, we covered AV1, and HEVC so this post will focus mainly on VP8/VP9. We are lumping VP8 in with VP9 as they are similar in regards to licensing, and VP9 is the evolution of VP8.
Although this post mostly focuses on VP9 versus h.265, the overarching concern is what is the best codec to use. Ultimately, we will present the case for why h.264 is currently a more effective choice for low latency live streaming.
What is VP9?
VP9 codec is a royalty-free, open-source video coding standard developed by Google. Designed as the successor to VP8, it was originally used to compress the ultra HD content on YouTube as it improves upon the coding efficiency of its predecessor. The Original VPX codecs came from On2 Technologies which was acquired by Google in 2010. Google subsequently open sourced the codec.
What is h.265?
The h.265 codec, or High Efficiency Video Coding (HEVC), was developed through a joint effort by the Video Coding Experts Group (VCEG) and the Moving Picture Experts Group (MPEG). It was approved as the official successor to h.264, also known as Advanced Video Coding (AVC) in April 2013. It improves upon the compression efficiency of h.264 reducing the size of the video by around 50%.
What is h.264?
The h.264, or AVC as explained above is currently the most widely adopted video codec. It is used by 91% of video industry developers as of September 2019. Like h.265, h.264 was also developed by the Moving Picture Experts Group (MPEG) as an improvement over previous standards with an aim to deliver efficiently compressed high quality video over the internet.
H.264 is protected by many patents and licensed by the MPEG-LA organization. However, a widely used free open source encoder and decoder called openH264 was made available for the general public by Cisco Systems in 2013. In other words Cisco paid for the patent licenses for all of us to use. This in turn created wide adoption of the h.264 codec, and implementations of openH264 showed up in all the web browsers.
For a very detailed overview of H.264 take a look at this post by VideoProc.
Now that we have introduced the codecs, let’s examine how they compare to each other. We’ve put together a list of 6 key factors to evaluate each codec.
There isn’t much difference between VP9 and h.265 in this category. The video tends to look good with either codec. However, h.265 slightly outperforms VP9, and vice versa when the bit rates are high.
In order to judge the image quality, we can use the SSIM (Structural Standard Index Measurement) metric as displayed below. When broadcasting a stream over the internet the process of compressing and expanding (encoding and decoding) the visual data contained in the stream can result in slight distortions as the decoder extrapolates the data to display it. Thus SSIM essentially measures how accurate the transported image is after being encoded and decoded.
As compared to h.264 however, there is a little bit more of a difference.
Part of the way that VP9 and h.265 are able to increase compression is through the use of larger macroblocks. A macroblock is a processing unit of an image or video that contains the pixels of the image to be displayed. h.264 uses 16 x 16 macroblocks while VP9 and h.265 use 64x64 blocks. Those macroblocks undergo a computation series called “intra-prediction directions” to rebuild these macroblocks into the same original image, only with slightly less detail in non crucial areas. This enables VP9 and h.265 to increase efficiency as less detailed areas such as the sky or blurred background of the image are not broken up into smaller units. The detail lost in these areas does not substantially decrease the overall quality of the image as the important sections are rendered in more detail. It should also be noted that as you increase the bitrate, the difference in quality between AVC (h.264) and the two other codecs gets smaller.
H.264 produces a poorer image, particularly at lower bitrates. When comparing images run at the same bitrate, both VP9 and h.265 are more detailed and crisp than the images produced with h.264. In other words, in order to produce the same quality image of VP9 or h.265, h.264 would need to run at a higher bitrate. However, the difference in quality, while perceivable, is not necessarily an outright problem. To measure this more objectively, we can take a look at the SSIM numbers, which show that the results for h.264 are pretty close to VP9 and h.265. Thus while h.264 might not be as good in regards to image quality, the difference isn’t enough to overcome the big tradeoff detailed in the next section.
We should also point out the other factors such as improved sub-pixel interpolation and motion vector reference selection (motion estimation) improve image quality as well. This is because they help predict what the next frame will look like in a movie. Those are pretty complicated concepts deserving their own articles so we will leave it at that.
In order to achieve a higher compression rate, VP9 and h.265 need to perform more processing. All that extra processing means that they will take longer to encode the video. This will hurt your latency as all that additional time spent processing will delay your video from being broadcast. Latency is important for ensuring that your live video streams can provide an interactive experience among other reasons.
Figure 2: encoding time as a factor of bitrate improvement comparing libvpx (VP9), x264(AVC), and x265 (HEVC). Image source: https://blogs.gnome.org/rbultje/2015/09/28/vp9-encodingdecoding-performance-vs-hevch-264/
So what exactly does the above graph mean?
This graph shows the encoding time in seconds per frame on the horizontal axis. The vertical axis shows bitrate improvement which compares a combination of SSIM and bitrate to a reference point set to x264 @ veryslow. The reference point is why x264 doesn’t go far above 0%.
What does the graph tell us?
VP9 and h.265 are (as advertised) 50% better than h.264, but they are also 10 to 20 times slower. If you follow the blue line for x264 (AVC) you will see that it stays below the other two lines for the majority of the bitrate benchmarks points. Not only that but both the green (h.265) and orange (VP9) lines intersect h.264 pretty early in their curves. That means that the seconds per frame rate will start to increase drastically, and really drag down your stream performance. Thus while VP9 and h.265 show much better compression rates, it comes at a very high cost of encoding time which will greatly increase latency. A more in depth analysis of encoding times and codec comparisons can be found in this University of Waterloo study.
As covered in the last section, both VP9 and h.265 have to run through more compression algorithms than h.264 which will subsequently increase their CPU usage. Even when fully optimized, live streaming is a CPU intensive process so increasing the already high usage will be a problem. However, there is something that can alleviate this: hardware support. Dedicated chipsets will reduce CPU consumption.
h.265 currently enjoys more hardware support including Windows 10 (downloadable or through Intel Kaby Lake or newer processors) Apple (iOS 11) and Android (Android 5.0) devices. While most mobile devices support VP9, most other systems do not. Without direct hardware support, the VP9 encoding process will peg the CPU, consuming a large amount of resources, decreasing battery life, and potentially increasing latency.
As we will cover in the next sections, h.264 enjoys widespread support and doesn’t drain the CPU as much as VP9 or h.265 in the first place.
Winner: h.264 with h.265 close behind
Adoption and Browser Implementation
In order to work with the codec, there needs to be supported hardware or software encoders. h.265 suffered from a low adoption rate due in no small part to patent licensing. h.265 has four patent pools related to it: HEVC Advance, MPEG LA, Velos Media, and Technicolor. This makes it more expensive which has discouraged more widespread adoption thus limiting it to specific hardware encoders and mobile chipsets. Only Edge, Internet Explorer and Safari support h.265, and even then the device running the browser will still need to support h.265 hardware encoding. Even when h.265 is supported in browsers with the correct implementation, WebRTC tends to not work correctly. Without WebRTC support, achieving real-time latency is difficult.
VP9 is royalty-free and open-source which cleared the way for wider adoption. It is available in the major browsers Chrome, Firefox, and Edge as well as the operating systems Windows 10, Android 5.0, iOS 14, and macOS BigSur. Since WebRTC supports VP9, it can work directly in the browser as well. There are also rumours that Safari support may be on the near horizon as well.
Although h.264 has one patent associated with it, as we mentioned earlier, in 2013 Cisco open-sourced its h.264 implementation and released it as a free binary download. That was a gigantic boost to the widespread implementation of h.264. As such, h.264 is supported by all of the browsers laptops as well as mobile.
Winner: h.264 with VP9 closing the gap
The biggest advantage to increased compression rates and the resulting smaller file sizes is that it consumes less bandwidth when you broadcast it. This means that users with slower internet speeds can still stream high-quality video streams.
So which codec produces better compression efficiency to create a smaller video?
According to a test conducted by Netflix, h.265 outperforms VP9 by about 20%. Although other tests have produced different results, they all conclude that h.265 creates smaller file sizes. Depending on the objective metric used, h.265 provides 0.6% to 38.2% bitrate savings over VP9.
However, while consuming less bandwidth is useful, there are other factors that should be taken into consideration. Upload speeds across the globe average at 42.63 Mbps for fixed broadband connections, which means that most places can support 4K streaming even with the higher connection speeds required by h.264. Despite the much lower average of 10.93 Mbps for mobile devices, they can still support 1080p streams.
This diagram from Boxcast shows that the average worldwide connection speeds are definitely able to handle the upload speed requirements at all tiers of resolutions. Note: we couldn’t find a graph comparing all three codecs, but VP9 would be in between h.264 and h.265.
image credit: https://www.boxcast.com/blog/hevc-h.265-vs.-h.264-avc-whats-the-difference
Furthermore, there are ways to configure your streaming application to cater to users in countries with slower internet speeds. You can do this by adding ABR and transcoding support. ABR (adaptive bitrate) will modify the bitrate to deliver the best experience. Transcoding splits broadcasts into multiple qualities so the client can request the best one depending upon the available bandwidth.
You may be thinking "What about mobile devices stuck on 2 or 3G connections?". The fortunate reality is that palm-sized devices don’t need to stream the highest resolutions to look good. 720 or even 480 will still display with good quality.
While bandwidth consumption may not matter as much to a consumer, it must be acknowledged that companies will save money in regards to bandwidth costs if they stream with VP9 or h. 265. The savings come from the smaller files which means they will not pay as much for more data streaming over CDN or cloud networks. While that is certainly nice, it is only at really high-resolution settings such as 4K that halving the data consumption makes a substantial difference.
Of course, saving money is certainly an important thing no matter what the scale. That brings us to our next point which will present the best of both worlds; better compression with the same performance.
LCEVC Circumvents the Whole Argument
LCEVC (Low Complexity Enhancement Video Coding) increases compression rates by about 40% for all codecs. This is due to the fact that it is an additional processing layer that works with existing and future versions of MPEG or other codecs such as VP9 and AV1. As we covered in a previous article, LCEVC has great potential to have a large impact on video streaming technology. Without having to change the composition of all the current protocols, LCEVC can make them more efficient in and of themselves.
From where things are now, it looks like content providers will be able to use LCEVC-enabled software or hardware-based encoders in combination with the Red5 Pro cross-cloud platform to unlock real-time streaming despite the processing-intensive video formats they are built with. Depending on which core codec is used, this applies not only to 4K and, eventually, 8K UHD, but also to formats devised for 360-degree viewing, virtual reality, and other innovations.
After considering everything outlined here, AVC/h.264 is currently the best available option due to widespread adoption and fast encoding speeds. Although increasing compression and video quality are important considerations, right now the tradeoffs are just too severe. Specifically, high encoding times, and voracious CPU consumption are really bad for live streaming video.
That said, considering that VP9 is free and also enjoys widespread support it will be a viable choice in the near future once faster software or hardware encoders are created. In the distant future, AV1 will eventually replace VP9, but considering the astronomically high encoding times it currently suffers from, there’s a lot of streamlining that needs to be done before it’s ready for expansive use. Of course, LCEVC could possibly circumvent the whole issue of changing codecs for better compression. Perhaps it will just serve as a longstanding bridge between h.264 and AV1.
Nonetheless, AV1 is poised to replace h.264, h.265, and VP8/9. With video consumption on the rise, decreasing bandwidth constraints will make it much easier to send the high quality videos that users are looking for. This is especially true for developing areas away from wired connections that are more dependent on cell phone connections. The consortium behind it has all of the major players involved and it’s royalty-free. All that is holding AV1 back right now is the lack of real-time encoders. Once those become widely available, AV1 (especially when paired with LCEVC) will be the way forward.
Think we missed something in our analysis? Let us know by sending an email to email@example.com or schedule a call with us to find out more about Red5 Pro.