When it comes to deciding between low latency HLS and WebRTC, which protocol delivers the best live streaming experience? Since the protocol determines how quickly the encoded video data will be transported across an internet connection, the decision of which one to use is very important.
Wowza recently published an article that contains misinformation about WebRTC and low latency HLS. While correctly establishing that WebRTC is the only way to provide real-time latency they repeated some common misconceptions as well, especially the often repeated myth – fully debunked by Red5 Pro and others – that WebRTC does not scale.
After further analysis, the fact checkers at Red5 Pro have come up with five major factors that you should consider when choosing protocols that Wowza also happened to mostly get wrong: latency, scalability, multi-device compatibility, performance in poor streaming conditions, and security. Let’s dive into those details starting with arguably the most important aspect in live streaming: latency.
Latency is crucial for live streaming. Real-time use cases from straightforward video conversations to more precise matters such as drone control can only allow for 500ms of latency. Anything above that is too high. As Wowza described “Low latency is critical. [...] WebRTC was built with bidirectional, real-time communication in mind [making it] the fastest protocol on the market.” This is where we agree. WebRTC is indeed the most widely supported and fastest protocol today.
HLS was built on the long established and deeply entrenched HTTP infrastructure leading to the widespread use it currently enjoys. This aging infrastructure is also why HLS results in anywhere from 10-40 seconds of latency.
However, there are ways to modify HLS to decrease the latency. Apple has their own Apple Low Latency HLS (LL-HLS) implementation which is similar to the open-source Low-Latency HLS (LHLS) which both reduce the latency down to around 2 or 3 seconds. Though they decrease latency, neither enjoy the widespread compatibility of standard HLS.
Looking to increase the compatibility of LLHLS, in early 2020 Apple announced that it dropped the HTTP/2 push requirement. Thus, it looks like the overall HLS spec will eventually support around 3 seconds of latency. While still not real-time it’s certainly better than 40.
As a UDP based protocol built to fully adapt to the modern internet, WebRTC supports 500ms of real-time latency meaning it is currently the only widely supported protocol that can provide real-time latency.
WebRTC is much harder to scale than HLS. However, that does not mean that it can’t be done especially considering that it has been done.
One such example of successful WebRTC based scaling comes from Microsoft.
In August of 2016 Microsoft acquired Beam, a WebRTC based approach to live game streaming intended to solve the latency issues and provide a better experience over the Twitch platform. A year later, Microsoft changed the name to Mixer. Though they ultimately shut down their Mixer game streaming platform it was due to not being able to attract enough people to it, rather than being able to support a large number of users. The popular gamer Ninja had one stream on Mixer that attracted over 85,000 concurrent viewers and 2.2 million viewers over an 8.5 hour stream.
According to Wowza, “if you need to scale your audience beyond about 50 viewers, you’ll need to think twice about how you’re going to do it.” They then claim that the Wowza Streaming Engine can scale up to 300 WebRTC based viewers in the best case scenario. Using their system, anything beyond that requires transmuxing WebRTC to HLS or DASH resulting in increased latency.
The difficulty that Wowza encounters in scaling is due to their implementation of WebRTC, not the protocol itself. Essentially Wowza’s Streaming Engine acts as a single server SFU in this case. The fact that it’s using one machine instance to handle the entire load means it can’t go beyond what a single server could handle in terms of connections, RAM, CPU, etc. The broadcasting or publishing stream goes out to a single SFU server so once all the resources of that SFU are consumed, it cannot add any additional instances. It does not have the ability to relay that broadcast out to other servers.
No matter what protocol is used, scaling out your application increases the amount of CPU and RAM that your application consumes. When your hosting provider uses fixed data centers – such as a CDN – meeting that increase means physically adding additional servers or increasing the server capacity. This could be a problem if you hit a higher than anticipated demand, or if you just need a little extra capacity as you could end up paying for a much larger server than you need. This of course is all abstracted to the developer when using a CDN service, and that’s often why it’s so attractive to use this kind of setup. The problem though, is that CDNS use HTTP to scale, and that comes with a tremendous amount of latency.
This is why you need a clustering solution that works with WebRTC as a protocol. Even better if it can autoscale with cloud infrastructure. This kind of Autoscaling Solution, involves switching from the static, datacenter-based CDN model, to a much more flexible cloud-based model. Server clusters can be set up to dynamically spin up new servers as network traffic increases and spin them back down once they are no longer needed. This alleviates the potential issue of paying for more capacity then you really need.
Red5 Pro’s WebRTC supported Autoscaling Solution works by publishing a stream broadcast to an origin server. Subscribers requesting access to view the broadcast stream, connect to a separate edge server which is then connected to the correct origin server through a Stream Manager. This architecture allows multiple edges to connect to the same origin server. Thus multiple servers can handle as many connections as needed and all connect to the same broadcasting stream. If the origin hits capacity in the number of edge servers it can connect to, relay nodes allow the same broadcast to publish to multiple origins. Thus the system will keep spinning up new origins and edges to handle as many publishers and subscribers as needed.
In fact, this dynamic scaling model is similar to how Mixer built their solution. However, Mixer used bare-metal servers which would not be as flexible as a strictly cloud-based solution. If you really want to dive deep into the Red5 Pro approach to cloud based scaling of WebRTC we recommend reading our white paper on the subject.
Ensuring that your application can run on a variety of devices is certainly important. Whether it’s a mobile, laptop or tablet, you need a full complement of browsers and platforms supported.
As a new web-standard, WebRTC is fully supported by the latest versions of all major browsers: Chrome, Safari, Firefox, Edge and Opera. Plus it can be run natively in the browser without the use of a plug-in. That includes mobile browsers as well for iOS and Android. Of course, creating dedicated mobile apps with the use of Mobile SDKs is good too.
Like everything else, all browser implementations are slightly different, but nothing acting as a full roadblock. Rather than figuring out how to create cross browser compatibility, Wowza simply blames Safari for being buggy. As WebRTC providers, it's our responsibility to figure it out to make cross browser compatibility work. Full compatibility is definitely possible as plenty of other groups – including Red5 Pro – fully support Safari with no issues. We agree that keeping compatibility with the full spec and multiple browser implementations is difficult, but that doesn't mean it can’t be done.
Performance in Poor Streaming Conditions
In terms of quality and performance, LL-HLS and WebRTC have similar features as they both can support transcoding and Adaptive Bitrate (ABR).
ABR allows the client to request a lower bitrate that is more appropriate to the connectivity they are experiencing at that moment. That will ensure a smooth connection despite poor connectivity. HLS and it’s newer cousin LL-HLS both have ABR built right into the spec. This is accomplished by a master manifest file which contains the variants. When the player detects that the video isn’t being delivered quickly enough and thus detects insufficient bandwidth, then it can simply request one of the lower stream variants in the manifest. It then starts downloading the new video segments at the lower bitrate.
With WebRTC things are quite a bit different. In WebRTC you have a single UDP connection where the delivery of video is over SRTP. This means you can't go and make new requests for different segment files since there are no segment files to begin with. Instead the approach is to make available the multiple bitrate variants at the edge server allowing for the client to request the correct quality of video. The request itself is over the RTCP channel, which is a bi-directional control channel for sending live information about the state of each peer in a WebRTC session. The specific message we listen for is REMB which contains the recommended bandwidth that the peer is requesting (in this case the subscriber client). Based on that information, the edge server node can then respond by shifting to delivering the best stream for the bandwidth requirement.
As a side note, both HLS and WebRTC can rely on live transcoding of the streams to generate these multiple bitrate variants. Transcoding splits the stream into a variety of quality ladders (For example: high, medium, and low) so that users who can support the highest quality can subscribe to it, while users with poorer connections can still watch.
While HLS is limited to ABR, WebRTC has additional features further improving quality and performance.
Given that WebRTC is a UDP based protocol, one of its most critical features is NACK, which is a method of resending critical packets. A bad network connection will likely result in the client dropping packets. Rather than trying to resend each and every one of the packets, NACK identifies the ones that are most important and resends those. This prevents the network from getting further overloaded with redundant requests. This will help keep the stream flowing and looking good even under poor network conditions, and doesn’t have the drawbacks of packet backups in TCP based systems. Again, like REMB, ACK is a message type sent over the RTCP channel to the Edge server. The edge server then is responsible for re-delivering that critical packet. WebRTC also supports many other strategies for keeping stream quality high and ensuring efficient delivery of video including FEC, FIR, and PLI, which also happen to work over the RTCP channel.
WebRTC is a complex specification with many moving parts, and this is likely why Wowza got so much wrong in their post about how ABR works over WebRTC. Specifically we refer to this section:
“WebRTC, on the other hand, wasn’t built with quality in mind. WebRTC’s No. 1 priority has always been real-time latency for peer-to-peer browser connections. Traditionally, quality has taken the back seat. You may find it surprising that WebRTC supports ABR — but with a caveat. As the saying goes, “A chain is only as strong as its weakest link.” And this holds true for WebRTC’s quality. WebRTC’s built-in ABR is on the subscriber side only, which creates an issue if you have multiple subscribers. You could run into a situation where one subscriber has a poor network. This would force the publisher to switch to a lower-quality stream, resulting in everyone having to watch it in low quality.”
Wowza seems to be confusing video peer to peer conferencing scenarios where the person with the lowest bandwidth will dictate the quality for all users. This setup is of course quite different from a scalable origin-edge clustering model where the edge server node handles a unique peer connection per client. In fact in Wowza’s SFU case, they also have this scenario. From our reading and what others have told us Wowza simply doesn’t have an ABR strategy for WebRTC.
Making sure that your data and streams remain protected is important as well. Preventing unauthorized users from creating streams and encrypting them so they can’t be intercepted ensures that sensitive information doesn’t leak out.
As mentioned earlier, LL-HLS will be wrapped into the HLS spec. That means that security features available with LL-HLS such as DRM, token authentication, and key rotation will be implemented. However, those extra security features will have to wait until providers can configure them in their systems. Waiting on someone else for your security can be an issue.
HLS does have one prominent security feature in that it can be encrypted. WebRTC is encrypted by default meaning your streams are free from hackers gaining illegal access to your streams. Furthermore, features such as user authentication, file authentication, and round-trip authentication, will further secure your streams.
In regards to DRM systems, for many circumstances the basic security provided by WebRTC is more than enough to protect your data. That means that content owners and distributors can safely forego the costs and hassles of contracting for DRM support if they have the legal latitude to do so.
One of most widespread myths about WebRTC is that it can’t scale. We are here to tell you that Red5 Pro is fully scalable to audiences well into the hundreds of thousands and even millions. WebRTC also enjoys robust security features, built-in device compatibility, and high quality performance regardless of network strength. Of course, there is no avoiding the fact that WebRTC is the only way to get real-time latency in under 500ms.
The question of low-latency HLS vs. WebRTC for live video, has one clear winner: WebRTC.