When the internet was born only few people would have imagined the amount of data that it would need to handle decades later. The Internet and its original protocols were designed to send small packets of data (like emails) that could traverse the network and go from computer A to computer B.
Nowadays, Internet traffic has evolved to include the delivery of video. Not only is video a part of internet traffic, it far exceeds all other content. Cisco estimated that by 2022 82% will consist of Internet Protocol (IP) video content while the sum of all forms of IP video, which includes Internet video, IP VoD, video files exchanged through file sharing, video-streamed gaming, and video conferencing, will continue to be in the range of 80 to 90 percent of total IP traffic .
The growth of IP video traffic pushed companies to research and develop new protocols that could exploit the existing infrastructure without major changes. For live video streaming this led to using Content Delivery Networks (CDNs) as much as possible to continue to deliver live video using simple HTTP based protocols. While this approach was reasonable in the beginning, it is no longer effective today.
The most modern HTTP based protocols like CMAF, Low Latency HLS and Apple’s Low Latency HLS cannot keep latency below 3 seconds  and this prevents interaction with the live streams and makes them unusable for multiple scenarios including e-sports, traditional sports, auctions and live events of all types. The way to solve this latency problem is to use more taylored protocols like WebRTC and SRT that have been designed from the bottom up with the goal of achieving very low latencies.
This post shows that HTTP based protocols and CDNs are not future-proof when it comes to live video streaming because of the latency they introduce. A Cross Cloud Distribution System (CCDS) using WebRTC or SRT and their implementations in Red5 Pro will be presented showing how one or more cloud platforms, including private networks, can work in synergy to provide highly reliable and available deployments in any world region while supporting custom security rules and providing live streams with sub-500 ms end-to-end latency to millions of users.
The World Wide Web was invented in 1989 by Tim Berners-Lee who also invented the first version of HTTP which could only load a page from a server using the GET method. In subsequent years, HTTP continued to be developed and support for more data types was added. HTTP is based on a request-response model with two entities - the client and the server. The client makes requests to a server that handles it and provides an answer. This model works very well for loading web pages or sending emails, and thus it was suited for the early stages of the Internet. This allowed it to expand and have more and more infrastructure built to support it, quickly making HTTP the foundation of the Internet.
In the last decade internet traffic showed an exponential growth in its use for IP video solutions which includes live video streaming and VOD as shown in Figure 1. The initial approach consisted in extending the existing HTTP based protocols and infrastructure to support video content. This was motivated by the existence of Content Delivery Networks (CDNs) that could speed up the delivery of any kind of HTTP traffic.
As IP video traffic increased so did the usage of CDNs as shown in Figure 2. A CDN is a large system of distributed servers, called Edges, that are dispersed over the globe and cache content from an Origin server to provide it to users closest to one of the Edge servers. When a client makes a request to an Origin that uses a CDN, the request is redirected to the CDN Edge server that is closest to the client which will provide the content requested that has been previously cached. This solution is optimal for static content like webpages and even Video On Demand or VOD services.
However, it is inefficient for highly dynamic content, like live video, because it adds too much latency to the stream. This is due to the fact that using a CDN for video requires the system to first cache the video which adds unwanted latency. This latency also depends on the streaming protocol used that must be suitable to work with a CDN. The most popular HTTP based protocols are HLS and MPEG DASH which have latencies around 10 to 30 seconds.
The latest improvements in streaming protocols brought Low Latency HLS, Apple’s Low Latency HLS and CMAF. Red5 Pro compared those protocols including WebRTC to display their advantages and disadvantages . The results showed that even though the protocols mentioned above are a great improvement over HLS and MPEG DASH, their end-to-end latency is still limited to about 3 seconds in optimal conditions.
Figure 1: Share of Internet traffic over the period 2017-2022 showing the growth of IP video solutions that by 2022 are expected to cover 82% of the whole IP traffic.
Figure 2: Content Delivery Network traffic growth over the period 2017-2022.
The maximum latency allowable in a stream depends on the use case. For instance, for real-time communication, it is generally considered that the experience starts to degrade above 200ms of latency (0.2 seconds!). Beyond this limit, conversations start to become more challenging .
Live sports are also very sensitive because the last thing viewers want is a push notification informing them of the result of a match that hasn’t yet finished on their live stream. On a similar note, different providers may broadcast an event with different latencies and this can make neighbors ruin the climax of a sport event. While most sports broadcasts today have an intentional 3 to 6 seconds of latency built into their streams, live sports betting and fans in stadiums live messaging to their friends is making even those few seconds a detriment to the experience.
The gambling (not just sports) and auction industries have embraced live streaming as well. The creation of online casinos with real-life dealers emphasises the importance of reducing the critical time between each action by a dealer or player to minimize interruption and maintain flow. Additionally, auction houses expanded their online presence allowing users to bid virtually while watching a live stream. Logically, the only way to guarantee a correctly synchronized bid is with sub-second latency.
The reason the video streaming industry hasn’t moved towards lower latency protocols is simply because Content Delivery Networks (CDNs) have built up infrastructure over many years that rely on HTTP. It’s only logical to try and make incremental improvements in the hopes that it will be good enough as opposed to rethink your entire technology stack.
A second aspect to consider when using CDN is their coverage. For a CDN to deliver content quickly to a client it must use an Edge node that is as close as possible to the client and this is fundamental for live streams to keep latency low. Accordingly, high performance on a global level can be achieved only if the CDN has servers everywhere in the world but that may not always be the case.
Currently, there is no universal CDN with data centers in every possible (major) city. Moreover, geopolitical areas such as China or South America offer delivery challenges as well. Therefore, a network of multiple CDNs may need to be linked together for effective content delivery. This means working with CDNs that may have very different APIs and pricing for the regions they cover.
Sub-second latency is the critical component of truly interactive video experiences. Achieving scalable real-time latency requires a new approach. This means moving from request-response protocols based on HTTP, to streaming protocols like WebRTC and SRT that were designed to handle large data streams with minimal latency. A further shift is needed from CDNs towards solutions based on Cloud Platforms which will create an infrastructure that is more efficient and does not require caching.
Similarly to CDNs, Cloud platforms support different regions with varying prices and are subject to the same geopolitical limitations in China and South America. The main difference though, is that using an approach with multiple Cloud platforms is becoming simpler thanks to tools like Terraform and Kubernetes. Terraform abstracts the different APIs of the Cloud platforms into a single API - the Terraform Cloud API - while Kubernetes is a platform for automating deployment, scaling, and operations of application containers across clusters of hosts.
Cloud Platforms are also moving beyond HTTP-based streaming data delivery.. For instance, AWS added support for IP multicasting to their AWS Transit Gateway so that a stream of data can be delivered simultaneously to multiple instances and this could be exploited for video multicast .
This post presents a Cross Cloud Distribution System (CCDS) as an alternative to CDNs. The solution is based on Red5 Pro and exploits multiple Cloud platforms at once to provide the best coverage and performance to support millions of users while delivering live streams with sub 500ms end-to-end latency. Additionally, the system supports private networks, which can be added to work alongside the Cloud Platforms, and custom security rules while abstracting the complexity from the end user.
Red5 Pro is a platform that allows developers to build scalable live streaming applications that support WebRTC and SRT for both ingest and egress as well as other protocols like RTMP, HLS, RTSP and MPEG-TS. It also uses a Cloud based clustering architecture instead of CDNs. This allows it to achieve a sub-500ms end-to-end latency as demonstrated by a live demo  while being able to support millions of concurrent users.
WebRTC is a protocol built as a free and open-source standard that adds real-time communication capabilities to an application. It uses three channels to deliver video, audio, and generic data between peers and has built in security that uses DTLS and SRTP for end-to-end data encryption. WebRTC guarantees very low latency by using UDP delivery, which is crucial for real-time video streaming. One of WebRTCs biggest advantages is that it’s built into virtually all modern web browsers.
SRT is an UDP based open source transport protocol originally developed by Haivision that optimises streaming performance across unpredictable networks by dynamically adapting to the real-time network conditions between transport endpoints. While any data type can be transported, it is particularly optimized for audio and video streaming. SRT helps compensate for jitter and bandwidth fluctuations due to congestion over noisy networks and its error recovery mechanism minimizes packet loss. Similarly to WebRTC, SRT supports end-to-end encryption using AES . SRT has gained widespread adoption among hardware video encoder manufacturers, thus making it an attractive low latency transport built right into the encoder itself.
Red5 Pro supports millions of concurrent users while providing sub 500ms end-to-end latency using High Availability (HA) deployments that consist of Red5 Pro clusters deployed on Cloud Platforms. A sample cluster diagram is shown in Figure 3.
Figure 3: Diagram of a Red5 Pro cluster that can be deployed on a Cloud platform
to support millions of users while guaranteeing sub-500ms end-to-end latency.
Within a Red5 Pro cluster each server can be an Origin, Relay or Edge Node. The Origin nodes are used for ingest while the Edge nodes are used for egress. Each Origin server can communicate with multiple Edge servers to support thousands of users. Relay nodes, which receive streams from an Origin and relay it to multiple Edges, are used for larger deployments to scale the cluster even more and allow it to support millions of users. Each node in the system can be deployed on a relatively low powered cloud-based virtual machine. Therefore, Red5 Pro achieves High Availability by scaling horizontally without using expensive, highly performant machines, thus making it easy to deploy in any Cloud Platform.
Red5 Pro clusters are controlled by a set of Stream Managers through a Cloud Controller while using a database for permanent storage. The Stream Manager provides an API to interact with the cluster and is able to manage the nodes, perform load balancing and replace them if they fail for any reason. The Stream Manager is also used by publishers to connect with an Origin to publish to and subscribers to get an Edge close to them for playback. A sample diagram of a Red5 Pro cluster with Stream Managers is shown in Figure 4.
The Autoscaling feature, shown in Figure 5, allows the Stream Manager to control the load on the network and spin up or down new instances based on the current load without requiring any user intervention. For planned events that may have a spike of connections in a short time frame, the Stream Manager provides a Scheduling API to prepare or scale up a cluster to guarantee it is already at scale once the event starts.
Figure 4: Sample diagram of a Red5 Pro cluster using two load balanced Stream Managers.
Figure 5: Autoscaling lifecycle of the Stream Manager.
The interaction between the Stream Manager and the nodes uses a Cloud Controller that translates the abstract commands of the Stream Manager to the actual API calls provided by each platform. Red5 Pro supports AWS, Google Cloud Platform and Microsoft Azure through Custom Cloud Controllers. At the same time it can use a Terraform Cloud Controller to use the platforms already mentioned and any other platform supported by Terraform.
Terraform abstracts the APIs of each Cloud Platform with its own Terraform Cloud API and in its backend uses Terraform providers to communicate with the Cloud Platforms. In this way, the Stream Manager can use a single Cloud Controller to communicate with Terraform and automatically support the over 200 Cloud Platforms that Terraform has a provider for. Figure 6 shows the difference between having the Stream Manager interact directly with a Cloud Platform and using the Terraform Cloud Controller.
Terraform makes it possible to have a cross cloud distribution system where multiple Cloud Platforms can be used at once including private networks. This is explained in greater detail in the following sections.
Cross Cloud Distribution System
The Stream Manager can use the Terraform Cloud Controller to communicate with Terraform and automatically support all the Cloud Platforms that Terraform supports. This approach can be extended to private networks as well which may be required for sensitive deployments. Private networks can be supported as long as they use a cloud offering that includes a Terraform provider. Such offerings include, and are not limited to, VMware vSphere , Apache CloudStack , OpenStack  and OpenNebula . In this way, the Stream Manager can use multiple clouds at once by deploying clusters in each of them based on custom criteria, thus creating a Cross Cloud Distribution System (CCDS).
An CCDS can be controlled by one or more sets of Stream Managers that can be deployed in one or more platforms. As long as the Stream Managers share the same database then all of them will be aware of the state of the whole network. Moreover, if a Stream Manager goes down, others can temporarily handle its requests while a new one is spun up. To make sure the system is always available the database deployment uses a replication strategy to avoid any data loss or a single point of failure.
Figure 6: Diagram showing how Cloud Controllers are used by a Stream Manager to interact with multiple platforms while saving the state in a database.
Advantages of Cross Platform Support
A cross platform approach increases the availability, reliability and coverage of the overall deployment while allowing a customer to exploit the different pricings offered by Cloud providers. The availability and reliability of the system is improved because if one Cloud fails it is possible to use a different one and deploy a new cluster of nodes in the region that failed.
At the same time, different Clouds have different performance and coverage in various regions. Therefore, if price is not a major constraint, this approach allows one to exploit the best Cloud provider for every region and to reach anywhere in the world. Figure 7 shows the coverage created by using AWS, GCP, Azure, DigitalOcean and Linode together.
Conversely, when working on tight budgets this approach allows one to create a custom trade-off between performance and price by selecting the most appropriate platforms for every region to target. A user of the CCDS also benefits from a single API and Stream Manager controller making it so they don’t have to deal with a different API for each cloud network.
Figure 7: Diagram showing the coverage created by using AWS, GCP, Azure, DigitalOcean and Linode together.
The filled donuts represent existing data centers while the empty ones data centers that are being built.
Red5 Pro has built-in support for stream authentication using the Round Trip Plugin. The plugin takes a username, password and optional token from a user upon making a broadcasting or subscribing request and validates it against a backend service to verify if the user should be allowed to publish or subscribe. This solution is great to add a basic security layer to the strems, but for certain deployments it may not be enough. In that case, Red5 Pro supports the development of custom authentication plugins that can implement any custom feature as needed.
Adaptive Bit Rate
Red5 Pro supports Adaptive Bit Rate (ABR) streaming with WebRTC to serve clients the best stream quality their networks support. The ABR solution consists of publishing multiple variants of the same stream to an Origin (using different bit rates) that are delivered to every Relay and Edge of the Cluster. The Edges then use an RTCP message called Receiver Estimated Maximum Bitrate (REMB) to determine the bandwidth of each client and deliver the stream variant with the highest quality that client’s network can support, thus preventing packet loss and congestion.
The multiple stream variants can be ingested into a Red5 Pro Origin using an Encoder or a Red5 Pro server instance configured as a Transcoder. In the latter case, the Transcoder will receive a single high quality stream and create the lower quality variants to ingest into the Origins. A diagram of the system is shown in Figure 8.
Figure 8: Diagram showing how three stream variants of the same stream propagate from Origin to Edge when using Adaptive Bit Rate and publishing with an Encoder (left) and with a Red5 Pro Transcoder (right).
This post showed that the latency provided by HTTP based protocols, including the most modern variants, fails to meet the demands of live video streams. In fact, more and more use cases require users to be able to interact with live streams and this is possible only if latency is kept at a minimum.
More specifically, the end-to-end latency must be kept below 500ms. Latency this low cannot be achieved with HTTP based protocols. Even LL-HLS, CMAF and other approaches can only achieve around 3 seconds of latency. The solution is to use more advanced protocols like WebRTC and SRT and move from an HTTP centric infrastructure based on CDNs to Cloud Computing Platforms.
This post presented a Cross Cloud Distribution System that leverages multiple Red5 Pro Clusters deployed in different Cloud Platforms and, optionally, private networks to create a high availability system that is reliable against possible failures or downtimes of the Cloud Platforms. Furthermore, the CCDS can exploit the different performance and pricing offered by each platform which can support millions of concurrent users on each cluster and custom authentication rules while keeping the end-to-end latency measured in mere milliseconds.