How Do You Get Ultra-Low Latency with WebRTC?

As we’ve covered before, WebRTC is currently the best way to get ultra-low latency. We’ve also mentioned – quite a few times– that Red5 Pro’s WebRTC implementation results in sub 500 ms of latency. That feature enables fully interactive live-streaming making real-time communication possible.

The question remains, “How does WebRTC actually achieve such ultra-low latency?”. This post will provide a lower level examination of the features that make WebRTC delivery so fast and efficient. For a more high-level overview, please take a look at our post on WebRTC.


TCP and UDP are transport layer protocols used for sending bits of data—known as packets—over the Internet. Both protocols use the IP protocol meaning that a packet sent by either TCP or UDP will be sent to an IP address.

The similarities stop there, however. TCP (Transmission Control Protocol) is a connection orientated protocol with built-in error recovery and retransmission. All the back-and-forth communication reordering packets and ensuring their complete delivery introduces latency.

WebRTC uses UDP (User Datagram Protocol) to deliver a faster stream of information by eliminating error-checking. Packets are just sent directly to the recipient without having to properly order and retransmit them. Rather than waiting for confirmation, the sender keeps transmitting packets without error recovery.  If any packets are dropped they are lost and will not be resent (except for NACK, but more on that later). Losing all this overhead means the devices can communicate more quickly.

Rather than containing all the sequencing data like with TCP, UDP depends upon the application level RTP protocol to properly order the packets or know which ones to drop as they don’t fit the current timeframe.  More on RTP below.

RTP Efficiency

WebRTC uses the streaming protocol RTP to transmit video over the Internet and other IP networks.

RTP sends video and audio data in small chunks. Each chunk of data is preceded by an RTP header; RTP header and data are in turn contained in a UDP packet. The data is organized as a sequence of packets with a small size suitable for transmission between the servers and clients. RTP streams carry the actual media payload encoded by an audio or video codec. The header is used to adapt to varying network conditions such as a single participant joining from a low bandwidth connection. When the application plays one packet, the following packets may already be in the stage of decompression or demultiplexing.

The Internet, like other packet networks, occasionally loses and reorders packets and delays them by variable amounts of time. To cope with these impairments, the RTP header contains timing information and a sequence number that allows the receivers to reconstruct the timing produced by the source. This timing reconstruction is performed separately for each source of RTP packets in the conference to adapt to real-time conditions.

Adjustments are made through a mixer that resynchronizes incoming audio packets to reconstruct the constant, small chunk, spacing generated by the sender. It then mixes those reconstructed audio streams into a single stream, translates the encoding to the corresponding quality setting and forwards the packet stream. Those packets might be unicast to a single recipient or multicast on a different address to multiple recipients. As the internet doesn’t support multicast, WebRTC leverages a unicast version of RTP.  

By consolidating essential information, RTP streamlines the process of media delivery. As with the UDP protocol by itself, when RTP is layered on top of UDP it also has less overhead which makes it much faster than other streaming solutions like HLS or DASH. More than just a lightweight protocol, RTP also reduces latency by using a push method to send out the stream. We shall discuss that in more detail in the next section.

For a very detailed explanation of how RTP works please refer to the Internet Standard and this article.

Push vs Pull

In order to deliver a stream, there must be a connection between the broadcaster and subscriber.  Beyond that the broadcaster must know what it needs to actually send to the subscriber. There are two different methods for doing that: push and pull.

HTTP protocols like HLS and MPEG DASH, employ a pull approach where the client continually requests segments of video. This means that there is a constant stream of requests flowing back and forth between the broadcaster and subscribing client. It’s basically like trying to send an email line by line, rather than all at the same time. This constant polling creates a large amount of overhead making for an inefficient process. It should be noted,that there are advantages to this approach, such as the ability to create a stateless system; CDNs use this feature of pull to be able to distribute chunks of video across multiple server nodes. However, doing so increases latency.

Alternatively, RTP pushes a continuous stream of data to the client. Without waiting for confirmation, RTP just sends the video straight through. RTP knows that line 2 comes after line 1 so it doesn’t waste any time asking what data packet needs to be sent next. This is a much more efficient process than the call and response method of pulling.

HLS and MPEG DASH use "PULL" , while WebRTC uses "PUSH".

WebRTC is Natively Supported in the Browser

WebRTC works natively in the browsers. All the encoding and decoding is performed directly in native code as opposed to JavaScript making for an efficient process.

One approach to ultra low latency streaming is to combine browser technologies such as MSE (Media Source Extensions) and WebSockets. In this setup the solution relies on JavaScript to pull all the data from the video stream. While MSE and WebSockets are both individually written in native code, the way a developer needs to make them work together requires writing JavaScript. As JavaScript is an interpreted language it means it’s just not as efficient as native code. In the case of the MSE Websocket approach JavaScript is used to extrapolate the information required to correctly transport the stream. Since JavaScript is less efficient this makes for a slower process and increased latency, not to mention WebSockets are also TCP based.

NACK Support -  Resends Critical Packets

As mentioned above, UDP may drop lost packets in order to stay up to date with the stream and provide the lowest latency. However, sometimes important packets need to be received. There needs to be a balance between retransmitting all lost packets – causing high latency – and just completely dropping everything – negatively affecting video quality.

NACK provides an efficient method for determining if the packet is critical enough to warrant retransmission. First, the receiving client identifies the missing packet. Then it determines which packets are essential and worthy of resending. The criteria for determining critical packets is based on its availability in the cache and the likelihood of successful receipt upon retransmission. The receiving client will then send a negative acknowledgement message (NACK) indicating which packets were lost. Upon receipt of that message the sender will resend the missing packet. While technically having to retransmit critical packets does add to latency, In the case of WebRTC as opposed to other technologies that will cache everything in the packet queue, WebRTC only does this if the network is experiencing high packet loss. Which leads us to our next point.

No Buffering

In order to provide the lowest latency possible, Red5 Pro’s implementation of WebRTC does not perform buffering or caching by default. Without a buffer, there is no need to wait for a queue of packets and they can just be sent as soon as they are ready. As described above, quality control methods such as NACK and the RTP mixer ensures that the stream will still perform with good quality even in poor network conditions.

This is different from HTTP delivery over CDNs which intentionally cache the packets in a buffer to ensure smooth playback of the stream.

The Lowest Latency Possible

Thus, for all the above reasons, WebRTC is able to provide a latency measured in milliseconds. The lightweight UDP and RTP protocols prioritize sending out packets rather than carefully checking that everything is perfectly lined up. Native browser support ensures the greatest code efficiency, and removing a buffer ensures the fastest delivery of packets. Of course, NACK finishes the job by correcting errors so that all the important information is received without needlessly bogging down the system.

While some may call 500 ms (or less) “ultra-low latency” we here at Red5 Pro prefer the term “real-time latency”. It is that real-time latency that allows our customers to build fully interactive applications.

Come see what we can do for you! Send us an email to or schedule a call.

  • Share: