8 Components of WebRTC Security Architecture

SHARE

WebRTC was designed for more than just low latency live streaming. Responding to the needs of a modern streaming application, WebRTC also provides stream security. This post examines the WebRTC security architecture, and how that can be set up. Encryption First and foremost, it needs to be mentioned that WebRTC streams are always encrypted. Encryption… Continue reading 8 Components of WebRTC Security Architecture


WebRTC was designed for more than just low latency live streaming. Responding to the needs of a modern streaming application, WebRTC also provides stream security. This post examines the WebRTC security architecture, and how that can be set up.


Encryption

First and foremost, it needs to be mentioned that WebRTC streams are always encrypted.

Encryption is a way of scrambling data so that only authorized parties can understand the information. In technical terms, it is the process of converting plaintext to ciphertext. In simpler terms, encryption takes readable data and alters it so that it appears random. This requires the use of two encryption keys; one public and one private. Those keys are a set of mathematical values that both the sender and the recipient of an encrypted message can decipher. Encryption needs to be random to prevent unauthorized users from accessing the data, but predictable for the authorized parties receiving the information so that it can be used correctly.

Since WebRTC works directly in the browser, this means that the encryption process is also performed in the browser with no additional configurations required. Furthermore, WebRTC does not need to download any additional plugins. This further increases security as it eliminates the concern of third party software and potential side effects such as data tracking or viruses. Plugins are also another potential security risk as they are an additional connection that could be exploited.

WebRTC security enables AES (Advanced Encryption Standard) based protection. As such, this eliminates the risk of using third parties or leveraging a DIY platform to manage all the functions related to authenticating devices, and authorizing users. Instead, WebRTC uses the video transport protocol SRTP (Secure Realtime Protocol) to send and receive encrypted content over the three channels WebRTC devotes to video, audio, and data.

Exchanges of the keys used by SRTP to encrypt and decrypt content are managed through a version of the IETF’s TLS known as DTLS (Datagram Transport Layer Security), which is used with UDP (User Datagram Protocol) connectivity, the ultra-low latency packet transmission protocol employed by WebRTC. While we describe using UDP since that’s the typical setup using WebRTC, it should be noted that the same process can be done over TCP. All of this happens automatically with instantiation of a WebRTC stream. This will be covered in more detail later on.

Furthermore, the same WebRTC security architecture will be replicated no matter what hosting provider is used. The ability to support cross-cloud solutions increases flexibility. It also enables the establishment of the same security features in different regions since WebRTC security implementation is standard.

Encryption ensures that the data sent between a broadcaster and subscriber cannot be read. The next sections will cover how the connection is established in the first place.


Signaling and CORS

CORS (cross-origin resource sharing) prevents unwanted information from being exchanged between a website and another resource like a server, datacenter or another website. It is a W3C standard that provides a process in which a server and website can interact to determine whether or not it is safe to allow the transmission of data through a cross-origin request.

CORS affects the use of WebRTC for live streaming as well. Specifically, in regards to establishing a connection between the broadcaster or subscribing client and the corresponding server that will act as a sort of relay point between the two called “signaling” in WebRTC parlance.

In order for a stream to connect to another peer, they need to know where to find each other. If the two sides of the connection are not served on the same web server, CORS restrictions will block the connection from being established. In that case, a connection must be negotiated via a signaling protocol. The WebRTC spec doesn’t specify how to send these signaling messages, so it can be sent over HTTP or with WebSockets. Either way, connecting to a server for signaling, requires dealing with CORS and the configurations that it provides. Red5 Pro implements signaling with WebSockets. In our Red5 Pro autoscaled cluster, the Stream Manager acts as the signaling server, proxying the calls down to the edge and origin nodes to establish connections from the WebRTC clients to those server nodes. The figure below shows this relation and Stream Manager connecting WebRTC publisher clients to origin nodes.


HTTPS and Secure WebSockets (WSS)

In order to create a video from a browser, the browser must be able to access the camera and microphone. In Chrome, that is achieved with the getUserMedia method which can only be accessed if it’s served from a secure website. As the HTML page has to be delivered via HTTPS to the browser, this also means that any server you communicate with from that page also needs to be secure.

Delivering a site’s content over HTTPS has two requirements: 1) a domain name to access the site, and 2) a certificate from a verified provider installed on your web server. Using the domain name, the browser validates the domain against the cert from the provider that it trusts. Once validated, a key exchange is performed between the browser and the server  to allow SSL encryption. Once encrypted, the page will not be delivered as plain HTML/JavaScript text which could be intercepted by anyone.

How does this all work with WebRTC?

The getUserMedia method needs access to the camera and microphone through the Chrome browser. Since the HTML page has to be delivered via HTTPS to the browser, this also means that any server you communicate with from that page also needs to be secure. When it comes to live streaming, HTTPS is just used to access the website. The actual streaming will be done over a UDP based WebRTC connection.

The WebRTC connection is established through WebSockets which falls under the same security criteria as the getUserMedia method. The way SSL is conducted over WebSockets is through WSS.

That last S stands for Secure. The same kind of cert and domain for HTTP traffic can be used exactly the same way for WebSocket communication.

In the Red5 Pro system, WebSockets are used for signaling, meaning that the WebSocket server also needs a cert installed and a domain associated with it. In this case, the WebSocket server is Red5 Pro. The Red5 Pro docs site has details on how to install an SSL cert. Once a cert is installed on the web server and another one on the Red5 Pro server, then the browser allows access to the camera and mic. This further enables the browser to begin the signaling process over secure WebSockets. As explained in the CORS section above, when deploying a Red5 Pro autoscaled cluster, the signaling is proxied through the Stream Manager, and all of the SSL is either configured on a single Stream Manager instance or on a load balancer sitting in front of multiple Stream Managers.

Since the Red5 Pro SDKs are designed to conduct all this signaling automatically, live streaming apps can be set up without having to worry too much about it. For those that want a deeper understanding of what’s going on exactly, keep reading.


Signaling in More Detail

Signaling is used to establish a connection between the browser and the server to enable the sending and receiving of video/audio. By design, WebRTC is a Peer to Peer protocol. Red5 Pro acts as one of the peers when making a connection to the Red5 Pro server which then allows for the Red5 Pro server to become a peer client communicating with the browser. It will then pull the video and audio to relay to the rest of the Red5 Pro streaming pipeline. From the other side, a WebRTC subscriber client wanting to watch a stream also makes a P2P connection to Red5 Pro in the same manner. Once the connection is negotiated, Red5 Pro pushes the video and audio down to the browser for viewing.

When conducting the signaling phase, the server and browser begin to exchange data back and forth in order to set up the connection which will ultimately push and receive the streaming video and audio. The signaling data being exchanged are of two types: SDP and ICE.

SDP – session control messages that cover media capabilities

ICE candidates – messages detailing how to connect through a NAT

More on these below.


SDP Exchange

session control messages that cover media capabilities

Session Description Protocol (known as SDP) is a format for describing the capabilities of a media capable device. In this case, that’s the Red5 Pro server and the browser. Rather than rehash the subject of WebRTC signaling and SDP exchange, this post will focus on security and oversimplify what’s going on here. Essentially, the browser sends the server a list of its capabilities, like the resolutions it can produce, which codecs it supports, and other detailed information to set-up the stream. The other peer (an edge or origin server node if using Red5 Pro autoscaling) then responds back with what it can handle. In the case of Red5 Pro, it prefers the client to broadcast with h.264 to streamline performance as it minimizes transcoding across multiple platforms and services. Once the server and browser agree on how they can communicate, the process moves to the ICE candidates phase.


ICE Candidates

network configuration details used for making the P2P connection

Exchanging ICE candidates is another aspect of establishing the P2P connection with the server. ICE is a protocol used to create connections between devices across the internet. The information included in an ICE candidate concerns whether to use TCP or UDP for transmission, the IP address of the client, and other details for making a direct connection to the peer.

ICE also consists of two sub-protocols known as STUN (Session Traversal Utilities for NAT) and TURN (Traversal Using Relay around NAT). STUN is used to punch through firewalls/NATs, and TURN is used if a direct P2P cannot be established using STUN. TURN basically routes the traffic through an intermediary server known (appropriately) as a TURN server. The Red5 Pro media server (like all servers on the internet) doesn’t use a firewall. Therefore, this normally alleviates the need to route through a TURN server. However, a STUN server will definitely need to be used, as many of the world’s computers/devices sit behind firewalls. To make this process a little easier, the Red5 Pro HTML5 examples (which use the Red5 Pro streaming SDK) defaults to public STUN servers hosted by Google and Mozilla. For best results it is recommended that you setup and host your own STUN/TURN server when you go to deploy your application to the public.

Once the Red5 Pro server and the browser client know how to connect to each other, the next step is to establish a secure connection using the info in the ICE candidates. As mentioned above, the WebRTC spec forces all traffic to be encrypted. It performs the encryption through DTLS and SRTP.


DTLS

The video and audio channels being streamed need to be encrypted, This process starts with DTLS (Datagram Transport Layer Security). To really get into the geeky details, DTLS is a subset of TLS but modified to be used with UDP connections. Both peers on either side of the P2P connection need to have keys which will be used to encrypt and decrypt the data. So there needs to be an exchange of those keys. DTLS exchanges the first keys to be used to encrypt and decrypt the stream at both peers. Then the browser is able to start streaming the video and audio over SRTP.


SRTP

SRTP (Secure Realtime Protocol) is the transport protocol that WebRTC uses to send and receive encrypted video and audio. Part of the way SRTP works is that the encryption key in use changes periodically. As such, DTLS needs to update that as needed and will do that through SRTP. The two protocols work closely in tandem to keep the stream secure throughout the session, so they are often referenced together as DTLS/SRTP.

One thing to note: most of the focus here was describing the peer connection from a broadcasting client connected to the server peer. However, everything described above works in reverse as well. The Red5 Pro server relaying streams out to WebRTC subscriber clients watching the video is done in the exact same manner ensuring that each stream is perfectly secure.

As detailed in this post, WebRTC is automatically configured to establish a secure connection to stream encrypted data across a P2P connection. The WebRTC security architecture can be implemented in multiple regions across a variety of cloud platforms including a simultaneous cross-cloud solution. These intrinsic properties make WebRTC a good choice for secure streaming without the need for implementing costly third party solutions or time consuming in-house solutions.

Find out more about securing your streams by contacting us through an email to info@red5.net or scheduling a call.