Dynamic Composite Live Streams with the XDN Server-Side Mixer

Composite video
SHARE

Red5 Pro’s Experience Delivery Network (XDN) introduces a new functionality: the ability to create a mixed live stream that includes the audio and video of other live streams. Mixer nodes, a new XDN node type, create mixed streams based on a visual layout that can be dynamically updated in real time. Mixers enable users to… Continue reading Dynamic Composite Live Streams with the XDN Server-Side Mixer

Red5 Pro’s Experience Delivery Network (XDN) introduces a new functionality: the ability to create a mixed live stream that includes the audio and video of other live streams. Mixer nodes, a new XDN node type, create mixed streams based on a visual layout that can be dynamically updated in real time. Mixers enable users to create large compositions with hundreds of live streams or video conferences with a single return stream. For video conferences, Mixers are coupled with a mix-minus audio engine that provides participants with a mix-minus audio stream of their conference track.

Read on to understand how the Mixer node works and the use cases it supports.

XDN Mixer

An XDN Mixer combines multiple source streams into a single live stream that includes the video of one or more source streams, as well as a mixed audio track. The Mixer loads a webpage that subscribes to the source streams, creates a display according to a user-defined layout in HTML, CSS, and/or JavaScript, and publishes a live stream that includes the content and audio of the page, like a screen share stream. That is, all the contents displayed in the page are displayed also in the resulting mixed stream. Similarly, all the unmuted audio in the page is also unmuted in the resulting mixed stream. Finally, any changes to the layout or audio in the page are reflected with minimal latency in the mixed stream.

XDN supports large compositions with hundreds of streams by chaining together Mixer nodes. In such architecture, each Mixer in each layer loads a page that subscribes to multiple live streams, combines them into a custom HTML5 layout, and publishes the resulting blended stream to a Mixer in the next layer. The last layer includes a single Mixer that publishes the composite stream to the XDN Cluster for viewers to subscribe to. A high level diagram of the architecture with two Mixers is shown below.

High level diagram of the server side Mixer solution that uses two layers to create a single stream from six input streams.

The figure above shows a simplified example where the streams of two sets of three viewers are mixed together into a single mixed stream. Each Mixer in the top layer loads a page that subscribes to the viewers’ streams to pre-mix into a single stream, and then generates the streams M(1,2,3) and M(4,5,6). The pre-mixed streams are published to another Mixer that mixes them into the final stream M(M(1,2,3), M(4,5,6)). Once published to an XDN Cluster, the stream can be transcoded to multiple variants for clients to subscribe to.

The HTML5 page loaded into each Mixer determines the composition created by the Mixer. For dynamic compositions that need to change in real time, the page can interface with a backend service, for instance a WebSocket server, to receive commands to update its layout in real time as driven by a Producer or Editor. For simpler use cases that include a single Mixer, the HTML5 page could include a grid that automatically grows/shrinks as streams are added to or removed from the composition. Specific livestreams can also be muted/unmuted. When part of the streams are muted, their audio will not be in the final mixed stream.

Mix-minus Audio Engine

The XDN Mixer enables an efficient video conferencing solution where participants consume a single video stream that includes only the audio of the other participants. XDN uses a mix-minus audio engine to generate the audio track for each participant. Doing so prevents any audio feedback that would be present if all audio tracks were mixed into a single track and returned to participants. The mix-minus engine operates by calculating in real time the participant’s tracks with the highest volume, creating up to four mixed audio tracks and returning one of them to each conference participant. Of the mixed audio tracks, one will include the audio of all loudest participants, while the others will include a mix of the loudest participants minus one. In this way, all non-speaking participants will receive the track with the complete mix, while speaking participants receive the track that does not include their own audio, thus avoiding any feedback. The current implementation is limited to WebRTC clients using Chrome and requires the live streams of a conference to be published to the same XDN Origin node that creates the mix-minus audio tracks for participants.

MIXER USE CASES

Small Audio and Video Mixes

The most basic use case consists of using a single Mixer to create a mixed stream that includes the audio and video of several streams. A diagram of this use case is shown below. Possible scenarios include control rooms displaying a mix of camera streams or small conferences where a single live stream is required for view-only participants. For the latter scenario, the conference participants join the conference by publishing their stream and subscribing to the other discrete participants’ streams, thus not requiring any Mixer or mix-minus audio functionality.

Architecture diagram for small audio and video mixes.

Large Audio and Video Mixes

This use case consists of mixing together a large number of live streams into a single composite stream using a set of stacked Mixers, where some compose the client published streams and others pre-composited streams published by other Mixers. A diagram of this use case is shown below. Possible scenarios include control rooms displaying a mix of a large set of camera streams and event spaces generating large video walls for performers. For video wall solutions, the live streams can be managed remotely by a Producer that can dynamically add, remove, mute, or unmute the source streams as needed.

Architecture diagram for small audio and video mixes.

Small Conference With Audio and Video Mix

XDN Mixers are the best solutions for creating efficient video conferences that minimize the bandwidth required of each participant. For this use case, a single Mixer creates a composite live stream consumed by both interactive conference participants and view-only customers. This stream includes the video and audio of all participants. However, given the audio is a complete mix, conference participants cannot use it without creating feedback. Instead, they consume a separate dedicated track provided by the audio mix-minus engine. Therefore, for a video conference, a Mixer creates the composition with everyone’s video and audio while the mix-minus audio engine provides participants with a dedicated audio track that prevents audio feedback. A diagram of this use case is shown below:

Architecture diagram for a small conference where an audio and video mix and a mix minus audio implementations are used.

Streaming Branded Content To Third Party Platforms

This use case consists of creating and publishing a mixed and branded live stream to a third-party platform like Facebook Live, YouTube Live, or Twitch. The Mixer loads an HTML5 page that subscribes to the relevant live streams and displays them with the design and branding required using HTML5, JavaScript, and CSS. The Mixer publishes the page as a mixed stream to the Red5 Pro Cluster, and the Red5 Pro Social Media Pusher feature is used to restream it to the third-party destinations. A diagram of the solution is shown below:

Architecture diagram for a solution that can mix several live streams and create a branded composition for viewers watching on a third party platform.

Composite Video On Demand (VOD) Recordings

Here you can use a Mixer to composite multiple live streams into a single one that can be used for VOD. If a Mixer is already used, as it would be in any of the previous use cases, Red5 Pro can be configured to automatically record every live stream, including the composite live streams. On the other hand, if the live pipeline does not already include a Mixer, then one can be added to generate the mixed recording for VOD. In this way, once the mixed stream is published to the Cluster, a Red5 Pro server will automatically record it and upload it to a cloud storage location for later consumption using VOD.

Conclusion

The XDN Mixer node enriches the offerings of Red5 Pro’s XDN while facilitating use cases that would not be possible otherwise. It enables conferences with minimal bandwidth requirements where each participant subscribes only to one stream, allows for combining hundreds of streams into a single one, and  creates opportunities for managing branded live streams and custom recordings of multiple streams at once. Like other XDN nodes, Mixers can be deployed in several regions across cloud platforms and private data centers, and are monitored by Stream Managers that guarantee their health and correct operation of XDN.

Learn how our XDN and its Mixer nodes can be employed for your use case by contacting info@red5.net or scheduling a call.