Hardware-Based Mixing &Transcoding with AMD Xilinx

WebRTC RTMP
SHARE

Red5 partnered with AMD to add low latency hardware-based transcoding and mixing capabilities to Red5 Pro servers. The solution leverages the AMD Xilinx® Alveo™ U30, a low-profile, PCIe-based media accelerator card that delivers a high-density real-time transcoding solution for live streaming video service providers, OEMs and Content Delivery Network (CDNs). Red5 integrated the Xilinx Media… Continue reading Hardware-Based Mixing &Transcoding with AMD Xilinx

Red5 partnered with AMD to add low latency hardware-based transcoding and mixing capabilities to Red5 Pro servers. The solution leverages the AMD Xilinx® Alveo™ U30, a low-profile, PCIe-based media accelerator card that delivers a high-density real-time transcoding solution for live streaming video service providers, OEMs and Content Delivery Network (CDNs).

Red5 integrated the Xilinx Media Accelerator (XMA) API into its Transcoder and Mixer implementations to utilize the hardware for video encoding. Cloud-based deployments will be supported using Amazon EC2 VT1 instances. Instances are powered by up to 8 AMD Xilinx Alveo U30 media accelerator cards designed to improve real time video transcoding and deliver low cost transcoding for live video streams.

Transcoding

The Red5 Pro architecture includes a Transcoder node that is capable of transcoding an ingest stream into multiple variants with different bitrates and/or resolutions. Prior to the integration with AMD hardware, the transcoding solution was CPU-based and relied on Cauldron, a real-time stream processor written in native code that enables fast encoding/decoding of video and audio streams. The transcoding Brew, or module of native code, initiates and configures native stream scaling and compression resources at runtime via JNI. In particular, the MIRV (Multiple Independent Re-encoded Video) processor accepts h.264 video input and decodes, splits, rescales, and recompresses it in real time to produce multiple bitrate sets.

The new Red5 Pro Transcoder extended the implementation to support hardware-based transcoding powered by AMD Xilinx Alveo U30 devices. The solution initializes an AMD accelerator and configures it for creating the required stream variants. The figure below shows a diagram of the system.

Figure 1: Accelerator architecture for a transcoding solution.

Red5 Pro configures the U30 device to provide a Decoder, Scaler, and a set of Encoders. Red5 Pro provides encoded h.264 video packets to the Decoder that decodes and feeds them to the Scaler that generates the ABR variants. Then, Encoders encode each of the variants into h.264 and provide them to Red5 Pro. Red5 Pro creates a ProStream instance for each variant, where ProStream is a generic Red5 Pro representation of a live stream, that is fed with the h.264 packets generated by the corresponding encoder. Variants are forwarded between Red5 Pro nodes in the cluster until they reach Edge nodes that use Receiver Estimated Maximum Bitrate (REMB) messages to determine the best variant to provide to each viewer of the stream.

Mixing Live Streams

Red5 Pro supports two mixer implementations that can acquire multiple live streams and compose them into a single output stream. One implementation is based on a native Cauldron Brew written in C++, while the other on the Chromium Embedded Framework (CEF).

The Brew-based implementation acquires the raw video frames of the live streams and combines them into a new frame according to a layout. The resulting frames are re-encoded and made available as a single mixed live stream. The CEF-based implementation loads an HTML5 web page that subscribes to the Red5 Pro-generated live streams using WebRTC, organizes them according to a layout defined in HTML, CSS, and JavaScript and publishes the resulting composition as a new live stream to be pushed through a Red5 Pro cluster.

Red5 Pro integrated the Brew-based mixer with AMD’s U30 cards to use hardware acceleration to encode the final composite stream, the stream requiring the most resources when using the existing software-based transcoding approach. The figure below shows a diagram of the solution.

Figure 2: Accelerator architecture for a Brew based Mixer solution.

A Red5 Pro Brew acquires the encoded video frames and draws them on a canvas representing the output stream. The raw frames from the canvas are passed to the AMD hardware to be encoded into h.264. The Brew instantiates a ProStream instance for the mixed stream and feeds it the encoded packets generated by the accelerator. In this way, the mixed stream is made available in the Red5 Pro pipeline for clients to subscribe to.

The solution works in a similar way for the CEF-based Mixer. In this case, the frames captured from CEF are encoded using AMD Accelerators to feed the ProStream instance for the mixed stream. Red5 Pro is continuing to evaluate ways to further improve the efficiency of the mixing process that may eventually include using decoders in the AMD Accelerators to decode the live streams that are part of a composite stream.


Reach out to Red5 to discover more about hardware support for Transcoder and Mixer nodes and how you could benefit from them. Contact us at info@red5.net or schedule a call.