The global shift to real-time video streaming will soon be drawing new support from advances in hardware acceleration that enable radical reductions in the time consumed processing content for transmission and reception over next-generation pipelines.
As evidenced by performance metrics attained by distributors operating over Red5 Pro’s Experience Delivery Network (XDN) infrastructure worldwide, it’s now possible to implement interactive live video streaming that reduces end-to-end latencies at any distance to no more than 400ms and often much less. But, in an environment where every few milliseconds matter, distributors have much to gain wherever hardware acceleration can be used to contribute incremental reductions in latency.
Embedded and standalone accelerators can shave precious milliseconds from real-time video streams by:
- Streamlining output processes executed by professional, smartphone, and drone cameras.
- Achieving greater efficiency in live content production processes.
- Offloading compute-intensive transcoding, feature aggregation, and other processing requirements to the cloud.
- Expediting decoding in end devices.
In some cases, the benefits to real-time streaming will occur as a natural consequence of generational changes in the chipsets powering the digital economy. But much can also be done to expedite and maximize those benefits through proactive collaboration between microprocessor suppliers and the providers of real-time streaming technology.
Notably, Red5 Pro is working with leaders in hardware acceleration to ensure these advances are applied with maximum benefit to real-time interactive video streaming over XDN infrastructures. We’ll soon have more to say about these efforts, but, for now, we’ll explore some of the more significant developments pertaining to how hardware acceleration can contribute to minimizing latency with live-streamed content.
The Market Shift to Hardware Acceleration
One indicator of how important hardware acceleration has become to performance across all computerized applications on the internet and beyond can be seen in the latest market reports from researchers. For example, Market Research Future predicts that the market for microprocessors and line cards devoted to hardware acceleration will surge at a 49% cumulative annual growth rate (CAGR) over the next four years, reaching $50 billion in 2025 compared to just $3.12 billion in 2018.
But hardware acceleration doesn’t just apply to the standalone components used in computerized appliances and devices to offload a given set of processes from the primary CPU. Circuits devoted to hardware acceleration related to one or more specific functions are now intrinsic to CPU core processors and integrated systems on chips (SoCs) permeating the digital ecosystem.
For example, hardware acceleration has long been a major focus of Intel’s efforts to expand the market appeal of its Xeon line of core CPU processors. This began with the incorporation of the functions performed by graphic processing units (GPUs) into Xeon cores several years ago and led more recently to the implementation of circuitry dedicated to the Quick Sync Video transcoding solution. Next up is the allocation of GPU acceleration to AI algorithmic functions in the Sapphire Rapids Xeon release scheduled for the first half of 2022.
But even Intel, with plans to introduce the Xe-HPC GPU standalone accelerator by late 2022, is acknowledging that separate processors devoted to hardware acceleration are a fact of life in the digital age. CPU core giant AMD, too, has acted on that realization with the pending acquisition of Xilinx, the leader and original developer of hardware acceleration based on field-programmable gate array (FPGA) technology.
New product releases from Xilinx, Nvidia, and other producers of microprocessors and SoCs devoted to hardware acceleration can be employed in many ways to reduce latency in processing for use cases that depend on real-time video streaming. One major contribution has to do with cutting live video transcoding latencies to 25ms-50ms from the 100ms-1sec or higher latencies incurred with software codecs running on CPUs.
The Roles Played by FPGAs and GPUs in Video Processing Acceleration
One approach to cutting transcoding latency is achieved with processing acceleration executed by purpose-built appliances running on Xilinx’s Zynq SoCs. Xilinx touts its Zynq UltraScale+ Mutiprocessing Platform (MP) as the industry’s lowest-latency single-chip, broadcast-grade codec system for 4K UHD. The SoC supports both SMPTE ST 2110, the broadcast production standard used with multimedia streaming, and AIMS IPMX, the standard for streaming professional A/V content.
Xilinx has created a real-time video server reference architecture to facilitate the use of these processors in datacenter appliances, including those supporting Kubernetes-managed virtualization. For example, the firm’s 1 RU Ultra-Low Bitrate Optimized Video Appliance operating with a maximum of eight Zynq-powered Alveo U50 accelerators on board can employ High-Efficiency Video Coding (HEVC or H.265) to transcode each of seven live 1080p60 channels in eight adaptive bitrate (ABR) profiles. Along with lower latencies, this can produce significant savings in power usage, footprint, and overall costs.
Nvidia stepped up its support for faster transcoding in 2012 with the implementation of the NVENC accelerator engine in its GeForce GPUs, which were introduced for high-power rendering in the gaming market and now permeate NVIDIA’s line of applications processors, from core encoders to accelerators embedded in smartphones. One test comparing H.264 encoding performances found that transcoding executed by NVENC is more than twice as fast as encoding supported by GPU acceleration available with AMD CPU cores and nearly twice as fast as encoding with Intel Quick Sync.
Hardware Acceleration Support for XR
Nvidia has also plunged into the XR market with GeForce RTX and Quadro RTX GPUs supporting high-speed, advanced rendering with AI assistance across a wide range of virtual reality (VR) and augmented reality (AR) use cases. Notably, the firm’s RTX architecture, widely used in 2D game production and rendering, supports real-time ray tracing, a visual effects technology traditionally used in movie-making and other non-real-time content productions.
Ray tracing estimates the spatial dispersion of light rays cast through pixels of an image plane to simulate realistic light and shadow effects that aren’t captured by cameras or, in the case of animated scenes, aren’t normally factored into the creation process. The RTX platform brings these capabilities into real-time 3D production and rendering for XR applications through Deep Learning Super Sampling (DLSS). This innovation reduces the volume of pixels to be processed in conjunction with AI-driven up-sampling of frames to achieve undiminished quality at the required level of resolution.
Nvidia has made it possible to offload these compute-intensive GPU processes to the cloud through its CloudXR platform, which can be used to bring these capabilities to untethered VR headsets and AR eyewear via real-time streaming over wired and wireless networks. In late October, the firm teamed up with AT&T, Ericsson, and VR producer Wevr with technology support from Dreamscape, Dell, Qualcomm, and VMware to demonstrate what was billed as the first use of 5G to support cloud delivery of a VR experience involving multiple simultaneous users in the same location.
The content delivered a level of photorealism that would otherwise have been beyond the rendering capabilities of the Qualcomm XR2 VR headsets used in the test. This points to what can be accomplished with real-time networks serving the untethered user base, which comprises the lion’s share of the market. As Wevr EVP and co-founder Anthony Batt noted in a press release about the demonstration, “By moving the RTX graphics processing to the edge, our developers can deliver a new level of VR experience to users.”
Such capabilities can move beyond the demonstration stage to commercial implementation as 5G operators embrace what can be done with XDN infrastructure in the context of using their edge-based datacenters as onramps to the cloud. Red5 Pro has taken a major step in this direction as the first provider of real-time interactive streaming technology to be approved for deployment in AWS Wavelength Zones, which are housed in carrier facilities to aggregate local cell tower traffic for direct access to AWS cloud facilities.
By deploying XDN infrastructure in Wavelength Zones, applications developers, service providers, and the carriers themselves can deliver interactive video streams at end-to-end latencies well below 50ms between any user and source locations served by these AWS on/off ramps. Other cloud operators, including Microsoft Azure and Google Cloud, are following Amazon’s lead with their own 5G edge strategies, which should soon lead to more paths for real-time interaction between users and cloud-based hardware acceleration in XR and other use cases.
Hardware Acceleration in Mobile Devices and Cameras
Hardware acceleration is also having a huge impact on what can be accomplished with real-time streaming in other mobile technology-related scenarios. In this domain, where achieving more functionality with ever greater component miniaturization is the top priority, major advances in hardware acceleration are continually worked into succeeding generations of SoCs, image sensor processors (ISPs), and digital signal processors (DSPs).
For example, Qualcomm has applied the latest iteration of its Vision Intelligence design platform to newly released blueprints for purpose-built QCS605 and QCS603 SoCs. These innovations are aimed at fueling the use of AI and accelerated GPU capabilities to support imaging-related applications like faster, more accurate processing for video streaming from smartphone and surveillance cameras.
The chipsets are also slated for use in edge computing, which, in the case of video surveillance, can enhance real-time stitching and AI-assisted analysis of video streams from multiple cameras. As explained in this white paper, real-time streaming over XDN infrastructure has made it possible for first responders and military commanders to use such capabilities for guidance in dealing more effectively with unfolding developments across a wide field of action.
Hardware acceleration in smartphone chipsets is also enhancing performance on 5G devices, as in the case of the Dimensity 5G SoCs developed by MediaTek. Latency-reducing contributions to live streaming are intrinsic to many functionalities incorporated into these SoCs, which are also available for use in PCs, routers, and mobile hotspots. Benefits include hardware-accelerated 4K and HDR video processing and resource management in video production and advanced networking engines that can be used to optimize output for distribution in real-time use cases.
Hardware Acceleration in Use Cases Supported by XDN Infrastructure
Hardware acceleration has already become a factor in situations that rely on XDN infrastructure. For example, Skreens, which offers a platform-as-a-service (PaaS) supporting a wide range of real-time interactive video streaming applications, relies on cloud-based Xilinx FPGAs that have been pre-integrated to work with XDN technology.
The Skreens PaaS serves as a streaming video engine that anyone can use to combine multiple video, graphics, and text feeds in support of interactive experiences that can be personalized on a per-user basis. Along with providing support for new approaches to watch parties, feature-enrichment, sports betting, and other broadcast-quality interactive entertainment experiences, the PaaS is supporting applications designed for a wide range of non-entertainment segments, including smart cities, enterprise collaboration, telemedicine, and residential security.
Skreens’ operating code defines processes that require adaptability and high-speed execution, which is best accommodated by cloud servers that utilize FPGA technology, notes Skreens founder and CEO Marc Todd. Taking advantage of the multi-cloud versatility of XDN architecture, Skreens sometimes engages with cloud providers who have yet to implement FPGA support, in which case Todd encourages them to do so.
“Our code isn’t written for regular processors,” he says. “We’re giving producers more horsepower to make more sophisticated productions.”
The emergence of a new generation of use cases that depend on real-time streaming infrastructure leaves no room for tolerating the processing latencies common to microprocessors widely used in the early years of live video streaming. Fortunately, there’s no need to, thanks to the types of advances in hardware acceleration discussed here
Developers and service providers can proceed with plans to bring these new applications to market knowing that Red5 Pro is taking steps to ensure that the best options in hardware acceleration are available to XDN users wherever such needs arise. To learn more about XDN architecture, use cases, and the role of hardware acceleration, contact firstname.lastname@example.org or schedule a call.