Businesses that depend on live-streamed video have an opportunity to employ real-time video streaming infrastructure as a much-needed solution to the challenges posed by surging volumes of metadata entering their pipelines.
As we’ve chronicled in numerous blogs and white papers, the limitations of one-way, latency-intensive video streaming infrastructure are driving ever more video applications and service providers to activate interactive streams operating at imperceptible latencies below 400ms. At the same time, there’s growing concern over how hard it is to synchronize high-value enhancements to primary payloads with reliance on HTTP-based streaming.
Providers of Media & Entertainment (M&E) services to internet-connected devices need to do whatever they can to ensure success in a crowded, churn-intensive marketplace. When it comes to live-streamed content, they want to exploit metadata that activates the execution of both multi- and per-user applications, such as:
- AI-assisted content recommendations.
- In-stream e-commerce purchase options.
- Responses to linguistic and regulatory nuances impacting globally distributed video.
- On-the-fly matchups of stored clips with live content.
- Multiscreen viewing options.
- Dynamic ad insertion.
- Seamless integrations of real action with animation.
- In-stream access to text and video chat feeds.
More broadly, the surge in metadata innovation stands to revolutionize a vast range of live video applications pervading business, academic, and government operations. And, perhaps most consequentially, the commands and other elements initiated by metadata connected to processes in the cloud will be instrumental in shaping user experience (UX) in live-streamed extended reality (XR) applications with major implications for whatever the Metaverse turns out to be.
But precise on-the-fly synchronization of ancillary metadata streams injected on a per-session basis from multiple sources remains impeded by the complexities of client-server communications intrinsic to adaptive bitrate (ABR) streaming over HTTP. Not only is this a barrier to accomplishing commonly pursued goals; it’s holding back the execution of a new generation of metadata-enabled applications flowing from the cloud.
Fortunately, metadata synchronization ceases to be an issue when live streaming is instantiated on Red5 Pro’s Experience Delivery Network (XDN) platform, including when the platform ingests and transmits content in the multiple bitrates of an ABR ladder. In an environment where complexity is the bane of consistency, XDN’s cross-cloud architecture affords anyone in any market sector who wants to benefit from metadata-driven enhancements to live-streamed video, audio, or synchronized A/V a much simpler, more reliable way to do anything contemplated with traditional streaming. Moreover, they can do things that are fundamentally beyond the capabilities of one-way HTTP streaming infrastructure.
As shall be seen, the XDN platform makes it easy to synchronize any metadata descriptor or enhancement feature with primary payloads distributed on a market-wide or personalized basis over real-time streams flowing in any direction to any number of endpoints at any distance. The automated metadata timing processes incorporated into Red5 Pro SDKs ensure the metadata elements, delivered as overlays, are precisely paired frame by frame with the primary content, either as part of the streams generated to all endpoints or, in the case of personalized metadata, at the instant of rendering on client devices.
This is a vital consideration even for providers who may not be feeling any immediate pressure to implement real-time streaming. Indeed, the need for a better approach to exploiting the benefits delivered by metadata tied to a new generation of cloud-based functionalities permeates the live-video streaming ecosystem.
Intractable Barriers to Metadata Synchronization in HTTP-Based Live Streaming
The metadata label applies to any descriptions or processes applicable to any audio, video, or data track in the composite stream flow. Timed metadata elements that are time-stamped in sync with A/V keyframes during encoding trigger action by devices at playback or, in the case of HTTP-based streaming, by remote servers connected to edge appliances hosting just-in-time edge packagers.
The need to ensure every time-sensitive metadata element is placed frame-accurately in live content flows has been a major focus of development activities supporting live streaming over traditional content delivery networks (CDNs). But the packaging modes used in HTTP-based adaptive bitrate (ABR) streaming weren’t designed to accommodate the multiplying types of metadata-triggered applications that producers can now inject into live streams from remote cloud locations.
Their ability to perform these highly granular operations was expanded by the M&E industry’s transition from Serial Digital Interface (SDI) technology to IP-based production with the adoption of the SMPTE 2110 standard five years ago. The standard supports the aggregation of varying combinations of video, audio, and metadata essences into coherent multi-layer flows through uniform synchronization of production device clocks to a GPS-locked reference time domain. This allows metadata-triggered functions at remote locations to be synchronously fed into production workflows over real-time data connections.
But per-user personalization of applications tied to individual identity and actions remained out of reach with contribution flows broadcast from production studios for distribution over legacy TV channels. The industry hoped this would be rectified with the emergence of internet unicast streaming, which would finally allow service providers to introduce personalized UX by leveraging time-stamped metadata elements to support enhancements on a per-session basis.
However, notwithstanding some progress in this direction, these expectations haven’t been met. In comparison to SMPTE 2110 broadcast production, synchronizing time-sensitive metadata elements with core A/V payloads proved to be much harder with the packaging of content for per-session streaming over conventional HTTP-based streaming infrastructure.
In this scenario, the packaging process must accommodate all the complexity that characterizes the preparation of ABR manifest master files. Master manifest files and their subsets, identified as Master Playlists in HLS and as Media Presentation Descriptions in DASH, provide client players metadata pertaining to things like the audio and video codec used with the content, available bitrate profiles, segment sizes, and sequencing, encryption mode, and details relating to captioning, subtitles, advertising, and other functions embodied in ancillary tracks comprising the full presentation set.
The industry has made significant progress toward simplifying operations over disparate streaming modes, starting with DASH, which unified packaging for Microsoft’s Smooth and Adobe’s HTTP Dynamic Streaming (HDS) formats and paved the way for a new generation of DASH-optimized devices. But Apple has refused to make the changes in HLS that would be required for compatibility with DASH.
Apple did cooperate with another step toward consolidation that applies to unification at the transport container level. By adopting the Common Media Application Format (CMAF) standard, the industry made it possible to encode an A/V payload in multiple bitrate profiles with uniform segmentation utilizing the fragmented MP4 (fMP4) container for streaming over either HLS or DASH.
But the time-stamped metadata triggers to personalized applications are still positioned in accord with how the disparate manifest formats are operated. Whichever streaming format is employed, the metadata elements must be presented and synchronized together for direct ingestion into origin servers to ensure precise, fast playback on client devices.
Whether or not CMAF is in play, and, by the results of one recent survey, the standard has yet to reach 50% adoption, a lot can and, too frequently, does go wrong. While transcoders are designed to ensure a new ABR segment starts with the instantiation of a new keyframe when a metadata marker is detected, this doesn’t always happen as intended.
Accuracy in such adjustments is especially challenging with the use of what’s known as Chunked Transfer Encoding, a process adopted with HTTP 1.1 and supported in CMAF that divides keyframe-marked fragments into shorter “chunks” for instant streaming without waiting for the full fragment to load in the streamer. Performing alignment of keyframes and fragments with metadata markers with the processing of chunks introduces additional strain on encoders’ CPU performance.
Many other issues can disrupt synchronization. For example:
- The delay caused by the resending of a dropped segment of the primary content as executed by the Transfer Control Protocol (TCP) used with HTTP can throw things out of sync.
- Spikes in usage can overload transcoders, causing frames to be skipped.
- Variations in frame durations of ancillary content may be incompatible with the keyframe intervals set with the primary content.
- Variations in keyframe intervals resulting from encoder adjustments to metadata markers can throw the flow out of sync with the established frame rate.
The XDN-Enabled Approach to Synchronizing Metadata
A new world of possibilities unfettered by the imprecision of HTTP streaming technology opens for any video use case supported by real-time streaming over XDN infrastructure. As described in greater detail by this white paper, the XDN platform is built on a cross-cloud architecture that enables all endpoints to simultaneously receive live-streamed content traversing any distance at latencies no greater than 200ms-400ms and even lower in many instances.
XDN infrastructure is built on automatically orchestrated hierarchies of Origin, Relay, and Edge Nodes operating in one or more private or public cloud clusters. The platform makes use of the Real-Time Transport Protocol (RTP) as the foundation for interactive streaming via WebRTC (Real-Time Communications) and Real-Time Streaming Protocol (RTSP). In most cases, WebRTC is the preferred option for streaming on the XDN platform by virtue of its support by all the major browsers, which eliminates the need for device plug-ins.
There are also other options for receiving and transmitting video in real time when devices are not utilizing any of these browsers. RTSP, often the preferred option when mobile devices are targeted, can be activated through Red5 Pro iOS and Android SDKs. And video can be ingested onto the XDN platform in other formats as well, including Real-Time Messaging Protocol (RTMP), Secure Reliable Transport (SRT), and MPEG-Transport Protocol (TS). The XDN retains these encapsulations while relying on RTP as the underlying real-time transport mechanism.
The XDN platform also provides full support for the multi-profile transcodes used with ABR streaming by utilizing intelligent Edge Node interactions with client devices to deliver content in the profiles appropriate to each user. And to ensure ubiquitous connectivity for every XDN use case, the platform supports content delivery in HTTP Live Streaming (HLS) mode as a fallback. In the rare instances where devices can’t be engaged via any of the other XDN-supported protocols, they will still be able to render the streamed content, albeit with the multi-second latencies that typify HTTP-based streaming.
No matter what type of protocol is used to interface with receiving devices, all metadata elements that aren’t directly incorporated into the primary A/V stream frames are delivered as overlays frame-accurately synchronized with the primary streams. These metadata overlays can be captured in the live production process to be delivered with the content to all recipients over the XDN infrastructure or they can be delivered as individually personalized enhancements that are rendered in sync with the primary content by receiving devices.
The SDKs customers use to implement XDN infrastructure automatically support the use of the Action Message Format (AMF) to associate metadata elements delivered in real-time from any source with time-coded metadata markers embedded in the appropriate primary content frames. This invisible code triggers the rendering of the overlaid metadata elements through the processes supported by device browsers in the case of WebRTC or via the use of WebSocket technology when other protocols are used to connect with end devices.
The HTML5 SDK used to implement WebRTC-based XDN use cases enables the synchronized rendering of all the types of applications supported by HTML5 extensions. With the use of other XDN SDKs, including the iOS and Android versions, WebSockets are employed to establish persistent connections with servers that allow the XDN to stream metadata elements to endpoints in real-time synchronicity with the primary content.
XDN-Synchronized Metadata in Commercial Operations
The multi-cloud XDN platform is providing support for multidirectional, highly scalable real-time streaming across a vast range of consumer, business, government, and institutional use cases. Any such implementation can benefit from the simple XDN approach to metadata synchronization to execute time-sensitive applications with far greater precision and consistency compared to HTTP-based networking, including applications that are either impossible or too hard to implement over these legacy connections. It doesn’t matter whether the situation calls for one-to-many, many-to-many, or many-to-one connectivity at any moment in time, the technological basics are the same.
Critically, M&E industry customers operating on XDN infrastructure can use these capabilities to accommodate the growing trend toward remote collaborations in sports, esports, concerts, and other live-streamed productions. Using core studio production facilities to process multi-directional transmissions of video, audio, and metadata files from field locations introduces new levels of versatility with faster turnaround as well as significant cost savings.
No matter how the live-production operations are configured, the range of metadata enhancements available to producers over XDN infrastructure is unlimited, allowing them to move the needle on content enhancements beyond previous norms. For example, many services are providing users access to highlights, graphics, and data related to what they’re viewing or whatever else interests them. And some innovators are taking advantage of streaming synchronicity to allow users to curate their own viewing experiences by switching between streams from multiple camera angles.
Other producers are taking advantage of extraordinarily low latency to incorporate betting and polling applications into live sports and similar programming. Their viewers all see the same content and prompts for action simultaneously with no perceptible difference between when the action is viewed in person and when it’s seen online. Similarly, online auction services are allowing remote bidders to participate in the on-site bidding.
The implications for e-commerce are profound. A user who downloads a provider’s live shopping app activates purchase options with objects embedded in the primary stream by pointing a smartphone screen at a connected TV. No QR code is required because the invisible embedded metadata marker triggering the activation of the purchasing option can be individualized by XDN intelligence as a unique mark identifying each user.
Such capabilities apply as well to any other type of interactive app that providers might want to use with this type of embedded metadata marker. And they can also be used as a simple solution to watermarking live content.
All the benefits of metadata synchronization over XDN infrastructure apply to any real-time group activity, such as watch parties, multiplayer game competitions, and content sharing on social media platforms. Beyond the M&E and casual social realms, the real-time streaming platform creates opportunities to employ metadata enhancements in remote engineering, medical, training, and any other work-related collaborations across dispersed locations. The same applies to the interactive experiences that have contributed to the normalization of online and hybrid tradeshows, company meetings, and other virtual gatherings.
The opportunity to exploit the full power of metadata also extends to all the versions of networked XR use cases, including virtual, augmented, and mixed reality (VR, AR, MR) and the new free-standing holographic capabilities developers are incorporating into these use cases. The combination of real-time interactive streaming with tight synchronization of metadata is absolutely essential in any effort to bring visions of a pervasive Metaverse to life.
Finally, it’s important to note that one of the major benefits unleashed by XDN architecture relates to the explosion in AI usage across all these scenarios. AI applications like object recognition, event detection, motion estimation, identification of usage and other behavioral patterns, and image restoration are feeding analytics engines with data that can be put to use in real time to augment the personalization of UX and innovations in asset management, application development, content production, quality assurance, advertising, e-commerce, and much else.
Interactive real-time streaming as enabled by the XDN platform is transforming online engagement across all segments of society. The benefits extend not only to enabling video interactions among people at work and in personal life but to enriching those interactions to new levels of experience through the synchronization of metadata.
To better understand all the ways metadata can add value to use cases on the XDN platform contact email@example.com or schedule a call.