[4th series / 1st] Media service-Live video streaming delay (latency)

By uavtechnology
02 Apr 22

Although the needs for video distribution such as sports broadcasts, games, news distribution, and TV programs are increasing, there may be some people who are having trouble selecting distribution delays and optimal services. We will explain the common challenges and solutions when considering media services in four parts as follows. The first theme is "Defining and Measuring Latency".

Part 1: Definition and measurement of latency (this article) Part 2: Recommended optimization of encoding, packaging, and CDN delivery Part 3: Recommended optimization of video player Part 4: Reference architecture and test results

Part 1: Definition and measurement of latency

Why is latency an issue with live video streaming? Whether it's TV content such as sports, games, news, or pure OTT content such as eSports or gambling, when content delivery is time-sensitive, you can't be late. If you wait a long time, you will lose interest. Waiting times make you a second-class in the world of entertainment and real-time information. An easy-to-understand example is watching a soccer match. Suppose your neighbor watches a soccer match on TV as before and yells through the wall when your favorite team (often with you) scores a goal. For over-the-top services, he has to wait 25 or 30 seconds to see the same thing. This is very frustrating and is similar to spoiling the results of your favorite singing contest with a Twitter or Facebook feed that you are monitoring along with your streaming channel. These social feeds are usually generated by the user watching the show on their TV, so the normal latency is reduced to his 15-20 seconds, but still well behind the live TV broadcast.

In addition to the competition between broadcast latency and social networks, there are other reasons why content providers want to minimize livestreaming latency. Older Flash-based applications with RTMP streaming worked well in terms of latency. However, the use of Flash in web browsers is declining. On the delivery side, his CDN no longer uses his RTMP, so content providers need to switch to HTML5-enabled streaming technologies such as his HLS and DASH, or more recently CMAF. Other content providers want to develop personal broadcast services with interactive capabilities, and in this use case there can be no 30 second delay in the video signal. In addition, anyone who wants to develop a synchronized second screen, social watching, or gambling application needs fine-grained control over streaming latency.

With regard to latency, there are usually three categories defined, upper and lower limits. These do not exactly match the broadcast latency, as they can range from 3 to 12 seconds, depending on the transport method (cable / IPTV / DTT / satellite) and the details of each delivery network topology. If the average broadcast latency is 6 seconds, it's common in this field. That is, the OTT sweet spot is somewhere in the low range of the "Low Latency" category or the high range of the "Low Latency" category. As you approach 5 seconds, you're most likely to compete efficiently against broadcast and social network feed competition. In addition, depending on the location of his OTT encoder in the content preparation workflow, the goal of reducing latency is enhanced when the encoder is located downstream of the chain.

vocabulary	High (seconds)	Low (seconds)
Latency reduction	18 18	6
Low latency	6	2
Ultra low latency	2	0.2 0.2

With HTTP-based streaming, latency depends primarily on the length of the media segment. If the length of the media segment is his 6 seconds, the player is already at least 6 seconds behind the actual absolute time when requesting the first segment. And many players download additional media segments into the buffer before they actually start playing. This will automatically increase the time to the first decoded video frame. Of course, there are other factors that cause latency, such as the duration of the video encoding pipeline, the duration of capture and packaging operations, network transmission delays, and CDN buffers. However, in most cases players make up the largest percentage of the overall latency. In fact, most players generally use passive heuristics to buffer three or more segments.

Microsoft Smooth Streaming typically has a segment length of 2 seconds, and Silverlight players typically have a latency of around 10 seconds. In DASH, it's almost the same. Most players support his 2-second segment with variable results in terms of latency. But the situation is quite different from his HLS. Until mid-2016, Apple's recommendation was to use his 10-second segment. This resulted in a latency of about 30 seconds for most his HLS players, including Apple's own player. In August 2016, Apple's technical note TN2224 stated: “We recommended a target time of 10 seconds. We don't expect to suddenly resegment all content, but we're confident that we'll be able to make better trade-offs in the next 6 seconds. It takes less than 4 seconds per segment and has a latency of 12 seconds that suddenly disappears from the screen. In most cases, the content creator followed Apple's recommendations, even if she could work with shorter segment lengths, because her iOS player wanted to avoid the danger when validating her iOS application on her App Store. But recently he has changed the game on Safari Mobile on iOS 11 with three evolutions. Live HLS stream auto-launch feature has been enabled, support for small segment periods has been significantly improved, and FairPlay DRM is supported. This means that content authors who never need to use iOS-compiled applications can reduce live latency in short media segments while delivering his studio-approved DRM-protected stream.

Some might argue that the short media segment puts a heavy load on the CDN and the player, but Microsoft Smooth Streaming is taking advantage of his 2-second segment, so it's been a real thing for many years. I've been. The next step in reducing the latency gap with the broadcast is to move to the 1 second segment. This does not actually create a large bottleneck. Of course, it doubles the number of requests considering all the HTTP overhead for headers and TCP connections, but CDNs (especially if the edge supports HTTP 2.0 and HTTP 1.1 like Amazon CloudFront). Makes it much easier to manage. And there are modern players who will benefit from wider bandwidth last mile connections due to fiber, 4G, LTE, DOCSIS 3.1 FD, and other recent connectivity advances. Experiments have shown that many players now support the short segments of 1 and 2 seconds, providing many new options for reducing latency. Finally, in both HLS and DASH, short segments are usually not a problem for encoders, packagers, and origin services throughout the chain.

Content creators who still apply his 6-second segment period, except for App Store requirements, have broadcast latencies equal to or faster than her 1-second or 2-second media segment on different players on all platforms. You can try it.

At a high level, the main operations you perform to classify your streaming solution into the "Low Latency" category are:

Let's see how to combine AWS Elemental's video solution with currently available open source or commercial players.

How to Measure Latency The first step in the latency optimization process is to know which components in the chain make up what percentage of the total latency. This guides you towards optimization priorities, regardless of workflow encoding, packaging, or playback stage. Let's start by measuring end-to-end latency.

The easiest way to measure end-to-end latency is to use a tablet running a clapperboard application, shoot it with a camera connected to an encoder, publish the stream to the origin, and deliver it to the player through the CDN. Is to do. Place the player next to the clapperboard tablet, take a picture of the two screens, and look up the timecode on each screen to find the number. You need to do it several times to make sure this is an accurate representation of the workflow latency.

Alternatively, use the AWS Elemental Live encoder with a loop file source, write the encoder time as an overlay to the video (with an encoder that uses NTP references), and compare the written timecode to a time service such as time.is in a browser window. You can also. Typically, you need to add a capture latency of about 400 ms.

【4回シリーズ／1回目】メディアサービス－ライブ動画ストリーミングの遅延（レイテンシー）

You can enable AWS Elemental Live timecode writing in the preprocessor section of the capture latency video encoding parameters. Must be enabled for each bit rate of the encode ladder.

You need to make sure that you have set the encoder to low latency mode. For AWS Elemental Live, this means selecting the Low Latency Mode checkbox in the Additional Global Configuration section of the input parameters.

Then set up a UDP / TS encoded event with a 500ms buffer in the TS output section, destined for the laptop IP.

On your laptop, open the network stream for VLC (rtp: //192.168.10.62:5011 in this example) with the: network-caching = 200 option and use the 200 ms network buffer. You can calculate the capture latency from a snapshot of the VLC window by comparing the written timecode with the clapperboard timecode.

Even if your tablet can't sync with NTP, some applications such as iOS's Emerald Time can see how much the tablet's time drift is compared to NTP. In this example, the drift is +0.023 seconds. So the clapperboard time is actually 25: 00.86 instead of 25: 00.88. The timecode written is 25:01:06 (the last two digits are the frame number), which can be converted to 25: 01.25 in 1 / 100th of a second (because it is encoded at 24fps). Therefore, the capture latency is (25: 01.25 – 25: 00.86) = 0.39 seconds. The formula is: Capture latency = write timecode in seconds – (write timecode + NTP drift t).

Encoding Latency You can also use this UDP / TS encoding event to calculate the latency caused by the video encoding pipeline. This example uses the following encoding parameters to generate broadcast-compliant quality for demanding scenes, while proposing acceptable trade-offs for inductive latency.

In this case, the tablet time is 13: 27: 19.32 and the VLC time is 13: 27: 16.75.

The latency of the encode pipeline is calculated by the following formula: (Tablet time – VLC time) – (Capture latency + VLC buffer + RTP buffer), that is, (19.32-16.75) – (0.39 + 0.20 + 0.50) = 1.48 seconds

Capture Latency Now that we know the capture latency and the pipeline latency of the encoding, let's consider the capture latency. "Ingestion latency" includes the time required to package the ingestion format and ingest it to the origin that does not apply the packaging to the ingestion stream. This is an AWS Elemental Delta with a pass-through filter, or an AWS Elemental MediaStore. We'll use HLS with a 1-second segment pushed to AWS Elemental MediaStore.

Use the shell to monitor changes in his HLS child playlist at the origin.

 $ while sleep 0.01; do curl https://container.mediastore.eu-west-1.amazonaws.com/livehls/index_730.m3u8 && date +"%Y-%m-%d %H:%M:%S,%3N"; done

Returned when the first segment “index_73020180223T154954_02190.ts” is referenced in the shell output.

 #EXTM3U[…]index_73020180223T154954_02190.ts2018-02-23 15:49:55,515

Then download the segment “index_73020180223T154954_02190.ts” to see which timecode it carries: 16:49:53:37 (UTC + 1). The difference between the current date and the timecode of the segment is 55.51 – 53.37 = 2.14 seconds. If you remove the encode latency and capture latency, package the HLS segment and isolate the time required to push it to the origin. The expression is ingestion latency = (current date – segment timecode) – (ingestion latency + encode latency). For AWS Elemental MediaStore, this is 0.27 seconds. For AWS Elemental Delta, the same calculation would take 0.55 seconds.

Repackaging Latency You can apply the same approach to AWS Elemental Delta and AWS Elemental MediaPackage and add the previously calculated ingestion latency to calculate the time required to repackage the ingested stream. The formula is: Repackage Latency = (Current Date – Segment Timecode) – (Ingestion Latency + Encode Latency + Ingestion Latency) For AWS ElementalMediaPackage (because there is no easy way to measure it, the ingestion latency is the same as AWS Elemental Delta) Assuming), if you want to output the HLS 1 second segment from the HLS 1 second capture, the latency of repackaging is (57.34 – 54.58) – (0.39 + 1.48 + 0.55) = 0.34 seconds. For AWS Elemental Delta, (26.41 – 23.86) – (0.39 + 1.48 + 0.55) = 0.41 seconds.

Delivery Latency The same approach applies to delivery. That is, a transfer from the origin to the CDN edge. If the origin does the repackaging, delivery latency = (current date – segment timecode) – (ingestion latency + encode latency + ingestion latency + repackaged latency). If the origin does pass-through streaming, delivery latency = (current date – segment timecode) – (ingestion latency + encode latency + ingestion latency). Delivery latency can be measured by adding an Amazon CloudFront distribution at the beginning of the origin and using the same type of command line as the ingestion latency calculation. For AWS Elemental MediaStore, (52.71 – 50.40) – (0.39 + 1.48 + 0.27), which is 0.17 seconds. This latency is the same for all origin types in the same region.

Client Latency There are two client-dependent latency factors in this category: Last Mile Latency (network bandwidth related) and Player Latency (content buffer related). Last mile latencies range from a few milliseconds on a fiber connection to a maximum of a few seconds on the slowest mobile connection. The content download period directly affects latency because the content download period is delayed by T + x seconds at the moment when timecode T is available for client-side buffers and playback. If the delay is too large for the length of the segment, the player will not be able to build enough buffers, and the encode ladder will find a good trade-off between the bitrate and network conditions, and the ability to build that content buffer. Switch to a lower bit rate. If the lowest bitrate does not build enough buffer, the content cannot be downloaded quickly enough to always start, stop, and rebuffer. As soon as the content download period reaches 50% of his segment period, the player moves to a dangerous zone in terms of buffers. Ideally, it should stay below 25%. Player latency is the result of a player's buffer policy: that the player buffers her X segments, or requires a certain minimum amount of content, and a strategy that positions the playhead. ..

The way to measure client latency is Client Latency = End-to-End Latency – (Capture Latency + Encode Latency + Capture Latency + Repackaged Latency + Delivery Latency). Player latency can be separated by subtracting the average transfer time (last mile latency) of the media segment from the overall client latency. The Last Mile Latency value should be calculated for at least 20 segment requests. It includes the actual data transfer time and the latency generated by the client. For example, when the segment request was created, all sockets allowed for a particular subdomain are currently open.

This is an example of a breakdown of HLS 1-second segments created with AWS Elemental Live and AWS Elemental MediaStore delivered to standard hls.js 0.8.9 players with Amazon CloudFront.

Latency type	Seconds	Impact
capture	0.39	7.58%
Encode	1.48	28.79%
Capture	0.27	5.25%
Repackaging	N / A	N / A
delivery	0.17	3.33%
Last mile	0.28	5.44%
player	2.55	49.61%
End-to-end latency	5.14	100%

As you can see, the encoding and playback steps generate most of the latency. This is where most of the improvement margins are located. It's not without ways to optimize other steps, but the impact of optimization is minimal. The longer the output media segment time, the longer the player and last mile latency will usually be, but the other steps will remain stable.

Part 2 of the series describes the optimization options that can be applied to each step in the workflow.

Nicolas Weil AWS Elemental Senior Solutions Architect