Adaptive Bitrate HLS Generation

MaxxSports has two variant streams for each camera, high and low resolution, generated by ffmpeg. Player can switch between the two resolutions based on bandwidth.

There are three playlists in this sample setup; master.m3u8 has two variant playlists:
var_0.m3u8: 426×240, with original resolution from input mp4 file
var_1.m3u8: 128×86, transcoded with parameter -s:v:1 128×86

Command used to generate the three playlists:
set FF=2018\bin\ffmpeg.exe
set MEDIA=Gym2017bshort.mp4    : to be replaced by rtsp://… for live camera feed.
%FF% -i %MEDIA% -s:v:1 128×86 -map 0:v -map 0:a -map 0:v -map 0:a -f hls -master_pl_name master.m3u8 -var_stream_map “v:0,a:0 v:1,a:1” var_%v.m3u8

Play URL: or or var_1.m3u8

This feature, var_stream_map, was added to ffmpeg in November 2017.
ffmpeg version is ffmpeg-20180925-a7429d8-win64-static 4.02, downloaded from


Adaptive Streaming on Player Side

I beleive this is how the player switches from one resolution to another on HLS, but I have not found any documentation describing this process on the internet to verify.
First of all, simply switching from, e.g. HighRes0008.ts to LowRes0008.ts, doesn’t work.
1. Most often, HighRes0008.ts desn’t start with a IDR, which make descoding this segment impossible.
2. Even it starts with IDR, they may not be lined up on the frame level.
How the player switch:
1. In preparation to switch to lower resolution, player traces backward on the low resolution play list until it reaches an IDR.
2. based on the PTS on the high resolution playlist, it decides the PTS on the low resolution playlist, which is being switched to. Assuming the destination frame number ID is frame# 481.
3. decodes the low resolution playlist, from IDR to frame# 481, and renders that frame when time comes.
.ts segment contains PES, which in turn contains ES. The PTS is in the PES header.
PCR is only at the .ts layer, which I don’t think is needed in this use case, as long as PTS for audio and video are based on same PCR.
The same layer structure applies to audio too: TS – PES – ES (e.g. hold AAC data)