Sync video from multiple IP cameras

When live streaming the same hockey game with two RTSP H.264 cameras, can the two video streams play in sync?

No wall clock in this setup; both cameras are standalone.

I think Yes, in theory.

We expect the clocks on the two cameras accurate enough. (frequent NTP helps)

The two RTCP servers on the cameras map the RTP time stamps to UTC time.

PTS can be calculated from RTP time stamp.

By RTP and RTCP, we have the PTS in UTC, and therefore, sync the two videos.

Problem: packages from RTCP may not inaccurate/missing in reality.



[“Non interleaved”, in which, you must set the RTP timestamp to the PTS + offset]


[synchronization between two RTP streams using RTCP SR]


Video Codec Approaches

I have worked with two video compression standards for IP cameras: H.264 and MJPEG.

H.265 is not widely supported. We don’t use  WMV.

H.264 is by far the most popular.

Why transcoding: to overlay scoreboard info and commercials in the streamed sports game.

 SDK: Mainconcept 9.1.0 (MC), in both pure software and hardware accelerated.

 bufstream_tt is the key class from MC

The difference: DXVA2 is used for HW acceleration, DirectX Video Acceleration from Microsoft

Steps SW:

Setup: pass frame_tt to bufstream_tt to receive decoded frame

Retrieval after decoding: bufstream_tt.auxinfo(GET_PIC)

Steps HW:

Setup: IDirect3DDeviceManager9 -> dxva2_config -> bufstream_tt

Retrieval after decoding:

bufstream_tt.auxinfo(HWACC_GET_PIC) -> dxva_surface_t -> LockRect -> pBits

To check if DXVA2 is supported: IDirectXVideoDecoderService

MJPEG takes more space, but easy to decode.

SDK: IJG.lib from Independent JPEG Group


              JPEG image buffer -> jpeg_decompress_struct with JCS_YCbCr -> jpeg_read_scanlines -> fill YUV buffer

Other Image process such as muxing, overlay, scaling

Why: to transform from to interlaced to progressive, Picture in Picture, Video side by side



Video Rendering Approaches

1. DirectShow: old technology which started in the late 90s.

It can get the job done by constructing a graph with existing filters.
More flexibility can be achieved by customizing the rendering filters.
I used in 2010 to playback school buses surveillance video.

2. Direct3D:

This is like making a rendering filter but outside of DirectShow’s frame work, video frame as texture on a poly.
Great flexibility can be achieved, but Direct3D coding may take more time to get up to speed.
Input can be raw video video frame such as YUV420. The YUV gets copied to the texture, by the route below.
YUV -> CreateOffscreenPlainSurface.IDirect3DSurface9.D3DLOCKED_RECT.pBits -> IDirect3DTexture9.IDirect3DSurface9
I did this in 2014 in my project to render video from IP cameras.

3. GDI

for rendering GUI in the old days, not for video as it’s too slow.