Introduce support for delivering BPM (Broadcast
Performance Metrics) over SEI (for AVC/H.264 and
HEVC/H.265) and OBU (for AV1) unregistered messages.
Metrics being sent are the session frame counters,
per-rendition frame counters, and RFC3339-based
timestamping information to support end-to-end
latency measurement.
SEI/OBU messages are generated and sent with each IDR
frame, and the frame counters are diff-based, meaning
the counts reflect the diff between IDRs, not the running
totals.
BPM documentation is available at [1].
BPM relies on the recently introduced encoder packet timing
support and the packet callback mechanism.
BPM injection is enabled for an output by registering
the `bpm_inject()` callback via `obs_output_add_packet_callback()`
function. The callback must be unregistered using
`obs_output_remove_packet_callback()` and `bpm_destroy()`
must be used by the caller to release the BPM structures.
It is important to measure the number of frames successfully
encoded by the obs_encoder_t instances, particularly for
renditions where the encoded frame rate differs from the
canvas frame rate. The encoded_frames counter and
`obs_encoder_get_encoded_frames()` API is introduced
to measure and report this in the encoded rendition
metrics message.
[1] https://d50yg09cghihd.cloudfront.net/other/20240718-MultitrackVideoIntegrationGuide.pdf
Packet callbacks are invoked in `send_interleaved()` and
are useful for any plugin to extend functionality at the
packet processing level without needing to modify code in
libobs. Closed caption support is one candidate that is
suitable for migration to a packet callback.
The packet callback also supports the new encoder packet
timing feature. This means a registered callback will have
access to both the compressed encoder packet and the associated
encoder packet timing information.
Introduce support for the `encoder_packet_time` struct
to capture timing information for each frame, starting
from the composition of each frame, through the encoder,
to the queueing of the frame data to each output_t.
Timestamps for each of the following events are based on
`os_gettime_ns()`:
CTS: Composition time stamp (in the encoder render threads)
FER: Frame encode request
FERC: Frame encoder request completely
PIR: Packet interleave request (`send_interleaved()`)
Frame times are forwarded through encoder callbacks in the
context that runs on the relevant encoder thread, ensuring
no race conditions with accessing per encoder array happen.
All per-output processing happens on data that is owned by
the output.
Co-authored-by: Ruwen Hahn <haruwenz@twitch.tv>
The existing caption insertion implementation is to add H.264 specific
SEI NALs, because of this we will skip caption insertion unless the
video stream is H.264. This prevents corruption of AV1/HEVC/etc.
We will now keep a per track list of caption data. When caption
insertion is triggered we will add that to the list for each actively
encoding video track. When doing interleaved send, we pull the data
from the caption data for the corresponding video. This ensures that
captions get copied to all video tracks.
Calling `set_higher_ts` before offsets are known pollutes
`highest_video_ts` with timestamps that are out of range of actual
timestamps. They're generally somewhere high above 0 until an
offset for all streams is found, where they are then reset to 0 or
slightly below 0 in the presence of b-frames.
`highest_video_ts` also needs to start below 0 for the same reason.
Even with timestamps being reset to close to 0, b-frames will
cause initial DTS to drop below 0, thus we need a value that should
"always" be below any "real" timestamps observed.
`highest_video_ts` tracking now only starts once all input streams
are ready, and is computed based on all buffered packets at that
point.
The API `obs_output_video` and `obs_output_audio` returned valid
pointers until the commit fb57eff21 and 645e31fa1.
The API `obs_output_video` is used by some plugins such as obs-websocket
and obs-midi to calculate the duration of streaming and recording.
To have a similar behavior, return the media (video and audio,
respectively) from the encoder.
When `obs_output_t` is an encoded output, `obs_output_set_media` will
ignore the video and audio so that `obs_output_t` will keep holding the
`video_t` and `audio_t` pointer when the output was created.
By this commit, `video` and `audio` member variables in `obs_output_t`
will never set if it is an encoded output.
In the auto-configuration wizard, `video_t` is released and created to
have a different video size while `obs_output_t` is not released. This
resulted in accessing the released `video_t` pointer.
Initialize arrays to 0, as otherwise these can get initialized with
garbage data or potentially Visual Studio's default debug marker,
which is a problem if they're being checked against `NULL` later.
Effectively reverting parts of d314d47, this commit removes the new
functions that got added to remove the flags parameter. Instead, it just
marks the parameter as unused and documents this. Having what is
effectively an API break just to remove a parameter is a bit overkill.
The other parts of d314d47 which cleaned up the usage of the flags
parameter are untouched here.
#8873 accidentally used `log_flag_encoded()` instead of `flag_encoded()` in `obs_output_begin_data_capture()`, which was causing a warning to be logged when it shouldn't.
This deprecates the following functions, replacing them with new
versions:
- `obs_output_can_begin_data_capture()` - now `*capture2()`
- `obs_output_initialize_encoders()` - now `*encoders2()`
- `obs_output_begin_data_capture()` - now `*capture2()`
The flags parameter was initially designed to support audio-only or
video-only operation of an output which had the `OBS_OUTPUT_AV` flag,
however, full support for that was never implemented, and there are
likely fundamental issues with an implementation, mainly that most
outputs are programmed assuming that there will always be at least one
audio and one video track. This requires new flags specifying support
for optional audio/video, among other things.
An implementation to allow audio/video to be optional is best done
using the flag technique above, with audio/video enablement specified
by whether media (raw, `video_t/audio_t`) or encoder (`obs_encoder_t`)
objects are specified.
Since every implementation I could find always specifies `flags` as 0,
I was able to safely conclude that immediately removing the parameter's
functionality is safe to do.
Since OBS outputs can have any combination of flags for encoded,
video, audio, and service, there are a number of cases where it would
be a good idea to validate that we're not allowing output changes that
risk undefined behavior given the supported flags.
This also does a mild amount of code cleanup, adding inline functions
for checking the flags mentioned above.
Since 65eb3c0815929866dde7d398bec7738d21e9b211 we tried to get as close
a sync between audio and video tracks as we could before starting to
send frames to the output.
But in be717dbb2c19aa0e91a164fa52e3d4467b64be9d when multi-track audio
was considered for synchronization it continued to try and line ALL
audio packets up with one video packet. While audio and video packets
are similar size this will work out. But once video packets duration is
smaller than 1/2 audio packet duration this may fail forever. In
practice with AAC's 20-23ms frame duration we can often get tracks
producing frames 5-10ms out of phase. For 60fps video this mostly fine
as 16ms frame duration will cover the gap, but for framerates as low as
100 its possible to fail to synchronize. In practice this is ~50%
for 6 audio tracks and 120fps video on my system, or 100% at 240fps.
There was quite a bit of conflated usage of mixes (which refers
to raw audio) and encoder counts. This fully separates the two
and makes a distinct separation when iterating over mixes vs
encoders.