Unknown four-byte data inserted in WEBVTT files before the timestamp #194

Open
opened 2026-01-29 16:37:35 +00:00 by claunia · 0 comments
Owner

Originally created by @atrottmann on GitHub (Nov 1, 2016).

When generating WEBVTT data from a MPEGTS stream, I get what appears to be four bytes of binary data immediately before the timestamp on every line.

out-vtt.zip

The attached file (zipped, because github didn't let me upload the raw .vtt) shows this: After the WEBVTT<0x0d><0x0a> header there are the bytes 0x50 0xf9 0xd9 0x01 before the text-form timestamp 00:00:15.120

I do not understand the purpose of those bytes and suspect a bug.

The source file ./lib_ccx/ccx_encoders_webvtt.c contains the following code at the beginning of write_stringz_as_webvtt:

    used = encode_line(context, context->buffer, (unsigned char *)timeline);
    written = write(context->out->fh, context->buffer, used);
    if (written != used)
            return -1;

This appears to be a duplicate of the code that runs right afterwards, after the timestamp has been sprintf'd, and I do not find a purpose for it. If I correctly understand the code, this just outputs some uninitialized data, which results in the four bytes of apparent garbage that I saw in the generated WEBVTT file.

If i comment this out, it appears to create correct WEBVTT files.

Kind regards,

Andreas Trottmann

Originally created by @atrottmann on GitHub (Nov 1, 2016). When generating WEBVTT data from a MPEGTS stream, I get what appears to be four bytes of binary data immediately before the timestamp on every line. [out-vtt.zip](https://github.com/CCExtractor/ccextractor/files/564176/out-vtt.zip) The attached file (zipped, because github didn't let me upload the raw .vtt) shows this: After the WEBVTT<0x0d><0x0a> header there are the bytes 0x50 0xf9 0xd9 0x01 before the text-form timestamp 00:00:15.120 I do not understand the purpose of those bytes and suspect a bug. The source file ./lib_ccx/ccx_encoders_webvtt.c contains the following code at the beginning of write_stringz_as_webvtt: used = encode_line(context, context->buffer, (unsigned char *)timeline); written = write(context->out->fh, context->buffer, used); if (written != used) return -1; This appears to be a duplicate of the code that runs right afterwards, after the timestamp has been sprintf'd, and I do not find a purpose for it. If I correctly understand the code, this just outputs some uninitialized data, which results in the four bytes of apparent garbage that I saw in the generated WEBVTT file. If i comment this out, it appears to create correct WEBVTT files. Kind regards, Andreas Trottmann
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/ccextractor#194