Weird "Style:" entry in webvtt_header #273

Closed
opened 2026-01-29 16:39:36 +00:00 by claunia · 17 comments
Owner

Originally created by @atrottmann on GitHub (Feb 8, 2017).

In ccextractor 0.85 in src/lib_ccx/ccx_encoders_common.c webvtt_header was changed from "WEBVTT\r\n" to "WEBVTT\r\nStyle:\n"

Additionally, in the function write_subtitle_file_header in src/lib_ccx/ccx_encoders_common.c of ccextractor 0.84 you write a CRLF after webvtt_header, while in 0.85, there is no such extra CRLF.

If I correctly understand the WEBVTT specs, the way it's done in 0.84 results in a valid file starting with WEBVTT\r\n\r\n followed by cue timings and subtitle lines.

In 0.85 I get a file starting with WEBVTT\r\nStyle:\n immediately followed by the first cue, which appears invalid to me:

  1. Between the WEBVTT header and a STYLE block there should be an empty line

  2. The STYLE block should be introduced using STYLE\r\n and not Style:\n

  3. The STYLE block should be ended with an empty line too

For now, I removed the "Style:\n" and added the \r\n, which appears to make it so ccextractor creates valid WEBVTT files again.

Originally created by @atrottmann on GitHub (Feb 8, 2017). In ccextractor 0.85 in src/lib_ccx/ccx_encoders_common.c webvtt_header was changed from "WEBVTT\r\n" to "WEBVTT\r\nStyle:\n" Additionally, in the function write_subtitle_file_header in src/lib_ccx/ccx_encoders_common.c of ccextractor 0.84 you write a CRLF after webvtt_header, while in 0.85, there is no such extra CRLF. If I correctly understand the WEBVTT specs, the way it's done in 0.84 results in a valid file starting with WEBVTT\r\n\r\n followed by cue timings and subtitle lines. In 0.85 I get a file starting with WEBVTT\r\nStyle:\n immediately followed by the first cue, which appears invalid to me: 1. Between the WEBVTT header and a STYLE block there should be an empty line 2. The STYLE block should be introduced using STYLE\r\n and not Style:\n 3. The STYLE block should be ended with an empty line too For now, I removed the "Style:\n" and added the \r\n, which appears to make it so ccextractor creates valid WEBVTT files again.
Author
Owner

@cfsmp3 commented on GitHub (Feb 9, 2017):

I think you're correct in most or all counts. We'll correct (assigning to @Izaron since he made that specific style change) but in the meantime, what software is involved in your end? Is it a player or something else?

@cfsmp3 commented on GitHub (Feb 9, 2017): I think you're correct in most or all counts. We'll correct (assigning to @Izaron since he made that specific style change) but in the meantime, what software is involved in your end? Is it a player or something else?
Author
Owner

@cfsmp3 commented on GitHub (Feb 9, 2017):

GSoC qualification: 1 point, since it's a trivial thing.

@cfsmp3 commented on GitHub (Feb 9, 2017): GSoC qualification: 1 point, since it's a trivial thing.
Author
Owner

@barun511 commented on GitHub (Feb 9, 2017):

If I understand correctly,

static const char *webvtt_header = "WEBVTT\r\n"
"Style:\n";

Simply needs to be changed to

static const char *webvtt_header = "WEBVTT\r\n\r\n";

to comply with the correct WEBVTT standards?

If so, I'll prepare a PR

@barun511 commented on GitHub (Feb 9, 2017): If I understand correctly, static const char *webvtt_header = "WEBVTT\r\n" "Style:\n"; Simply needs to be changed to static const char *webvtt_header = "WEBVTT\r\n\r\n"; to comply with the correct WEBVTT standards? If so, I'll prepare a PR
Author
Owner

@Izaron commented on GitHub (Feb 9, 2017):

I improved WebVTT encoder, but I was not able to test if it is full correct - simply can't play subtitle file in browser in HTML page.
Of course anyone can do it (since right now I'm a bit busy with learning), if he able to test correctness (of course with colored text, bolded and so on).

@Izaron commented on GitHub (Feb 9, 2017): I improved WebVTT encoder, but I was not able to test if it is full correct - simply can't play subtitle file in browser in HTML page. Of course anyone can do it (since right now I'm a bit busy with learning), if he able to test correctness (of course with colored text, bolded and so on).
Author
Owner

@barun511 commented on GitHub (Feb 9, 2017):

My previous comment was incorrect: expected behaviour is

"WEBVTT\r\n\r\nSTYLE\r\n" to be the webvtt header

and also add a "\n" after the STYLE block.

Can somebody confirm if this is correct?

@barun511 commented on GitHub (Feb 9, 2017): My previous comment was incorrect: expected behaviour is "WEBVTT\r\n\r\nSTYLE\r\n" to be the webvtt header and also add a "\n" after the STYLE block. Can somebody confirm if this is correct?
Author
Owner

@Izaron commented on GitHub (Feb 9, 2017):

@barun511 if you want to earn 1 point, you work also is to know exactly if this is correct or no :) Not hard, I think

@Izaron commented on GitHub (Feb 9, 2017): @barun511 if you want to earn 1 point, you work also is to know exactly if this is correct or no :) Not hard, I think
Author
Owner

@barun511 commented on GitHub (Feb 9, 2017):

Good point :)

@barun511 commented on GitHub (Feb 9, 2017): Good point :)
Author
Owner

@atrottmann commented on GitHub (Feb 9, 2017):

@cfsmp3: I use ccextractor to generate VTT files from TV recordings (mpeg-ts, using teletext subtitles), and play them back in a HTML5 based player.

It works extremely well with the subtitles created by ccextractor using any major browser (Firefox/Chrome/Safari/IE/Edge, also on iOS and Android).

A minimal test/demo can be found here: http://guardian.werft22.net/public/html5/player.html

For it to work on IE, the web server has to be configured to send the MIME type "text/vtt" when requesting the VTT files.

@atrottmann commented on GitHub (Feb 9, 2017): @cfsmp3: I use ccextractor to generate VTT files from TV recordings (mpeg-ts, using teletext subtitles), and play them back in a HTML5 based player. It works extremely well with the subtitles created by ccextractor using any major browser (Firefox/Chrome/Safari/IE/Edge, also on iOS and Android). A minimal test/demo can be found here: http://guardian.werft22.net/public/html5/player.html For it to work on IE, the web server has to be configured to send the MIME type "text/vtt" when requesting the VTT files.
Author
Owner

@atrottmann commented on GitHub (Feb 9, 2017):

I'm quite sure there are multiple ways of making it work correctly.

I have already submitted a similar issue: https://github.com/CCExtractor/ccextractor/issues/439

Then, the fix was to leave:

static const char *webvtt_header = "WEBVTT\r\n";

and to write another \r\n explicitely when finishing to write the header.

@atrottmann commented on GitHub (Feb 9, 2017): I'm quite sure there are multiple ways of making it work correctly. I have already submitted a similar issue: https://github.com/CCExtractor/ccextractor/issues/439 Then, the fix was to leave: static const char *webvtt_header = "WEBVTT\r\n\"; and to write another \r\n explicitely when finishing to write the header.
Author
Owner

@barun511 commented on GitHub (Feb 10, 2017):

So I did some reading up. We need to initialize the header with WEBVTT, and then a \r\n to finish the header, (according to https://w3c.github.io/webvtt/#file-structure). We aren't putting anything in the header, but just in case we do, I imagine we would explicitly write the "\r\n" in write_subtitle_file_header in ccx_encoders_common.c, as mentioned in #439 .

As for the STYLE.. since we now have internal CSS, we need the STYLE block. So I was thinking we could add that to the inline CSS file (essentially modify static const char *webvtt_inline_css to add a STYLE block with appropriate \r\n before and after the block.). Alternatively, we could explicitly write a STYLE \r\n before fprintf(f, webvtt_inline_css); and an explicit CRLF after.

Which design decision should we prefer?

@barun511 commented on GitHub (Feb 10, 2017): So I did some reading up. We need to initialize the header with WEBVTT, and then a \r\n to finish the header, (according to https://w3c.github.io/webvtt/#file-structure). We aren't putting anything in the header, but just in case we do, I imagine we would explicitly write the "\r\n" in `write_subtitle_file_header` in ccx_encoders_common.c, as mentioned in #439 . As for the STYLE.. since we now have internal CSS, we need the STYLE block. So I was thinking we could add that to the inline CSS file (essentially modify `static const char *webvtt_inline_css` to add a STYLE block with appropriate \r\n before and after the block.). Alternatively, we could explicitly write a STYLE \r\n before `fprintf(f, webvtt_inline_css);` and an explicit CRLF after. Which design decision should we prefer?
Author
Owner

@Izaron commented on GitHub (Feb 10, 2017):

@barun511 Since we have to write "\r\n" in any case, I think it's better to add it in "const char*", so other developer in case of editing webvtt header function will not think why we have CRLF.

@Izaron commented on GitHub (Feb 10, 2017): @barun511 Since we have to write "\r\n" in any case, I think it's better to add it in "const char*", so other developer in case of editing webvtt header function will not think why we have CRLF.
Author
Owner

@barun511 commented on GitHub (Feb 16, 2017):

#686 should fix this.

@barun511 commented on GitHub (Feb 16, 2017): #686 should fix this.
Author
Owner

@cfsmp3 commented on GitHub (Mar 6, 2017):

@atrottmann can you confirm? Is this fixed in current master?

@cfsmp3 commented on GitHub (Mar 6, 2017): @atrottmann can you confirm? Is this fixed in current master?
Author
Owner

@atrottmann commented on GitHub (Mar 7, 2017):

If I correctly understand the WEBVTT file structure (as defined at https://w3c.github.io/webvtt/#file-structure ), it still lacks an empty line after STYLE:

Currently I get:

WEBVTT

STYLE
00:00:06.440 --> 00:00:25.240
*

so the "STYLE" is immediately followed by the timing of the first cue. It should be separated from that by an empty line.

@atrottmann commented on GitHub (Mar 7, 2017): If I correctly understand the WEBVTT file structure (as defined at https://w3c.github.io/webvtt/#file-structure ), it still lacks an empty line after STYLE: Currently I get: ``` WEBVTT STYLE 00:00:06.440 --> 00:00:25.240 * ``` so the "STYLE" is immediately followed by the timing of the first cue. It should be separated from that by an empty line.
Author
Owner

@barun511 commented on GitHub (Mar 9, 2017):

Also:
A WebVTT style block consists of the following components, in the given order:

The string "STYLE" (U+0053 LATIN CAPITAL LETTER S, U+0054 LATIN CAPITAL LETTER T, U+0059 LATIN CAPITAL LETTER Y, U+004C LATIN CAPITAL LETTER L, U+0045 LATIN CAPITAL LETTER E).
Zero or more U+0020 SPACE characters or U+0009 CHARACTER TABULATION (tab) characters.
A WebVTT line terminator.

And
A WebVTT line terminator consists of one of the following:

A U+000D CARRIAGE RETURN U+000A LINE FEED (CRLF) character pair.
A single U+000A LINE FEED (LF) character.

So I interpreted it as follows:

STYLE(the string required) \n (the line terminator)
00:00:07.550 --> 00:00:21.114 (beginning of first cue)

https://www.matroska.org/technical/specs/subtitles/webvtt.html

That seems to agree with my interpretation, or am I doing something wrong?

@barun511 commented on GitHub (Mar 9, 2017): Also: A WebVTT style block consists of the following components, in the given order: The string "STYLE" (U+0053 LATIN CAPITAL LETTER S, U+0054 LATIN CAPITAL LETTER T, U+0059 LATIN CAPITAL LETTER Y, U+004C LATIN CAPITAL LETTER L, U+0045 LATIN CAPITAL LETTER E). Zero or more U+0020 SPACE characters or U+0009 CHARACTER TABULATION (tab) characters. A WebVTT line terminator. And A WebVTT line terminator consists of one of the following: A U+000D CARRIAGE RETURN U+000A LINE FEED (CRLF) character pair. A single U+000A LINE FEED (LF) character. So I interpreted it as follows: STYLE(the string required) \n (the line terminator) 00:00:07.550 --> 00:00:21.114 (beginning of first cue) https://www.matroska.org/technical/specs/subtitles/webvtt.html That seems to agree with my interpretation, or am I doing something wrong?
Author
Owner

@atrottmann commented on GitHub (Mar 9, 2017):

You have quoted points 1. to 3. of the definiton of the WebVTT style block, which indeed say "STYLE(the string required) \n (the line terminator)".

However, afterwards, you also need points 4. and 5.:

  1. Any sequence of zero or more characters other than U+000A LINE FEED (LF) characters and U+000D CARRIAGE RETURN (CR) characters, each optionally separated from the next by a WebVTT line terminator, except that the entire resulting string must not contain the substring "-->" (U+002D HYPHEN-MINUS, U+002D HYPHEN-MINUS, U+003E GREATER-THAN SIGN). The string represents a CSS stylesheet; the requirements given in the relevant CSS specifications apply. [CSS21]
  2. A WebVTT line terminator.

In the current case of all of the files generated by ccextractor, point 5 is actually "zero characters". Point 6 then specifies another line terminator.

So:

STYLE\n
\n
00:00:00.000 -> ... (first cue)

In the matroska.org link you provided, I always see an empty line after the CSS stylesheet and also an empty line before the first cue.

@atrottmann commented on GitHub (Mar 9, 2017): You have quoted points 1. to 3. of the definiton of the WebVTT style block, which indeed say "STYLE(the string required) \n (the line terminator)". However, afterwards, you also need points 4. and 5.: > 5. Any sequence of zero or more characters other than U+000A LINE FEED (LF) characters and U+000D CARRIAGE RETURN (CR) characters, each optionally separated from the next by a WebVTT line terminator, except that the entire resulting string must not contain the substring "-->" (U+002D HYPHEN-MINUS, U+002D HYPHEN-MINUS, U+003E GREATER-THAN SIGN). The string represents a CSS stylesheet; the requirements given in the relevant CSS specifications apply. [CSS21] > 6. A WebVTT line terminator. In the current case of all of the files generated by ccextractor, point 5 is actually "zero characters". Point 6 then specifies another line terminator. So: STYLE\n \n 00:00:00.000 -> ... (first cue) In the matroska.org link you provided, I always see an empty line after the CSS stylesheet and also an empty line before the first cue.
Author
Owner

@barun511 commented on GitHub (Mar 17, 2017):

Right, gotcha! I'll add in the next line terminator

@barun511 commented on GitHub (Mar 17, 2017): Right, gotcha! I'll add in the next line terminator
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/ccextractor#273