mirror of
https://github.com/CCExtractor/ccextractor.git
synced 2026-02-04 05:44:53 +00:00
Weird "Style:" entry in webvtt_header #273
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @atrottmann on GitHub (Feb 8, 2017).
In ccextractor 0.85 in src/lib_ccx/ccx_encoders_common.c webvtt_header was changed from "WEBVTT\r\n" to "WEBVTT\r\nStyle:\n"
Additionally, in the function write_subtitle_file_header in src/lib_ccx/ccx_encoders_common.c of ccextractor 0.84 you write a CRLF after webvtt_header, while in 0.85, there is no such extra CRLF.
If I correctly understand the WEBVTT specs, the way it's done in 0.84 results in a valid file starting with WEBVTT\r\n\r\n followed by cue timings and subtitle lines.
In 0.85 I get a file starting with WEBVTT\r\nStyle:\n immediately followed by the first cue, which appears invalid to me:
Between the WEBVTT header and a STYLE block there should be an empty line
The STYLE block should be introduced using STYLE\r\n and not Style:\n
The STYLE block should be ended with an empty line too
For now, I removed the "Style:\n" and added the \r\n, which appears to make it so ccextractor creates valid WEBVTT files again.
@cfsmp3 commented on GitHub (Feb 9, 2017):
I think you're correct in most or all counts. We'll correct (assigning to @Izaron since he made that specific style change) but in the meantime, what software is involved in your end? Is it a player or something else?
@cfsmp3 commented on GitHub (Feb 9, 2017):
GSoC qualification: 1 point, since it's a trivial thing.
@barun511 commented on GitHub (Feb 9, 2017):
If I understand correctly,
static const char *webvtt_header = "WEBVTT\r\n"
"Style:\n";
Simply needs to be changed to
static const char *webvtt_header = "WEBVTT\r\n\r\n";
to comply with the correct WEBVTT standards?
If so, I'll prepare a PR
@Izaron commented on GitHub (Feb 9, 2017):
I improved WebVTT encoder, but I was not able to test if it is full correct - simply can't play subtitle file in browser in HTML page.
Of course anyone can do it (since right now I'm a bit busy with learning), if he able to test correctness (of course with colored text, bolded and so on).
@barun511 commented on GitHub (Feb 9, 2017):
My previous comment was incorrect: expected behaviour is
"WEBVTT\r\n\r\nSTYLE\r\n" to be the webvtt header
and also add a "\n" after the STYLE block.
Can somebody confirm if this is correct?
@Izaron commented on GitHub (Feb 9, 2017):
@barun511 if you want to earn 1 point, you work also is to know exactly if this is correct or no :) Not hard, I think
@barun511 commented on GitHub (Feb 9, 2017):
Good point :)
@atrottmann commented on GitHub (Feb 9, 2017):
@cfsmp3: I use ccextractor to generate VTT files from TV recordings (mpeg-ts, using teletext subtitles), and play them back in a HTML5 based player.
It works extremely well with the subtitles created by ccextractor using any major browser (Firefox/Chrome/Safari/IE/Edge, also on iOS and Android).
A minimal test/demo can be found here: http://guardian.werft22.net/public/html5/player.html
For it to work on IE, the web server has to be configured to send the MIME type "text/vtt" when requesting the VTT files.
@atrottmann commented on GitHub (Feb 9, 2017):
I'm quite sure there are multiple ways of making it work correctly.
I have already submitted a similar issue: https://github.com/CCExtractor/ccextractor/issues/439
Then, the fix was to leave:
static const char *webvtt_header = "WEBVTT\r\n";
and to write another \r\n explicitely when finishing to write the header.
@barun511 commented on GitHub (Feb 10, 2017):
So I did some reading up. We need to initialize the header with WEBVTT, and then a \r\n to finish the header, (according to https://w3c.github.io/webvtt/#file-structure). We aren't putting anything in the header, but just in case we do, I imagine we would explicitly write the "\r\n" in
write_subtitle_file_headerin ccx_encoders_common.c, as mentioned in #439 .As for the STYLE.. since we now have internal CSS, we need the STYLE block. So I was thinking we could add that to the inline CSS file (essentially modify
static const char *webvtt_inline_cssto add a STYLE block with appropriate \r\n before and after the block.). Alternatively, we could explicitly write a STYLE \r\n beforefprintf(f, webvtt_inline_css);and an explicit CRLF after.Which design decision should we prefer?
@Izaron commented on GitHub (Feb 10, 2017):
@barun511 Since we have to write "\r\n" in any case, I think it's better to add it in "const char*", so other developer in case of editing webvtt header function will not think why we have CRLF.
@barun511 commented on GitHub (Feb 16, 2017):
#686 should fix this.
@cfsmp3 commented on GitHub (Mar 6, 2017):
@atrottmann can you confirm? Is this fixed in current master?
@atrottmann commented on GitHub (Mar 7, 2017):
If I correctly understand the WEBVTT file structure (as defined at https://w3c.github.io/webvtt/#file-structure ), it still lacks an empty line after STYLE:
Currently I get:
so the "STYLE" is immediately followed by the timing of the first cue. It should be separated from that by an empty line.
@barun511 commented on GitHub (Mar 9, 2017):
Also:
A WebVTT style block consists of the following components, in the given order:
The string "STYLE" (U+0053 LATIN CAPITAL LETTER S, U+0054 LATIN CAPITAL LETTER T, U+0059 LATIN CAPITAL LETTER Y, U+004C LATIN CAPITAL LETTER L, U+0045 LATIN CAPITAL LETTER E).
Zero or more U+0020 SPACE characters or U+0009 CHARACTER TABULATION (tab) characters.
A WebVTT line terminator.
And
A WebVTT line terminator consists of one of the following:
A U+000D CARRIAGE RETURN U+000A LINE FEED (CRLF) character pair.
A single U+000A LINE FEED (LF) character.
So I interpreted it as follows:
STYLE(the string required) \n (the line terminator)
00:00:07.550 --> 00:00:21.114 (beginning of first cue)
https://www.matroska.org/technical/specs/subtitles/webvtt.html
That seems to agree with my interpretation, or am I doing something wrong?
@atrottmann commented on GitHub (Mar 9, 2017):
You have quoted points 1. to 3. of the definiton of the WebVTT style block, which indeed say "STYLE(the string required) \n (the line terminator)".
However, afterwards, you also need points 4. and 5.:
In the current case of all of the files generated by ccextractor, point 5 is actually "zero characters". Point 6 then specifies another line terminator.
So:
STYLE\n
\n
00:00:00.000 -> ... (first cue)
In the matroska.org link you provided, I always see an empty line after the CSS stylesheet and also an empty line before the first cue.
@barun511 commented on GitHub (Mar 17, 2017):
Right, gotcha! I'll add in the next line terminator