[PROPOSAL] Add WebVTT output from Matroska #298

Closed
opened 2026-01-29 16:40:13 +00:00 by claunia · 2 comments
Owner

Originally created by @Izaron on GitHub (Mar 20, 2017).

Please prefix your issue with one of the following: [BUG], [PROPOSAL], [QUESTION].

CCExtractor version (using the --version parameter preferably) : 0.85

In raising this issue, I confirm the following (please check boxes, eg [X]):

  • I have read and understood the contributors guide.
  • I have checked that the bug-fix I am reporting can be replicated, or that the feature I am suggesting isn't already present.
  • I have checked that the issue I'm posting isn't already reported.
  • I have checked that the issue I'm porting isn't already solved and no duplicates exist in closed issues and in opened issues
  • I have checked the pull requests tab for existing solutions/implementations to my issue/suggestion.
  • I have used the latest available version of CCExtractor to verify this issue exists.

My familiarity with the project is as follows (check one, eg [X]):

  • I have never used CCExtractor.
  • I have used CCExtractor just a couple of times.
  • I absolutely love CCExtractor, but have not contributed previously.
  • I am an active contributor to CCExtractor.

Necessary information

  • Is this a regression (did it work before)? [X] NO | [ ] YES - please specify the last known working version
  • What platform did you use? [X] Windows - [X] Linux - [ ] Mac
  • What where the used arguments? ccextractor test_t.mkv

Video links
test_t.mkv

Additional information

Given a video (test_t.mkv) that contains one WebVTT subtitle file with name "Some subs lol" and language "nob", and many SRT files.
The task is to add WebVTT extraction. You can also use this argument ccextractor test_t.mkv -mkvlang nob for simple debugging.

Sourse WebVTT subs - link

We already have printing of header - in test_t_nob.vtt there are subtitle file header.
Step-by-step:

  1. Read this page: https://www.matroska.org/technical/specs/subtitles/webvtt.html
  2. Add printing of timing and the text of sentence (now only SRT and ASS/SSA is supported) - here https://github.com/CCExtractor/ccextractor/blob/master/src/lib_ccx/matroska.c#L782
  3. Extract BlockAddition's content (read in specs https://www.matroska.org/technical/specs/subtitles/webvtt.html) - rewrite this code https://github.com/CCExtractor/ccextractor/blob/master/src/lib_ccx/matroska.c#L309 (this content is skipped in current master) and save in track's structure. For rewriting this part of code read https://www.matroska.org/technical/specs/index.html (search BlockAdditions - you should read BlockAdditional)
  4. Print this BlockAddition's content for getting source WebVTT file.

Tips: use program mkvinfo for reading structures positions in bytes, and use get_current_byte(FILE* file) in debugging for reading current byte position.

Originally created by @Izaron on GitHub (Mar 20, 2017). Please prefix your issue with one of the following: [BUG], [PROPOSAL], [QUESTION]. CCExtractor version (using the --version parameter preferably) : **0.85** **In raising this issue, I confirm the following (please check boxes, eg [X]):** - [X] I have read and understood the [contributors guide](https://github.com/CCExtractor/ccextractor/blob/master/.github/CONTRIBUTING.md). - [X] I have checked that the bug-fix I am reporting can be replicated, or that the feature I am suggesting isn't already present. - [X] I have checked that the issue I'm posting isn't already reported. - [X] I have checked that the issue I'm porting isn't already solved and no duplicates exist in [closed issues](https://github.com/CCExtractor/ccextractor/issues?q=is%3Aissue+is%3Aclosed) and in [opened issues](https://github.com/CCExtractor/ccextractor/issues) - [X] I have checked the pull requests tab for existing solutions/implementations to my issue/suggestion. - [X] I have used the latest available version of CCExtractor to verify this issue exists. **My familiarity with the project is as follows (check one, eg [X]):** - [ ] I have never used CCExtractor. - [ ] I have used CCExtractor just a couple of times. - [ ] I absolutely love CCExtractor, but have not contributed previously. - [x] I am an active contributor to CCExtractor. **Necessary information** - Is this a regression (did it work before)? [X] NO | [ ] YES - *please specify the last known working version* - What platform did you use? [X] Windows - [X] Linux - [ ] Mac - What where the used arguments? `ccextractor test_t.mkv` **Video links** [test_t.mkv](https://drive.google.com/file/d/0B2fcX80_rHuOM0N4VlJNRWVVYVU/view?usp=sharing) **Additional information** Given a video (test_t.mkv) that contains one WebVTT subtitle file with name "Some subs lol" and language "nob", and many SRT files. The task is to add WebVTT extraction. You can also use this argument `ccextractor test_t.mkv -mkvlang nob` for simple debugging. Sourse WebVTT subs - [link](https://paste.fedoraproject.org/paste/8k8QTPpcKeT5jtrYx3f6zV5M1UNdIGYhyRLivL9gydE=/raw) We already have printing of header - in `test_t_nob.vtt` there are subtitle file header. Step-by-step: 1) Read this page: https://www.matroska.org/technical/specs/subtitles/webvtt.html 2) Add printing of timing and the text of sentence (now only SRT and ASS/SSA is supported) - here https://github.com/CCExtractor/ccextractor/blob/master/src/lib_ccx/matroska.c#L782 3) Extract `BlockAddition's content` (read in specs https://www.matroska.org/technical/specs/subtitles/webvtt.html) - rewrite this code https://github.com/CCExtractor/ccextractor/blob/master/src/lib_ccx/matroska.c#L309 (this content is skipped in current master) and save in track's structure. For rewriting this part of code read https://www.matroska.org/technical/specs/index.html (search `BlockAdditions` - you should read `BlockAdditional`) 4) Print this `BlockAddition's content` for getting source WebVTT file. Tips: use program `mkvinfo` for reading structures positions in bytes, and use `get_current_byte(FILE* file)` in debugging for reading current byte position.
Author
Owner

@Izaron commented on GitHub (Mar 20, 2017):

GSoC: This issue gives 5 qualification points. (For good PR. Because it's really hard for a newcomers)

@Izaron commented on GitHub (Mar 20, 2017): GSoC: This issue gives 5 qualification points. (For good PR. Because it's really hard for a newcomers)
Author
Owner

@LucasYoung commented on GitHub (Mar 28, 2017):

Hi, I've started working on this proposal. It appears that the link to the source of the WebVTT subs for the test is broken.

@LucasYoung commented on GitHub (Mar 28, 2017): Hi, I've started working on this proposal. It appears that the [link](https://paste.fedoraproject.org/paste/8k8QTPpcKeT5jtrYx3f6zV5M1UNdIGYhyRLivL9gydE=/raw) to the source of the WebVTT subs for the test is broken.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/ccextractor#298