Make some improvements to the Matroska decoder #291

Closed
opened 2026-01-29 16:40:04 +00:00 by claunia · 7 comments
Owner

Originally created by @Izaron on GitHub (Mar 3, 2017).

Some not so hard tasks to become familiar with the CCExtractor code.
Matroska supports now SRT and ASS/SSA subtitle formats
The main files is matroska.c and matroska.h

  1. Make an unlimited count of the tracks (subtitle files) and sentences (one subtitle block). Currently, they are constants as MATROSKA_MAX_TRACKS and MATROSKA_MAX_SENTENCES. You should reallocate memory when it's required and free it after.

  2. Make some "defence from the fool". Even the official Matroska specifications are warning that someone may do copy /b or cat a b > c for "merging" videos. If there are more than one Segment block (normal videos contains the only one block), you can get some error (locate it yourself and fix for getting correct subtitles for all the Segment blocks)

  3. Change generated names - remove ".mkv" (and other possible extensions) from "[name].mkv_[lang]_[index].[extension]", for example "video.mkv_fre_2.srt" -> "video_fre_2.srt".

  4. Do not allow to have any newline characters at the start of the sentences (only newline, extra spaces at the start and at the end are OK). For example

3
00:01:41,720 --> 00:01:43,670



Thank you, brother.

Is bad because this sentence is dissapears in the VLC player, and

3
00:01:41,720 --> 00:01:43,670
Thank you, brother.

Is good. Same thing with ASS/SSA subtitles, but they have the special character for newline.

Specs https://www.matroska.org/technical/specs/index.html (although it is not useful for the tasks)

UPD: Please keep all of this in a single pull request (with multiple commits of course) for getting full points.

Originally created by @Izaron on GitHub (Mar 3, 2017). Some not so hard tasks to become familiar with the CCExtractor code. Matroska supports now SRT and ASS/SSA subtitle formats The main files is matroska.c and matroska.h 1. Make an unlimited count of the tracks (subtitle files) and sentences (one subtitle block). Currently, they are constants as `MATROSKA_MAX_TRACKS` and `MATROSKA_MAX_SENTENCES`. You should **realloc**ate memory when it's required and free it after. 2. Make some "defence from the fool". Even the official Matroska specifications are warning that someone may do `copy /b` or `cat a b > c` for "merging" videos. If there are more than one Segment block (normal videos contains the only one block), you can get some error (locate it yourself and fix for getting correct subtitles for all the Segment blocks) 3. Change generated names - remove ".mkv" (and other possible extensions) from "[name].mkv_[lang]_[index].[extension]", for example "video.mkv_fre_2.srt" -> "video_fre_2.srt". 4. Do not allow to have any newline characters at the start of the sentences (only newline, extra spaces at the start and at the end are OK). For example ``` 3 00:01:41,720 --> 00:01:43,670 Thank you, brother. ``` Is bad because this sentence is dissapears in the VLC player, and ``` 3 00:01:41,720 --> 00:01:43,670 Thank you, brother. ``` Is good. Same thing with ASS/SSA subtitles, but they have the special character for newline. Specs https://www.matroska.org/technical/specs/index.html (although it is not useful for the tasks) **UPD:** Please keep all of this in a single pull request (with multiple commits of course) for getting full points.
Author
Owner

@Izaron commented on GitHub (Mar 3, 2017):

GSoC: This issue gives 3 qualification points.

Also 1 additional qualification point for creating new option "-mkvlang [lang]" (see options.c) for extracting from only the language user want to.

@Izaron commented on GitHub (Mar 3, 2017): GSoC: This issue gives 3 qualification points. Also 1 additional qualification point for creating new option "-mkvlang [lang]" (see options.c) for extracting from only the language user want to.
Author
Owner

@kapilkd13 commented on GitHub (Mar 5, 2017):

Hi @Izaron I would like to work on this issue

@kapilkd13 commented on GitHub (Mar 5, 2017): Hi @Izaron I would like to work on this issue
Author
Owner

@Izaron commented on GitHub (Mar 5, 2017):

Hi @kapilkd13 you're welcome! It is good. Also, you do not required to indicate that you are working on issue. First coder who will send good PR will get this points.

@Izaron commented on GitHub (Mar 5, 2017): Hi @kapilkd13 you're welcome! It is good. Also, you do not required to indicate that you are working on issue. First coder who will send good PR will get this points.
Author
Owner
@Izaron commented on GitHub (Mar 5, 2017): Samples: https://drive.google.com/file/d/0B2fcX80_rHuOOEM3RjhOTXlqbEk/view?usp=sharing (30mb) https://drive.google.com/file/d/0B2fcX80_rHuOZnU3ZFU3VmY2dTQ/view?usp=sharing (~1gb) http://trailers.divx.com/divx_prod/divx_plus_hd_showcase/Sintel_DivXPlus_6500kbps.mkv (819mb)
Author
Owner

@kapilkd13 commented on GitHub (Mar 5, 2017):

Made PR https://github.com/CCExtractor/ccextractor/pull/709 for point 3

@kapilkd13 commented on GitHub (Mar 5, 2017): Made PR https://github.com/CCExtractor/ccextractor/pull/709 for point 3
Author
Owner

@cfsmp3 commented on GitHub (Mar 6, 2017):

@kapilkd13 don't rush it - try producing a high quality PR instead. Points go to the student that has its PR merged, so no point in half-ass'ing it :-) Everyone will need to take their time doing a good job.

@cfsmp3 commented on GitHub (Mar 6, 2017): @kapilkd13 don't rush it - try producing a high quality PR instead. Points go to the student that has its PR merged, so no point in half-ass'ing it :-) Everyone will need to take their time doing a good job.
Author
Owner

@Izaron commented on GitHub (Mar 16, 2017):

@Diptanshu8 got 4 points for PR #722

@Izaron commented on GitHub (Mar 16, 2017): @Diptanshu8 got 4 points for PR #722
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/ccextractor#291