mirror of
https://github.com/CCExtractor/ccextractor.git
synced 2026-02-03 21:23:48 +00:00
[REQUEST] Add support for 608 tracks inside MKV files #481
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @cfsmp3 on GitHub (Jan 29, 2019).
CCExtractor supports MKV files, and it obviously supports EIA-608.
However for MKV it only supports .srt, while EIA-608 must come for example from TS files or other kind of MPEG files.
So while we have support for both MKV and EIA-608, we can't extract subtitles in EIA-608 inside MKV files. This would be really nice to have now since there's some files around that contain them.
Two such examples:
https://drive.google.com/drive/folders/1A0qPbE8hZdf-dqrDDyVe_AZRo31NLnEa?usp=sharing
@thelastpolaris commented on GitHub (Feb 25, 2019):
I would like to work on this issue.
@cfsmp3 commented on GitHub (Feb 26, 2019):
@thelastpolaris go for it, it's probably a good issue to get started with CCExtractor's codebase. Not trivial, but not too hard.
Good luck!
@thelastpolaris commented on GitHub (Feb 26, 2019):
@cfsmp3 thank you!
So I dived into Matroska related code and found out that it simply doesn't "see" the track with EIA-608 subtitles. Seems that
MATROSKA_SEGMENT_TRACK_ENTRY, which is equal to0xAEdoesn't work for EIA-608 subtitles track.Meanwhile, I just opened your sample through VLC and it works pretty well. VLC is able to display this and I am looking at VLC code now.
Also I found the following
Does it mean that I should look into video track for EIA-608 subtitles?
@cfsmp3 commented on GitHub (Feb 26, 2019):
Seems easy enough - if the EIA-608 data is in stream 2 then find out why
CCExtractor is skipping that stream. What value is it looking for and what
value is it there?
On Tue, Feb 26, 2019 at 8:50 AM Artyom Fedoskin notifications@github.com
wrote:
@thelastpolaris commented on GitHub (Feb 26, 2019):
That is exactly what I am thinking about. Thank you, working on it.
@thelastpolaris commented on GitHub (Feb 26, 2019):
Just a small notification:
I looked up EBML specification for Matroska, played with different ids but it didn't work out because EIA-608 subtitles are really part of the video stream and VLC is simply representing them as a separate stream while they are actually muxed in the video track. See FFMPEG screenshot attached.
So my plan now is to see how I can extract subtitles from the video track in MKV using ccextractor utilities.
@cfsmp3 commented on GitHub (Feb 26, 2019):
You are going to have lots of fun there :-)
In theory it shouldn't be too hard (but probably harder than the case
in which the captions where in their own track). But well, we do have
a H264 decoder in place so you might be able to just feed it the data
from the video stream. It could just work :-)
On Tue, Feb 26, 2019 at 10:49 AM Artyom Fedoskin
notifications@github.com wrote:
@thelastpolaris commented on GitHub (Feb 27, 2019):
@cfsmp3 Yeah, this is a puzzle that is interesting to solve :)
I'm just curious, maybe you could help me:
in
matroska.c, I understood that we useparse_segmentfunctions to walk through Matroska's EBML structure but I would really like to understand where in this code I can pick up the video stream and use H264 decoder with it. I will try to study EBML but any suggestions are welcome :)@cfsmp3 commented on GitHub (Feb 27, 2019):
@thelastpolaris for Matroska's stuff the person you want to talk to is @Izaron , he implemented that part.
If he's too busy ping me in a couple days and I'll work with you.
@thelastpolaris commented on GitHub (Feb 27, 2019):
@cfsmp3 ok, thank you. Seems that I know this guy in person and he also speaks Russian :)
@cfsmp3 commented on GitHub (Feb 27, 2019):
He does and you're in good hands for sure, he's a genius!
@thelastpolaris commented on GitHub (Mar 3, 2019):
@cfsmp3 just a small update on the issue.
@Izaron greatly helped me. He suggested me several tools for Matroska debugging (MKVToolNix, mkvextract) and now I much better understand how Matroska is structured. He also gave me a few very useful advises and I asked him a few more questions today.
Current state of the issue - I figured out how Matroska is structured and the code in matroska.c makes it very easy to work with EBML structured mkv video, however, as I think, the problem is that it might be not enough just pass video data from mkv to h264 decoder as it has some of EBML ids in video itself. So I'm waiting for @Izaron reply and I want to try getting subtitles from each block of h264 video inside
matroska.c, where now they are simply skipped.@cfsmp3 commented on GitHub (Mar 3, 2019):
@thelastpolaris Sounds good. Should be "easy" to do if @Izaron lends a hand. In any case I'm around in slack if you need my help. I didn't do the MKV part but I did (most of) the 608 code so probably can help you get there.