[REQUEST] Add support for 608 tracks inside MKV files #481

Closed
opened 2026-01-29 16:45:01 +00:00 by claunia · 13 comments
Owner

Originally created by @cfsmp3 on GitHub (Jan 29, 2019).

CCExtractor supports MKV files, and it obviously supports EIA-608.

However for MKV it only supports .srt, while EIA-608 must come for example from TS files or other kind of MPEG files.

So while we have support for both MKV and EIA-608, we can't extract subtitles in EIA-608 inside MKV files. This would be really nice to have now since there's some files around that contain them.

Two such examples:
https://drive.google.com/drive/folders/1A0qPbE8hZdf-dqrDDyVe_AZRo31NLnEa?usp=sharing

Originally created by @cfsmp3 on GitHub (Jan 29, 2019). CCExtractor supports MKV files, and it obviously supports EIA-608. However for MKV it only supports .srt, while EIA-608 must come for example from TS files or other kind of MPEG files. So while we have support for both MKV and EIA-608, we can't extract subtitles in EIA-608 inside MKV files. This would be really nice to have now since there's some files around that contain them. Two such examples: https://drive.google.com/drive/folders/1A0qPbE8hZdf-dqrDDyVe_AZRo31NLnEa?usp=sharing
Author
Owner

@thelastpolaris commented on GitHub (Feb 25, 2019):

I would like to work on this issue.

@thelastpolaris commented on GitHub (Feb 25, 2019): I would like to work on this issue.
Author
Owner

@cfsmp3 commented on GitHub (Feb 26, 2019):

@thelastpolaris go for it, it's probably a good issue to get started with CCExtractor's codebase. Not trivial, but not too hard.

Good luck!

@cfsmp3 commented on GitHub (Feb 26, 2019): @thelastpolaris go for it, it's probably a good issue to get started with CCExtractor's codebase. Not trivial, but not too hard. Good luck!
Author
Owner

@thelastpolaris commented on GitHub (Feb 26, 2019):

@cfsmp3 thank you!

So I dived into Matroska related code and found out that it simply doesn't "see" the track with EIA-608 subtitles. Seems that MATROSKA_SEGMENT_TRACK_ENTRY, which is equal to 0xAE doesn't work for EIA-608 subtitles track.

Meanwhile, I just opened your sample through VLC and it works pretty well. VLC is able to display this and I am looking at VLC code now.

screenshot from 2019-02-26 17-41-44

Also I found the following

EIA-608 and similar closed captioning standards, the captions are embedded directly in the video bitstream as user data. H.264 bitstreams are stored as a sequence of NAL (network abstraction layer) units. Each unit has a type; user data is stored in a NAL unit of the supplemental enhancement information (SEI) type.

Does it mean that I should look into video track for EIA-608 subtitles?

@thelastpolaris commented on GitHub (Feb 26, 2019): @cfsmp3 thank you! So I dived into Matroska related code and found out that it simply doesn't "see" the track with EIA-608 subtitles. Seems that `MATROSKA_SEGMENT_TRACK_ENTRY`, which is equal to `0xAE` doesn't work for EIA-608 subtitles track. Meanwhile, I just opened your sample through VLC and it works pretty well. VLC is able to display this and I am looking at VLC code now. ![screenshot from 2019-02-26 17-41-44](https://user-images.githubusercontent.com/8721751/53430035-01ff0280-39ee-11e9-9695-583002dbe61f.png) Also I found the following > EIA-608 and similar closed captioning standards, the captions are embedded directly in the video bitstream as user data. H.264 bitstreams are stored as a sequence of NAL (network abstraction layer) units. Each unit has a type; user data is stored in a NAL unit of the supplemental enhancement information (SEI) type. Does it mean that I should look into video track for EIA-608 subtitles?
Author
Owner

@cfsmp3 commented on GitHub (Feb 26, 2019):

Seems easy enough - if the EIA-608 data is in stream 2 then find out why
CCExtractor is skipping that stream. What value is it looking for and what
value is it there?

On Tue, Feb 26, 2019 at 8:50 AM Artyom Fedoskin notifications@github.com
wrote:

@cfsmp3 https://github.com/cfsmp3 thank you!

So I dived into Matroska related code and found out that it simply doesn't
"see" the track with EIA-608 subtitles. Seems that
MATROSKA_SEGMENT_TRACK_ENTRY, which is equal to 0xAE doesn't work for
EIA-608 subtitles track.

Meanwhile, I just opened your sample through VLC and it works pretty well.
VLC is able to display this and I am looking at VLC code now.

[image: screenshot from 2019-02-26 17-41-44]
https://user-images.githubusercontent.com/8721751/53430035-01ff0280-39ee-11e9-9695-583002dbe61f.png

Also I found the following

EIA-608 and similar closed captioning standards, the captions are embedded
directly in the video bitstream as user data. H.264 bitstreams are stored
as a sequence of NAL (network abstraction layer) units. Each unit has a
type; user data is stored in a NAL unit of the supplemental enhancement
information (SEI) type.

Does it mean that I should look into video track for EIA-608 subtitles?


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/CCExtractor/ccextractor/issues/1068#issuecomment-467518396,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AFrJ2VzS1BI20uKflJedqaKQj9vgqpVpks5vRWXjgaJpZM4aXxGq
.

@cfsmp3 commented on GitHub (Feb 26, 2019): Seems easy enough - if the EIA-608 data is in stream 2 then find out why CCExtractor is skipping that stream. What value is it looking for and what value is it there? On Tue, Feb 26, 2019 at 8:50 AM Artyom Fedoskin <notifications@github.com> wrote: > @cfsmp3 <https://github.com/cfsmp3> thank you! > > So I dived into Matroska related code and found out that it simply doesn't > "see" the track with EIA-608 subtitles. Seems that > MATROSKA_SEGMENT_TRACK_ENTRY, which is equal to 0xAE doesn't work for > EIA-608 subtitles track. > > Meanwhile, I just opened your sample through VLC and it works pretty well. > VLC is able to display this and I am looking at VLC code now. > > [image: screenshot from 2019-02-26 17-41-44] > <https://user-images.githubusercontent.com/8721751/53430035-01ff0280-39ee-11e9-9695-583002dbe61f.png> > > Also I found the following > > EIA-608 and similar closed captioning standards, the captions are embedded > directly in the video bitstream as user data. H.264 bitstreams are stored > as a sequence of NAL (network abstraction layer) units. Each unit has a > type; user data is stored in a NAL unit of the supplemental enhancement > information (SEI) type. > > Does it mean that I should look into video track for EIA-608 subtitles? > > — > You are receiving this because you were mentioned. > Reply to this email directly, view it on GitHub > <https://github.com/CCExtractor/ccextractor/issues/1068#issuecomment-467518396>, > or mute the thread > <https://github.com/notifications/unsubscribe-auth/AFrJ2VzS1BI20uKflJedqaKQj9vgqpVpks5vRWXjgaJpZM4aXxGq> > . >
Author
Owner

@thelastpolaris commented on GitHub (Feb 26, 2019):

Seems easy enough - if the EIA-608 data is in stream 2 then find out why CCExtractor is skipping that stream. What value is it looking for and what value is it there?

On Tue, Feb 26, 2019 at 8:50 AM Artyom Fedoskin @.***> wrote: @cfsmp3 https://github.com/cfsmp3 thank you! So I dived into Matroska related code and found out that it simply doesn't "see" the track with EIA-608 subtitles. Seems that MATROSKA_SEGMENT_TRACK_ENTRY, which is equal to 0xAE doesn't work for EIA-608 subtitles track. Meanwhile, I just opened your sample through VLC and it works pretty well. VLC is able to display this and I am looking at VLC code now. [image: screenshot from 2019-02-26 17-41-44] https://user-images.githubusercontent.com/8721751/53430035-01ff0280-39ee-11e9-9695-583002dbe61f.png Also I found the following EIA-608 and similar closed captioning standards, the captions are embedded directly in the video bitstream as user data. H.264 bitstreams are stored as a sequence of NAL (network abstraction layer) units. Each unit has a type; user data is stored in a NAL unit of the supplemental enhancement information (SEI) type. Does it mean that I should look into video track for EIA-608 subtitles? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#1068 (comment)>, or mute the thread https://github.com/notifications/unsubscribe-auth/AFrJ2VzS1BI20uKflJedqaKQj9vgqpVpks5vRWXjgaJpZM4aXxGq .

That is exactly what I am thinking about. Thank you, working on it.

@thelastpolaris commented on GitHub (Feb 26, 2019): > Seems easy enough - if the EIA-608 data is in stream 2 then find out why CCExtractor is skipping that stream. What value is it looking for and what value is it there? > […](#) > On Tue, Feb 26, 2019 at 8:50 AM Artyom Fedoskin ***@***.***> wrote: @cfsmp3 <https://github.com/cfsmp3> thank you! So I dived into Matroska related code and found out that it simply doesn't "see" the track with EIA-608 subtitles. Seems that MATROSKA_SEGMENT_TRACK_ENTRY, which is equal to 0xAE doesn't work for EIA-608 subtitles track. Meanwhile, I just opened your sample through VLC and it works pretty well. VLC is able to display this and I am looking at VLC code now. [image: screenshot from 2019-02-26 17-41-44] <https://user-images.githubusercontent.com/8721751/53430035-01ff0280-39ee-11e9-9695-583002dbe61f.png> Also I found the following EIA-608 and similar closed captioning standards, the captions are embedded directly in the video bitstream as user data. H.264 bitstreams are stored as a sequence of NAL (network abstraction layer) units. Each unit has a type; user data is stored in a NAL unit of the supplemental enhancement information (SEI) type. Does it mean that I should look into video track for EIA-608 subtitles? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <[#1068 (comment)](https://github.com/CCExtractor/ccextractor/issues/1068#issuecomment-467518396)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AFrJ2VzS1BI20uKflJedqaKQj9vgqpVpks5vRWXjgaJpZM4aXxGq> . That is exactly what I am thinking about. Thank you, working on it.
Author
Owner

@thelastpolaris commented on GitHub (Feb 26, 2019):

Just a small notification:

I looked up EBML specification for Matroska, played with different ids but it didn't work out because EIA-608 subtitles are really part of the video stream and VLC is simply representing them as a separate stream while they are actually muxed in the video track. See FFMPEG screenshot attached.

screenshot from 2019-02-26 19-28-08

So my plan now is to see how I can extract subtitles from the video track in MKV using ccextractor utilities.

@thelastpolaris commented on GitHub (Feb 26, 2019): Just a small notification: I looked up EBML specification for Matroska, played with different ids but it didn't work out because EIA-608 subtitles are really part of the video stream and VLC is simply representing them as a separate stream while they are actually muxed in the video track. See FFMPEG screenshot attached. ![screenshot from 2019-02-26 19-28-08](https://user-images.githubusercontent.com/8721751/53437742-8d808f80-39fe-11e9-9418-3284bf3a765b.png) So my plan now is to see how I can extract subtitles from the video track in MKV using ccextractor utilities.
Author
Owner

@cfsmp3 commented on GitHub (Feb 26, 2019):

You are going to have lots of fun there :-)
In theory it shouldn't be too hard (but probably harder than the case
in which the captions where in their own track). But well, we do have
a H264 decoder in place so you might be able to just feed it the data
from the video stream. It could just work :-)

On Tue, Feb 26, 2019 at 10:49 AM Artyom Fedoskin
notifications@github.com wrote:

Just a small notification:

I looked up EBML specification for Matroska, played with different ids but it didn't work out because EIA-608 subtitles are really part of the video stream and VLC is simply representing them as a separate stream while they are actually muxed in the video track. See FFMPEG screenshot attached.

So my plan now is to see how I can extract subtitles from the video track in MKV using ccextractor utilities.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.

@cfsmp3 commented on GitHub (Feb 26, 2019): You are going to have lots of fun there :-) In theory it shouldn't be too hard (but probably harder than the case in which the captions where in their own track). But well, we do have a H264 decoder in place so you might be able to just feed it the data from the video stream. It _could_ just work :-) On Tue, Feb 26, 2019 at 10:49 AM Artyom Fedoskin <notifications@github.com> wrote: > > Just a small notification: > > I looked up EBML specification for Matroska, played with different ids but it didn't work out because EIA-608 subtitles are really part of the video stream and VLC is simply representing them as a separate stream while they are actually muxed in the video track. See FFMPEG screenshot attached. > > So my plan now is to see how I can extract subtitles from the video track in MKV using ccextractor utilities. > > — > You are receiving this because you were mentioned. > Reply to this email directly, view it on GitHub, or mute the thread.
Author
Owner

@thelastpolaris commented on GitHub (Feb 27, 2019):

@cfsmp3 Yeah, this is a puzzle that is interesting to solve :)

I'm just curious, maybe you could help me:
in matroska.c, I understood that we use parse_segment functions to walk through Matroska's EBML structure but I would really like to understand where in this code I can pick up the video stream and use H264 decoder with it. I will try to study EBML but any suggestions are welcome :)

@thelastpolaris commented on GitHub (Feb 27, 2019): @cfsmp3 Yeah, this is a puzzle that is interesting to solve :) I'm just curious, maybe you could help me: in `matroska.c`, I understood that we use `parse_segment` functions to walk through Matroska's EBML structure but I would really like to understand where in this code I can pick up the video stream and use H264 decoder with it. I will try to study EBML but any suggestions are welcome :)
Author
Owner

@cfsmp3 commented on GitHub (Feb 27, 2019):

@thelastpolaris for Matroska's stuff the person you want to talk to is @Izaron , he implemented that part.

If he's too busy ping me in a couple days and I'll work with you.

@cfsmp3 commented on GitHub (Feb 27, 2019): @thelastpolaris for Matroska's stuff the person you want to talk to is @Izaron , he implemented that part. If he's too busy ping me in a couple days and I'll work with you.
Author
Owner

@thelastpolaris commented on GitHub (Feb 27, 2019):

@cfsmp3 ok, thank you. Seems that I know this guy in person and he also speaks Russian :)

@thelastpolaris commented on GitHub (Feb 27, 2019): @cfsmp3 ok, thank you. Seems that I know this guy in person and he also speaks Russian :)
Author
Owner

@cfsmp3 commented on GitHub (Feb 27, 2019):

@cfsmp3 ok, thank you. Seems that I know this guy in person and he also speaks Russian :)

He does and you're in good hands for sure, he's a genius!

@cfsmp3 commented on GitHub (Feb 27, 2019): > @cfsmp3 ok, thank you. Seems that I know this guy in person and he also speaks Russian :) He does and you're in good hands for sure, he's a genius!
Author
Owner

@thelastpolaris commented on GitHub (Mar 3, 2019):

@cfsmp3 just a small update on the issue.
@Izaron greatly helped me. He suggested me several tools for Matroska debugging (MKVToolNix, mkvextract) and now I much better understand how Matroska is structured. He also gave me a few very useful advises and I asked him a few more questions today.

Current state of the issue - I figured out how Matroska is structured and the code in matroska.c makes it very easy to work with EBML structured mkv video, however, as I think, the problem is that it might be not enough just pass video data from mkv to h264 decoder as it has some of EBML ids in video itself. So I'm waiting for @Izaron reply and I want to try getting subtitles from each block of h264 video inside matroska.c, where now they are simply skipped.

@thelastpolaris commented on GitHub (Mar 3, 2019): @cfsmp3 just a small update on the issue. @Izaron greatly helped me. He suggested me several tools for Matroska debugging (MKVToolNix, mkvextract) and now I much better understand how Matroska is structured. He also gave me a few very useful advises and I asked him a few more questions today. Current state of the issue - I figured out how Matroska is structured and the code in matroska.c makes it very easy to work with EBML structured mkv video, however, as I think, the problem is that it might be not enough just pass video data from mkv to h264 decoder as it has some of EBML ids in video itself. So I'm waiting for @Izaron reply and I want to try getting subtitles from each block of h264 video inside `matroska.c`, where now they are simply skipped.
Author
Owner

@cfsmp3 commented on GitHub (Mar 3, 2019):

@thelastpolaris Sounds good. Should be "easy" to do if @Izaron lends a hand. In any case I'm around in slack if you need my help. I didn't do the MKV part but I did (most of) the 608 code so probably can help you get there.

@cfsmp3 commented on GitHub (Mar 3, 2019): @thelastpolaris Sounds good. Should be "easy" to do if @Izaron lends a hand. In any case I'm around in slack if you need my help. I didn't do the MKV part but I did (most of) the 608 code so probably can help you get there.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/ccextractor#481