GSOC - Finish CEA-708 support #4

Closed
opened 2026-01-29 16:32:38 +00:00 by claunia · 8 comments
Owner

Originally created by @cfsmp3 on GitHub (Apr 10, 2014).

Originally assigned to: @PunitLodha on GitHub.

EIA-708 is the "new" standard for closed captioning. While the specification has been around for some years and support for it is mandatory in the US for both TV receivers and stations, until very recently almost all stations have just converted their CEA-608 data to 708; this means that none of the 708 features have actually been used, and you still see many captions in all uppercase (to mention just one thing). This is starting to change though, so it makes sense for CCExtractor to fully implement a 708 decoder. Some work was done already, and you can actually see 708 output in CCExtractor in debug mode. But it needs to be completed by adding the actual export features.

Originally created by @cfsmp3 on GitHub (Apr 10, 2014). Originally assigned to: @PunitLodha on GitHub. EIA-708 is the "new" standard for closed captioning. While the specification has been around for some years and support for it is mandatory in the US for both TV receivers and stations, until very recently almost all stations have just converted their CEA-608 data to 708; this means that none of the 708 features have actually been used, and you still see many captions in all uppercase (to mention just one thing). This is starting to change though, so it makes sense for CCExtractor to fully implement a 708 decoder. Some work was done already, and you can actually see 708 output in CCExtractor in debug mode. But it needs to be completed by adding the actual export features.
claunia added the bugCEA-708difficulty: hardGSoC 2021 labels 2026-01-29 16:32:38 +00:00
Author
Owner

@cfsmp3 commented on GitHub (Apr 10, 2014):

CEA-708 is impossible to support 100% exporting to .srt or transcript because it has a number of features (windows to name one) that just don't have a correlation in those formats.

Still, for most of the real world data we can do a fairly good job.

First order of business is fix the timing. It doesn't work, it's off by around 3 seconds. Probably not to hard to solve, just read the specs paying attention :-)

Expect for that, I believe the internal status of the decoder is correct and it supports all the 708 commands (at least that I saw ever in use). What's missing is correctly exporting that status to files.

@cfsmp3 commented on GitHub (Apr 10, 2014): CEA-708 is impossible to support 100% exporting to .srt or transcript because it has a number of features (windows to name one) that just don't have a correlation in those formats. Still, for most of the real world data we can do a fairly good job. First order of business is fix the timing. It doesn't work, it's off by around 3 seconds. Probably not to hard to solve, just read the specs paying attention :-) Expect for that, I believe the internal status of the decoder is correct and it supports all the 708 commands (at least that I saw ever in use). What's missing is correctly exporting that status to files.
Author
Owner

@cfsmp3 commented on GitHub (Aug 8, 2016):

Assigned to bigharshrag since he's the 708 man this GSoC :-) kisselef is mentoring.

@cfsmp3 commented on GitHub (Aug 8, 2016): Assigned to bigharshrag since he's the 708 man this GSoC :-) kisselef is mentoring.
Author
Owner

@cfsmp3 commented on GitHub (Jan 6, 2017):

Additional info (pasted from a separate issue, I'm going to merge 708 stuff here).

Someone just sent a number of useful samples. They're available in /repository/Cristiano708

MPEG-PS containing both CEA608 and CEA708 captions
(On version 0.76, 608 extraction works, but 708 does not)
captions_test.mpg

MPEG-TS containing both CEA608 and CEA708 captions
(On version 0.76, 608 extraction works, but 708 does not)
captions_test_ts.mpg

CEA608 TTML file generated by Adobe Premiere (CEA608 track)
(the great thing about this version is that it contains the correct positions)
captions-test_608.xml

CEA708 TTML file (slightly different from the CEA608 TTML):
(the great thing about this version is that it contains the correct positions)
captions-test_708.xml

SCC File (CEA608)
captions-test_608.scc

MCC File (CEA708)
captions-test_708.mcc

@cfsmp3 commented on GitHub (Jan 6, 2017): Additional info (pasted from a separate issue, I'm going to merge 708 stuff here). Someone just sent a number of useful samples. They're available in /repository/Cristiano708 MPEG-PS containing both CEA608 and CEA708 captions (On version 0.76, 608 extraction works, but 708 does not) captions_test.mpg MPEG-TS containing both CEA608 and CEA708 captions (On version 0.76, 608 extraction works, but 708 does not) captions_test_ts.mpg CEA608 TTML file generated by Adobe Premiere (CEA608 track) (the great thing about this version is that it contains the correct positions) captions-test_608.xml CEA708 TTML file (slightly different from the CEA608 TTML): (the great thing about this version is that it contains the correct positions) captions-test_708.xml SCC File (CEA608) captions-test_608.scc MCC File (CEA708) captions-test_708.mcc
Author
Owner

@cfsmp3 commented on GitHub (Jan 6, 2017):

Additional: https://github.com/CCExtractor/ccextractor/issues/178

@cfsmp3 commented on GitHub (Jan 6, 2017): Additional: https://github.com/CCExtractor/ccextractor/issues/178
Author
Owner

@NilsIrl commented on GitHub (Dec 30, 2019):

you still see many captions in all uppercase

Doesn't CEA-608 support case?

@NilsIrl commented on GitHub (Dec 30, 2019): > you still see many captions in all uppercase Doesn't CEA-608 support case?
Author
Owner

@cfsmp3 commented on GitHub (Dec 30, 2019):

you still see many captions in all uppercase

Doesn't CEA-608 support case?

In theory yes (check the character set here: https://en.wikipedia.org/wiki/EIA-608#Characters )

But if I recall correctly the very first decoders were uppercase only (I can't find the technical reference to support this claim though) but:

https://audio-accessibility.com/news/2018/10/all-caps-vs-mixed-case-type-for-captions/

Also, this:

https://www.vitac.com/all-caps-captioning/

@cfsmp3 commented on GitHub (Dec 30, 2019): > > you still see many captions in all uppercase > > Doesn't CEA-608 support case? In theory yes (check the character set here: https://en.wikipedia.org/wiki/EIA-608#Characters ) But if I recall correctly the very first decoders were uppercase only (I can't find the technical reference to support this claim though) but: https://audio-accessibility.com/news/2018/10/all-caps-vs-mixed-case-type-for-captions/ Also, this: https://www.vitac.com/all-caps-captioning/
Author
Owner

@cfsmp3 commented on GitHub (May 19, 2021):

Standard here: https://shop.cta.tech/products/line-21-data-services

I'm assigning this to @PunitLodha since he's going to be working on it during GSoC 2021.

@cfsmp3 commented on GitHub (May 19, 2021): Standard here: https://shop.cta.tech/products/line-21-data-services I'm assigning this to @PunitLodha since he's going to be working on it during GSoC 2021.
Author
Owner

@cfsmp3 commented on GitHub (Nov 20, 2021):

I'm closing this since 708 is now in reasonably good shape now that @PunitLodha killed the known bugs :-)

@cfsmp3 commented on GitHub (Nov 20, 2021): I'm closing this since 708 is now in reasonably good shape now that @PunitLodha killed the known bugs :-)
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/ccextractor#4