mirror of
https://github.com/CCExtractor/ccextractor.git
synced 2026-02-04 05:44:53 +00:00
GSOC - Finish CEA-708 support #4
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @cfsmp3 on GitHub (Apr 10, 2014).
Originally assigned to: @PunitLodha on GitHub.
EIA-708 is the "new" standard for closed captioning. While the specification has been around for some years and support for it is mandatory in the US for both TV receivers and stations, until very recently almost all stations have just converted their CEA-608 data to 708; this means that none of the 708 features have actually been used, and you still see many captions in all uppercase (to mention just one thing). This is starting to change though, so it makes sense for CCExtractor to fully implement a 708 decoder. Some work was done already, and you can actually see 708 output in CCExtractor in debug mode. But it needs to be completed by adding the actual export features.
@cfsmp3 commented on GitHub (Apr 10, 2014):
CEA-708 is impossible to support 100% exporting to .srt or transcript because it has a number of features (windows to name one) that just don't have a correlation in those formats.
Still, for most of the real world data we can do a fairly good job.
First order of business is fix the timing. It doesn't work, it's off by around 3 seconds. Probably not to hard to solve, just read the specs paying attention :-)
Expect for that, I believe the internal status of the decoder is correct and it supports all the 708 commands (at least that I saw ever in use). What's missing is correctly exporting that status to files.
@cfsmp3 commented on GitHub (Aug 8, 2016):
Assigned to bigharshrag since he's the 708 man this GSoC :-) kisselef is mentoring.
@cfsmp3 commented on GitHub (Jan 6, 2017):
Additional info (pasted from a separate issue, I'm going to merge 708 stuff here).
Someone just sent a number of useful samples. They're available in /repository/Cristiano708
MPEG-PS containing both CEA608 and CEA708 captions
(On version 0.76, 608 extraction works, but 708 does not)
captions_test.mpg
MPEG-TS containing both CEA608 and CEA708 captions
(On version 0.76, 608 extraction works, but 708 does not)
captions_test_ts.mpg
CEA608 TTML file generated by Adobe Premiere (CEA608 track)
(the great thing about this version is that it contains the correct positions)
captions-test_608.xml
CEA708 TTML file (slightly different from the CEA608 TTML):
(the great thing about this version is that it contains the correct positions)
captions-test_708.xml
SCC File (CEA608)
captions-test_608.scc
MCC File (CEA708)
captions-test_708.mcc
@cfsmp3 commented on GitHub (Jan 6, 2017):
Additional: https://github.com/CCExtractor/ccextractor/issues/178
@NilsIrl commented on GitHub (Dec 30, 2019):
Doesn't CEA-608 support case?
@cfsmp3 commented on GitHub (Dec 30, 2019):
In theory yes (check the character set here: https://en.wikipedia.org/wiki/EIA-608#Characters )
But if I recall correctly the very first decoders were uppercase only (I can't find the technical reference to support this claim though) but:
https://audio-accessibility.com/news/2018/10/all-caps-vs-mixed-case-type-for-captions/
Also, this:
https://www.vitac.com/all-caps-captioning/
@cfsmp3 commented on GitHub (May 19, 2021):
Standard here: https://shop.cta.tech/products/line-21-data-services
I'm assigning this to @PunitLodha since he's going to be working on it during GSoC 2021.
@cfsmp3 commented on GitHub (Nov 20, 2021):
I'm closing this since 708 is now in reasonably good shape now that @PunitLodha killed the known bugs :-)