mirror of
https://github.com/CCExtractor/ccextractor.git
synced 2026-02-15 13:35:30 +00:00
[BUG] French DVB subtitles need deduplication #462
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @Liontooth on GitHub (Nov 18, 2018).
CCExtractor version: 0.85
In raising this issue, I confirm the following:
My familiarity with the project is as follows:
Necessary information
-datets -ttxt -UCLA -noru -utf8**Video links **
http://vrnewsscape.ucla.edu/dropbox/2017-07-14_1100_FR_TF1_Journal.mpg
http://vrnewsscape.ucla.edu/dropbox/2017-07-14_1100_FR_TF1_Journal.txt
Additional information
CCExtractor-0.85 compiled 2017-07-29 with liblept4 succeeds in extracting DVB captions from the file above, as shown in the accompanying txt file (Chrome gets the encoding wrong and no longer has a way to correct it; in fact the file is UTF-8). (CCExtractor-0.86 and CCExtractor-0.87 fail to find any subtitles, see issue #1039.)
However, each line appears in part several times before it completes, and also at times partially repeats in the following line:
CCExtractor has solved this duplication problem in teletext; it's clearly also present in some DVB subtitles, notably the French network TF1.
@cfsmp3 commented on GitHub (Dec 27, 2025):
Closing - samples are no longer working, I suppose this corner case is no longer important? @Liontooth let us know if you still would like to see this happen.