mirror of
https://github.com/CCExtractor/ccextractor.git
synced 2026-02-04 05:44:53 +00:00
Processing CC data off of US television files creates 2 .srt files, is this expected behavior? #866
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @way-lo on GitHub (Dec 27, 2025).
Submitting this as a bug as I don't believe this is normal behavior. In all previous versions, the default behavior was to create a single .srt file.
In past versions, processing a recorded US video file creates a .srt file with contents such as this:
1
00:00:34,167 --> 00:00:35,467
>>> BREAKING TONIGHT,
THE STATE OF EMERGENCY
HERE IN CALIFORNIA, AS
2
00:00:35,469 --> 00:00:37,102
THE STATE OF EMERGENCY
HERE IN CALIFORNIA, AS
HEAVY RAINS BRING
3
00:00:37,104 --> 00:00:38,203
HERE IN CALIFORNIA, AS
HEAVY RAINS BRING
CATASTROPHIC FLOODING
Now, processing the same file default creates 2 subtitle files. In addition to the above .srt file, a second file with extension .p1.svc01.srt is created. Font color are included on every line for some reason. Contents look like this:
1
00:00:32,133 --> 00:00:33,600
<font color="#aaaaaa"> >>> BREAKING TONIGHT,
2
00:00:33,601 --> 00:00:34,167
<font color="#aaaaaa"> >>> BREAKING TONIGHT,
<font color="#aaaaaa"> THE STATE OF EMERGENCY
3
00:00:34,168 --> 00:00:35,468
<font color="#aaaaaa"> >>> BREAKING TONIGHT,
<font color="#aaaaaa"> THE STATE OF EMERGENCY
<font color="#aaaaaa"> HERE IN CALIFORNIA, AS
4
00:00:35,469 --> 00:00:37,103
<font color="#aaaaaa"> THE STATE OF EMERGENCY
<font color="#aaaaaa"> HERE IN CALIFORNIA, AS
<font color="#aaaaaa"> HEAVY RAINS BRING
5
00:00:37,104 --> 00:00:38,204
<font color="#aaaaaa"> HERE IN CALIFORNIA, AS
<font color="#aaaaaa"> HEAVY RAINS BRING
<font color="#aaaaaa"> CATASTROPHIC FLOODING
Adding the option "--output-field 1" removes the second .srt output and fixes this. (However, after further evaluation, I find the video file does not have a second field. i.e. If I use --output-field 2, there is only an empty .srt created).
If I add the --no-fontcolor option, both files are still created, except now the .p1.svc01.srt file has the font color markup removed.
I'm on the latest Windows portable version, 0.96.2, but the same thing occurs with all 0.96 releases.
@cfsmp3 commented on GitHub (Dec 28, 2025):
The first one is CEA-608, let' say the legacy captions from the analog era. The second one is CEA-708, the digital version. In some content there's a difference (in favor of the digital, which doesn't have the same limitations), while in others is identical because they don't bother creating a better version. I suppose news is one of these cases.
Which one looks better to you, both in content but also in timing? There seems to be a difference there.
Anyway I don't think the 2nd file is accurate, since it's opening the font tag but not closing it.
Can you share that video file? We'll take a look.
@way-lo commented on GitHub (Dec 28, 2025):
Thanks for the reply.
Looking at my quoted text, I forgot to properly add the "\" character to allow for proper pasting within the github comment. I did for the first "<" in each line, but forgot the second one. So there are font closures. All the text should look like this:
1
00:00:32,133 --> 00:00:33,600
<font color="#aaaaaa"> >>> BREAKING TONIGHT, </font>
2
00:00:33,601 --> 00:00:34,167
<font color="#aaaaaa"> >>> BREAKING TONIGHT, </font>
<font color="#aaaaaa"> THE STATE OF EMERGENCY</font>
3
00:00:34,168 --> 00:00:35,468
<font color="#aaaaaa"> >>> BREAKING TONIGHT, </font>
<font color="#aaaaaa"> THE STATE OF EMERGENCY</font>
<font color="#aaaaaa"> HERE IN CALIFORNIA, AS</font>
Here's a link to the file.
https://mega.nz/file/q1wUFDgb#gcX3HiOFl59qXRMMAsww-ADqP3p7645Ym94GdJFqmWU
My prelim testing of other show types (serialized shows, for example) gives the same double output, so it may not be "news" specific. These files are recorded through a cablecard on US Comcast.
I'll have to run two playback instances side by side to see which one .srt is actually superior. The chosen font color is actually quite poor, but I'll take a look at the timing, etc.
@cfsmp3 commented on GitHub (Dec 28, 2025):
Yes, the double output will happen whenever there's subtitles in both CEA-608 and CEA-708. What I mean is that for news, it's likely to be pretty much the same. For other content however it's possible that they do use the extended capabilities for the digital format and subtitles are better in general.
@way-lo commented on GitHub (Dec 28, 2025):
Okay, thanks for the explanation! Searching more, I found the help section that is exactly what you described:
I don't see what option to specify before the -1, -2 or -svc however. Using "--output-field svc" results in an error. Can you clarify?
@cfsmp3 commented on GitHub (Dec 29, 2025):
It's -svc 1 (if you want service 1)
@way-lo commented on GitHub (Dec 30, 2025):
Okay, I found where my issue is, after utilizing the ccxgui and examining the command construction.
"svc" is not accepted as an option on the command line. The full word "service" must be used instead.
(Similarly, I noticed -sc has been removed after 0.94 and is now replaced with "sentencecap")
@cfsmp3 commented on GitHub (Dec 30, 2025):
That's for this! We've added svc as an alias for service just now.
I don't know if sc was used often. Should we add this alias too?
@way-lo commented on GitHub (Dec 30, 2025):
Thanks for the quick fix.
Re: -sc vs -sentencecap, -sc is still referenced on some of the older tutorials online, such as here:
https://ccextractor.org/public/general/command_line_usage/
(I recognize that's using the old 0.88 help screen, and a lot has changed since then).
I might recommend adding it in, if only to keep the old naming alive and consistent.
I agree it's probably not a frequently chosen option. I used to use it because the all caps of news is a bit grating, but somehow having no proper nouns capitalized is also funny to read. I wish there were widely available dictionary files of proper nouns to include. I tried making one just of popular English first names, and there are a LOT out there!