Processing CC data off of US television files creates 2 .srt files, is this expected behavior? #866

Closed
opened 2026-01-29 16:55:39 +00:00 by claunia · 8 comments
Owner

Originally created by @way-lo on GitHub (Dec 27, 2025).

Submitting this as a bug as I don't believe this is normal behavior. In all previous versions, the default behavior was to create a single .srt file.

In past versions, processing a recorded US video file creates a .srt file with contents such as this:
1
00:00:34,167 --> 00:00:35,467
>>> BREAKING TONIGHT,
THE STATE OF EMERGENCY
HERE IN CALIFORNIA, AS

2
00:00:35,469 --> 00:00:37,102
THE STATE OF EMERGENCY
HERE IN CALIFORNIA, AS
HEAVY RAINS BRING

3
00:00:37,104 --> 00:00:38,203
HERE IN CALIFORNIA, AS
HEAVY RAINS BRING
CATASTROPHIC FLOODING

Now, processing the same file default creates 2 subtitle files. In addition to the above .srt file, a second file with extension .p1.svc01.srt is created. Font color are included on every line for some reason. Contents look like this:
1
00:00:32,133 --> 00:00:33,600
<font color="#aaaaaa"> >>> BREAKING TONIGHT,

2
00:00:33,601 --> 00:00:34,167
<font color="#aaaaaa"> >>> BREAKING TONIGHT,
<font color="#aaaaaa"> THE STATE OF EMERGENCY

3
00:00:34,168 --> 00:00:35,468
<font color="#aaaaaa"> >>> BREAKING TONIGHT,
<font color="#aaaaaa"> THE STATE OF EMERGENCY
<font color="#aaaaaa"> HERE IN CALIFORNIA, AS

4
00:00:35,469 --> 00:00:37,103
<font color="#aaaaaa"> THE STATE OF EMERGENCY
<font color="#aaaaaa"> HERE IN CALIFORNIA, AS
<font color="#aaaaaa"> HEAVY RAINS BRING

5
00:00:37,104 --> 00:00:38,204
<font color="#aaaaaa"> HERE IN CALIFORNIA, AS
<font color="#aaaaaa"> HEAVY RAINS BRING
<font color="#aaaaaa"> CATASTROPHIC FLOODING

Adding the option "--output-field 1" removes the second .srt output and fixes this. (However, after further evaluation, I find the video file does not have a second field. i.e. If I use --output-field 2, there is only an empty .srt created).

If I add the --no-fontcolor option, both files are still created, except now the .p1.svc01.srt file has the font color markup removed.

I'm on the latest Windows portable version, 0.96.2, but the same thing occurs with all 0.96 releases.

Originally created by @way-lo on GitHub (Dec 27, 2025). Submitting this as a bug as I don't believe this is normal behavior. In all previous versions, the default behavior was to create a single .srt file. In past versions, processing a recorded US video file creates a .srt file with contents such as this: 1 00:00:34,167 --> 00:00:35,467 \>>> BREAKING TONIGHT, THE STATE OF EMERGENCY HERE IN CALIFORNIA, AS 2 00:00:35,469 --> 00:00:37,102 THE STATE OF EMERGENCY HERE IN CALIFORNIA, AS HEAVY RAINS BRING 3 00:00:37,104 --> 00:00:38,203 HERE IN CALIFORNIA, AS HEAVY RAINS BRING CATASTROPHIC FLOODING Now, processing the same file default creates 2 subtitle files. In addition to the above .srt file, a second file with extension .p1.svc01.srt is created. Font color are included on every line for some reason. Contents look like this: 1 00:00:32,133 --> 00:00:33,600 \<font color="#aaaaaa"> >>> BREAKING TONIGHT, </font> 2 00:00:33,601 --> 00:00:34,167 \<font color="#aaaaaa"> >>> BREAKING TONIGHT, </font> \<font color="#aaaaaa"> THE STATE OF EMERGENCY</font> 3 00:00:34,168 --> 00:00:35,468 \<font color="#aaaaaa"> >>> BREAKING TONIGHT, </font> \<font color="#aaaaaa"> THE STATE OF EMERGENCY</font> \<font color="#aaaaaa"> HERE IN CALIFORNIA, AS</font> 4 00:00:35,469 --> 00:00:37,103 \<font color="#aaaaaa"> THE STATE OF EMERGENCY</font> \<font color="#aaaaaa"> HERE IN CALIFORNIA, AS</font> \<font color="#aaaaaa"> HEAVY RAINS BRING </font> 5 00:00:37,104 --> 00:00:38,204 \<font color="#aaaaaa"> HERE IN CALIFORNIA, AS</font> \<font color="#aaaaaa"> HEAVY RAINS BRING </font> \<font color="#aaaaaa"> CATASTROPHIC FLOODING </font> Adding the option "--output-field 1" removes the second .srt output and fixes this. (However, after further evaluation, I find the video file does not have a second field. i.e. If I use --output-field 2, there is only an empty .srt created). If I add the --no-fontcolor option, both files are still created, except now the .p1.svc01.srt file has the font color markup removed. I'm on the latest Windows portable version, 0.96.2, but the same thing occurs with all 0.96 releases.
Author
Owner

@cfsmp3 commented on GitHub (Dec 28, 2025):

The first one is CEA-608, let' say the legacy captions from the analog era. The second one is CEA-708, the digital version. In some content there's a difference (in favor of the digital, which doesn't have the same limitations), while in others is identical because they don't bother creating a better version. I suppose news is one of these cases.

Which one looks better to you, both in content but also in timing? There seems to be a difference there.

Anyway I don't think the 2nd file is accurate, since it's opening the font tag but not closing it.

Can you share that video file? We'll take a look.

@cfsmp3 commented on GitHub (Dec 28, 2025): The first one is CEA-608, let' say the legacy captions from the analog era. The second one is CEA-708, the digital version. In some content there's a difference (in favor of the digital, which doesn't have the same limitations), while in others is identical because they don't bother creating a better version. I suppose news is one of these cases. Which one looks better to you, both in content but also in timing? There seems to be a difference there. Anyway I don't think the 2nd file is accurate, since it's opening the font tag but not closing it. Can you share that video file? We'll take a look.
Author
Owner

@way-lo commented on GitHub (Dec 28, 2025):

Thanks for the reply.

Looking at my quoted text, I forgot to properly add the "\" character to allow for proper pasting within the github comment. I did for the first "<" in each line, but forgot the second one. So there are font closures. All the text should look like this:
1
00:00:32,133 --> 00:00:33,600
<font color="#aaaaaa"> >>> BREAKING TONIGHT, </font>

2
00:00:33,601 --> 00:00:34,167
<font color="#aaaaaa"> >>> BREAKING TONIGHT, </font>
<font color="#aaaaaa"> THE STATE OF EMERGENCY</font>

3
00:00:34,168 --> 00:00:35,468
<font color="#aaaaaa"> >>> BREAKING TONIGHT, </font>
<font color="#aaaaaa"> THE STATE OF EMERGENCY</font>
<font color="#aaaaaa"> HERE IN CALIFORNIA, AS</font>

Here's a link to the file.
https://mega.nz/file/q1wUFDgb#gcX3HiOFl59qXRMMAsww-ADqP3p7645Ym94GdJFqmWU

My prelim testing of other show types (serialized shows, for example) gives the same double output, so it may not be "news" specific. These files are recorded through a cablecard on US Comcast.

I'll have to run two playback instances side by side to see which one .srt is actually superior. The chosen font color is actually quite poor, but I'll take a look at the timing, etc.

@way-lo commented on GitHub (Dec 28, 2025): Thanks for the reply. Looking at my quoted text, I forgot to properly add the "\\" character to allow for proper pasting within the github comment. I did for the first "<" in each line, but forgot the second one. So there are font closures. All the text should look like this: 1 00:00:32,133 --> 00:00:33,600 \<font color="#aaaaaa"> >>> BREAKING TONIGHT, \</font> 2 00:00:33,601 --> 00:00:34,167 \<font color="#aaaaaa"> >>> BREAKING TONIGHT, \</font> \<font color="#aaaaaa"> THE STATE OF EMERGENCY\</font> 3 00:00:34,168 --> 00:00:35,468 \<font color="#aaaaaa"> >>> BREAKING TONIGHT, \</font> \<font color="#aaaaaa"> THE STATE OF EMERGENCY\</font> \<font color="#aaaaaa"> HERE IN CALIFORNIA, AS\</font> Here's a link to the file. https://mega.nz/file/q1wUFDgb#gcX3HiOFl59qXRMMAsww-ADqP3p7645Ym94GdJFqmWU My prelim testing of other show types (serialized shows, for example) gives the same double output, so it may not be "news" specific. These files are recorded through a cablecard on US Comcast. I'll have to run two playback instances side by side to see which one .srt is actually superior. The chosen font color is actually quite poor, but I'll take a look at the timing, etc.
Author
Owner

@cfsmp3 commented on GitHub (Dec 28, 2025):

Yes, the double output will happen whenever there's subtitles in both CEA-608 and CEA-708. What I mean is that for news, it's likely to be pretty much the same. For other content however it's possible that they do use the extended capabilities for the digital format and subtitles are better in general.

@cfsmp3 commented on GitHub (Dec 28, 2025): Yes, the double output will happen whenever there's subtitles in both CEA-608 and CEA-708. What I mean is that for news, it's likely to be pretty much the same. For other content however it's possible that they do use the extended capabilities for the digital format and subtitles are better in general.
Author
Owner

@way-lo commented on GitHub (Dec 28, 2025):

Okay, thanks for the explanation! Searching more, I found the help section that is exactly what you described:

Notes on the CEA-708 decoder:
By default, ccextractor now extracts both CEA-608 and CEA-708 subtitles
if they are present in the input. This results in two output files: one
for CEA-608 and one for CEA-708.
To extract only CEA-608 subtitles, use -1, -2, or -12.
To extract only CEA-708 subtitles, use -svc.
To extract both CEA-608 and CEA-708 subtitles, use both -1/-2/-12 and -svc.

I don't see what option to specify before the -1, -2 or -svc however. Using "--output-field svc" results in an error. Can you clarify?

@way-lo commented on GitHub (Dec 28, 2025): Okay, thanks for the explanation! Searching more, I found the help section that is exactly what you described: > Notes on the CEA-708 decoder: By default, ccextractor now extracts both CEA-608 and CEA-708 subtitles if they are present in the input. This results in two output files: one for CEA-608 and one for CEA-708. To extract only CEA-608 subtitles, use -1, -2, or -12. To extract only CEA-708 subtitles, use -svc. To extract both CEA-608 and CEA-708 subtitles, use both -1/-2/-12 and -svc. I don't see what option to specify before the -1, -2 or -svc however. Using "--output-field svc" results in an error. Can you clarify?
Author
Owner

@cfsmp3 commented on GitHub (Dec 29, 2025):

I don't see what option to specify before the -1, -2 or -svc however. Using "--output-field svc" results in an error. Can you clarify?

It's -svc 1 (if you want service 1)

@cfsmp3 commented on GitHub (Dec 29, 2025): > > I don't see what option to specify before the -1, -2 or -svc however. Using "--output-field svc" results in an error. Can you clarify? It's -svc 1 (if you want service 1)
Author
Owner

@way-lo commented on GitHub (Dec 30, 2025):

Okay, I found where my issue is, after utilizing the ccxgui and examining the command construction.

"svc" is not accepted as an option on the command line. The full word "service" must be used instead.

(Similarly, I noticed -sc has been removed after 0.94 and is now replaced with "sentencecap")

@way-lo commented on GitHub (Dec 30, 2025): Okay, I found where my issue is, after utilizing the ccxgui and examining the command construction. "svc" is not accepted as an option on the command line. The full word "service" must be used instead. (Similarly, I noticed -sc has been removed after 0.94 and is now replaced with "sentencecap")
Author
Owner

@cfsmp3 commented on GitHub (Dec 30, 2025):

"svc" is not accepted as an option on the command line. The full word "service" must be used instead.

That's for this! We've added svc as an alias for service just now.

(Similarly, I noticed -sc has been removed after 0.94 and is now replaced with "sentencecap")

I don't know if sc was used often. Should we add this alias too?

@cfsmp3 commented on GitHub (Dec 30, 2025): > "svc" is not accepted as an option on the command line. The full word "service" must be used instead. That's for this! We've added svc as an alias for service just now. > (Similarly, I noticed -sc has been removed after 0.94 and is now replaced with "sentencecap") I don't know if sc was used often. Should we add this alias too?
Author
Owner

@way-lo commented on GitHub (Dec 30, 2025):

Thanks for the quick fix.

Re: -sc vs -sentencecap, -sc is still referenced on some of the older tutorials online, such as here:
https://ccextractor.org/public/general/command_line_usage/

(I recognize that's using the old 0.88 help screen, and a lot has changed since then).

I might recommend adding it in, if only to keep the old naming alive and consistent.

I agree it's probably not a frequently chosen option. I used to use it because the all caps of news is a bit grating, but somehow having no proper nouns capitalized is also funny to read. I wish there were widely available dictionary files of proper nouns to include. I tried making one just of popular English first names, and there are a LOT out there!

@way-lo commented on GitHub (Dec 30, 2025): Thanks for the quick fix. Re: -sc vs -sentencecap, -sc is still referenced on some of the older tutorials online, such as here: https://ccextractor.org/public/general/command_line_usage/ (I recognize that's using the old 0.88 help screen, and a lot has changed since then). I might recommend adding it in, if only to keep the old naming alive and consistent. I agree it's probably not a frequently chosen option. I used to use it because the all caps of news is a bit grating, but somehow having no proper nouns capitalized is also funny to read. I wish there were widely available dictionary files of proper nouns to include. I tried making one just of popular English first names, and there are a LOT out there!
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/ccextractor#866