False + True entries for each subtitle from one TV channel #220

Closed
opened 2026-01-29 16:38:23 +00:00 by claunia · 8 comments
Owner

Originally created by @PFDuke on GitHub (Dec 22, 2016).

When I record TV from Australia's SBS channel and extract the subtitles to an SRT file, each subtitle has two entries. The first entry is false and has zero or near zero length while the second entry is correct. The text is identical. Since the false entries are so short, in practice they don't cause any problems, except to double the length of the SRT file and to make editing a bit more complicated. The main problem is that I have a tidy mind... :(

This problem does not occur with the other TV channels that I have tried.

I have fiddled with a few settings in CCExtractor GUI, but they didn't help.

Two examples are stored in my Dropbox account:
https://www.dropbox.com/sh/6mpifvmm6ofw2so/AABcHbTWNlGv0t4Sr28OrkKVa?dl=0

Originally created by @PFDuke on GitHub (Dec 22, 2016). When I record TV from Australia's SBS channel and extract the subtitles to an SRT file, each subtitle has two entries. The first entry is false and has zero or near zero length while the second entry is correct. The text is identical. Since the false entries are so short, in practice they don't cause any problems, except to double the length of the SRT file and to make editing a bit more complicated. The main problem is that I have a tidy mind... :( This problem does not occur with the other TV channels that I have tried. I have fiddled with a few settings in CCExtractor GUI, but they didn't help. Two examples are stored in my Dropbox account: https://www.dropbox.com/sh/6mpifvmm6ofw2so/AABcHbTWNlGv0t4Sr28OrkKVa?dl=0
Author
Owner

@saurabhshri commented on GitHub (Dec 30, 2016):

@PFDuke Would you please share the exact command you used to extract the subtitles? If you are using the GUI version, the command appears in the box below.

@saurabhshri commented on GitHub (Dec 30, 2016): @PFDuke Would you please share the exact command you used to extract the subtitles? If you are using the GUI version, the command appears in the box below.
Author
Owner

@PFDuke commented on GitHub (Dec 31, 2016):

I am using the GUI.

I was hoping that you could tell me if there is a problem with the
sample video files from this TV channel, and if you could tell me how to
avoid the duplicated subtitle entries.

Have you tried to extract .srt subtitles from my files?

Peter Duke

------ Original Message ------
From: "Saurabh Shrivastava" notifications@github.com
To: "CCExtractor/ccextractor" ccextractor@noreply.github.com
Cc: "PFDuke" pp.duke@bigpond.com; "Mention"
mention@noreply.github.com
Sent: 30/12/2016 7:43:19 PM
Subject: Re: [CCExtractor/ccextractor] False + True entries for each
subtitle from one TV channel (#562)

@PFDuke https://github.com/PFDuke Would you please share the exact
command you used to extract the subtitles? If you are using the GUI
version, the command appears in the box below.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/CCExtractor/ccextractor/issues/562#issuecomment-269746485,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AXkNhlMrPXPle98qkJIw3CGpnevPM7BGks5rNMQngaJpZM4LTqUd.

@PFDuke commented on GitHub (Dec 31, 2016): I am using the GUI. I was hoping that you could tell me if there is a problem with the sample video files from this TV channel, and if you could tell me how to avoid the duplicated subtitle entries. Have you tried to extract .srt subtitles from my files? Peter Duke ------ Original Message ------ From: "Saurabh Shrivastava" <notifications@github.com> To: "CCExtractor/ccextractor" <ccextractor@noreply.github.com> Cc: "PFDuke" <pp.duke@bigpond.com>; "Mention" <mention@noreply.github.com> Sent: 30/12/2016 7:43:19 PM Subject: Re: [CCExtractor/ccextractor] False + True entries for each subtitle from one TV channel (#562) >@PFDuke <https://github.com/PFDuke> Would you please share the exact >command you used to extract the subtitles? If you are using the GUI >version, the command appears in the box below. > >— >You are receiving this because you were mentioned. >Reply to this email directly, view it on GitHub ><https://github.com/CCExtractor/ccextractor/issues/562#issuecomment-269746485>, >or mute the thread ><https://github.com/notifications/unsubscribe-auth/AXkNhlMrPXPle98qkJIw3CGpnevPM7BGks5rNMQngaJpZM4LTqUd>. >
Author
Owner

@saurabhshri commented on GitHub (Jan 2, 2017):

@PFDuke Yes I did, and indeed the two entries exist. I'll spend the day reading about SBS channel's CC specifications to see if it's intentional or there's some bug. Please do not remove the samples in the meantime. :)

@saurabhshri commented on GitHub (Jan 2, 2017): @PFDuke Yes I did, and indeed the two entries exist. I'll spend the day reading about SBS channel's CC specifications to see if it's intentional or there's some bug. Please do not remove the samples in the meantime. :)
Author
Owner

@PFDuke commented on GitHub (Jan 2, 2017):

Thanks

I am hoping that there is some straight forward way to prevent the
double entries, but if not there is always the text editor. :)

Peter

------ Original Message ------
From: "Saurabh Shrivastava" notifications@github.com
To: "CCExtractor/ccextractor" ccextractor@noreply.github.com
Cc: "PFDuke" pp.duke@bigpond.com; "Mention"
mention@noreply.github.com
Sent: 2/01/2017 8:20:12 PM
Subject: Re: [CCExtractor/ccextractor] False + True entries for each
subtitle from one TV channel (#562)

@PFDuke https://github.com/PFDuke Yes I did, and indeed the two
entries exist. I'll spend the day reading about SBS channel's CC
specifications to see if it's intentional or there's some bug. Please
do not remove the samples in the meantime. :)


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/CCExtractor/ccextractor/issues/562#issuecomment-269948386,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AXkNhiGUJIzKgeN3cII-YdCKYDZcsoFYks5rOMFMgaJpZM4LTqUd.

@PFDuke commented on GitHub (Jan 2, 2017): Thanks I am hoping that there is some straight forward way to prevent the double entries, but if not there is always the text editor. :) Peter ------ Original Message ------ From: "Saurabh Shrivastava" <notifications@github.com> To: "CCExtractor/ccextractor" <ccextractor@noreply.github.com> Cc: "PFDuke" <pp.duke@bigpond.com>; "Mention" <mention@noreply.github.com> Sent: 2/01/2017 8:20:12 PM Subject: Re: [CCExtractor/ccextractor] False + True entries for each subtitle from one TV channel (#562) >@PFDuke <https://github.com/PFDuke> Yes I did, and indeed the two >entries exist. I'll spend the day reading about SBS channel's CC >specifications to see if it's intentional or there's some bug. Please >do not remove the samples in the meantime. :) > >— >You are receiving this because you were mentioned. >Reply to this email directly, view it on GitHub ><https://github.com/CCExtractor/ccextractor/issues/562#issuecomment-269948386>, >or mute the thread ><https://github.com/notifications/unsubscribe-auth/AXkNhiGUJIzKgeN3cII-YdCKYDZcsoFYks5rOMFMgaJpZM4LTqUd>. >
Author
Owner

@saurabhshri commented on GitHub (Jan 2, 2017):

@PFDuke We can always add an option (which can be enabled through parameter) to completely remove the subtitles if the length of them is zero. :) But I'll be keeping this as the last resort.

@saurabhshri commented on GitHub (Jan 2, 2017): @PFDuke We can always add an option (which can be enabled through parameter) to completely remove the subtitles if the length of them is zero. :) But I'll be keeping this as the last resort.
Author
Owner

@saurabhshri commented on GitHub (Jan 2, 2017):

@PFDuke I thoroughly searched the web to find SBS's subtitling specification. While they boast of their state of the start subtitles, I couldn't find those specifications anywhere. Also, they haven't replied to my mail yet.

I was able to solve the issue though.

@cfsmp3 What according to you should be the ideal behaviour? Do you want me to add an extra parameter which will do this job? If yes, what should I name it? (-nozerolength ? )

Or, I should make this change permanent and always ignore the subtitle with same starting and ending timestamp (zero length)?

I think we should simply add a new parameter for the people who need it. This will help in conserving subtitle information which might be needed by some people.

@saurabhshri commented on GitHub (Jan 2, 2017): @PFDuke I thoroughly searched the web to find SBS's subtitling specification. While they boast of their state of the start subtitles, I couldn't find those specifications anywhere. Also, they haven't replied to my mail yet. I was able to solve the issue though. @cfsmp3 What according to you should be the ideal behaviour? Do you want me to add an extra parameter which will do this job? If yes, what should I name it? (-nozerolength ? ) Or, I should make this change permanent and always ignore the subtitle with same starting and ending timestamp (zero length)? I think we should simply add a new parameter for the people who need it. This will help in conserving subtitle information which might be needed by some people.
Author
Owner

@PFDuke commented on GitHub (Jan 3, 2017):

I wonder whether this behaviour has something to do with the use of
their live captioning equipment. I have included the start of the Vienna
New Year's Day Concert, which was live captioned. You will see that the
captions are delayed and build up progressively. The two captioning
examples I first gave you did not have to be generated in real time, so
there is no great delay, but the zero length entry may be a hangover
from using the same equipment. Just a guess, however.

I feel awkward about asking for special treatment if no one else is
worried about my problem. In any case I think I should first look at
some more examples to see how predictable it is.

Thanks and Happy New Year yourself.

Peter Duke.

------ Original Message ------
From: "Saurabh Shrivastava" notifications@github.com
To: "CCExtractor/ccextractor" ccextractor@noreply.github.com
Cc: "PFDuke" pp.duke@bigpond.com; "Mention"
mention@noreply.github.com
Sent: 3/01/2017 8:06:51 AM
Subject: Re: [CCExtractor/ccextractor] False + True entries for each
subtitle from one TV channel (#562)

@PFDuke https://github.com/PFDuke I thoroughly searched the web to
find SBS's subtitling specification. While they boast of their state of
the start subtitles, I couldn't find those specifications anywhere
else.

I was able to solve the issue though.

@cfsmp3 https://github.com/cfsmp3 What according to you should be the
ideal behaviour? Do you want me to add an extra parameter which will do
this job? If yes, what should I name it? (-nozerolength ? )

Or, I should make this change permanent and always ignore the subtitle
with same starting and ending timestamp (zero length)?

I think we should simply add a new parameter for the people who need
it. This will help in conserving subtitle information which might be
needed by some people.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/CCExtractor/ccextractor/issues/562#issuecomment-270020315,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AXkNho1QrqrXIySCf4U1MHFZuSqzUCJQks5rOWbrgaJpZM4LTqUd.

@PFDuke commented on GitHub (Jan 3, 2017): I wonder whether this behaviour has something to do with the use of their live captioning equipment. I have included the start of the Vienna New Year's Day Concert, which was live captioned. You will see that the captions are delayed and build up progressively. The two captioning examples I first gave you did not have to be generated in real time, so there is no great delay, but the zero length entry may be a hangover from using the same equipment. Just a guess, however. I feel awkward about asking for special treatment if no one else is worried about my problem. In any case I think I should first look at some more examples to see how predictable it is. Thanks and Happy New Year yourself. Peter Duke. ------ Original Message ------ From: "Saurabh Shrivastava" <notifications@github.com> To: "CCExtractor/ccextractor" <ccextractor@noreply.github.com> Cc: "PFDuke" <pp.duke@bigpond.com>; "Mention" <mention@noreply.github.com> Sent: 3/01/2017 8:06:51 AM Subject: Re: [CCExtractor/ccextractor] False + True entries for each subtitle from one TV channel (#562) >@PFDuke <https://github.com/PFDuke> I thoroughly searched the web to >find SBS's subtitling specification. While they boast of their state of >the start subtitles, I couldn't find those specifications anywhere >else. > >I was able to solve the issue though. > >@cfsmp3 <https://github.com/cfsmp3> What according to you should be the >ideal behaviour? Do you want me to add an extra parameter which will do >this job? If yes, what should I name it? (-nozerolength ? ) > >Or, I should make this change permanent and always ignore the subtitle >with same starting and ending timestamp (zero length)? > >I think we should simply add a new parameter for the people who need >it. This will help in conserving subtitle information which might be >needed by some people. > >— >You are receiving this because you were mentioned. >Reply to this email directly, view it on GitHub ><https://github.com/CCExtractor/ccextractor/issues/562#issuecomment-270020315>, >or mute the thread ><https://github.com/notifications/unsubscribe-auth/AXkNho1QrqrXIySCf4U1MHFZuSqzUCJQks5rOWbrgaJpZM4LTqUd>. >
Author
Owner

@saurabhshri commented on GitHub (Jan 4, 2017):

@PFDuke It's completely fine, and thanks for filing the bug. The issue appears to be deeper than I thought. If you could provide few more samples it will be great.

In the meantime, I have made a quick patch which should remove those "zero length" subtitles for you. If you are comfortable compiling your own version, here's the patch : https://github.com/saurabhshri/ccextractor/tree/BugFix

Simply clone this, and build. Use parameter -nonzerolength to remove those zero length subtitles.

./ccextractor elephant.ts -nonzerolength

@saurabhshri commented on GitHub (Jan 4, 2017): @PFDuke It's completely fine, and thanks for filing the bug. The issue appears to be deeper than I thought. If you could provide few more samples it will be great. In the meantime, I have made a quick patch which should remove those "zero length" subtitles for you. If you are comfortable compiling your own version, here's the patch : https://github.com/saurabhshri/ccextractor/tree/BugFix Simply clone this, and build. Use parameter `-nonzerolength` to remove those zero length subtitles. ` ./ccextractor elephant.ts -nonzerolength`
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/ccextractor#220