mirror of
https://github.com/CCExtractor/ccextractor.git
synced 2026-02-04 05:44:53 +00:00
False + True entries for each subtitle from one TV channel #220
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @PFDuke on GitHub (Dec 22, 2016).
When I record TV from Australia's SBS channel and extract the subtitles to an SRT file, each subtitle has two entries. The first entry is false and has zero or near zero length while the second entry is correct. The text is identical. Since the false entries are so short, in practice they don't cause any problems, except to double the length of the SRT file and to make editing a bit more complicated. The main problem is that I have a tidy mind... :(
This problem does not occur with the other TV channels that I have tried.
I have fiddled with a few settings in CCExtractor GUI, but they didn't help.
Two examples are stored in my Dropbox account:
https://www.dropbox.com/sh/6mpifvmm6ofw2so/AABcHbTWNlGv0t4Sr28OrkKVa?dl=0
@saurabhshri commented on GitHub (Dec 30, 2016):
@PFDuke Would you please share the exact command you used to extract the subtitles? If you are using the GUI version, the command appears in the box below.
@PFDuke commented on GitHub (Dec 31, 2016):
I am using the GUI.
I was hoping that you could tell me if there is a problem with the
sample video files from this TV channel, and if you could tell me how to
avoid the duplicated subtitle entries.
Have you tried to extract .srt subtitles from my files?
Peter Duke
------ Original Message ------
From: "Saurabh Shrivastava" notifications@github.com
To: "CCExtractor/ccextractor" ccextractor@noreply.github.com
Cc: "PFDuke" pp.duke@bigpond.com; "Mention"
mention@noreply.github.com
Sent: 30/12/2016 7:43:19 PM
Subject: Re: [CCExtractor/ccextractor] False + True entries for each
subtitle from one TV channel (#562)
@saurabhshri commented on GitHub (Jan 2, 2017):
@PFDuke Yes I did, and indeed the two entries exist. I'll spend the day reading about SBS channel's CC specifications to see if it's intentional or there's some bug. Please do not remove the samples in the meantime. :)
@PFDuke commented on GitHub (Jan 2, 2017):
Thanks
I am hoping that there is some straight forward way to prevent the
double entries, but if not there is always the text editor. :)
Peter
------ Original Message ------
From: "Saurabh Shrivastava" notifications@github.com
To: "CCExtractor/ccextractor" ccextractor@noreply.github.com
Cc: "PFDuke" pp.duke@bigpond.com; "Mention"
mention@noreply.github.com
Sent: 2/01/2017 8:20:12 PM
Subject: Re: [CCExtractor/ccextractor] False + True entries for each
subtitle from one TV channel (#562)
@saurabhshri commented on GitHub (Jan 2, 2017):
@PFDuke We can always add an option (which can be enabled through parameter) to completely remove the subtitles if the length of them is zero. :) But I'll be keeping this as the last resort.
@saurabhshri commented on GitHub (Jan 2, 2017):
@PFDuke I thoroughly searched the web to find SBS's subtitling specification. While they boast of their state of the start subtitles, I couldn't find those specifications anywhere. Also, they haven't replied to my mail yet.
I was able to solve the issue though.
@cfsmp3 What according to you should be the ideal behaviour? Do you want me to add an extra parameter which will do this job? If yes, what should I name it? (-nozerolength ? )
Or, I should make this change permanent and always ignore the subtitle with same starting and ending timestamp (zero length)?
I think we should simply add a new parameter for the people who need it. This will help in conserving subtitle information which might be needed by some people.
@PFDuke commented on GitHub (Jan 3, 2017):
I wonder whether this behaviour has something to do with the use of
their live captioning equipment. I have included the start of the Vienna
New Year's Day Concert, which was live captioned. You will see that the
captions are delayed and build up progressively. The two captioning
examples I first gave you did not have to be generated in real time, so
there is no great delay, but the zero length entry may be a hangover
from using the same equipment. Just a guess, however.
I feel awkward about asking for special treatment if no one else is
worried about my problem. In any case I think I should first look at
some more examples to see how predictable it is.
Thanks and Happy New Year yourself.
Peter Duke.
------ Original Message ------
From: "Saurabh Shrivastava" notifications@github.com
To: "CCExtractor/ccextractor" ccextractor@noreply.github.com
Cc: "PFDuke" pp.duke@bigpond.com; "Mention"
mention@noreply.github.com
Sent: 3/01/2017 8:06:51 AM
Subject: Re: [CCExtractor/ccextractor] False + True entries for each
subtitle from one TV channel (#562)
@saurabhshri commented on GitHub (Jan 4, 2017):
@PFDuke It's completely fine, and thanks for filing the bug. The issue appears to be deeper than I thought. If you could provide few more samples it will be great.
In the meantime, I have made a quick patch which should remove those "zero length" subtitles for you. If you are comfortable compiling your own version, here's the patch : https://github.com/saurabhshri/ccextractor/tree/BugFix
Simply clone this, and build. Use parameter
-nonzerolengthto remove those zero length subtitles../ccextractor elephant.ts -nonzerolength