mirror of
https://github.com/CCExtractor/ccextractor.git
synced 2026-02-14 21:23:42 +00:00
DVB: Incorrect timing #172
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @cfsmp3 on GitHub (Jul 5, 2016).
Originally assigned to: @anshul1912, @Abhinav95 on GitHub.
Using the usual "01-BBC1.London.News.ts" and comparing with playback via VLC (which is perfect) I can see that out timing is off and we have a number of issues.
First, in .srt
1
00:00:00,001 --> 00:00:00,000
Where am I? Where am I?!
Where am I?!
The first problem is obvious, with the end time being before the start time for that time.
In spupng that line looks like this:
The other thing is that the "Where am I..." text appears in VLC (in sync with audio, so they have it right) at around 00:04:xx. It doesn't appear immediately as both our .srt and .xml do.
Moving forward, picking a random line:
"Is this all a joke to you?"
Appears in VLC at 08:48, but we have it at
00:08:40,307 --> 00:08:42,976
Is this all a joke to you?
The final subtitles in VLC are
Jane had squeezed my hand.
Yeah. . .was it just that?
which appear at 16:57. We have this at the end:
334
00:16:49,197 --> 00:16:51,966
Jane had squeezed my hand.
Yeah. . .was it just that?
335
00:16:51,967 --> 00:16:55,346
Or was it to do
with the sentencing as well?
The final frame (335) doesn't appear with VLC at all, and that's correct as the audio isn't there.
@cfsmp3 commented on GitHub (Aug 1, 2016):
That should make it obvious. You can see that when VLC shows the subtitles (in perfect sync with audio) CCExtractor is a few subtitles frames ahead.
@Vindictor commented on GitHub (Oct 29, 2016):
The other issue I'm finding when editing files that have run through OCR is that the outputted .srt file doesn't provide any gaps on screen when there's no talking.
What I mean is this.
I've just been editing a documentary series.
Firstly, no matter when the talking actually first occurs, the first line of dialogue is shown right at the start of the .srt.
Also, when the title music plays and there's no talking, instead of showing no subtitle, it shows whatever the next line of dialogue will be throughout the title sequence and right up until the line is actually spoken.
Another example is that in the last episode I looked at, there was a 1 minute section with no talking, while some wildlife was doing it's thing.
Once again, after the last line of dialogue was spoken and the correct subtitle displayed, my .srt file then displays what will be the next line of dialogue for the entire 1 minute +, until the line is actually spoken.
The outputted .srt doesn't seem to support any gaps between speaking, and always shows the following line of text on screen.
I've been manually editing the .srt in Notepad++, setting the first line to start with wherever the actual dialogue starts. Then also changing the start time of the first line of dialogue after the title sequence, so that this line of text isn't on screen during the entire title sequence, and then I try to manually scroll through and spot where there seem to be any unnaturally long cases of a line of text remaining on screen.
EDIT: and if you're interested, here is the file I was referencing above.
https://drive.google.com/file/d/0B0DIrRkpdn12MmZBckw0N3cwNnc/view?usp=sharing
@cfsmp3 commented on GitHub (Nov 9, 2016):
Code-in task created.
@Vindictor commented on GitHub (Nov 27, 2016):
I've just tried extracting DVB subs from another file, but am having similar timing issues. This one from a DVB-S source.
Once again, there is no gap in the subtitles on screen, even if minutes go by with no dialogue. On this file the last line of text remains on screen until somebody else speaks. IE. take note of subtitle 9 from my srt below
00:00:00,001 --> 00:00:02,230
This programme contains
some strong language.
2
00:00:02,231 --> 00:00:04,480
Good morning. Please come this way.
3
00:00:04,481 --> 00:00:06,719
Thank you.
4
00:00:06,720 --> 00:00:13,289
Good morning, Mr Sodergren.
How are you?
5
00:00:13,290 --> 00:00:17,029
Your bicycle is ready.
6
00:00:17,030 --> 00:00:18,230
How are you doing today, sir?
Good to see you.
7
00:00:18,231 --> 00:00:21,309
Did you go running this morning?
I woke up very early this morning.
8
00:00:21,310 --> 00:00:27,210
Have a good day, sir.
The same to you, my friend.
9
00:00:27,211 --> 00:02:36,649
I'll be with you in just a moment.
10
00:02:36,650 --> 00:02:38,499
It's quite a short interview,
When using OCR to extract dvb-subs it never seems to allow for gaps between dialogue. One line of text will remain onscreen until the next begins.
The line "I'll be with you in just a moment." remains on screen for approx 2 minutes, rather than for a few seconds.
Currently if I want to extract DVB-Subs I need to go through each file with Notepad++ and look for moments like these, and manually change the end time of a line of dialogue.
I've tried CCExtractor 0.82, and 0.83a3 which was given to me to work with Humax PVR files, but using the 0.82 GUI with the 0.83a3.exe file.
@cfsmp3 commented on GitHub (Nov 29, 2016):
Direct link to 01-BBC1.London.News.ts:
https://drive.google.com/open?id=0B_61ywKPmI0TN3dERnRVazJIZ3c
@Vindictor commented on GitHub (Nov 30, 2016):
I've just been trying to extract subtitles from another programme, from a different source, and a different channel. (TvHeadend, DVB-T, Ch5HD, UK)
https://drive.google.com/file/d/0B03VrTibH96mQkZaZkNWQmR4XzQ/view?usp=sharing
Always the same timing issue regardless of source or channel, but only with DVB-Subs.
If I make a new recording from DVB-S in the UK, which also come with teletext subtitles, then CCExtractor does a perfect job, even retaining colour information. I love it.
Sadly, when using DVB-T recordings (which only come with DVB-Subs), or DVB-S recordings that have been remuxed and the teletext lost, I always have these timing issues with DVB-Subs.
Really hoping you clever guys can solve it. I'd just love for the DVB-Subs to be as good as the teletext. The OCR part seems really good.
Thanks, and good night.
@JuanPotato commented on GitHub (Dec 7, 2016):
This seems like a fun task to take on. Once my current one is accepted I plan to work on this
EDIT: anyone going to try this, it is quite fun
@cfsmp3 commented on GitHub (Dec 7, 2016):
Solve this problem and if you come to California I'll personally invite you
and your family to lunch.
On Tue, Dec 6, 2016 at 4:31 PM, Juan Potato notifications@github.com
wrote:
@JuanPotato commented on GitHub (Dec 11, 2016):
@cfsmp3 Maybe another task then, I don't have enough understanding of the code in the project to know how to really get things together.
@cfsmp3 commented on GitHub (Dec 11, 2016):
That's fine, still plenty of time before code-in is over. Maybe in 3 weeks
or so from now you'll feel ready to give it another go.
On Sun, Dec 11, 2016 at 1:58 PM, Juan Potato notifications@github.com
wrote: