mirror of
https://github.com/CCExtractor/ccextractor.git
synced 2026-02-14 13:35:43 +00:00
Ripping captions from WTV files ignores all time stamps and commercials #210
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @Silver-Streak on GitHub (Nov 30, 2016).
I have a HTPC I built running Windows 8Windows Media Center. I use MCEBuddy to clean up my recorded WTV files, ripping captions to SRT for my wife, who is deaf and wants to be able to watch the videos throughout the house, and creating edl (I store for later to try and tweak appropriately).
I noticed that all of a sudden all of my subtitles are WAY out of sync. Like, minutes. Doing some digging, it happened in every file I've ripped CC from since I upgrade MCEBuddy. Digging into that, I saw they updated their version of CC extractor. The previous version of MCEBuddy I had installed (2.3.5) had CCExtractor 0.66,, where as the newer version of MCEBuddy had CCExtractor 0.81.
Looking at output files, it looks like what is happening is that the newly output captioned file act like the show starts at 0:00:00, which is not true, due to commercial/channel logos. Older captured files (from 0.66, I believe) include the commercials themselves, and seem to be fully in sync. The new files don't appear to have captions at all for the commercials that have them, and timestamps are off by minutes in some cases.
I've eliminated MCEBuddy from the equation, and have been able to reproduce this behavior on a separate machine with 0.81 vs 0.66. I've attached what each version spits out. Any insight into what I'm doing wrong, or what could have changed, is greatly appreciated.
Created previously
062116 Arrow_S03E18 - Copysrt.txt
Created and recreated yesterday
112816 The Flash (2014)_S03E05 - Copysrt.txt
For reference, the first line in the video file this is ripped from doesn't start until 1 minute, 41 seconds.
@cfsmp3 commented on GitHub (Nov 30, 2016):
Sounds like you are not using the subtitle file on the same video file it
was extracted from? The subtitles are in sync with the source file, but if
you remove clips from that file and generate a new one with missing parts
of course the subtitles are going to be totally off sync.
On Tue, Nov 29, 2016 at 6:24 PM, Silver-Streak notifications@github.com
wrote:
@Silver-Streak commented on GitHub (Nov 30, 2016):
To clarify, these files are being used with the exact same files they're ripped from. Previously, I had been automatically converting these files to MKV and leaving the subtitles external files, I've halted all conversions until I can figure out what is happening with the CC timings. To provide additional clarity, these are being recorded via ATSC over the air broadcasts.
Unfortunately, I no longer have the source WTV for the Arrow episode, so I cannot show a 1:1 comparison, as I can no longer get any newly extracted captions to extract with correct timings.
@Silver-Streak commented on GitHub (Nov 30, 2016):
Here's a better example. I just reconverted one of the few WTV files I have kept that were good (as I do not reconvert Legends of Tomorrow episodes yet, as I haven't set up the rules within MCEbuddy).
Here is the original caption file.
Last line of text with time stamp:
"1082
01:01:26,951 --> 01:01:34,289
♪ ♪ "
Original LOT_S02E01srt.txt
Here's what 0.81 spat out with clean settings, from the exact same video file:
Last string and time stamp:
"
1082
00:56:05,230 --> 00:56:12,568
¶ ¶ "
Just created LOT_S02E01srt.txt
@cfsmp3 commented on GitHub (Nov 30, 2016):
We'd need the source video files. Best thing you can do is upload them to
Google Drive or Dropbox or any other place from which we can download
directly (we won't download from a place in which we need to create an
account or which offer bad speeds unless we pay) and we'll try to figure it
out.
On Tue, Nov 29, 2016 at 6:45 PM, Silver-Streak notifications@github.com
wrote:
@Silver-Streak commented on GitHub (Nov 30, 2016):
Edit for clarity:
I can definitely upload one, but the source video files are 4 gb each due to being 1080p recorded via HD OTA broadcast. I have the bandwidth to do it if it'll still meet what you all need, and as long as it won't kill anyone's bandwidth cap?
If so, I can upload the legends of tomorrow episode listed above, since it is still the original and I have the original SRT.
@cfsmp3 commented on GitHub (Nov 30, 2016):
Yes, we're used to dealing with mammoth files.
On Tue, Nov 29, 2016 at 6:49 PM, Silver-Streak notifications@github.com
wrote:
@Silver-Streak commented on GitHub (Nov 30, 2016):
You got it. Uploading now, will post link tonight or tomorrow once it's done. Thanks for your time and assistance.
@cfsmp3 commented on GitHub (Dec 15, 2016):
Problem confirmed in 0.83, we'll look into this.
@Silver-Streak commented on GitHub (Dec 15, 2016):
@cfsmp3 Thanks a ton. I'll go ahead and remove the test file for now. Let me know if you need any other clips or if I can assist with testing anything.
@cfsmp3 commented on GitHub (Dec 15, 2016):
Wait wait please don't remove the file, otherwise other developers can't
work on it.
On Wed, Dec 14, 2016 at 4:55 PM, Silver-Streak notifications@github.com
wrote:
@Silver-Streak commented on GitHub (Dec 15, 2016):
@cfsmp3 Sure thing. Reposting link as I deleted the original comment: https://drive.google.com/open?id=0B-7WslMx37HXV0NyYVF1QndxOWM
@cfsmp3 commented on GitHub (Jan 11, 2017):
Fixed in current master.
@Silver-Streak commented on GitHub (Jan 11, 2017):
Unfortunately I'm terrible at compiling. How long would I need to wait
until a binary with this master is made available?
On Tuesday, January 10, 2017, Carlos Fernandez Sanz <
notifications@github.com> wrote:
@cfsmp3 commented on GitHub (Jan 11, 2017):
For Windows?
@Silver-Streak commented on GitHub (Jan 11, 2017):
Correct.
On Tuesday, January 10, 2017, Carlos Fernandez Sanz <
notifications@github.com> wrote:
@cfsmp3 commented on GitHub (Jan 11, 2017):
Try this one. Not 100% sure all dependencies are there - let us know if it works OK for you.
ccextractor.0.85-full-RC1.zip
@Silver-Streak commented on GitHub (Jan 11, 2017):
That's fantastic. I can confirm it works (both in the capacity of
dependencies, and in captions). The video that would no longer produce
proper captions now produces proper captions. As soon as the final release
is out I'll bug the MCEBuddy folks to let them know. In the interim I'll
manually replace it's install path.
Thank you and everyone else so much on this. My wife will be estatic.
On Tue, Jan 10, 2017 at 8:25 PM, Carlos Fernandez Sanz <
notifications@github.com> wrote:
@Silver-Streak commented on GitHub (Jan 11, 2017):
Hey Carlos,
So...bad news. Sorry to bother you with this.
The test file I provided definitely works now....other files I've thrown at
it are better, but still around 10ish seconds out of sync. I've tested with
3 other video files now and it seems to be the same deal.
Should I upload another test file that is not working?
-Silver
On Tuesday, January 10, 2017, Silver Stratton silver.stratton@gmail.com
wrote:
@cfsmp3 commented on GitHub (Jan 11, 2017):
Sure, send us samples and we'll analyze them... goal is 100% reliability.
On Tue, Jan 10, 2017 at 9:36 PM, Silver-Streak notifications@github.com
wrote:
@Silver-Streak commented on GitHub (Jan 12, 2017):
Thanks Carlos. Here's another test file:
https://drive.google.com/open?id=0B-7WslMx37HXZ3NRY3YtWTJWTEk
A good example issue: In this video, at 40:06 and ending at 40:07, the line
"Barry it doesn't matter what you've done" is said. In the SRT that the
compiled version you gave me spits out, this appears at 00:39:57,229 -->
00:39:58,562
On Wed, Jan 11, 2017 at 8:08 AM, Carlos Fernandez Sanz <
notifications@github.com> wrote:
@saurabhshri commented on GitHub (Jan 12, 2017):
@Silver-Streak Looks like Barry messed up with CCExtractor's timeline too.
@brooss commented on GitHub (Jan 12, 2017):
I'm not quite clear on what you're reporting here. Is this a new bug (it worked correctly on an older version but is wrong in the newest version)?
A quick test shows the same output with an old 0.70 build and the latest Git for me with the new test file you uploaded.
@Silver-Streak commented on GitHub (Jan 13, 2017):
@brooss To clarify, all of my exports worked fine on old versions, I believe they work correctly on 0.65 or 66. I'm going to confirm that and update now. For more detail, I've added as much as I could below:
Edit: Sadly, I can't test on old versions. I can't register CCExtractordump in Windows 10 it seems, so I can't go back that far. That said, is it possible this is related to this: https://github.com/CCExtractor/ccextractor/issues/641
So, the original test file now works fine with 0.85 RC1. The new test file,
ltest.wtv, does not. As mentioned in the above post, this is the scenario
I'm seeing.
you've done" is heard.
Center: Caption and audio are synced to the above time.
(from 0.85 RC1)
.srt at 00:39:57,229 --> 00:39:58,562
loaded, all of the captions are out of sync with the audio. The subtitle
srt text is ahead of the audio by ~10 seconds.
As a precaution, I also uploaded the output .srt here:
https://drive.google.com/open?id=0B-7WslMx37HXcTAzVk9tbkQ2bU0
Also, here is the output of the command line after running the above
command. One thing to notice, the source .wtv has a length of 62:58,
however the detected length in this output is 62:49:
"
D:\Temp\ccextractor>ccextractorwinfull ltest.wtv -o ltest.srt
CCExtractor 0.85, Carlos Fernandez Sanz, Volker Quetschke.
Teletext portions taken from Petr Kutalek's telxcc
Input: ltest.wtv
[Extract: 1] [Stream mode: Autodetect]
[Program : Auto ] [Hauppage mode: No] [Use MythTV code: Auto]
[Timing mode: Auto] [Debug: No] [Buffer input: Yes]
[Use pic_order_cnt_lsb for H.264: No] [Print CC decoder traces: No]
[Target format: .srt] [Encoding: UTF-8] [Delay: 0] [Trim lines: No]
[Add font color data: Yes] [Add font typesetting: Yes]
[Convert case: No] [Video-edit join: No]
[Extraction start time: not set (from start)]
[Extraction end time: not set (to end)]
[Live stream: No] [Clock frequency: 90000]
[Teletext page: Autodetect]
[Start credits text: None]
Opening file: ltest.wtv
File seems to be a WTV, enabling WTV mode
Analyzing data in general mode
Creating ltest.srt
100% | 62:49
Number of NAL_type_7: 0
Number of VCL_HRD: 0
Number of NAL HRD: 0
Number of jump-in-frames: 0
Number of num_unexpected_sei_length: 0
Total frames time: 00:00:00:000 (0 frames at 29.97fps)
Min PTS: 00:00:11:067
Max PTS: 01:03:00:633
Length: 01:02:49:566
Done, processing time = 2 seconds
Issues? Open a ticket here
https://github.com/CCExtractor/ccextractor/issues
"
@brooss commented on GitHub (Jan 13, 2017):
Can you try with the -wtvmpeg2 flag on some more of your recordings?
ccextractorwinfull -wtvmpeg2 ltest.wtv -o ltest.srtSome WTV recordings contain two sets of captions, one in a separate MSTV captions stream (default for ccextrator to use) and one in the mpeg2 video. This flag will make ccextractor use the alternative captions from the video stream.
In the sample you provided the -wtvmpeg2 flag seems to give accurate timings, possibly the timings in MSTV stream are simply wrong in the file (for some reason).
@Silver-Streak commented on GitHub (Jan 14, 2017):
@brooss Well that is awesome, this works on the test file. Going to test it on some of the others I knew weren't working.
Any ideas on why this would impact only since version updates last year? Did older versions always use this flag?
Assuming this works on the other files, I can always update MCEBuddy's CCextractor config to accomodate.
I'll post an update once I've tested with at least 2 other files that I know were bad.
@Silver-Streak commented on GitHub (Jan 14, 2017):
Okay, ran through an episode of 3 different shows. Without -wtvmpeg2 flag, they all have timings off by ~10 seconds. With the flag, everything is pretty much spot on.
I'll definitely update my config files to include this.
So this gives me a few questions:
(Regardless, thank you all for your help)
@brooss commented on GitHub (Jan 19, 2017):
Not sure I'm the best person to answer this, but I'll try.
Best guess, this is probably a case of your WTV files having incorrect times on the MSTV captions (for some unknown reason) in the recordings. But it is possible this is a ccextractor bug still. We are dealing with a proprietary, undocumented format here so it's possible we are parsing something wrong that's causing the offset.
It would be possible to autodetect the presence of MPEG2 captions in WTV files I think. If I remember correctly, when I was implementing the WTV functionality, the MPEG2 captions were often incomplete or incorrect though, so I'm not sure it's desirable. The best case would be for somebody to study carefully how the Microsoft players detect and select which stream to use in which circumstances over a large selection of different WTV recordings from different regions and MCE versions/hardware. That's probably a quite large undertaking though.
Really difficult to say. At this point I'm assuming it's some bug at record time.
@Silver-Streak commented on GitHub (Jan 22, 2017):
@brooss Thank you for the input. I still can't figure out what would have changed on the recordings in that time frame.
I also found out that there doesn't seem to be a way to pass through commands to CCExtractor from MCEbuddy, which makes this harder. I've posted on their forums and will see if any options come up.
Thank you all for your help.
@Silver-Streak commented on GitHub (Jan 28, 2017):
Just going to post this here incase anyone else runs into the same issue I did. Here's how I solved adding this command in MCEbuddy. This will need to be added to your profile's section in profile.conf
CustomCommandPath="C:\Program Files\MCEBuddy2x\ccextractor\ccextractorwin.exe"
CustomCommandParameters= -wtvmpeg2 "%sourcefile%" -o "%destinationpath%%showname%_S%season%##E%episode%##.srt"
CustomCommandHangPeriod=0
CustomCommandCritical=true
CustomCommandUISession=False
CustomCommandShowWindow=False