mirror of
https://github.com/CCExtractor/ccextractor.git
synced 2026-02-17 05:25:33 +00:00
Broken text (video sample available) #158
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @SviMik on GitHub (Jun 6, 2016).
Originally assigned to: @bigharshrag on GitHub.
I'd like to send you a video sample, which doesn't work well with CCExtractor in hope you can fix it, or tell me what's wrong with the file.
Download link:
https://docs.google.com/uc?id=0B5foVtK-k1OPUEEtSmQyTEZDLUU&export=download
I've tried this command:
ccextractor -noteletext -12 -in=ts S06E11_raw_1080i.ts
And the output is here:
http://svimik.com/S06E11_raw_1080i_1.srt
Indeed there are some captions, but it's somehow broken. Is it an input file fault, or there is something that CCExtractor can't understand?
I have also tried the ffmpeg with following command:
ffmpeg -f lavfi -i "movie=S06E11_raw_1080i.ts[out0+subcc]" -map s S06E11_ffmpeg.srt
ffmpeg did it somehow better (at least, there is a clean text, although with some words and lines still missing).
Here is ffmpeg output: http://svimik.com/S06E11_ffmpeg.srt
@cfsmp3 commented on GitHub (Jul 5, 2016):
I thought this was related to CEA-708 but I'm not sure any more... ffprobe shows this:
Input #0, mpegts, from 'My.Little.Pony.Friendship.is.Magic.S06E11.HDTV.1080i.DD5.1.MPEG-2.ts':
Duration: 00:22:00.71, start: 0.482178, bitrate: 9605 kb/s
Program 1
Stream #0:0[0x20e]: Video: mpeg2video (Main) ([2][0][0][0] / 0x0002), yuv420p(tv), 1920x1080 [SAR 1:1 DAR 16:9], max. 80000 kb/s, 29.97 fps, 59.94 tbr, 90k tbn, 59.94 tbc
Stream #0:10x20f: Audio: ac3 (AC-3 / 0x332D4341), 48000 Hz, 5.1(side), fltp, 384 kb/s
So no subtitles, but VLC shows CC1 up to CC4
And there's some text...
@SviMik commented on GitHub (Jul 5, 2016):
Yes, there are some subtitles, but I'm not sure about the format.
The last version from git extracts much better (the garbage is gone), but some words are still missing (in VLC there is same problem).
Is the file broken or there's just some unknown encoding?
@ghost commented on GitHub (Nov 29, 2016):
As of 0.82 on windows 10,, the output is now in plaintext. However, a lot of the text is quite garbled. No warnings or segfault.
Console output:
Output file: My.Little.Pony.Friendship.is.Magic.S06E11.HDTV.1080i.DD5.1.MPEG-2_1.zip
@cfsmp3 commented on GitHub (Jan 11, 2017):
@SviMik what was used to produce this file? (hardware, software, source of the input)
@SviMik commented on GitHub (Jan 11, 2017):
Can't say. These files was produced by @SpazzDHN.
Never actually had a contact with him (I tried once, but got ignored - maybe you get more luck).
Btw, sometimes he publish captions too, without any flaws. It's still a mystery to me how he extract it.
@cfsmp3 commented on GitHub (Jan 20, 2017):
GSoC qualification: This issues gives 3 points.
If not solved/replied to by Feb 15 I'll close it here.
@himanshu-dixit commented on GitHub (Jan 20, 2017):
@bigharshrag are you working on this?
@cactusGit commented on GitHub (Jan 20, 2017):
CEA-608 is really corrupt in files from this ripper (I've checked binary in #426), but CEA-708 looks ok.
@mahalwal commented on GitHub (Dec 24, 2017):
@cfsmp3 @SviMik Sample is not available.
@SviMik commented on GitHub (Dec 25, 2017):
@mahalwal sent the sample to your email.
@cfsmp3 commented on GitHub (Jan 12, 2018):
Closing because link is no longer available.