Broken text (video sample available) #158

Closed
opened 2026-01-29 16:36:35 +00:00 by claunia · 11 comments
Owner

Originally created by @SviMik on GitHub (Jun 6, 2016).

Originally assigned to: @bigharshrag on GitHub.

I'd like to send you a video sample, which doesn't work well with CCExtractor in hope you can fix it, or tell me what's wrong with the file.

Download link:
https://docs.google.com/uc?id=0B5foVtK-k1OPUEEtSmQyTEZDLUU&export=download

I've tried this command:
ccextractor -noteletext -12 -in=ts S06E11_raw_1080i.ts

And the output is here:
http://svimik.com/S06E11_raw_1080i_1.srt

Indeed there are some captions, but it's somehow broken. Is it an input file fault, or there is something that CCExtractor can't understand?

I have also tried the ffmpeg with following command:
ffmpeg -f lavfi -i "movie=S06E11_raw_1080i.ts[out0+subcc]" -map s S06E11_ffmpeg.srt

ffmpeg did it somehow better (at least, there is a clean text, although with some words and lines still missing).
Here is ffmpeg output: http://svimik.com/S06E11_ffmpeg.srt

Originally created by @SviMik on GitHub (Jun 6, 2016). Originally assigned to: @bigharshrag on GitHub. I'd like to send you a video sample, which doesn't work well with CCExtractor in hope you can fix it, or tell me what's wrong with the file. Download link: https://docs.google.com/uc?id=0B5foVtK-k1OPUEEtSmQyTEZDLUU&export=download I've tried this command: ccextractor -noteletext -12 -in=ts S06E11_raw_1080i.ts And the output is here: http://svimik.com/S06E11_raw_1080i_1.srt Indeed there are some captions, but it's somehow broken. Is it an input file fault, or there is something that CCExtractor can't understand? I have also tried the ffmpeg with following command: ffmpeg -f lavfi -i "movie=S06E11_raw_1080i.ts[out0+subcc]" -map s S06E11_ffmpeg.srt ffmpeg did it somehow better (at least, there is a clean text, although with some words and lines still missing). Here is ffmpeg output: http://svimik.com/S06E11_ffmpeg.srt
Author
Owner

@cfsmp3 commented on GitHub (Jul 5, 2016):

I thought this was related to CEA-708 but I'm not sure any more... ffprobe shows this:

Input #0, mpegts, from 'My.Little.Pony.Friendship.is.Magic.S06E11.HDTV.1080i.DD5.1.MPEG-2.ts':
Duration: 00:22:00.71, start: 0.482178, bitrate: 9605 kb/s
Program 1
Stream #0:0[0x20e]: Video: mpeg2video (Main) ([2][0][0][0] / 0x0002), yuv420p(tv), 1920x1080 [SAR 1:1 DAR 16:9], max. 80000 kb/s, 29.97 fps, 59.94 tbr, 90k tbn, 59.94 tbc
Stream #0:10x20f: Audio: ac3 (AC-3 / 0x332D4341), 48000 Hz, 5.1(side), fltp, 384 kb/s

So no subtitles, but VLC shows CC1 up to CC4

And there's some text...

@cfsmp3 commented on GitHub (Jul 5, 2016): I thought this was related to CEA-708 but I'm not sure any more... ffprobe shows this: Input #0, mpegts, from 'My.Little.Pony.Friendship.is.Magic.S06E11.HDTV.1080i.DD5.1.MPEG-2.ts': Duration: 00:22:00.71, start: 0.482178, bitrate: 9605 kb/s Program 1 Stream #0:0[0x20e]: Video: mpeg2video (Main) ([2][0][0][0] / 0x0002), yuv420p(tv), 1920x1080 [SAR 1:1 DAR 16:9], max. 80000 kb/s, 29.97 fps, 59.94 tbr, 90k tbn, 59.94 tbc Stream #0:1[0x20f](eng): Audio: ac3 (AC-3 / 0x332D4341), 48000 Hz, 5.1(side), fltp, 384 kb/s So no subtitles, but VLC shows CC1 up to CC4 And there's some text...
Author
Owner

@SviMik commented on GitHub (Jul 5, 2016):

Yes, there are some subtitles, but I'm not sure about the format.
The last version from git extracts much better (the garbage is gone), but some words are still missing (in VLC there is same problem).
Is the file broken or there's just some unknown encoding?

@SviMik commented on GitHub (Jul 5, 2016): Yes, there are some subtitles, but I'm not sure about the format. The last version from git extracts much better (the garbage is gone), but some words are still missing (in VLC there is same problem). Is the file broken or there's just some unknown encoding?
Author
Owner

@ghost commented on GitHub (Nov 29, 2016):

As of 0.82 on windows 10,, the output is now in plaintext. However, a lot of the text is quite garbled. No warnings or segfault.

Console output:

--------------------------------------------------------------------------
Input: My.Little.Pony.Friendship.is.Magic.S06E11.HDTV.1080i.DD5.1.MPEG-2.ts
[Extract: 12] [Stream mode: Transport]
[Program : Auto ] [Hauppage mode: No] [Use MythTV code: Auto]
[Timing mode: Auto] [Debug: No] [Buffer input: Yes]
[Use pic_order_cnt_lsb for H.264: No] [Print CC decoder traces: No]
[Target format: .srt] [Encoding: UTF-8] [Delay: 0] [Trim lines: No]
[Add font color data: Yes] [Add font typesetting: Yes]
[Convert case: No] [Video-edit join: No]
[Extraction start time: not set (from start)]
[Extraction end time: not set (to end)]
[Live stream: No] [Clock frequency: 90000]
[Teletext page: Autodetect]
[Start credits text: None]

-----------------------------------------------------------------
Opening file: My.Little.Pony.Friendship.is.Magic.S06E11.HDTV.1080i.DD5.1.MPEG-2.ts
Analyzing data in general mode
Creating My.Little.Pony.Friendship.is.Magic.S06E11.HDTV.1080i.DD5.1.MPEG-2_1.srt
Creating My.Little.Pony.Friendship.is.Magic.S06E11.HDTV.1080i.DD5.1.MPEG-2_2.srt


New video information found
[1920 * 1080] [AR: 03 - 16:9] [FR: 04 - 29.97] [progressive: no]

100%  |  22:00
Number of NAL_type_7: 0
Number of VCL_HRD: 0
Number of NAL HRD: 0
Number of jump-in-frames: 0
Number of num_unexpected_sei_length: 0

Total frames time:        00:22:00:619  (39579 frames at 29.97fps)
incl. pulldown frames:  00:01:06:266  (1986 frames at 29.97fps)

Min PTS:                                00:00:00:482
Max PTS:                                00:22:01:167
Length:                          00:22:00:685

Initial GOP time:          00:00:00:000
Final GOP time:          00:22:00:233+13F
Diff. GOP length:          00:22:00:233+13F     (00:22:00:666)


Total user data fields: 112596
SCTE-20 type user data fields: 37532
HDTV type user data fields: 37532
Done, processing time = 2 seconds
This is beta software. Report issues to carlos at ccextractor org...

Output file: My.Little.Pony.Friendship.is.Magic.S06E11.HDTV.1080i.DD5.1.MPEG-2_1.zip

@ghost commented on GitHub (Nov 29, 2016): As of 0.82 on windows 10,, the output is now in plaintext. However, a lot of the text is quite garbled. No warnings or segfault. Console output: ``` -------------------------------------------------------------------------- Input: My.Little.Pony.Friendship.is.Magic.S06E11.HDTV.1080i.DD5.1.MPEG-2.ts [Extract: 12] [Stream mode: Transport] [Program : Auto ] [Hauppage mode: No] [Use MythTV code: Auto] [Timing mode: Auto] [Debug: No] [Buffer input: Yes] [Use pic_order_cnt_lsb for H.264: No] [Print CC decoder traces: No] [Target format: .srt] [Encoding: UTF-8] [Delay: 0] [Trim lines: No] [Add font color data: Yes] [Add font typesetting: Yes] [Convert case: No] [Video-edit join: No] [Extraction start time: not set (from start)] [Extraction end time: not set (to end)] [Live stream: No] [Clock frequency: 90000] [Teletext page: Autodetect] [Start credits text: None] ----------------------------------------------------------------- Opening file: My.Little.Pony.Friendship.is.Magic.S06E11.HDTV.1080i.DD5.1.MPEG-2.ts Analyzing data in general mode Creating My.Little.Pony.Friendship.is.Magic.S06E11.HDTV.1080i.DD5.1.MPEG-2_1.srt Creating My.Little.Pony.Friendship.is.Magic.S06E11.HDTV.1080i.DD5.1.MPEG-2_2.srt New video information found [1920 * 1080] [AR: 03 - 16:9] [FR: 04 - 29.97] [progressive: no] 100% | 22:00 Number of NAL_type_7: 0 Number of VCL_HRD: 0 Number of NAL HRD: 0 Number of jump-in-frames: 0 Number of num_unexpected_sei_length: 0 Total frames time: 00:22:00:619 (39579 frames at 29.97fps) incl. pulldown frames: 00:01:06:266 (1986 frames at 29.97fps) Min PTS: 00:00:00:482 Max PTS: 00:22:01:167 Length: 00:22:00:685 Initial GOP time: 00:00:00:000 Final GOP time: 00:22:00:233+13F Diff. GOP length: 00:22:00:233+13F (00:22:00:666) Total user data fields: 112596 SCTE-20 type user data fields: 37532 HDTV type user data fields: 37532 Done, processing time = 2 seconds This is beta software. Report issues to carlos at ccextractor org... ``` Output file: [My.Little.Pony.Friendship.is.Magic.S06E11.HDTV.1080i.DD5.1.MPEG-2_1.zip](https://github.com/CCExtractor/ccextractor/files/617810/My.Little.Pony.Friendship.is.Magic.S06E11.HDTV.1080i.DD5.1.MPEG-2_1.zip)
Author
Owner

@cfsmp3 commented on GitHub (Jan 11, 2017):

@SviMik what was used to produce this file? (hardware, software, source of the input)

@cfsmp3 commented on GitHub (Jan 11, 2017): @SviMik what was used to produce this file? (hardware, software, source of the input)
Author
Owner

@SviMik commented on GitHub (Jan 11, 2017):

Can't say. These files was produced by @SpazzDHN.
Never actually had a contact with him (I tried once, but got ignored - maybe you get more luck).
Btw, sometimes he publish captions too, without any flaws. It's still a mystery to me how he extract it.

@SviMik commented on GitHub (Jan 11, 2017): Can't say. These files was produced by [@SpazzDHN](https://twitter.com/SpazzDHN). Never actually had a contact with him (I tried once, but got ignored - maybe you get more luck). Btw, sometimes he publish captions too, without any flaws. It's still a mystery to me how he extract it.
Author
Owner

@cfsmp3 commented on GitHub (Jan 20, 2017):

GSoC qualification: This issues gives 3 points.

If not solved/replied to by Feb 15 I'll close it here.

@cfsmp3 commented on GitHub (Jan 20, 2017): GSoC qualification: This issues gives 3 points. If not solved/replied to by Feb 15 I'll close it here.
Author
Owner

@himanshu-dixit commented on GitHub (Jan 20, 2017):

@bigharshrag are you working on this?

@himanshu-dixit commented on GitHub (Jan 20, 2017): @bigharshrag are you working on this?
Author
Owner

@cactusGit commented on GitHub (Jan 20, 2017):

CEA-608 is really corrupt in files from this ripper (I've checked binary in #426), but CEA-708 looks ok.

@cactusGit commented on GitHub (Jan 20, 2017): CEA-608 is really corrupt in files from this ripper (I've checked binary in #426), but CEA-708 looks ok.
Author
Owner

@mahalwal commented on GitHub (Dec 24, 2017):

@cfsmp3 @SviMik Sample is not available.

@mahalwal commented on GitHub (Dec 24, 2017): @cfsmp3 @SviMik Sample is not available.
Author
Owner

@SviMik commented on GitHub (Dec 25, 2017):

@mahalwal sent the sample to your email.

@SviMik commented on GitHub (Dec 25, 2017): @mahalwal sent the sample to your email.
Author
Owner

@cfsmp3 commented on GitHub (Jan 12, 2018):

Closing because link is no longer available.

@cfsmp3 commented on GitHub (Jan 12, 2018): Closing because link is no longer available.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/ccextractor#158