Automatic videotext Subtitle extraction fails for some HD TS files #110

Closed
opened 2026-01-29 16:35:30 +00:00 by claunia · 4 comments
Owner

Originally created by @workflowsguy on GitHub (Feb 2, 2016).

When running CCExtractor 0.79 on those files, the output is as follows:

Opening file: /Volumes/Public/bearbeiten/Videos/extrahieren Untertitel/Konzerne klagen - Wir zahlen.eyetv/000000001c3bc241.mpg
Detected MP4 box with name: mfra

File seems to be a MP4

Analyzing data with GPAC (MP4 library)
Creating /Volumes/Public/bearbeiten/Videos/extrahieren Untertitel/Konzerne klagen - Wir zahlen.eyetv/Konzerne klagen - Wir zahlen.srt
opening '/Volumes/Public/bearbeiten/Videos/extrahieren Untertitel/Konzerne klagen - Wir zahlen.eyetv/000000001c3bc241.mpg': [iso file] Incomplete box en.R
failed to open


Total frames time:    00:00:00:000  (0 frames at 29.97fps)

Done, processing time = 57 seconds
This is beta software. Report issues to carlos at ccextractor org...
0

When running telxcc on the file, the subtitles are correctly extracted.

It seems that autodetection for video type fails in this case. It works if the parameter "in=ts" is added to the command line.

The MediaInfo information for this file is:

General
ID                                       : 1039 (0x40F)
Complete name                            : /Volumes/Public/bearbeiten/Videos/extrahieren Untertitel/Konzerne klagen - Wir zahlen.eyetv/000000001c3bc241.mpg
Format                                   : MPEG-TS
File size                                : 4.29 GiB
Duration                                 : 43mn 13s
Start time                               : UTC 2016-01-05 09:25:58
End time                                 : UTC 2016-01-05 10:09:11
Overall bit rate mode                    : Variable
Overall bit rate                         : 14.2 Mbps
Country                                  : DEU
Timezone                                 : +01:00:00

Video
ID                                       : 5411 (0x1523)
Menu ID                                  : 10376 (0x2888)
Format                                   : AVC
Format/Info                              : Advanced Video Codec
Format profile                           : High@L4
Format settings, CABAC                   : Yes
Format settings, ReFrames                : 4 frames
Codec ID                                 : 27
Duration                                 : 43mn 13s
Bit rate                                 : 12.7 Mbps
Width                                    : 1 280 pixels
Height                                   : 720 pixels
Display aspect ratio                     : 16:9
Frame rate                               : 50.000 fps
Color space                              : YUV
Chroma subsampling                       : 4:2:0
Bit depth                                : 8 bits
Scan type                                : Progressive
Bits/(Pixel*Frame)                       : 0.275
Stream size                              : 3.82 GiB (89%)
Text #1
ID                                       : 5414 (0x1526)-100
Menu ID                                  : 10376 (0x2888)
Format                                   : Teletext
Language                                 : German

Text #2
ID                                       : 5415 (0x1527)
Menu ID                                  : 10376 (0x2888)
Format                                   : DVB Subtitle
Codec ID                                 : 6
Duration                                 : 43mn 1s
Delay relative to video                  : 2s 860ms
Language                                 : German

Originally created by @workflowsguy on GitHub (Feb 2, 2016). When running CCExtractor 0.79 on those files, the output is as follows: ``` Opening file: /Volumes/Public/bearbeiten/Videos/extrahieren Untertitel/Konzerne klagen - Wir zahlen.eyetv/000000001c3bc241.mpg Detected MP4 box with name: mfra File seems to be a MP4 Analyzing data with GPAC (MP4 library) Creating /Volumes/Public/bearbeiten/Videos/extrahieren Untertitel/Konzerne klagen - Wir zahlen.eyetv/Konzerne klagen - Wir zahlen.srt opening '/Volumes/Public/bearbeiten/Videos/extrahieren Untertitel/Konzerne klagen - Wir zahlen.eyetv/000000001c3bc241.mpg': [iso file] Incomplete box en.R failed to open Total frames time: 00:00:00:000 (0 frames at 29.97fps) Done, processing time = 57 seconds This is beta software. Report issues to carlos at ccextractor org... 0 ``` When running telxcc on the file, the subtitles are correctly extracted. It seems that autodetection for video type fails in this case. It works if the parameter "in=ts" is added to the command line. The MediaInfo information for this file is: ``` General ID : 1039 (0x40F) Complete name : /Volumes/Public/bearbeiten/Videos/extrahieren Untertitel/Konzerne klagen - Wir zahlen.eyetv/000000001c3bc241.mpg Format : MPEG-TS File size : 4.29 GiB Duration : 43mn 13s Start time : UTC 2016-01-05 09:25:58 End time : UTC 2016-01-05 10:09:11 Overall bit rate mode : Variable Overall bit rate : 14.2 Mbps Country : DEU Timezone : +01:00:00 Video ID : 5411 (0x1523) Menu ID : 10376 (0x2888) Format : AVC Format/Info : Advanced Video Codec Format profile : High@L4 Format settings, CABAC : Yes Format settings, ReFrames : 4 frames Codec ID : 27 Duration : 43mn 13s Bit rate : 12.7 Mbps Width : 1 280 pixels Height : 720 pixels Display aspect ratio : 16:9 Frame rate : 50.000 fps Color space : YUV Chroma subsampling : 4:2:0 Bit depth : 8 bits Scan type : Progressive Bits/(Pixel*Frame) : 0.275 Stream size : 3.82 GiB (89%) Text #1 ID : 5414 (0x1526)-100 Menu ID : 10376 (0x2888) Format : Teletext Language : German Text #2 ID : 5415 (0x1527) Menu ID : 10376 (0x2888) Format : DVB Subtitle Codec ID : 6 Duration : 43mn 1s Delay relative to video : 2s 860ms Language : German ```
Author
Owner

@cfsmp3 commented on GitHub (Feb 2, 2016):

Probably just forcing ts as input type would work. Seems like it's
incorrectly detecting it as a MP4.

On Tue, Feb 2, 2016 at 1:44 AM, workflowsguy notifications@github.com
wrote:

When running CCExtractor 079 on those files, the output is as follows:

Opening file: /Volumes/Public/bearbeiten/Videos/extrahieren Untertitel/Konzerne klagen - Wir zahleneyetv/000000001c3bc241mpg
Detected MP4 box with name: mfra

File seems to be a MP4

Analyzing data with GPAC (MP4 library)
Creating /Volumes/Public/bearbeiten/Videos/extrahieren Untertitel/Konzerne klagen - Wir zahleneyetv/Konzerne klagen - Wir zahlensrt
opening '/Volumes/Public/bearbeiten/Videos/extrahieren Untertitel/Konzerne klagen - Wir zahleneyetv/000000001c3bc241mpg': [iso file] Incomplete box enR
failed to open

Total frames time: 00:00:00:000 (0 frames at 2997fps)

Done, processing time = 57 seconds
This is beta software Report issues to carlos at ccextractor org
0

When running telxcc on the file, the subtitles are correctly extracted

The MediaInfo information for this file is:

General
ID : 1039 (0x40F)
Complete name : /Volumes/Public/bearbeiten/Videos/extrahieren Untertitel/Konzerne klagen - Wir zahleneyetv/000000001c3bc241mpg
Format : MPEG-TS
File size : 429 GiB
Duration : 43mn 13s
Start time : UTC 2016-01-05 09:25:58
End time : UTC 2016-01-05 10:09:11
Overall bit rate mode : Variable
Overall bit rate : 142 Mbps
Country : DEU
Timezone : +01:00:00

Video
ID : 5411 (0x1523)
Menu ID : 10376 (0x2888)
Format : AVC
Format/Info : Advanced Video Codec
Format profile : High@L4
Format settings, CABAC : Yes
Format settings, ReFrames : 4 frames
Codec ID : 27
Duration : 43mn 13s
Bit rate : 127 Mbps
Width : 1 280 pixels
Height : 720 pixels
Display aspect ratio : 16:9
Frame rate : 50000 fps
Color space : YUV
Chroma subsampling : 4:2:0
Bit depth : 8 bits
Scan type : Progressive
Bits/(Pixel*Frame) : 0275
Stream size : 382 GiB (89%)
Text #1
ID : 5414 (0x1526)-100
Menu ID : 10376 (0x2888)
Format : Teletext
Language : German

Text #2
ID : 5415 (0x1527)
Menu ID : 10376 (0x2888)
Format : DVB Subtitle
Codec ID : 6
Duration : 43mn 1s
Delay relative to video : 2s 860ms
Language : German


Reply to this email directly or view it on GitHub
https://github.com/CCExtractor/ccextractor/issues/273.

@cfsmp3 commented on GitHub (Feb 2, 2016): Probably just forcing ts as input type would work. Seems like it's incorrectly detecting it as a MP4. On Tue, Feb 2, 2016 at 1:44 AM, workflowsguy notifications@github.com wrote: > When running CCExtractor 079 on those files, the output is as follows: > > Opening file: /Volumes/Public/bearbeiten/Videos/extrahieren Untertitel/Konzerne klagen - Wir zahleneyetv/000000001c3bc241mpg > Detected MP4 box with name: mfra > > File seems to be a MP4 > > Analyzing data with GPAC (MP4 library) > Creating /Volumes/Public/bearbeiten/Videos/extrahieren Untertitel/Konzerne klagen - Wir zahleneyetv/Konzerne klagen - Wir zahlensrt > opening '/Volumes/Public/bearbeiten/Videos/extrahieren Untertitel/Konzerne klagen - Wir zahleneyetv/000000001c3bc241mpg': [iso file] Incomplete box enR > failed to open > > Total frames time: 00:00:00:000 (0 frames at 2997fps) > > Done, processing time = 57 seconds > This is beta software Report issues to carlos at ccextractor org > 0 > > When running telxcc on the file, the subtitles are correctly extracted > > The MediaInfo information for this file is: > > General > ID : 1039 (0x40F) > Complete name : /Volumes/Public/bearbeiten/Videos/extrahieren Untertitel/Konzerne klagen - Wir zahleneyetv/000000001c3bc241mpg > Format : MPEG-TS > File size : 429 GiB > Duration : 43mn 13s > Start time : UTC 2016-01-05 09:25:58 > End time : UTC 2016-01-05 10:09:11 > Overall bit rate mode : Variable > Overall bit rate : 142 Mbps > Country : DEU > Timezone : +01:00:00 > > Video > ID : 5411 (0x1523) > Menu ID : 10376 (0x2888) > Format : AVC > Format/Info : Advanced Video Codec > Format profile : High@L4 > Format settings, CABAC : Yes > Format settings, ReFrames : 4 frames > Codec ID : 27 > Duration : 43mn 13s > Bit rate : 127 Mbps > Width : 1 280 pixels > Height : 720 pixels > Display aspect ratio : 16:9 > Frame rate : 50000 fps > Color space : YUV > Chroma subsampling : 4:2:0 > Bit depth : 8 bits > Scan type : Progressive > Bits/(Pixel*Frame) : 0275 > Stream size : 382 GiB (89%) > Text #1 > ID : 5414 (0x1526)-100 > Menu ID : 10376 (0x2888) > Format : Teletext > Language : German > > Text #2 > ID : 5415 (0x1527) > Menu ID : 10376 (0x2888) > Format : DVB Subtitle > Codec ID : 6 > Duration : 43mn 1s > Delay relative to video : 2s 860ms > Language : German > > — > Reply to this email directly or view it on GitHub > https://github.com/CCExtractor/ccextractor/issues/273.
Author
Owner

@vinayakathavale commented on GitHub (Mar 9, 2016):

@workflowsguy can you provide the sample file (a small sample cut from the original should be enough)

@vinayakathavale commented on GitHub (Mar 9, 2016): @workflowsguy can you provide the sample file (a small sample cut from the original should be enough)
Author
Owner

@YorkHe commented on GitHub (Mar 14, 2016):

It might be related to a broken header or transformed file from MP4 to TS, since in the file-detecting program, the MP4 detection procedure goes before TS detection procedure and it did detected a valid sync-code in box.

@YorkHe commented on GitHub (Mar 14, 2016): It might be related to a broken header or transformed file from MP4 to TS, since in the file-detecting program, the MP4 detection procedure goes before TS detection procedure and it did detected a valid sync-code in box.
Author
Owner

@at25sep commented on GitHub (Mar 15, 2016):

File-detecting program detect only "mfra" mp4 box in this video which is a trailer and it is assigned weight 3 in ccx_stream_mp4_boxes . So, program detects file as mp4. "mfra" box weight should be "1" so that other box too can confirm file is mp4.

@at25sep commented on GitHub (Mar 15, 2016): File-detecting program detect only "mfra" mp4 box in this video which is a trailer and it is assigned weight 3 in ccx_stream_mp4_boxes . So, program detects file as mp4. "mfra" box weight should be "1" so that other box too can confirm file is mp4.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/ccextractor#110