[BUG] MP4 box name as random video data at beginning of TS-recording trips format-detection #693

Closed
opened 2026-01-29 16:51:14 +00:00 by claunia · 4 comments
Owner

Originally created by @hurda on GitHub (Mar 8, 2022).

CCExtractor version: 0.94

In raising this issue, I confirm the following:

  • I have read and understood the contributors guide.
  • I have checked that the bug-fix I am reporting can be replicated, or that the feature I am suggesting isn't already present.
  • I have checked that the issue I'm posting isn't already reported.
  • I have checked that the issue I'm porting isn't already solved and no duplicates exist in closed issues and in opened issues
  • I have checked the pull requests tab for existing solutions/implementations to my issue/suggestion.
  • I have used the latest available version of CCExtractor to verify this issue exists.

Necessary information

  • Is this a regression (i.e. did it work before)? NO
  • What platform did you use? WINDOWS
  • What were the used arguments? ccextractorwinfull.exe -autoprogram -out=srt -bom -utf8 file.ts

Video links

  • Is one needed?

Additional information

After running countless of DVB-recordings through ccextractor to the subtitles from the teletext, this file was the first to not getting processed at all, instead I got this console-output:

>ccextractorwinfull.exe -out=srt -bom -utf8 file.ts
CCExtractor 0.94, Carlos Fernandez Sanz, Volker Quetschke.
Teletext portions taken from Petr Kutalek's telxcc
--------------------------------------------------------------------------
Input: K:\file.ts
[Extract: 1] [Stream mode: Autodetect]
[Program : Auto ] [Hauppage mode: No] [Use MythTV code: Auto]
[CEA-708: 63 decoders active]
[CEA-708: using charset "none" for all services]
[Timing mode: Auto] [Debug: No] [Buffer input: Yes]
[Use pic_order_cnt_lsb for H.264: No] [Print CC decoder traces: No]
[Target format: .srt] [Encoding: UTF-8] [Delay: 0] [Trim lines: No]
[Add font color data: Yes] [Add font typesetting: Yes]
[Convert case: No][Filter profanity: No] [Video-edit join: No]
[Extraction start time: not set (from start)]
[Extraction end time: not set (to end)]
[Live stream: No] [Clock frequency: 90000]
[Teletext page: Autodetect]
[Start credits text: None]
[Quantisation-mode: CCExtractor's internal function]

-----------------------------------------------------------------
Opening file: K:\file.ts
Detected MP4 box with name: moov
File seems to be a MP4
Analyzing data with GPAC (MP4 library)
Opening 'K:\file.ts': ←[31m[iso file] Incomplete box 0000B00D - start 0 size 479044969
←[0m←[31m[iso file] Incomplete file while reading for dump - aborting parsing
←[0mFailed to open input file (gf_isom_open() returned error)


Total frames time:        00:00:00:000  (0 frames at 29,97fps)
Done, processing time = 0 seconds

Forcing the input-file-format with -in=ts worked and the subtitle was created successfully, but I wanted to get down to the cause of the problem.

After going through the source and checking how the format-detection works, I saw that CCE is checking the video for certain strings to determine the format, at least that's how I understood it.
I opened the TS-file in a hex-editor and searched for moov:
ccextractor_moov
Position 727131 0xB185B

Luckily that was a payload-only TS-packet of the video-PID, so I was free to just change the text to something else.
Then I ran this modified file through ccextractor, which worked:

>ccextractorwinfull.exe -autoprogram -out=srt -bom -utf8 file_edit.ts
CCExtractor 0.94, Carlos Fernandez Sanz, Volker Quetschke.
Teletext portions taken from Petr Kutalek's telxcc
--------------------------------------------------------------------------
Input: K:\file_edit.ts
[Extract: 1] [Stream mode: Autodetect]
[Program : Auto ] [Hauppage mode: No] [Use MythTV code: Auto]
[CEA-708: 63 decoders active]
[CEA-708: using charset "none" for all services]
[Timing mode: Auto] [Debug: No] [Buffer input: Yes]
[Use pic_order_cnt_lsb for H.264: No] [Print CC decoder traces: No]
[Target format: .srt] [Encoding: UTF-8] [Delay: 0] [Trim lines: No]
[Add font color data: Yes] [Add font typesetting: Yes]
[Convert case: No][Filter profanity: No] [Video-edit join: No]
[Extraction start time: not set (from start)]
[Extraction end time: not set (to end)]
[Live stream: No] [Clock frequency: 90000]
[Teletext page: Autodetect]
[Start credits text: None]
[Quantisation-mode: CCExtractor's internal function]

-----------------------------------------------------------------
Opening file: K:\file_edit.ts
File seems to be a transport stream, enabling TS mode
Analyzing data in general mode
VBI/teletext stream ID 2701 (0xa8d) for SID 2004 (0x7d4)
- Programme Identification Data = ProSieben.at
- Universal Time Co-ordinated = Tue Mar  8 15:33:44 2022
Notice: Teletext page with possible subtitles detected: 149
- No teletext page specified, first received suitable page is 149, not guaranteed
100%  |  34:00
Teletext decoder: 51004 packets processed

Number of NAL_type_7: 0
Number of VCL_HRD: 0
Number of NAL HRD: 0
Number of jump-in-frames: 0
Number of num_unexpected_sei_length: 0

Min PTS:                                25:29:18:443
Max PTS:                                26:03:18:563
Length:                          00:34:00:120
Done, processing time = 4 seconds
Originally created by @hurda on GitHub (Mar 8, 2022). CCExtractor version: 0.94 # In raising this issue, I confirm the following: - [x] I have read and understood the [contributors guide](https://github.com/CCExtractor/ccextractor/blob/master/.github/CONTRIBUTING.md). - [x] I have checked that the bug-fix I am reporting can be replicated, or that the feature I am suggesting isn't already present. - [x] I have checked that the issue I'm posting isn't already reported. - [x] I have checked that the issue I'm porting isn't already solved and no duplicates exist in [closed issues](https://github.com/CCExtractor/ccextractor/issues?q=is%3Aissue+is%3Aclosed) and in [opened issues](https://github.com/CCExtractor/ccextractor/issues) - [x] I have checked the pull requests tab for existing solutions/implementations to my issue/suggestion. - [x] I have used the latest available version of CCExtractor to verify this issue exists. # Necessary information - Is this a regression (i.e. did it work before)? NO - What platform did you use? WINDOWS - What were the used arguments? `ccextractorwinfull.exe -autoprogram -out=srt -bom -utf8 file.ts` # Video links * Is one needed? # Additional information After running countless of DVB-recordings through ccextractor to the subtitles from the teletext, this file was the first to not getting processed at all, instead I got this console-output: ``` >ccextractorwinfull.exe -out=srt -bom -utf8 file.ts CCExtractor 0.94, Carlos Fernandez Sanz, Volker Quetschke. Teletext portions taken from Petr Kutalek's telxcc -------------------------------------------------------------------------- Input: K:\file.ts [Extract: 1] [Stream mode: Autodetect] [Program : Auto ] [Hauppage mode: No] [Use MythTV code: Auto] [CEA-708: 63 decoders active] [CEA-708: using charset "none" for all services] [Timing mode: Auto] [Debug: No] [Buffer input: Yes] [Use pic_order_cnt_lsb for H.264: No] [Print CC decoder traces: No] [Target format: .srt] [Encoding: UTF-8] [Delay: 0] [Trim lines: No] [Add font color data: Yes] [Add font typesetting: Yes] [Convert case: No][Filter profanity: No] [Video-edit join: No] [Extraction start time: not set (from start)] [Extraction end time: not set (to end)] [Live stream: No] [Clock frequency: 90000] [Teletext page: Autodetect] [Start credits text: None] [Quantisation-mode: CCExtractor's internal function] ----------------------------------------------------------------- Opening file: K:\file.ts Detected MP4 box with name: moov File seems to be a MP4 Analyzing data with GPAC (MP4 library) Opening 'K:\file.ts': ←[31m[iso file] Incomplete box 0000B00D - start 0 size 479044969 ←[0m←[31m[iso file] Incomplete file while reading for dump - aborting parsing ←[0mFailed to open input file (gf_isom_open() returned error) Total frames time: 00:00:00:000 (0 frames at 29,97fps) Done, processing time = 0 seconds ``` Forcing the input-file-format with `-in=ts` worked and the subtitle was created successfully, but I wanted to get down to the cause of the problem. After going through the source and checking how the format-detection works, I saw that CCE is checking the video for certain strings to determine the format, at least that's how I understood it. I opened the TS-file in a hex-editor and searched for `moov`: ![ccextractor_moov](https://user-images.githubusercontent.com/3539609/157324394-53e89601-94da-4250-a981-2fdaaf347339.png) Position 727131 0xB185B Luckily that was a payload-only TS-packet of the video-PID, so I was free to just change the text to something else. Then I ran this modified file through ccextractor, which worked: ``` >ccextractorwinfull.exe -autoprogram -out=srt -bom -utf8 file_edit.ts CCExtractor 0.94, Carlos Fernandez Sanz, Volker Quetschke. Teletext portions taken from Petr Kutalek's telxcc -------------------------------------------------------------------------- Input: K:\file_edit.ts [Extract: 1] [Stream mode: Autodetect] [Program : Auto ] [Hauppage mode: No] [Use MythTV code: Auto] [CEA-708: 63 decoders active] [CEA-708: using charset "none" for all services] [Timing mode: Auto] [Debug: No] [Buffer input: Yes] [Use pic_order_cnt_lsb for H.264: No] [Print CC decoder traces: No] [Target format: .srt] [Encoding: UTF-8] [Delay: 0] [Trim lines: No] [Add font color data: Yes] [Add font typesetting: Yes] [Convert case: No][Filter profanity: No] [Video-edit join: No] [Extraction start time: not set (from start)] [Extraction end time: not set (to end)] [Live stream: No] [Clock frequency: 90000] [Teletext page: Autodetect] [Start credits text: None] [Quantisation-mode: CCExtractor's internal function] ----------------------------------------------------------------- Opening file: K:\file_edit.ts File seems to be a transport stream, enabling TS mode Analyzing data in general mode VBI/teletext stream ID 2701 (0xa8d) for SID 2004 (0x7d4) - Programme Identification Data = ProSieben.at - Universal Time Co-ordinated = Tue Mar 8 15:33:44 2022 Notice: Teletext page with possible subtitles detected: 149 - No teletext page specified, first received suitable page is 149, not guaranteed 100% | 34:00 Teletext decoder: 51004 packets processed Number of NAL_type_7: 0 Number of VCL_HRD: 0 Number of NAL HRD: 0 Number of jump-in-frames: 0 Number of num_unexpected_sei_length: 0 Min PTS: 25:29:18:443 Max PTS: 26:03:18:563 Length: 00:34:00:120 Done, processing time = 4 seconds ```
claunia added the buggood-first-taskdifficulty: easy labels 2026-01-29 16:51:14 +00:00
Author
Owner

@canihavesomecoffee commented on GitHub (Mar 9, 2022):

Well, it's the first time in about 7 years we see this kind of issue (code for this was added by me in #165), so that sample would definitely be welcome.

Looks like my approach back then wasn't fully bulletproof.

@canihavesomecoffee commented on GitHub (Mar 9, 2022): Well, it's the first time in about 7 years we see this kind of issue (code for this was added by me in #165), so that sample would definitely be welcome. Looks like my approach back then wasn't fully bulletproof.
Author
Owner

@hurda commented on GitHub (Mar 9, 2022):

Trimmed the file to the first megabyte, as that's what the format-autodetection is looking at, right?
https://www.mediafire.com/file/xt5s9pd6yj3hc4q/ccextractor_moov_ts.zip/file
Contains the original file and the edited version.

@hurda commented on GitHub (Mar 9, 2022): Trimmed the file to the first megabyte, as that's what the format-autodetection is looking at, right? https://www.mediafire.com/file/xt5s9pd6yj3hc4q/ccextractor_moov_ts.zip/file Contains the original file and the edited version.
Author
Owner

@canihavesomecoffee commented on GitHub (Mar 9, 2022):

Yes, that should be sufficient indeed. Thanks for the quick share.

@canihavesomecoffee commented on GitHub (Mar 9, 2022): Yes, that should be sufficient indeed. Thanks for the quick share.
Author
Owner

@cfsmp3 commented on GitHub (Mar 21, 2023):

Closing since it seems fixed already (at least based on the merge, I haven't validated)

@cfsmp3 commented on GitHub (Mar 21, 2023): Closing since it seems fixed already (at least based on the merge, I haven't validated)
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/ccextractor#693