[BUG] Failing to extract DVB subtitles from live stream (Failed to perform OCR) #452

Closed
opened 2026-01-29 16:44:20 +00:00 by claunia · 6 comments
Owner

Originally created by @jakubvojacek on GitHub (Oct 23, 2018).

CCExtractor version (using the --version parameter preferably) : 0.87

  • I have read and understood the contributors guide.
  • I have checked that the bug-fix I am reporting can be replicated, or that the feature I am suggesting isn't already present.
  • I have checked that the issue I'm posting isn't already reported.
  • I have checked that the issue I'm porting isn't already solved and no duplicates exist in closed issues and in opened issues
  • I have checked the pull requests tab for existing solutions/implementations to my issue/suggestion.
  • I have used the latest available version of CCExtractor to verify this issue exists.

My familiarity with the project is as follows (check one, eg [X] - and delete unchecked ones):

  • I have never used CCExtractor.
  • I have used CCExtractor just a couple of times.
  • I absolutely love CCExtractor, but have not contributed previously.
  • I am an active contributor to CCExtractor.

Necessary information

  • Is this a regression (did it work before)? [x] NO | [ ] YES - please specify the last known working version
  • What platform did you use? [ ] Windows - [x] Linux - [x] Mac
  • What were the used arguments? fails even with -udp 239.1.2.3:1234 (unrelated, but originally i was testing with -ocrlang por -quant 0 -datapid 0x451 -out=webvtt -noru -trim -lf -nots -nobom -s -nofc -nogt)

**Video links (replace text below with your links) **
tnt.ts - https://goo.gl/r4WXto

Additional information
Interestingly, when running ccextractor on the file (ccextractor tnt.ts), it does produce a tnt.srt file with correct subtitles in it. However, it does print a whole bunch of errors.

But when the tnt.ts is being played out in a loop (for example tsplay tnt.ts 239.1.2.3:1234 -loop), ccextractor fails eventually (the time before it fails varies in seconds to a minute usually)

root@jones:~/tnt# ccextractor   -udp 239.1.2.3:1234
CCExtractor 0.87, Carlos Fernandez Sanz, Volker Quetschke.
Teletext portions taken from Petr Kutalek's telxcc
--------------------------------------------------------------------------
Input: Network, 239.1.2.3:1234
[Extract: 1] [Stream mode: Autodetect]
[Program : Auto ] [Hauppage mode: No] [Use MythTV code: Auto]
[Timing mode: Auto] [Debug: No] [Buffer input: Yes]
[Use pic_order_cnt_lsb for H.264: No] [Print CC decoder traces: No]
[Target format: .srt] [Encoding: UTF-8] [Delay: 0] [Trim lines: No]
[Add font color data: Yes] [Add font typesetting: Yes]
[Convert case: No] [Video-edit join: No]
[Extraction start time: not set (from start)]
[Extraction end time: not set (to end)]
[Live stream: No] [Clock frequency: 90000]
[Teletext page: Autodetect]
[Start credits text: None]
[Quantisation-mode: CCExtractor's internal function]

----------------------------------------------------------------------
Reading from UDP socket 239.1.2.3:1234
File seems to be a transport stream, enabling TS mode
Analyzing data in general mode
Error in boxClipToRectangle: box outside rectangle
Warning in pixClipRectangle: box doesn't overlap pix
Error in boxClipToRectangle: box outside rectangle
Warning in pixClipRectangle: box doesn't overlap pix
Error in pixConvertRGBToGray: pixs not defined
Error in pixGetDimensions: pix not defined
Error in pixGetColormap: pix not defined
Error in pixClone: pixs not defined
Error in pixGetDepth: pix not defined
Error in pixGetWpl: pix not defined
Error in pixGetYRes: pix not defined

TessBaseAPIRecognize returned -1, skipping this bitmap.
Error in boxClipToRectangle: box outside rectangle
Warning in pixClipRectangle: box doesn't overlap pix
Error in boxClipToRectangle: box outside rectangle
Warning in pixClipRectangle: box doesn't overlap pix
Error in pixConvertRGBToGray: pixs not defined
Error in pixGetDimensions: pix not defined
Error in pixGetColormap: pix not defined
Error in pixClone: pixs not defined
Error in pixGetDepth: pix not defined
Error in pixGetWpl: pix not defined
Error in pixGetYRes: pix not defined

TessBaseAPIRecognize returned -1, skipping this bitmap.
TS continuity counter not incremented prev/curr 11/6
dvbsub_decode: incomplete, broken or empty packet, remaining bytes=3249, segment_length=3490
Return from dvbsub_decode: -1
Error in boxClipToRectangle: box outside rectangle
Warning in pixClipRectangle: box doesn't overlap pix
Error in boxClipToRectangle: box outside rectangle
Warning in pixClipRectangle: box doesn't overlap pix
Error in pixConvertRGBToGray: pixs not defined
Error: In ocr_bitmap: Failed to perform OCR - Failed to get text. Please report.

Issues? Open a ticket here
https://github.com/CCExtractor/ccextractor/issues

Can you please look into what is wrong?

Thank you
Jakub

Originally created by @jakubvojacek on GitHub (Oct 23, 2018). CCExtractor version (using the --version parameter preferably) : **0.87** - [x] I have read and understood the [contributors guide](https://github.com/CCExtractor/ccextractor/blob/master/.github/CONTRIBUTING.md). - [x] I have checked that the bug-fix I am reporting can be replicated, or that the feature I am suggesting isn't already present. - [x] I have checked that the issue I'm posting isn't already reported. - [x] I have checked that the issue I'm porting isn't already solved and no duplicates exist in [closed issues](https://github.com/CCExtractor/ccextractor/issues?q=is%3Aissue+is%3Aclosed) and in [opened issues](https://github.com/CCExtractor/ccextractor/issues) - [x] I have checked the pull requests tab for existing solutions/implementations to my issue/suggestion. - [x] I have used the latest available version of CCExtractor to verify this issue exists. **My familiarity with the project is as follows (check one, eg [X] - and delete unchecked ones):** - [ ] I have never used CCExtractor. - [ ] I have used CCExtractor just a couple of times. - [x] I absolutely love CCExtractor, but have not contributed previously. - [ ] I am an active contributor to CCExtractor. **Necessary information** - Is this a regression (did it work before)? [x] NO | [ ] YES - *please specify the last known working version* - What platform did you use? [ ] Windows - [x] Linux - [x] Mac - What were the used arguments? fails even with` -udp 239.1.2.3:1234` (unrelated, but originally i was testing with `-ocrlang por -quant 0 -datapid 0x451 -out=webvtt -noru -trim -lf -nots -nobom -s -nofc -nogt`) **Video links (replace text below with your links) ** `tnt.ts` - https://goo.gl/r4WXto **Additional information** Interestingly, when running ccextractor on the file (`ccextractor tnt.ts`), it does produce a `tnt.srt` file with correct subtitles in it. However, it does print a whole bunch of errors. But when the `tnt.ts` is being played out in a loop (for example `tsplay tnt.ts 239.1.2.3:1234 -loop`), ccextractor fails eventually (the time before it fails varies in seconds to a minute usually) ``` root@jones:~/tnt# ccextractor -udp 239.1.2.3:1234 CCExtractor 0.87, Carlos Fernandez Sanz, Volker Quetschke. Teletext portions taken from Petr Kutalek's telxcc -------------------------------------------------------------------------- Input: Network, 239.1.2.3:1234 [Extract: 1] [Stream mode: Autodetect] [Program : Auto ] [Hauppage mode: No] [Use MythTV code: Auto] [Timing mode: Auto] [Debug: No] [Buffer input: Yes] [Use pic_order_cnt_lsb for H.264: No] [Print CC decoder traces: No] [Target format: .srt] [Encoding: UTF-8] [Delay: 0] [Trim lines: No] [Add font color data: Yes] [Add font typesetting: Yes] [Convert case: No] [Video-edit join: No] [Extraction start time: not set (from start)] [Extraction end time: not set (to end)] [Live stream: No] [Clock frequency: 90000] [Teletext page: Autodetect] [Start credits text: None] [Quantisation-mode: CCExtractor's internal function] ---------------------------------------------------------------------- Reading from UDP socket 239.1.2.3:1234 File seems to be a transport stream, enabling TS mode Analyzing data in general mode Error in boxClipToRectangle: box outside rectangle Warning in pixClipRectangle: box doesn't overlap pix Error in boxClipToRectangle: box outside rectangle Warning in pixClipRectangle: box doesn't overlap pix Error in pixConvertRGBToGray: pixs not defined Error in pixGetDimensions: pix not defined Error in pixGetColormap: pix not defined Error in pixClone: pixs not defined Error in pixGetDepth: pix not defined Error in pixGetWpl: pix not defined Error in pixGetYRes: pix not defined TessBaseAPIRecognize returned -1, skipping this bitmap. Error in boxClipToRectangle: box outside rectangle Warning in pixClipRectangle: box doesn't overlap pix Error in boxClipToRectangle: box outside rectangle Warning in pixClipRectangle: box doesn't overlap pix Error in pixConvertRGBToGray: pixs not defined Error in pixGetDimensions: pix not defined Error in pixGetColormap: pix not defined Error in pixClone: pixs not defined Error in pixGetDepth: pix not defined Error in pixGetWpl: pix not defined Error in pixGetYRes: pix not defined TessBaseAPIRecognize returned -1, skipping this bitmap. TS continuity counter not incremented prev/curr 11/6 dvbsub_decode: incomplete, broken or empty packet, remaining bytes=3249, segment_length=3490 Return from dvbsub_decode: -1 Error in boxClipToRectangle: box outside rectangle Warning in pixClipRectangle: box doesn't overlap pix Error in boxClipToRectangle: box outside rectangle Warning in pixClipRectangle: box doesn't overlap pix Error in pixConvertRGBToGray: pixs not defined Error: In ocr_bitmap: Failed to perform OCR - Failed to get text. Please report. Issues? Open a ticket here https://github.com/CCExtractor/ccextractor/issues ``` Can you please look into what is wrong? Thank you Jakub
claunia added the OCRDVBdifficulty: medium labels 2026-01-29 16:44:20 +00:00
Author
Owner

@cfsmp3 commented on GitHub (Jan 25, 2020):

@jakubvojacek Is this still a problem in current master?

@cfsmp3 commented on GitHub (Jan 25, 2020): @jakubvojacek Is this still a problem in current master?
Author
Owner

@jakubvojacek commented on GitHub (Jan 26, 2020):

Hello @cfsmp3

I just tested with the current master (5f61fae0c7) and it's still happening, it's reproducible on a static file now too. If you download https://goo.gl/r4WXto and try to play in VLC and enable Portugesse DVB subtitles, there will be subtitles visible. While trying with ccextractor (plain ccextractor tnt.ts), it will throw the same errors as described above. I have attached the console output below.

root@ts:/opt/ccextractor# git rev-parse HEAD
5f61fae0c7dacb05e2f42d5647aafc59d3cd2ef6

root@ts:/opt/ccextractor# build/ccextractor /data/tnt.ts
CCExtractor 0.88, Carlos Fernandez Sanz, Volker Quetschke.
Teletext portions taken from Petr Kutalek's telxcc
--------------------------------------------------------------------------
Input: /data/tnt.ts
[Extract: 1] [Stream mode: Autodetect]
[Program : Auto ] [Hauppage mode: No] [Use MythTV code: Auto]
[Timing mode: Auto] [Debug: No] [Buffer input: No]
[Use pic_order_cnt_lsb for H.264: No] [Print CC decoder traces: No]
[Target format: .srt] [Encoding: UTF-8] [Delay: 0] [Trim lines: No]
[Add font color data: Yes] [Add font typesetting: Yes]
[Convert case: No][Filter profanity: No] [Video-edit join: No]
[Extraction start time: not set (from start)]
[Extraction end time: not set (to end)]
[Live stream: No] [Clock frequency: 90000]
[Teletext page: Autodetect]
[Start credits text: None]
[Quantisation-mode: CCExtractor's internal function]

-----------------------------------------------------------------
Opening file: /data/tnt.ts
File seems to be a transport stream, enabling TS mode
Analyzing data in general mode
Error in boxClipToRectangle: box outside rectangle
Warning in pixClipRectangle: box doesn't overlap pix
Error in boxClipToRectangle: box outside rectangle
Warning in pixClipRectangle: box doesn't overlap pix
Error in pixConvertRGBToGray: pixs not defined
Error in pixGetDimensions: pix not defined
Error in pixGetColormap: pix not defined
Error in pixClone: pixs not defined
Error in pixGetDepth: pix not defined
Error in pixGetWpl: pix not defined
Error in pixGetYRes: pix not defined

TessBaseAPIRecognize returned -1, skipping this bitmap.
TS continuity counter not incremented prev/curr 10/14
dvbsub_decode: incomplete, broken or empty packet, remaining bytes=2917, segment_length=3462
Return from dvbsub_decode: -1
Error in boxClipToRectangle: box outside rectangle
Warning in pixClipRectangle: box doesn't overlap pix
Error in boxClipToRectangle: box outside rectangle
Warning in pixClipRectangle: box doesn't overlap pix
Error in pixConvertRGBToGray: pixs not defined
Error: In ocr_bitmap: Failed to perform OCR - Failed to get text. Please report.

Issues? Open a ticket here
https://github.com/CCExtractor/ccextractor/issues
@jakubvojacek commented on GitHub (Jan 26, 2020): Hello @cfsmp3 I just tested with the current master (5f61fae0c7dacb05e2f42d5647aafc59d3cd2ef6) and it's still happening, it's reproducible on a static file now too. If you download https://goo.gl/r4WXto and try to play in VLC and enable Portugesse DVB subtitles, there will be subtitles visible. While trying with ccextractor (plain `ccextractor tnt.ts`), it will throw the same errors as described above. I have attached the console output below. ``` root@ts:/opt/ccextractor# git rev-parse HEAD 5f61fae0c7dacb05e2f42d5647aafc59d3cd2ef6 root@ts:/opt/ccextractor# build/ccextractor /data/tnt.ts CCExtractor 0.88, Carlos Fernandez Sanz, Volker Quetschke. Teletext portions taken from Petr Kutalek's telxcc -------------------------------------------------------------------------- Input: /data/tnt.ts [Extract: 1] [Stream mode: Autodetect] [Program : Auto ] [Hauppage mode: No] [Use MythTV code: Auto] [Timing mode: Auto] [Debug: No] [Buffer input: No] [Use pic_order_cnt_lsb for H.264: No] [Print CC decoder traces: No] [Target format: .srt] [Encoding: UTF-8] [Delay: 0] [Trim lines: No] [Add font color data: Yes] [Add font typesetting: Yes] [Convert case: No][Filter profanity: No] [Video-edit join: No] [Extraction start time: not set (from start)] [Extraction end time: not set (to end)] [Live stream: No] [Clock frequency: 90000] [Teletext page: Autodetect] [Start credits text: None] [Quantisation-mode: CCExtractor's internal function] ----------------------------------------------------------------- Opening file: /data/tnt.ts File seems to be a transport stream, enabling TS mode Analyzing data in general mode Error in boxClipToRectangle: box outside rectangle Warning in pixClipRectangle: box doesn't overlap pix Error in boxClipToRectangle: box outside rectangle Warning in pixClipRectangle: box doesn't overlap pix Error in pixConvertRGBToGray: pixs not defined Error in pixGetDimensions: pix not defined Error in pixGetColormap: pix not defined Error in pixClone: pixs not defined Error in pixGetDepth: pix not defined Error in pixGetWpl: pix not defined Error in pixGetYRes: pix not defined TessBaseAPIRecognize returned -1, skipping this bitmap. TS continuity counter not incremented prev/curr 10/14 dvbsub_decode: incomplete, broken or empty packet, remaining bytes=2917, segment_length=3462 Return from dvbsub_decode: -1 Error in boxClipToRectangle: box outside rectangle Warning in pixClipRectangle: box doesn't overlap pix Error in boxClipToRectangle: box outside rectangle Warning in pixClipRectangle: box doesn't overlap pix Error in pixConvertRGBToGray: pixs not defined Error: In ocr_bitmap: Failed to perform OCR - Failed to get text. Please report. Issues? Open a ticket here https://github.com/CCExtractor/ccextractor/issues ```
Author
Owner

@jstrot commented on GitHub (Jul 7, 2024):

I'm having a similar issue:

$ ccextractor --output-field 1 --cc2 --out=srt --utf8 movie.vob -o subtitle.srt
CCExtractor 0.94, Carlos Fernandez Sanz, Volker Quetschke.
Teletext portions taken from Petr Kutalek's telxcc
--------------------------------------------------------------------------
Input: movie.vob
[Extract: 1] [Stream mode: Autodetect]
[Program : Auto ] [Hauppage mode: No] [Use MythTV code: Auto]
[Timing mode: Auto] [Debug: No] [Buffer input: No]
[Use pic_order_cnt_lsb for H.264: No] [Print CC decoder traces: No]
[Target format: .srt] [Encoding: UTF-8] [Delay: 0] [Trim lines: No]
[Add font color data: Yes] [Add font typesetting: Yes]
[Convert case: No][Filter profanity: No] [Video-edit join: No]
[Extraction start time: not set (from start)]
[Extraction end time: not set (to end)]
[Live stream: No] [Clock frequency: 90000]
[Teletext page: Autodetect]
[Start credits text: None]
[Quantisation-mode: CCExtractor's internal function]

-----------------------------------------------------------------
Opening file: movie.vob
File seems to be a program stream, enabling PS mode
Analyzing data in general mode


New video information found
[720 * 480] [AR: 02 - 4:3] [FR: 04 - 29.97] [progressive: no]

Error in boxClipToRectangle: box outside rectangle
Warning in pixClipRectangle: box doesn't overlap pix
Error in boxClipToRectangle: box outside rectangle
Warning in pixClipRectangle: box doesn't overlap pix
Error in pixConvertRGBToGray: pixs not defined
Error: In ocr_bitmap: Failed to perform OCR - Failed to get text. Please report.

Issues? Open a ticket here
https://github.com/CCExtractor/ccextractor/issues

I'm not familiar with the exact content of the vob file I'm working with. Could be there is no actual CC encoded at all, could be corrupted too, but mediainfo seems to think there is a CC3 (hence my using --output-field 1 --cc2):

Text
ID                                       : 224 (0xE0)-CC3
Format                                   : EIA-608
Muxing mode, more info                   : Muxed in Video #1
Duration                                 : 2 min 58 s
Start time (commands)                    : 1 s 248 ms
Start time                               : 2 s 183 ms
Bit rate mode                            : Constant
Stream size                              : 0.00 Byte (0%)
Count of frames before first event       : 58
Type of the first event                  : PopOn
@jstrot commented on GitHub (Jul 7, 2024): I'm having a similar issue: ``` $ ccextractor --output-field 1 --cc2 --out=srt --utf8 movie.vob -o subtitle.srt CCExtractor 0.94, Carlos Fernandez Sanz, Volker Quetschke. Teletext portions taken from Petr Kutalek's telxcc -------------------------------------------------------------------------- Input: movie.vob [Extract: 1] [Stream mode: Autodetect] [Program : Auto ] [Hauppage mode: No] [Use MythTV code: Auto] [Timing mode: Auto] [Debug: No] [Buffer input: No] [Use pic_order_cnt_lsb for H.264: No] [Print CC decoder traces: No] [Target format: .srt] [Encoding: UTF-8] [Delay: 0] [Trim lines: No] [Add font color data: Yes] [Add font typesetting: Yes] [Convert case: No][Filter profanity: No] [Video-edit join: No] [Extraction start time: not set (from start)] [Extraction end time: not set (to end)] [Live stream: No] [Clock frequency: 90000] [Teletext page: Autodetect] [Start credits text: None] [Quantisation-mode: CCExtractor's internal function] ----------------------------------------------------------------- Opening file: movie.vob File seems to be a program stream, enabling PS mode Analyzing data in general mode New video information found [720 * 480] [AR: 02 - 4:3] [FR: 04 - 29.97] [progressive: no] Error in boxClipToRectangle: box outside rectangle Warning in pixClipRectangle: box doesn't overlap pix Error in boxClipToRectangle: box outside rectangle Warning in pixClipRectangle: box doesn't overlap pix Error in pixConvertRGBToGray: pixs not defined Error: In ocr_bitmap: Failed to perform OCR - Failed to get text. Please report. Issues? Open a ticket here https://github.com/CCExtractor/ccextractor/issues ``` I'm not familiar with the exact content of the vob file I'm working with. Could be there is no actual CC encoded at all, could be corrupted too, but mediainfo seems to think there is a CC3 (hence my using `--output-field 1 --cc2`): ``` Text ID : 224 (0xE0)-CC3 Format : EIA-608 Muxing mode, more info : Muxed in Video #1 Duration : 2 min 58 s Start time (commands) : 1 s 248 ms Start time : 2 s 183 ms Bit rate mode : Constant Stream size : 0.00 Byte (0%) Count of frames before first event : 58 Type of the first event : PopOn ```
Author
Owner

@cfsmp3 commented on GitHub (Dec 21, 2025):

@jakubvojacek can you update the video link?

@cfsmp3 commented on GitHub (Dec 21, 2025): @jakubvojacek can you update the video link?
Author
Owner

@cfsmp3 commented on GitHub (Dec 21, 2025):

@jstrot video sample?

@cfsmp3 commented on GitHub (Dec 21, 2025): @jstrot video sample?
Author
Owner

@jstrot commented on GitHub (Jan 10, 2026):

@cfsmp3,

@jstrot video sample?

Very sorry, I don't have a failing video at the moment. Hopefully, never again.

Thanks for the fix.

@jstrot commented on GitHub (Jan 10, 2026): @cfsmp3, > @jstrot video sample? Very sorry, I don't have a failing video at the moment. Hopefully, never again. Thanks for the fix.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/ccextractor#452