OCR doesn't work well in DVB when there's two lines at the same time #163

Closed
opened 2026-01-29 16:36:44 +00:00 by claunia · 0 comments
Owner

Originally created by @cfsmp3 on GitHub (Jun 21, 2016).

Originally assigned to: @cfsmp3, @anshul1912 on GitHub.

In DVB some times a bitmap contains two lines of text. In this case CCExtractor just writes the second one when using the OCR.

However in spupng the output is good.

I enabled the DEBUG_OCR stuff and what is passed to tesseract is indeed one line of text, so the problem is not with tesseract. I wonder, why are passing a different image to tessaract than the one we use for spupng?

Originally created by @cfsmp3 on GitHub (Jun 21, 2016). Originally assigned to: @cfsmp3, @anshul1912 on GitHub. In DVB some times a bitmap contains two lines of text. In this case CCExtractor just writes the second one when using the OCR. However in spupng the output is good. I enabled the DEBUG_OCR stuff and what is passed to tesseract is indeed one line of text, so the problem is not with tesseract. I wonder, why are passing a different image to tessaract than the one we use for spupng?
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/ccextractor#163