[BUG] -oem option has no effect with tesseract v4 #586

Closed
opened 2026-01-29 16:48:30 +00:00 by claunia · 1 comment
Owner

Originally created by @hamelg on GitHub (Apr 29, 2020).

CCExtractor version: 0.88
tesseract version: 4.1.1
leptonica version: 1.79.0

  • Is this a regression (i.e. did it work before)? NO
  • What platform did you use? Linux

https://app.box.com/s/mhu17q37hc4ofprneydfailktp70pi4l

Additional information

I found out the -oem option doesn't work : if tesseract v4 is installed, ccextract force silently oem parameter to 1.

https://github.com/CCExtractor/ccextractor/blob/master/src/lib_ccx/ocr.c#L182

With oem=1, I get bad results.

I fixed that to set tesseract oem with the option -oem passed on cli (ccx_options.ocr_oem) and now with oem=0 the result is near perfect. On my subs, oem=0 gives the best results, oem=1 or 2 are useless.

$ tesseract --help-oem
OCR Engine modes:
  0    Legacy engine only.
  1    Neural nets LSTM engine only.
  2    Legacy + LSTM engines.
  3    Default, based on what is available.
Originally created by @hamelg on GitHub (Apr 29, 2020). CCExtractor version: 0.88 tesseract version: 4.1.1 leptonica version: 1.79.0 - Is this a regression (i.e. did it work before)? NO - What platform did you use? Linux ### Video links https://app.box.com/s/mhu17q37hc4ofprneydfailktp70pi4l ### Additional information I found out the -oem option doesn't work : if tesseract v4 is installed, ccextract force silently oem parameter to 1. https://github.com/CCExtractor/ccextractor/blob/master/src/lib_ccx/ocr.c#L182 With oem=1, I get bad results. I fixed that to set tesseract oem with the option -oem passed on cli (ccx_options.ocr_oem) and now with oem=0 the result is near perfect. On my subs, oem=0 gives the best results, oem=1 or 2 are useless. ``` $ tesseract --help-oem OCR Engine modes: 0 Legacy engine only. 1 Neural nets LSTM engine only. 2 Legacy + LSTM engines. 3 Default, based on what is available. ```
claunia added the difficulty: easy label 2026-01-29 16:48:30 +00:00
Author
Owner

@cfsmp3 commented on GitHub (May 7, 2020):

@hamelg this seems trivial do fix, send a PR? :-)

@cfsmp3 commented on GitHub (May 7, 2020): @hamelg this seems trivial do fix, send a PR? :-)
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/ccextractor#586