mirror of
https://github.com/CCExtractor/ccextractor.git
synced 2026-02-15 13:35:30 +00:00
[BUG] -oem option has no effect with tesseract v4 #586
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @hamelg on GitHub (Apr 29, 2020).
CCExtractor version: 0.88
tesseract version: 4.1.1
leptonica version: 1.79.0
Video links
https://app.box.com/s/mhu17q37hc4ofprneydfailktp70pi4l
Additional information
I found out the -oem option doesn't work : if tesseract v4 is installed, ccextract force silently oem parameter to 1.
https://github.com/CCExtractor/ccextractor/blob/master/src/lib_ccx/ocr.c#L182
With oem=1, I get bad results.
I fixed that to set tesseract oem with the option -oem passed on cli (ccx_options.ocr_oem) and now with oem=0 the result is near perfect. On my subs, oem=0 gives the best results, oem=1 or 2 are useless.
@cfsmp3 commented on GitHub (May 7, 2020):
@hamelg this seems trivial do fix, send a PR? :-)