fix(ocr): Future-proof Tesseract version check for 6+

The version check for Tesseract was explicitly checking for versions
"4." and "5." which would fail for future Tesseract versions (6, 7,
etc.). Changed the check to use TessVersion()[0] >= '4' which will
correctly handle all versions >= 4.0.

Tesseract 4+ uses:
- Different tessdata path convention (appends /tessdata to path)
- Different default OEM mode (1 = LSTM vs 0 = Legacy)

The previous explicit check would cause future Tesseract versions to
incorrectly use the legacy Tesseract 3.x code path.

Addresses remaining concern from #1412

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
Carlos Fernandez
2025-12-20 20:03:42 +01:00
parent 87c898497a
commit 24c0a2c2ee
2 changed files with 6 additions and 2 deletions

View File

@@ -266,7 +266,9 @@ struct lib_hardsubx_ctx *_init_hardsubx(struct ccx_s_options *options)
int ret = -1;
if (!strncmp("4.", TessVersion(), 2) || !strncmp("5.", TessVersion(), 2))
// Tesseract 4+ uses different tessdata path convention and default OEM mode
// Use >= '4' check to handle future versions (5, 6, 7, etc.)
if (TessVersion()[0] >= '4')
{
char tess_path[1024];
if (ccx_options.ocr_oem < 0)

View File

@@ -191,7 +191,9 @@ void *init_ocr(int lang_index)
}
ctx->api = TessBaseAPICreate();
if (!strncmp("4.", TessVersion(), 2) || !strncmp("5.", TessVersion(), 2))
// Tesseract 4+ uses different tessdata path convention and default OEM mode
// Use >= '4' check to handle future versions (5, 6, 7, etc.)
if (TessVersion()[0] >= '4')
{
char tess_path[1024];
snprintf(tess_path, 1024, "%s%s%s", tessdata_path, "/", "tessdata");