mirror of
https://github.com/CCExtractor/ccextractor.git
synced 2026-02-14 21:23:42 +00:00
[BUG] Incorrect path for loading tesseract traineddata #746
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @ibrahim-akrab on GitHub (Mar 10, 2023).
CCExtractor version: 0.94
Necessary information
{}Video links
channel5-2018-02-12.ts from the TV Samples page
Additional information
ccextractor tries to load tesseract traineddata from a wrong location then blames it on the TESSDATA_PREFIX. Here's the output it produces:
I checked the logic in
ocr.cand found thatprobe_tessdata_locationworks fine by tracing the syscalls it makes to each possible tessdata location by runningstrace -e trace=openat ./ccextractor ~/Downloads/channel5-2018-02-12.tsand the result is as follows:It checks the paths correctly and stops when finding it at
/usr/share/tessdata/so I suspect the problem is possibly in theTessBaseAPIInit4call.Also for full reference, here's the complete output of
ccextractor --versionon my setup:@ibrahim-akrab commented on GitHub (Mar 10, 2023):
I Investigated it a bit more and it turns out that the
init_ocrfunction checks for the version of tesseract installed and for some reason if it isn't major version 4, it doesn't pass the initialization the full path of the traineddata.I think the reason this is the way it is is because of support for old tesseract versions because the same code appears in
hardsubx.cbut it treats versions 4.x and 5.x the same.I just want confirmation from a maintainer to make that change since it's my first contribution to the project, then I'll get to fixing the #929 issue.
@cfsmp3 commented on GitHub (Mar 10, 2023):
Go for it. We're updating both tesseract and FFmpeg (and others) to the last version. We don't really care much about supporting old versions of anything anymore - if someone wants to run an old tesseract they can do it with an old CCExtractor.
@ibrahim-akrab commented on GitHub (Mar 11, 2023):
I think I should mark this issue as closed since it's fix is now merged.