mirror of
https://github.com/CCExtractor/ccextractor.git
synced 2026-02-16 05:25:09 +00:00
[BUG] TESSDATA_PREFIX requires path separator at its end #534
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @NilsIrl on GitHub (Dec 29, 2019).
Necessary information
Video link
I hope that it works with any file that uses tesseract (files that store subtitles as images). Because if it isn't then it means that the location of the tesseract data is dealt in a separate way for different files.
I used the one from #1104 (https://edge1.motv.eu/telecine.ts)
Additional information
TESSDATA_PREFIXis an environment variable that points to the directory/folder containing the tessdata directory/folder. For some reason, ccextractor requiresTESSDATA_PREFIXto finish with a/. It should work without one.e.g.
Should work but it doesn't.
@cfsmp3 commented on GitHub (Dec 29, 2019):
Feel free to fix :-)
in ocr.c
@NilsIrl commented on GitHub (Dec 29, 2019):
This environment variable isn't documented so I discovered about it by looking at ocr.c.
Documentating it should also be done.
@cfsmp3 commented on GitHub (Dec 30, 2019):
TESSDATA_PREFIX is a tesseract environment variable, not ours (even though we use it).
@NilsIrl commented on GitHub (Dec 30, 2019):
Yes but how is a user supposed to know, they can use it? In the end, ccextractor, implements it so I believe it should be documented.
@cfsmp3 commented on GitHub (Dec 31, 2019):
Go ahead :-)
@NilsIrl commented on GitHub (Jan 2, 2020):
This is a regression from this line:
5dbbe654f0 (diff-06df1969161cf1684b04764b42380ce6R52)@cfsmp3 commented on GitHub (Jan 2, 2020):
I'll let @anshul1912 comment and decide since it's his code and he knows what he's doing :-)
@cfsmp3 commented on GitHub (Jan 10, 2020):
@NilsIrl did you test with both tesseract 3 and 4?
@NilsIrl commented on GitHub (Jan 10, 2020):
yes
@anshul1912 commented on GitHub (Jan 15, 2020):
I think you will break ubuntu version 4 with it, I think it may work on nixOS but break Ubuntu.
what is location of tessdata in your nixOS installation. If you are using only enviorment variable TESSDATA_PREFIX then as you see in function first priority is given to environment variable.
if there is default location in nixOS tessdata but enviorment variable is not set. Then you must add that location in probe function
@Rahul-2k4 commented on GitHub (Dec 6, 2025):
Hi! I’ve prepared a fix for this issue.
It adds automatic normalization to TESSDATA_PREFIX so it no longer requires a trailing slash.
The solution is cross-platform, safe (uses a static buffer), and backward compatible.
Tests pass for all cases.
I’ll open a PR shortly.
@cfsmp3 commented on GitHub (Dec 21, 2025):
This issue was fixed in PR #1674, which was merged on 2025-03-13. The
search_language_pack()function inocr.cnow automatically normalizes the path by adding a trailing slash if missing before appendingtessdata/.