mirror of
https://github.com/CCExtractor/ccextractor.git
synced 2026-02-16 13:35:45 +00:00
Specifying tesseract language (Vietnamese) for OCR extraction - closed, new problem has arisen! #203
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @Ivan2309 on GitHub (Nov 28, 2016).
Hi again,
As I said, I've found a new problem and this one I'm having a lot of trouble with! I want to use OCR to extract Vietnamese subtitles but I can't see how to tell ccextract/tesseract what language to use - I have the tesseract VN training file but ccextract insists that I must use English and, quite rightly, complains that I don't have that data file.
I'm not very good at reading code but I did try to look at some of the source files - it seems that's there's a provision for a user-specified language but I couldn't figure out how to specify it on the command line.
I've tried a few variations on the -svc option but that isn't working for me and I can't see anything else in the help file that seems relevant.
Any assistance will be very much appreciated.
Ivan.