Specifying tesseract language (Vietnamese) for OCR extraction - closed, new problem has arisen! #203

Closed
opened 2026-01-29 16:37:53 +00:00 by claunia · 0 comments
Owner

Originally created by @Ivan2309 on GitHub (Nov 28, 2016).

Hi again,

As I said, I've found a new problem and this one I'm having a lot of trouble with! I want to use OCR to extract Vietnamese subtitles but I can't see how to tell ccextract/tesseract what language to use - I have the tesseract VN training file but ccextract insists that I must use English and, quite rightly, complains that I don't have that data file.

I'm not very good at reading code but I did try to look at some of the source files - it seems that's there's a provision for a user-specified language but I couldn't figure out how to specify it on the command line.

I've tried a few variations on the -svc option but that isn't working for me and I can't see anything else in the help file that seems relevant.

Any assistance will be very much appreciated.

Ivan.

Originally created by @Ivan2309 on GitHub (Nov 28, 2016). Hi again, As I said, I've found a new problem and this one I'm having a lot of trouble with! I want to use OCR to extract Vietnamese subtitles but I can't see how to tell ccextract/tesseract what language to use - I have the tesseract VN training file but ccextract insists that I must use English and, quite rightly, complains that I don't have that data file. I'm not very good at reading code but I did try to look at some of the source files - it seems that's there's a provision for a user-specified language but I couldn't figure out how to specify it on the command line. I've tried a few variations on the -svc option but that isn't working for me and I can't see anything else in the help file that seems relevant. Any assistance will be very much appreciated. Ivan.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/ccextractor#203