Specifying tesseract language (Vietnamese) for OCR extraction - closed, new problem has arisen! #203

New Issue

claunia · 2026-01-29T16:37:53Z

claunia commented

2026-01-29 16:37:53 +00:00

Originally created by @Ivan2309 on GitHub (Nov 28, 2016).

Hi again,

As I said, I've found a new problem and this one I'm having a lot of trouble with! I want to use OCR to extract Vietnamese subtitles but I can't see how to tell ccextract/tesseract what language to use - I have the tesseract VN training file but ccextract insists that I must use English and, quite rightly, complains that I don't have that data file.

I'm not very good at reading code but I did try to look at some of the source files - it seems that's there's a provision for a user-specified language but I couldn't figure out how to specify it on the command line.

I've tried a few variations on the -svc option but that isn't working for me and I can't see anything else in the help file that seems relevant.

Any assistance will be very much appreciated.

Ivan.

Originally created by @Ivan2309 on GitHub (Nov 28, 2016). Hi again, As I said, I've found a new problem and this one I'm having a lot of trouble with! I want to use OCR to extract Vietnamese subtitles but I can't see how to tell ccextract/tesseract what language to use - I have the tesseract VN training file but ccextract insists that I must use English and, quite rightly, complains that I don't have that data file. I'm not very good at reading code but I did try to look at some of the source files - it seems that's there's a provision for a user-specified language but I couldn't figure out how to specify it on the command line. I've tried a few variations on the -svc option but that isn't working for me and I can't see anything else in the help file that seems relevant. Any assistance will be very much appreciated. Ivan.

claunia closed this issue

2026-01-29 16:37:53 +00:00

Sign in to join this conversation.

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: starred/ccextractor#203