mirror of
https://github.com/CCExtractor/ccextractor.git
synced 2026-02-15 05:26:07 +00:00
OCR issue #54
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @okisseloff on GitHub (Jun 4, 2015).
Originally assigned to: @canihavesomecoffee, @Abhinav95 on GitHub.
I've found a problem with OCR feature, that causes problems with these two samples - https://github.com/CCExtractor/ccextractor/issues/172 and https://github.com/CCExtractor/ccextractor/issues/151.
I've compiled ccextractor with ocr feature and extracted png subs using -out=spupng option. pngs extracted well, but there are not enough subtitles in srt file - some of lines are missing. First thing, that strikes the eye is that there are no multi-line subs there. After that I found that some single-line subs are missing too.
Then I tried to check some of excluded from srt file png sources with tesseract cli tool if it can can recognize the text. Some of multi-line sources were recognized well, and some of them could not be recognized. More than that, lots of single-line sources could not be recognized too. Error messages appeared:
led me to tesseracts' bugtracker https://code.google.com/p/tesseract-ocr/issues/detail?id=605, where they say it is a leptonica issue.
I am not so familiar with OCR-related code in ccextractor, but probably some of you guys are.