mirror of
https://github.com/CCExtractor/ccextractor.git
synced 2026-02-12 05:25:06 +00:00
[BUG] Not enough memory to initialize Tesseract #449
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @Aradmey on GitHub (Oct 22, 2018).
CCExtractor version (using the --version parameter preferably) : 0.85
In raising this issue, I confirm the following (please check boxes, eg [X] - and delete unchecked ones):
My familiarity with the project is as follows (check one, eg [X] - and delete unchecked ones):
Necessary information
E:\CCExtractor\ccextractorwinfull.exe --gui_mode_reports -in=mp4 -autoprogram -out=srt -bom -unicode -hardsubx -subcolor white -conf_thresh 60 [+input files]Additional information
Hello, I've tried using the program to extract burned-in subtitles from a .mp4 movie, but it seems to always show me this error: "Not enough memory to initialize Tesseract!"
Is there any solution known for this issue?
@MatejMecka commented on GitHub (Oct 22, 2018):
It seems your computer doesn't have the power to run Tesseract. Therefore there isn't any issue with CCExtractor but with your computer running it.
@saurabhshri commented on GitHub (Oct 22, 2018):
Could you please post complete logs along with procedure you followed to
compile CCExtractor?
On Mon, Oct 22, 2018 at 8:34 PM Aradmey notifications@github.com wrote:
--
Saurabh Shrivastava
@Aradmey commented on GitHub (Oct 22, 2018):
I doubt it, as my PC has 16GB.
I did not compile CCExtractor, I downloaded the binaries (GUI and command line programs) and later the installer itself. None of them worked..
All I did is open CCExtractor, selected my file, selected "With OCR" below, ticked "Perform burned-in subtitle extraction", started and received the mentioned error.
@AntonOfTheWoods commented on GitHub (Nov 17, 2018):
This is also happening on Ubuntu 18.04 with ccextractor compiled from master using the tesseract from the normal repos. If I manually extract images using ffmpeg and run tesseract then there is no complaining about memory on my 8GB Dell XPS laptop.
Now my C++ is almost non-existent but looking at the tesseract code looks like it may have nothing to do with memory at all.
https://github.com/CCExtractor/ccextractor/blob/master/src/lib_ccx/hardsubx.c#L238
Assumes that any non-zero return value means "Not enough memory to intialize Tesseract" but I don't see anything in https://github.com/tesseract-ocr/tesseract/blob/master/src/api/capi.cpp#L241 or https://github.com/tesseract-ocr/tesseract/blob/master/src/api/baseapi.h#L189 that suggest that non-zero is guaranteed to be memory related. It simply says:
I may well not be looking at the right place but it seems to me that this could well be something other than insufficient memory.
@AntonOfTheWoods commented on GitHub (Nov 26, 2018):
@saurabhshri , do you have any ideas about this? Am I completely wrong in my interpretation of the code?
@saurabhshri commented on GitHub (Nov 26, 2018):
@AntonOfTheWoods No, you're not. It's not your machine. It has been reported previously, but they were able to solve it. Happy debugging :)
@cfsmp3 commented on GitHub (Nov 26, 2018):
OK let's try to figure this one out... @Aradmey first, does it happen with all files or just some, or a specific hone? Can you share one?
Have you tried in 0.87?
@AntonOfTheWoods commented on GitHub (Nov 30, 2018):
@Aradmey , was it you that was able to solve it or did you abandon CCExtractor?
@AntonOfTheWoods commented on GitHub (Nov 30, 2018):
@cfsmp3 , I am using master rather than an official release version (like 0.8.7) so I can get support for tesseract 4 (the version available on Ubuntu 18.04). The git log suggests I need the HEAD of origin/master for that. Could this be simply a matter of tesseract 4 not being fully supported yet? I have also tried with the latest tesseract version from https://launchpad.net/~alex-p/+archive/ubuntu/tesseract-ocr and have the same error. Is it worth trying to get tesseract 3 installed and using 0.8.7? Thanks.
@cfsmp3 commented on GitHub (Nov 30, 2018):
Give tesseract 3 a try indeed... in any case it's going to be faster,
tesseract 4 seems better handling handwritten stuff but for our use
doesn't seem like a great upgrade.
On Fri, Nov 30, 2018 at 6:30 AM Anton Melser notifications@github.com wrote:
@AntonOfTheWoods commented on GitHub (Dec 1, 2018):
@Aradmey and @cfsmp3 , I can confirm that manually compiling tesseract 3.05 on Ubuntu 18.04 and compiling ccextractor at master and pointing to the tesseract 3 gets rid of the error. I definitely think the error message could do with some improvement though!
@RobJacobson commented on GitHub (Dec 11, 2018):
I'm completely new to CCExtractor. I'm encountering the same issue when running 0.87 on Windows. My steps to reproduce:
Install the Windows installer for CCExtractor on Windows 10.
Run the GUI version with the following options:
C:\Program Files (x86)\CCExtractor\ccextractorwinfull.exe --gui_mode_reports -out=srt -bom -latin1 -hardsubx -subcolor white -conf_thresh 60 [+input files]
When I click the "Start" button, I get the message "Not enough memory to initialize Tesseract."
Could this be a problem with Windows support for Tesseract? I noticed that the Windows version seems to be lagging behind the Linux version.
For what it's worth, I'm trying to use CCExtractor to make some HBO shows more accessible. The show "My Brilliant Friend" is spoken in Italian and has burned-in subtitles in English, but those aren't accessible for for English-speaking blind users. Details below. If there's a way to OCR these subtitles, that would be completely amazing.
https://www.huffingtonpost.com/entry/hbo-discriminates-against-blind_us_5be073e1e4b04367a87f1cab
@anonynamja commented on GitHub (Jan 15, 2019):
Same issue as above, is there any solution yet? Run with different options? Many thanks.
@Pi7on commented on GitHub (Jan 22, 2019):
Same issue here
@bioluminesceme commented on GitHub (Mar 8, 2019):
Same issue.
Windows 10, Tesseract3 is installed and in my System PATH.
@DaniGTA commented on GitHub (Apr 11, 2019):
Is this already fixed ?
@thelastpolaris commented on GitHub (Apr 12, 2019):
@DaniGTA I guess that this problem was already fixed by #1083 that changed the way Tesseract is initialized. Previously if for some reason Tesseract was not initialized, you were getting a memory error. #1083 updated the way Tesseract is initialized to be more stable. Anybody who had this error - kindly ask you to check it again with CCExtractor's master.
@drodz11 commented on GitHub (May 9, 2019):
Hello,
I was having the same problem (error message while running 0.87 GUI - "Not enough memory to initialize Tesseract") so I cloned the master and compiled on Windows 10 using Visual Studio 2019 (Community) and the instructions given here. However, when I launch the new GUI I am seeing the following message:
I have tried compiling with both the Debug and Release configurations. Has anyone else had this problem or have an idea why the library can't be found?
@cfsmp3 commented on GitHub (Nov 21, 2021):
Closing - this seems fixed. Feel free to comment if anyone is having this problem in current master.