mirror of
https://github.com/CCExtractor/ccextractor.git
synced 2026-02-03 21:23:48 +00:00
CCextractor says "OCR subsystem not present" although compiled with OCR support #201
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @ghost on GitHub (Nov 24, 2016).
On a Linux system I have compiled version 0.82 of CCextractor with these lines:
cmake -DWITH_OCR=ON ../src/makesudo make installbut when I run this command
ccextractor -pn 7176 -codec dvbsub Pointless.ts(program number taken from 'mediainfo')
CCextractor amongst other outputs this:
Opening file: Pointless.tsFile seems to be a transport stream, enabling TS modeAnalyzing data in general modeDVB subtitles detected, OCR subsystem not present. Use -out=spupng for graphic outputDVB subtitles detected, OCR subsystem not present. Use -out=spupng for graphic outputDVB subtitles detected, OCR subsystem not present. Use -out=spupng for graphic outputCreating Pointless.srtThe generated Pointless.srt is 3 bytes long and contains this hex string "bbef 00bf".
I have installed leptonica-devel and tesseract-ocr-devel.
Have I missed something during the compilation or/and am I using the wrong parameter in my call of CCextractor?
@cfsmp3 commented on GitHub (Nov 24, 2016):
You need to build with OCR support, not just have the libraries installed.
On Thu, Nov 24, 2016 at 1:08 PM, Bent Bagger notifications@github.com
wrote:
@ghost commented on GitHub (Nov 25, 2016):
I thought I had, but apparently not. Anyway, I got it working by running these commands in /usr/local/src/ccextractor (a soft link to ccextractor.0.82):
cd buildcmake -DWITH_OCR=ON ../src/cd ../linux/make cleanmake ENABLE_OCR=yesmake install('make clean' only to start from a clean slate).
Allow me an additional question: When I now run CCextractor I do get a .srt file but CCextractor complains a little:
Opening file: Pointless.tsFile seems to be a transport stream, enabling TS modeAnalyzing data in general modedan.traineddata not found! Switching to Englishswe.traineddata not found! Switching to Englishfin.traineddata not found! Switching to EnglishCreating Pointless.srtUsing English trained data on Scandinavian texts makes for funny results!
The tesseract-ocr trained data is installed in /usr/share/tessdata/. So my additional question is actually two:
@ghost commented on GitHub (Nov 25, 2016):
I may have part of an answer to my question 1 above. When I run an 'strace' on CCextractor I found that CCextractor looks locally to find the trained data:
openat(AT_FDCWD, "./tessdata/", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = -1 ENOENT (No such file or directory)write(1, "dan.traineddata not found! Switc"..., 48dan.traineddata not found! Switching to Englishbut globally to find the English data:
open("/usr/share/tessdata/eng.traineddata", O_RDONLY) = 4When I added a link from current directory to /usr/shar/tessdata CCextractor stopped complaining over missing data.
It is inconvenient to have to add links to every directory when I have videoes stored, so is this a fault or a feature?
@ykarim commented on GitHub (Nov 29, 2016):
@BentB may you please close this issue as the original "OCR subsystem not present" is now resolved. You can open another issue for your new problem if it still exists.
@ghost commented on GitHub (Nov 29, 2016):
I have moved the above questions on tesseract data to a new issue 448
@anshul1912 commented on GitHub (Nov 30, 2016):
Please help us with pull request
It's not bug nither feature, it's incomplete implementation.
-Anshul
On 25-Nov-2016 5:28 PM, "Bent Bagger" notifications@github.com wrote:
@ghost commented on GitHub (Nov 30, 2016):
@anshul1912 I'm not quite familiar with life here at Github so please expand a bit on what you mean by "Please help us with pull request". I know 'pull' from Git, but not in this context. Sorry about that.
@wojtekw commented on GitHub (Feb 4, 2020):
I get the same error "OCR subsystem not present" on MacOS but leptonic and tesseract are installed on system. CCX -v shows:
Version: 0.88
Git commit: Unknown
Compilation date: 2020-02-04
File SHA256: fa4b6f64af9f923a0fca842ae017a189740de63916188b8afa43e6c00acb07b5
Libraries used by CCExtractor
libGPAC Version: 0.7.2-DEV
zlib: 1.2.11
utf8proc Version: 2.4.0
protobuf-c Version: 1.3.1
libpng Version: 1.6.35
FreeType
libhash
nuklear
libzvbi
Do You know where can be a problem ?
@rialg commented on GitHub (Feb 4, 2020):
Hello, I get the same issue on Ubuntu 16.04 using Tesseract 4.1.1, even after following @ghost 's compilation guide.
Linux desktop 4.15.0-76-generic #86~16.04.1-Ubuntu SMP Mon Jan 20 11:02:50 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
CCExtractor detailed version info
Version: 0.88
Git commit:
6697ed3496Compilation date: 2020-02-04
File SHA256: Could not open file
Libraries used by CCExtractor
libGPAC Version: 0.7.2-DEV
zlib: 1.2.8
utf8proc Version: 2.4.0
protobuf-c Version: 1.3.1
libpng Version: 1.2.54
FreeType
libhash
nuklear
libzvbi
@wojtekw commented on GitHub (Feb 5, 2020):
@cfsmp3 Can You reopen issue ?
@cfsmp3 commented on GitHub (Nov 21, 2021):
Closing as we've made a lot of changes in build lately so I don't know if this is still an issue or not
@wojtekw @rialg
let me know if it's still a problem in master
@kousthub97 commented on GitHub (Aug 4, 2023):
Hello, I am facing the same issue with tesseract 4.1.1 leptonica-1.76.0 I tried compiling with the below steps and @ghost's both haven't worked for me. Please let me know if any changes needs to be done while compiling.
mkdir build
cd build
cmake -DWITH_OCR=ON -DWITHOUT_RUST=ON ../src/
make
I am using Centos 8 for compiling. Below is the ccextractor --version output
CCExtractor 0.94, Carlos Fernandez Sanz, Volker Quetschke.
Teletext portions taken from Petr Kutalek's telxcc
CCExtractor detailed version info
Version: 0.94
Git commit:
35e73c1c90Compilation date: 2023-08-04
CEA-708 decoder: C
File SHA256: 08b9e909cc730e591a4331eef6dd45584a20e4a92c8dbf3fc37bf570f48ce79e
Libraries used by CCExtractor
Tesseract Version: 4.1.1
Leptonica Version: leptonica-1.76.0
libGPAC Version: 1.0.1
zlib: 1.2.11
utf8proc Version: 2.4.0
protobuf-c Version: 1.3.1
libpng Version: 1.6.37
FreeType
libhash
nuklear
libzvbi
ldd output for ccextractor
linux-vdso.so.1 (0x00007ffcf98db000) libm.so.6 => /lib64/libm.so.6 (0x00007fe365526000) libpthread.so.0 => /lib64/libpthread.so.0 (0x00007fe365306000) libdl.so.2 => /lib64/libdl.so.2 (0x00007fe365102000) libtesseract.so.4 => /lib64/libtesseract.so.4 (0x00007fe364b9b000) liblept.so.5 => /lib64/liblept.so.5 (0x00007fe36471a000) libc.so.6 => /lib64/libc.so.6 (0x00007fe364358000) /lib64/ld-linux-x86-64.so.2 (0x00007fe3658a8000) libstdc++.so.6 => /lib64/libstdc++.so.6 (0x00007fe363fc3000) libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007fe363dab000) libgomp.so.1 => /lib64/libgomp.so.1 (0x00007fe363b73000) libpng16.so.16 => /lib64/libpng16.so.16 (0x00007fe36393e000) libz.so.1 => /lib64/libz.so.1 (0x00007fe363727000) libjpeg.so.62 => /lib64/libjpeg.so.62 (0x00007fe3634be000) libgif.so.7 => /lib64/libgif.so.7 (0x00007fe3632b4000) libtiff.so.5 => /lib64/libtiff.so.5 (0x00007fe36303b000) libwebp.so.7 => /lib64/libwebp.so.7 (0x00007fe362dcd000) libjbig.so.2.1 => /lib64/libjbig.so.2.1 (0x00007fe362bc1000)If I directly use the tesseract commands it was working image-to-text conversion.
@Neo2SHYAlien commented on GitHub (Aug 5, 2023):
@kousthub97 try compile previous commit
0264e7da2bor v0.94 tag :) Both should work@kousthub97 commented on GitHub (Aug 5, 2023):
@Neo2SHYAlien
Thanks for help it worked with v0.94 tag