CCextractor says "OCR subsystem not present" although compiled with OCR support #201

Closed
opened 2026-01-29 16:37:48 +00:00 by claunia · 14 comments
Owner

Originally created by @ghost on GitHub (Nov 24, 2016).

On a Linux system I have compiled version 0.82 of CCextractor with these lines:

cmake -DWITH_OCR=ON ../src/
make
sudo make install

but when I run this command

ccextractor -pn 7176 -codec dvbsub Pointless.ts
(program number taken from 'mediainfo')

CCextractor amongst other outputs this:

Opening file: Pointless.ts
File seems to be a transport stream, enabling TS mode
Analyzing data in general mode
DVB subtitles detected, OCR subsystem not present. Use -out=spupng for graphic output
DVB subtitles detected, OCR subsystem not present. Use -out=spupng for graphic output
DVB subtitles detected, OCR subsystem not present. Use -out=spupng for graphic output
Creating Pointless.srt

The generated Pointless.srt is 3 bytes long and contains this hex string "bbef 00bf".

I have installed leptonica-devel and tesseract-ocr-devel.

Have I missed something during the compilation or/and am I using the wrong parameter in my call of CCextractor?

Originally created by @ghost on GitHub (Nov 24, 2016). On a Linux system I have compiled version 0.82 of CCextractor with these lines: `cmake -DWITH_OCR=ON ../src/` `make` `sudo make install` but when I run this command `ccextractor -pn 7176 -codec dvbsub Pointless.ts` (program number taken from 'mediainfo') CCextractor amongst other outputs this: `Opening file: Pointless.ts` `File seems to be a transport stream, enabling TS mode` `Analyzing data in general mode` `DVB subtitles detected, OCR subsystem not present. Use -out=spupng for graphic output` `DVB subtitles detected, OCR subsystem not present. Use -out=spupng for graphic output` `DVB subtitles detected, OCR subsystem not present. Use -out=spupng for graphic output` `Creating Pointless.srt` The generated Pointless.srt is 3 bytes long and contains this hex string "bbef 00bf". I **have** installed leptonica-devel and tesseract-ocr-devel. Have I missed something during the compilation or/and am I using the wrong parameter in my call of CCextractor?
Author
Owner

@cfsmp3 commented on GitHub (Nov 24, 2016):

You need to build with OCR support, not just have the libraries installed.

On Thu, Nov 24, 2016 at 1:08 PM, Bent Bagger notifications@github.com
wrote:

On a Linux system I have compiled version 0.82 of CCextractor with these
lines:

cmake -DWITH_OCR=ON ../src/ make sudo make install

but when I run this command

ccextractor -pn 7176 -codec dvbsub Pointless.ts
(program number taken from 'mediainfo')

CCextractor amongst other outputs this:

Opening file: Pointless.ts
File seems to be a transport stream, enabling TS mode
Analyzing data in general mode
DVB subtitles detected, OCR subsystem not present. Use -out=spupng for
graphic output
DVB subtitles detected, OCR subsystem not present. Use -out=spupng for
graphic output
DVB subtitles detected, OCR subsystem not present. Use -out=spupng for
graphic output
Creating Pointless.srt

The generated Pointless.srt is 3 bytes long and contains this hex string
"bbef 00bf".

I have installed leptonica-devel and tesseract-ocr-devel.

Have I missed something during the compilation or/and am I using the wrong
parameter in my call of CCextractor?


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
https://github.com/CCExtractor/ccextractor/issues/442, or mute the
thread
https://github.com/notifications/unsubscribe-auth/AFrJ2Y9JaYYbKKyN_xMJVK_pH1hsEHStks5rBfy-gaJpZM4K79j9
.

@cfsmp3 commented on GitHub (Nov 24, 2016): You need to build with OCR support, not just have the libraries installed. On Thu, Nov 24, 2016 at 1:08 PM, Bent Bagger <notifications@github.com> wrote: > On a Linux system I have compiled version 0.82 of CCextractor with these > lines: > > cmake -DWITH_OCR=ON ../src/ make sudo make install > > but when I run this command > > ccextractor -pn 7176 -codec dvbsub Pointless.ts > (program number taken from 'mediainfo') > > CCextractor amongst other outputs this: > > Opening file: Pointless.ts > File seems to be a transport stream, enabling TS mode > Analyzing data in general mode > DVB subtitles detected, OCR subsystem not present. Use -out=spupng for > graphic output > DVB subtitles detected, OCR subsystem not present. Use -out=spupng for > graphic output > DVB subtitles detected, OCR subsystem not present. Use -out=spupng for > graphic output > Creating Pointless.srt > > The generated Pointless.srt is 3 bytes long and contains this hex string > "bbef 00bf". > > I *have* installed leptonica-devel and tesseract-ocr-devel. > > Have I missed something during the compilation or/and am I using the wrong > parameter in my call of CCextractor? > > — > You are receiving this because you are subscribed to this thread. > Reply to this email directly, view it on GitHub > <https://github.com/CCExtractor/ccextractor/issues/442>, or mute the > thread > <https://github.com/notifications/unsubscribe-auth/AFrJ2Y9JaYYbKKyN_xMJVK_pH1hsEHStks5rBfy-gaJpZM4K79j9> > . >
Author
Owner

@ghost commented on GitHub (Nov 25, 2016):

I thought I had, but apparently not. Anyway, I got it working by running these commands in /usr/local/src/ccextractor (a soft link to ccextractor.0.82):

cd build
cmake -DWITH_OCR=ON ../src/
cd ../linux/
make clean
make ENABLE_OCR=yes
make install

('make clean' only to start from a clean slate).

Allow me an additional question: When I now run CCextractor I do get a .srt file but CCextractor complains a little:

Opening file: Pointless.ts
File seems to be a transport stream, enabling TS mode
Analyzing data in general mode
dan.traineddata not found! Switching to English
swe.traineddata not found! Switching to English
fin.traineddata not found! Switching to English
Creating Pointless.srt

Using English trained data on Scandinavian texts makes for funny results!

The tesseract-ocr trained data is installed in /usr/share/tessdata/. So my additional question is actually two:

  1. How do I get CCextractor to read the trained data?
  2. How do I specify to CCextractor which language I want?
@ghost commented on GitHub (Nov 25, 2016): I thought I had, but apparently not. Anyway, I got it working by running these commands in /usr/local/src/ccextractor (a soft link to ccextractor.0.82): `cd build` `cmake -DWITH_OCR=ON ../src/` `cd ../linux/` `make clean` `make ENABLE_OCR=yes` `make install` ('make clean' only to start from a clean slate). Allow me an additional question: When I now run CCextractor I do get a .srt file but CCextractor complains a little: `Opening file: Pointless.ts` `File seems to be a transport stream, enabling TS mode` `Analyzing data in general mode` `dan.traineddata not found! Switching to English` `swe.traineddata not found! Switching to English` `fin.traineddata not found! Switching to English` `Creating Pointless.srt` Using English trained data on Scandinavian texts makes for funny results! The tesseract-ocr trained data is installed in /usr/share/tessdata/. So my additional question is actually two: 1. How do I get CCextractor to read the trained data? 2. How do I specify to CCextractor which language I want?
Author
Owner

@ghost commented on GitHub (Nov 25, 2016):

I may have part of an answer to my question 1 above. When I run an 'strace' on CCextractor I found that CCextractor looks locally to find the trained data:

openat(AT_FDCWD, "./tessdata/", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
write(1, "dan.traineddata not found! Switc"..., 48dan.traineddata not found! Switching to English

but globally to find the English data:

open("/usr/share/tessdata/eng.traineddata", O_RDONLY) = 4

When I added a link from current directory to /usr/shar/tessdata CCextractor stopped complaining over missing data.

It is inconvenient to have to add links to every directory when I have videoes stored, so is this a fault or a feature?

@ghost commented on GitHub (Nov 25, 2016): I may have part of an answer to my question 1 above. When I run an 'strace' on CCextractor I found that CCextractor looks locally to find the trained data: `openat(AT_FDCWD, "./tessdata/", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = -1 ENOENT (No such file or directory)` `write(1, "dan.traineddata not found! Switc"..., 48dan.traineddata not found! Switching to English` but globally to find the English data: `open("/usr/share/tessdata/eng.traineddata", O_RDONLY) = 4` When I added a link from current directory to /usr/shar/tessdata CCextractor stopped complaining over missing data. It is inconvenient to have to add links to every directory when I have videoes stored, so is this a fault or a feature?
Author
Owner

@ykarim commented on GitHub (Nov 29, 2016):

@BentB may you please close this issue as the original "OCR subsystem not present" is now resolved. You can open another issue for your new problem if it still exists.

@ykarim commented on GitHub (Nov 29, 2016): @BentB may you please close this issue as the original "OCR subsystem not present" is now resolved. You can open another issue for your new problem if it still exists.
Author
Owner

@ghost commented on GitHub (Nov 29, 2016):

I have moved the above questions on tesseract data to a new issue 448

@ghost commented on GitHub (Nov 29, 2016): I have moved the above questions on tesseract data to a new issue [448](https://github.com/CCExtractor/ccextractor/issues/448)
Author
Owner

@anshul1912 commented on GitHub (Nov 30, 2016):

Please help us with pull request

It's not bug nither feature, it's incomplete implementation.

-Anshul

On 25-Nov-2016 5:28 PM, "Bent Bagger" notifications@github.com wrote:

I may have part of an answer to my question 1 above. When I run an
'strace' on CCextractor I found that CCextractor looks locally to find the
trained data:

openat(AT_FDCWD, "./tessdata/", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC)
= -1 ENOENT (No such file or directory)
write(1, "dan.traineddata not found! Switc"..., 48dan.traineddata not
found! Switching to English

but globally to find the English data:

open("/usr/share/tessdata/eng.traineddata", O_RDONLY) = 4

When I added a link from current directory to /usr/shar/tessdata
CCextractor stopped complaining over missing data.

It is inconvenient to have to add links to every directory when I have
videoes stored, so is this a fault or a feature?


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
https://github.com/CCExtractor/ccextractor/issues/442#issuecomment-262941877,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AHCOGvZrnB2Bx4tfQg8NGIWLAP0WBp2Dks5rBs1ngaJpZM4K79j9
.

@anshul1912 commented on GitHub (Nov 30, 2016): Please help us with pull request It's not bug nither feature, it's incomplete implementation. -Anshul On 25-Nov-2016 5:28 PM, "Bent Bagger" <notifications@github.com> wrote: > I may have part of an answer to my question 1 above. When I run an > 'strace' on CCextractor I found that CCextractor looks locally to find the > trained data: > > openat(AT_FDCWD, "./tessdata/", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) > = -1 ENOENT (No such file or directory) > write(1, "dan.traineddata not found! Switc"..., 48dan.traineddata not > found! Switching to English > > but globally to find the English data: > > open("/usr/share/tessdata/eng.traineddata", O_RDONLY) = 4 > > When I added a link from current directory to /usr/shar/tessdata > CCextractor stopped complaining over missing data. > > It is inconvenient to have to add links to every directory when I have > videoes stored, so is this a fault or a feature? > > — > You are receiving this because you are subscribed to this thread. > Reply to this email directly, view it on GitHub > <https://github.com/CCExtractor/ccextractor/issues/442#issuecomment-262941877>, > or mute the thread > <https://github.com/notifications/unsubscribe-auth/AHCOGvZrnB2Bx4tfQg8NGIWLAP0WBp2Dks5rBs1ngaJpZM4K79j9> > . >
Author
Owner

@ghost commented on GitHub (Nov 30, 2016):

@anshul1912 I'm not quite familiar with life here at Github so please expand a bit on what you mean by "Please help us with pull request". I know 'pull' from Git, but not in this context. Sorry about that.

@ghost commented on GitHub (Nov 30, 2016): @anshul1912 I'm not quite familiar with life here at Github so please expand a bit on what you mean by "Please help us with pull request". I know 'pull' from Git, but not in this context. Sorry about that.
Author
Owner

@wojtekw commented on GitHub (Feb 4, 2020):

I get the same error "OCR subsystem not present" on MacOS but leptonic and tesseract are installed on system. CCX -v shows:
Version: 0.88
Git commit: Unknown
Compilation date: 2020-02-04
File SHA256: fa4b6f64af9f923a0fca842ae017a189740de63916188b8afa43e6c00acb07b5
Libraries used by CCExtractor
libGPAC Version: 0.7.2-DEV
zlib: 1.2.11
utf8proc Version: 2.4.0
protobuf-c Version: 1.3.1
libpng Version: 1.6.35
FreeType
libhash
nuklear
libzvbi

Do You know where can be a problem ?

@wojtekw commented on GitHub (Feb 4, 2020): I get the same error "OCR subsystem not present" on MacOS but leptonic and tesseract are installed on system. CCX -v shows: Version: 0.88 Git commit: Unknown Compilation date: 2020-02-04 File SHA256: fa4b6f64af9f923a0fca842ae017a189740de63916188b8afa43e6c00acb07b5 Libraries used by CCExtractor libGPAC Version: 0.7.2-DEV zlib: 1.2.11 utf8proc Version: 2.4.0 protobuf-c Version: 1.3.1 libpng Version: 1.6.35 FreeType libhash nuklear libzvbi Do You know where can be a problem ?
Author
Owner

@rialg commented on GitHub (Feb 4, 2020):

Hello, I get the same issue on Ubuntu 16.04 using Tesseract 4.1.1, even after following @ghost 's compilation guide.

  • Below I describe some system information.

Linux desktop 4.15.0-76-generic #86~16.04.1-Ubuntu SMP Mon Jan 20 11:02:50 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

CCExtractor detailed version info
Version: 0.88
Git commit: 6697ed3496
Compilation date: 2020-02-04
File SHA256: Could not open file
Libraries used by CCExtractor
libGPAC Version: 0.7.2-DEV
zlib: 1.2.8
utf8proc Version: 2.4.0
protobuf-c Version: 1.3.1
libpng Version: 1.2.54
FreeType
libhash
nuklear
libzvbi

  • Standard output:
Reading from UDP socket 226.51.0.0:1234
File seems to be a transport stream, enabling TS mode
Analyzing data in general mode
DVB subtitles detected, OCR subsystem not present. Use -out=spupng for graphic output
TS continuity counter not incremented prev/curr 4/6

Found large gap(1072860) in PTS! Trying to recover ...

Found large gap(1072861) in PTS! Trying to recover ...

Found large gap(1072864) in PTS! Trying to recover ...

Found large gap(1072865) in PTS! Trying to recover ...

Found large gap(1072862) in PTS! Trying to recover ...

Found large gap(1072863) in PTS! Trying to recover ...


@rialg commented on GitHub (Feb 4, 2020): Hello, I get the same issue on Ubuntu 16.04 using Tesseract 4.1.1, even after following @ghost 's compilation guide. - Below I describe some system information. Linux desktop 4.15.0-76-generic #86~16.04.1-Ubuntu SMP Mon Jan 20 11:02:50 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux **CCExtractor detailed version info Version: 0.88 Git commit: 6697ed34967343830178f8452e276ab0d94f08e0 Compilation date: 2020-02-04 File SHA256: Could not open file Libraries used by CCExtractor libGPAC Version: 0.7.2-DEV zlib: 1.2.8 utf8proc Version: 2.4.0 protobuf-c Version: 1.3.1 libpng Version: 1.2.54 FreeType libhash nuklear libzvbi** - Standard output: ``` Reading from UDP socket 226.51.0.0:1234 File seems to be a transport stream, enabling TS mode Analyzing data in general mode DVB subtitles detected, OCR subsystem not present. Use -out=spupng for graphic output TS continuity counter not incremented prev/curr 4/6 Found large gap(1072860) in PTS! Trying to recover ... Found large gap(1072861) in PTS! Trying to recover ... Found large gap(1072864) in PTS! Trying to recover ... Found large gap(1072865) in PTS! Trying to recover ... Found large gap(1072862) in PTS! Trying to recover ... Found large gap(1072863) in PTS! Trying to recover ... ```
Author
Owner

@wojtekw commented on GitHub (Feb 5, 2020):

@cfsmp3 Can You reopen issue ?

@wojtekw commented on GitHub (Feb 5, 2020): @cfsmp3 Can You reopen issue ?
Author
Owner

@cfsmp3 commented on GitHub (Nov 21, 2021):

Closing as we've made a lot of changes in build lately so I don't know if this is still an issue or not

@wojtekw @rialg
let me know if it's still a problem in master

@cfsmp3 commented on GitHub (Nov 21, 2021): Closing as we've made a lot of changes in build lately so I don't know if this is still an issue or not @wojtekw @rialg let me know if it's still a problem in master
Author
Owner

@kousthub97 commented on GitHub (Aug 4, 2023):

Hello, I am facing the same issue with tesseract 4.1.1 leptonica-1.76.0 I tried compiling with the below steps and @ghost's both haven't worked for me. Please let me know if any changes needs to be done while compiling.

mkdir build
cd build
cmake -DWITH_OCR=ON -DWITHOUT_RUST=ON ../src/
make

I am using Centos 8 for compiling. Below is the ccextractor --version output

CCExtractor 0.94, Carlos Fernandez Sanz, Volker Quetschke.
Teletext portions taken from Petr Kutalek's telxcc

CCExtractor detailed version info
Version: 0.94
Git commit: 35e73c1c90
Compilation date: 2023-08-04
CEA-708 decoder: C
File SHA256: 08b9e909cc730e591a4331eef6dd45584a20e4a92c8dbf3fc37bf570f48ce79e
Libraries used by CCExtractor
Tesseract Version: 4.1.1
Leptonica Version: leptonica-1.76.0
libGPAC Version: 1.0.1
zlib: 1.2.11
utf8proc Version: 2.4.0
protobuf-c Version: 1.3.1
libpng Version: 1.6.37
FreeType
libhash
nuklear
libzvbi

ldd output for ccextractor

linux-vdso.so.1 (0x00007ffcf98db000) libm.so.6 => /lib64/libm.so.6 (0x00007fe365526000) libpthread.so.0 => /lib64/libpthread.so.0 (0x00007fe365306000) libdl.so.2 => /lib64/libdl.so.2 (0x00007fe365102000) libtesseract.so.4 => /lib64/libtesseract.so.4 (0x00007fe364b9b000) liblept.so.5 => /lib64/liblept.so.5 (0x00007fe36471a000) libc.so.6 => /lib64/libc.so.6 (0x00007fe364358000) /lib64/ld-linux-x86-64.so.2 (0x00007fe3658a8000) libstdc++.so.6 => /lib64/libstdc++.so.6 (0x00007fe363fc3000) libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007fe363dab000) libgomp.so.1 => /lib64/libgomp.so.1 (0x00007fe363b73000) libpng16.so.16 => /lib64/libpng16.so.16 (0x00007fe36393e000) libz.so.1 => /lib64/libz.so.1 (0x00007fe363727000) libjpeg.so.62 => /lib64/libjpeg.so.62 (0x00007fe3634be000) libgif.so.7 => /lib64/libgif.so.7 (0x00007fe3632b4000) libtiff.so.5 => /lib64/libtiff.so.5 (0x00007fe36303b000) libwebp.so.7 => /lib64/libwebp.so.7 (0x00007fe362dcd000) libjbig.so.2.1 => /lib64/libjbig.so.2.1 (0x00007fe362bc1000)

If I directly use the tesseract commands it was working image-to-text conversion.

@kousthub97 commented on GitHub (Aug 4, 2023): Hello, I am facing the same issue with tesseract 4.1.1 leptonica-1.76.0 I tried compiling with the below steps and @ghost's both haven't worked for me. Please let me know if any changes needs to be done while compiling. mkdir build cd build cmake -DWITH_OCR=ON -DWITHOUT_RUST=ON ../src/ make I am using Centos 8 for compiling. Below is the ccextractor --version output CCExtractor 0.94, Carlos Fernandez Sanz, Volker Quetschke. Teletext portions taken from Petr Kutalek's telxcc -------------------------------------------------------------------------- CCExtractor detailed version info Version: 0.94 Git commit: 35e73c1c90ce3ca69394d3523836bb1cdec28f11 Compilation date: 2023-08-04 CEA-708 decoder: C File SHA256: 08b9e909cc730e591a4331eef6dd45584a20e4a92c8dbf3fc37bf570f48ce79e Libraries used by CCExtractor Tesseract Version: 4.1.1 Leptonica Version: leptonica-1.76.0 libGPAC Version: 1.0.1 zlib: 1.2.11 utf8proc Version: 2.4.0 protobuf-c Version: 1.3.1 libpng Version: 1.6.37 FreeType libhash nuklear libzvbi **ldd output for ccextractor** ` linux-vdso.so.1 (0x00007ffcf98db000) libm.so.6 => /lib64/libm.so.6 (0x00007fe365526000) libpthread.so.0 => /lib64/libpthread.so.0 (0x00007fe365306000) libdl.so.2 => /lib64/libdl.so.2 (0x00007fe365102000) libtesseract.so.4 => /lib64/libtesseract.so.4 (0x00007fe364b9b000) liblept.so.5 => /lib64/liblept.so.5 (0x00007fe36471a000) libc.so.6 => /lib64/libc.so.6 (0x00007fe364358000) /lib64/ld-linux-x86-64.so.2 (0x00007fe3658a8000) libstdc++.so.6 => /lib64/libstdc++.so.6 (0x00007fe363fc3000) libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007fe363dab000) libgomp.so.1 => /lib64/libgomp.so.1 (0x00007fe363b73000) libpng16.so.16 => /lib64/libpng16.so.16 (0x00007fe36393e000) libz.so.1 => /lib64/libz.so.1 (0x00007fe363727000) libjpeg.so.62 => /lib64/libjpeg.so.62 (0x00007fe3634be000) libgif.so.7 => /lib64/libgif.so.7 (0x00007fe3632b4000) libtiff.so.5 => /lib64/libtiff.so.5 (0x00007fe36303b000) libwebp.so.7 => /lib64/libwebp.so.7 (0x00007fe362dcd000) libjbig.so.2.1 => /lib64/libjbig.so.2.1 (0x00007fe362bc1000) ` If I directly use the tesseract commands it was working image-to-text conversion.
Author
Owner

@Neo2SHYAlien commented on GitHub (Aug 5, 2023):

@kousthub97 try compile previous commit 0264e7da2b or v0.94 tag :) Both should work

@Neo2SHYAlien commented on GitHub (Aug 5, 2023): @kousthub97 try compile previous commit 0264e7da2be67182deb031228eb07e6ed4943c81 or v0.94 tag :) Both should work
Author
Owner

@kousthub97 commented on GitHub (Aug 5, 2023):

@Neo2SHYAlien
Thanks for help it worked with v0.94 tag

@kousthub97 commented on GitHub (Aug 5, 2023): @Neo2SHYAlien Thanks for help it worked with v0.94 tag
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/ccextractor#201