[BUG] Using -sc option crashes ccextractor #513

Closed
opened 2026-01-29 16:45:59 +00:00 by claunia · 22 comments
Owner

Originally created by @rboy1 on GitHub (Oct 17, 2019).

Please prefix your issue with one of the following: [BUG], [PROPOSAL], [QUESTION].

CCExtractor version (using the --version parameter preferably) : 0.88

In raising this issue, I confirm the following (please check boxes, eg [X] - and delete unchecked ones):

  • [X ] I have read and understood the contributors guide.
  • [X ] I have checked that the bug-fix I am reporting can be replicated, or that the feature I am suggesting isn't already present.
  • [ X] I have checked that the issue I'm posting isn't already reported.
  • [X ] I have checked that the issue I'm porting isn't already solved and no duplicates exist in closed issues and in opened issues
  • [ X] I have checked the pull requests tab for existing solutions/implementations to my issue/suggestion.
  • [ X] I have used the latest available version of CCExtractor to verify this issue exists.

My familiarity with the project is as follows (check one, eg [X] - and delete unchecked ones):

  • I have never used CCExtractor.
  • I have used CCExtractor just a couple of times.
  • I absolutely love CCExtractor, but have not contributed previously.
  • I am an active contributor to CCExtractor.

Necessary information

  • Is this a regression (did it work before)? [ ] NO | [ ] YES - please specify the last known working version
  • What platform did you use? [X] Windows - [ ] Linux - [ ] Mac
  • What were the used arguments? -sc

Additional information
Link to file which is causing the crash:
https://www.dropbox.com/s/4jiooj787e02kd3/CCExtractor%20crash.ts?dl=0

Output of cmd line:

C:\ccextractor.0.88-windows.binaries>ccextractorwinfull.exe -sc "CCExtractor crash.ts" -o output.srt
CCExtractor 0.88, Carlos Fernandez Sanz, Volker Quetschke.
Teletext portions taken from Petr Kutalek's telxcc
--------------------------------------------------------------------------
Input: CCExtractor crash.ts
[Extract: 1] [Stream mode: Autodetect]
[Program : Auto ] [Hauppage mode: No] [Use MythTV code: Auto]
[Timing mode: Auto] [Debug: No] [Buffer input: Yes]
[Use pic_order_cnt_lsb for H.264: No] [Print CC decoder traces: No]
[Target format: .srt] [Encoding: UTF-8] [Delay: 0] [Trim lines: No]
[Add font color data: Yes] [Add font typesetting: Yes]
[Convert case: Yes, but only built-in words] [Video-edit join: No]
[Extraction start time: not set (from start)]
[Extraction end time: not set (to end)]
[Live stream: No] [Clock frequency: 90000]
[Teletext page: Autodetect]
[Start credits text: None]
[Quantisation-mode: CCExtractor's internal function]

-----------------------------------------------------------------
Opening file: CCExtractor crash.ts
File seems to be a transport stream, enabling TS mode
Analyzing data in general mode
VBI/teletext stream ID 836 (0x344) for SID 1332 (0x534)
Notice: Teletext page with possible subtitles detected: 801
- No teletext page specified, first received suitable page is 801, not guaranteed
  6%  |  02:01
C:\ccextractor.0.88-windows.binaries>
Originally created by @rboy1 on GitHub (Oct 17, 2019). Please prefix your issue with one of the following: [BUG], [PROPOSAL], [QUESTION]. CCExtractor version (using the --version parameter preferably) : **0.88** **In raising this issue, I confirm the following (please check boxes, eg [X] - and delete unchecked ones):** - [X ] I have read and understood the [contributors guide](https://github.com/CCExtractor/ccextractor/blob/master/.github/CONTRIBUTING.md). - [X ] I have checked that the bug-fix I am reporting can be replicated, or that the feature I am suggesting isn't already present. - [ X] I have checked that the issue I'm posting isn't already reported. - [X ] I have checked that the issue I'm porting isn't already solved and no duplicates exist in [closed issues](https://github.com/CCExtractor/ccextractor/issues?q=is%3Aissue+is%3Aclosed) and in [opened issues](https://github.com/CCExtractor/ccextractor/issues) - [ X] I have checked the pull requests tab for existing solutions/implementations to my issue/suggestion. - [ X] I have used the latest available version of CCExtractor to verify this issue exists. **My familiarity with the project is as follows (check one, eg [X] - and delete unchecked ones):** - [ ] I have never used CCExtractor. - [ ] I have used CCExtractor just a couple of times. - [X] I absolutely love CCExtractor, but have not contributed previously. - [ ] I am an active contributor to CCExtractor. **Necessary information** - Is this a regression (did it work before)? [ ] NO | [ ] YES - *please specify the last known working version* - What platform did you use? [X] Windows - [ ] Linux - [ ] Mac - What were the used arguments? `-sc` **Additional information** Link to file which is causing the crash: https://www.dropbox.com/s/4jiooj787e02kd3/CCExtractor%20crash.ts?dl=0 Output of cmd line: ``` C:\ccextractor.0.88-windows.binaries>ccextractorwinfull.exe -sc "CCExtractor crash.ts" -o output.srt CCExtractor 0.88, Carlos Fernandez Sanz, Volker Quetschke. Teletext portions taken from Petr Kutalek's telxcc -------------------------------------------------------------------------- Input: CCExtractor crash.ts [Extract: 1] [Stream mode: Autodetect] [Program : Auto ] [Hauppage mode: No] [Use MythTV code: Auto] [Timing mode: Auto] [Debug: No] [Buffer input: Yes] [Use pic_order_cnt_lsb for H.264: No] [Print CC decoder traces: No] [Target format: .srt] [Encoding: UTF-8] [Delay: 0] [Trim lines: No] [Add font color data: Yes] [Add font typesetting: Yes] [Convert case: Yes, but only built-in words] [Video-edit join: No] [Extraction start time: not set (from start)] [Extraction end time: not set (to end)] [Live stream: No] [Clock frequency: 90000] [Teletext page: Autodetect] [Start credits text: None] [Quantisation-mode: CCExtractor's internal function] ----------------------------------------------------------------- Opening file: CCExtractor crash.ts File seems to be a transport stream, enabling TS mode Analyzing data in general mode VBI/teletext stream ID 836 (0x344) for SID 1332 (0x534) Notice: Teletext page with possible subtitles detected: 801 - No teletext page specified, first received suitable page is 801, not guaranteed 6% | 02:01 C:\ccextractor.0.88-windows.binaries> ```
claunia added the good-first-taskGCI19 labels 2026-01-29 16:45:59 +00:00
Author
Owner

@sp2703 commented on GitHub (Nov 3, 2019):

I'm wanting to work on this. Any leads where to start off?

@sp2703 commented on GitHub (Nov 3, 2019): I'm wanting to work on this. Any leads where to start off?
Author
Owner

@cfsmp3 commented on GitHub (Nov 3, 2019):

Start by reproducing the issue locally.

On Sun, Nov 3, 2019 at 5:24 AM sp2703 notifications@github.com wrote:

I'm wanting to work on this. Any leads where to start off?


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
https://github.com/CCExtractor/ccextractor/issues/1115?email_source=notifications&email_token=ABNMTWJTEU3QVCGOT4J2XADQR3GJJA5CNFSM4JBSNFSKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEC5SP3A#issuecomment-549136364,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/ABNMTWPT7AIIVFOGTS6RBXTQR3GJJANCNFSM4JBSNFSA
.

@cfsmp3 commented on GitHub (Nov 3, 2019): Start by reproducing the issue locally. On Sun, Nov 3, 2019 at 5:24 AM sp2703 <notifications@github.com> wrote: > I'm wanting to work on this. Any leads where to start off? > > — > You are receiving this because you are subscribed to this thread. > Reply to this email directly, view it on GitHub > <https://github.com/CCExtractor/ccextractor/issues/1115?email_source=notifications&email_token=ABNMTWJTEU3QVCGOT4J2XADQR3GJJA5CNFSM4JBSNFSKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEC5SP3A#issuecomment-549136364>, > or unsubscribe > <https://github.com/notifications/unsubscribe-auth/ABNMTWPT7AIIVFOGTS6RBXTQR3GJJANCNFSM4JBSNFSA> > . >
Author
Owner

@rboy1 commented on GitHub (Nov 11, 2019):

Patched here:
https://github.com/CCExtractor/ccextractor/pull/1122

@rboy1 commented on GitHub (Nov 11, 2019): Patched here: https://github.com/CCExtractor/ccextractor/pull/1122
Author
Owner

@rboy1 commented on GitHub (Jul 17, 2020):

@cfsmp3 Carlos any chance of getting a new build? It's been a while and many fixes have gone in since the last release

@rboy1 commented on GitHub (Jul 17, 2020): @cfsmp3 Carlos any chance of getting a new build? It's been a while and many fixes have gone in since the last release
Author
Owner

@cfsmp3 commented on GitHub (Jul 20, 2020):

I'll bundle a new version after GSoC (in one month or so). These days I really don't have a lot of time on my hands I'm afraid.

@cfsmp3 commented on GitHub (Jul 20, 2020): I'll bundle a new version after GSoC (in one month or so). These days I really don't have a lot of time on my hands I'm afraid.
Author
Owner

@rboy1 commented on GitHub (Dec 8, 2020):

@cfsmp3 do you think it’s ready for a release?

@rboy1 commented on GitHub (Dec 8, 2020): @cfsmp3 do you think it’s ready for a release?
Author
Owner

@cfsmp3 commented on GitHub (Dec 8, 2020):

Well, there's lots of bugs, but there's no one doing active work these days, so they're not going to go away magically.
I've set a bit of time this Sunday to do a release with what we currently have.

@cfsmp3 commented on GitHub (Dec 8, 2020): Well, there's lots of bugs, but there's no one doing active work these days, so they're not going to go away magically. I've set a bit of time this Sunday to do a release with what we currently have.
Author
Owner

@rboy1 commented on GitHub (Dec 15, 2020):

@cfsmp3 were you able to get the new build released Carlos?

@rboy1 commented on GitHub (Dec 15, 2020): @cfsmp3 were you able to get the new build released Carlos?
Author
Owner

@rboy1 commented on GitHub (Jan 26, 2021):

@cfsmp3 bump on release

@rboy1 commented on GitHub (Jan 26, 2021): @cfsmp3 bump on release
Author
Owner

@rboy1 commented on GitHub (May 22, 2021):

@cfsmp3 can we expect a new release anytime soon?

@rboy1 commented on GitHub (May 22, 2021): @cfsmp3 can we expect a new release anytime soon?
Author
Owner

@cfsmp3 commented on GitHub (May 22, 2021):

Around mid June

On Sat, May 22, 2021, 08:05 rboy1 @.***> wrote:

@cfsmp3 https://github.com/cfsmp3 can we expect a new release anytime
soon?


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/CCExtractor/ccextractor/issues/1115#issuecomment-846420938,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/ABNMTWLTPDHZRCMR6IV6CV3TO7B3XANCNFSM4JBSNFSA
.

@cfsmp3 commented on GitHub (May 22, 2021): Around mid June On Sat, May 22, 2021, 08:05 rboy1 ***@***.***> wrote: > @cfsmp3 <https://github.com/cfsmp3> can we expect a new release anytime > soon? > > — > You are receiving this because you were mentioned. > Reply to this email directly, view it on GitHub > <https://github.com/CCExtractor/ccextractor/issues/1115#issuecomment-846420938>, > or unsubscribe > <https://github.com/notifications/unsubscribe-auth/ABNMTWLTPDHZRCMR6IV6CV3TO7B3XANCNFSM4JBSNFSA> > . >
Author
Owner

@rboy1 commented on GitHub (Jun 17, 2021):

@cfsmp3 Mid June is here, looking forward to it :)

@rboy1 commented on GitHub (Jun 17, 2021): @cfsmp3 Mid June is here, looking forward to it :)
Author
Owner

@canihavesomecoffee commented on GitHub (Jun 17, 2021):

https://github.com/CCExtractor/ccextractor/releases/tag/v0.89 :)

Windows build is still WIP*. You can download the binaries here though (let me know if you can't): https://github.com/CCExtractor/ccextractor/suites/2983776538/artifacts/67339947

* We're working on a new installer and code signing, the latter is what's holding us back right now

@canihavesomecoffee commented on GitHub (Jun 17, 2021): https://github.com/CCExtractor/ccextractor/releases/tag/v0.89 :) Windows build is still WIP*. You can download the binaries here though (let me know if you can't): https://github.com/CCExtractor/ccextractor/suites/2983776538/artifacts/67339947 \* We're working on a new installer and code signing, the latter is what's holding us back right now
Author
Owner

@rboy1 commented on GitHub (Jun 18, 2021):

Awesome thanks. I noticed that it says it's compiled against tessdata 4.0alpha. Does it mean it won't work with tessdata 3?

@rboy1 commented on GitHub (Jun 18, 2021): Awesome thanks. I noticed that it says it's compiled against tessdata 4.0alpha. Does it mean it won't work with tessdata 3?
Author
Owner

@canihavesomecoffee commented on GitHub (Jun 18, 2021):

Awesome thanks. I noticed that it says it's compiled against tessdata 4.0alpha. Does it mean it won't work with tessdata 3?

For Windows the libraries are embedded, so you're indeed stuck to that specific version.

I noticed you had another comment a couple of minutes ago, but it seems to have vanished.

Thanks. When I try to run the GUI it gives me an error, "code execution cannot proceed because pthreadVSE2.dll was not found". Looks like one of the DLL's is missing.

Was that when trying the standalone binary too, or caused by the GUI exe itself?

Anyway, looks like we should add that dll to the generated artifacts too.

@canihavesomecoffee commented on GitHub (Jun 18, 2021): > Awesome thanks. I noticed that it says it's compiled against tessdata 4.0alpha. Does it mean it won't work with tessdata 3? For Windows the libraries are embedded, so you're indeed stuck to that specific version. I noticed you had another comment a couple of minutes ago, but it seems to have vanished. > Thanks. When I try to run the GUI it gives me an error, "code execution cannot proceed because pthreadVSE2.dll was not found". Looks like one of the DLL's is missing. Was that when trying the standalone binary too, or caused by the GUI exe itself? Anyway, looks like we should add that dll to the generated artifacts too.
Author
Owner

@rboy1 commented on GitHub (Jun 18, 2021):

Hmm, I tried using tessdata 3.04 and it seemed to work fine converting dvbsub to srt

@rboy1 commented on GitHub (Jun 18, 2021): Hmm, I tried using tessdata 3.04 and it seemed to work fine converting dvbsub to srt
Author
Owner

@canihavesomecoffee commented on GitHub (Jun 18, 2021):

IIRC tessdata is not bound to the tesseract version, so that's indeed no problem :)

@canihavesomecoffee commented on GitHub (Jun 18, 2021): IIRC tessdata is not bound to the tesseract version, so that's indeed no problem :)
Author
Owner

@rboy1 commented on GitHub (Jun 18, 2021):

On a side note I have some files with multiple dvbsub tracks but when I run ccextractor it only extracts the first track. Is there a way to get it to extract all tracks or maybe specify the track number?

@rboy1 commented on GitHub (Jun 18, 2021): On a side note I have some files with multiple dvbsub tracks but when I run ccextractor it only extracts the first track. Is there a way to get it to extract *all* tracks or maybe specify the track number?
Author
Owner

@cfsmp3 commented on GitHub (Jun 18, 2021):

Awesome thanks. I noticed that it says it's compiled against tessdata 4.0alpha. Does it mean it won't work with tessdata 3?

For Windows the libraries are embedded, so you're indeed stuck to that specific version.

Maybe we should also figure out a way to build those libraries again from source :-) @Izaron did that work a few years ago and we haven't touched that ever since I think?

@cfsmp3 commented on GitHub (Jun 18, 2021): > > Awesome thanks. I noticed that it says it's compiled against tessdata 4.0alpha. Does it mean it won't work with tessdata 3? > > For Windows the libraries are embedded, so you're indeed stuck to that specific version. Maybe we should also figure out a way to build those libraries again from source :-) @Izaron did that work a few years ago and we haven't touched that ever since I think?
Author
Owner

@cfsmp3 commented on GitHub (Jun 18, 2021):

IIRC tessdata is not bound to the tesseract version, so that's indeed no problem :)

 Compatibility with Tesseract 3 is enabled by using the Legacy OCR Engine mode (--oem 0). 

Since tesseract 3 is no longer maintained at all, I think we should stick to 4 (which as can be see, supports the pattern recognition mode from v3, so no need to actually use v3).

@cfsmp3 commented on GitHub (Jun 18, 2021): > IIRC tessdata is not bound to the tesseract version, so that's indeed no problem :) ``` Compatibility with Tesseract 3 is enabled by using the Legacy OCR Engine mode (--oem 0). ``` Since tesseract 3 is no longer maintained at all, I think we should stick to 4 (which as can be see, supports the pattern recognition mode from v3, so no need to actually use v3).
Author
Owner

@rboy1 commented on GitHub (Jun 18, 2021):

@cfsmp3 are you saying that we need to explicitly add --oem 0 to get it to work with Tesseract 3 because 0.89 is working (or am I missing something here).

For future ref, wouldn't be better if ccextractor automatically detects if it's using Tesseract 3 or 4 with an option to override using the --oem?

@rboy1 commented on GitHub (Jun 18, 2021): @cfsmp3 are you saying that we need to explicitly add --oem 0 to get it to work with Tesseract 3 because 0.89 is working (or am I missing something here). For future ref, wouldn't be better if ccextractor automatically detects if it's using Tesseract 3 or 4 with an option to override using the --oem?
Author
Owner

@cfsmp3 commented on GitHub (Jun 18, 2021):

@cfsmp3 are you saying that we need to explicitly add --oem 0 to get it to work with Tesseract 3 because 0.89 is working (or am I missing something here).

What I pasted comes from tesseract's website. v4 supports v3's legacy engine, so there's not reason to actually have v3 around at all. If you want to use the old system, just use --oem (If I remember correctly we do expose that argument in CCExtractor).

For future ref, wouldn't be better if ccextractor automatically detects if it's using Tesseract 3 or 4 with an option to override using the --oem?

I don't want to support legacy versions of libraries. If the tesseract maintainers have decided to stop development of v3, what's the reason for us to bother supporting both? Just use v4 and use the legacy mode if it works better.

@cfsmp3 commented on GitHub (Jun 18, 2021): > @cfsmp3 are you saying that we need to explicitly add --oem 0 to get it to work with Tesseract 3 because 0.89 is working (or am I missing something here). What I pasted comes from tesseract's website. v4 supports v3's legacy engine, so there's not reason to actually have v3 around at all. If you want to use the old system, just use --oem (If I remember correctly we do expose that argument in CCExtractor). > > For future ref, wouldn't be better if ccextractor automatically detects if it's using Tesseract 3 or 4 with an option to override using the --oem? I don't want to support legacy versions of libraries. If the tesseract maintainers have decided to stop development of v3, what's the reason for us to bother supporting both? Just use v4 and use the legacy mode if it works better.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/ccextractor#513