mirror of
https://github.com/CCExtractor/ccextractor.git
synced 2026-02-03 21:23:48 +00:00
[Feature] Unable to properly extract (ocr) dvb_subtitles from .mkv container #444
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @agrafiodata on GitHub (Aug 27, 2018).
Please prefix your issue with one of the following: [BUG], [PROPOSAL], [QUESTION].
CCExtractor version (using the --version parameter preferably) : X.X
In raising this issue, I confirm the following (please check boxes, eg [X] - and delete unchecked ones):
My familiarity with the project is as follows (check one, eg [X] - and delete unchecked ones):
Necessary information
Is this a regression (did it work before)? [X] NO | [ ] YES - please specify the last known working version
What platform did you use? [ ] Windows - [X] Linux - [ ] Mac
What were the used arguments?
Tried both without arguments and
-codec dvbsub -dvblang eng -ocrlang eng. Same outputs for both.Video links
https://www.dropbox.com/sh/7wmowjdyy2sv4n6/AADACv4q16SmC2MgAE_nqz3Da?dl=0
Additional information
Hi,
I am having an issue while trying to extract subtitles from an mkv container. Using the default build from the ccextractor master branch with ocr enabled (git -clone master and build via the supplied linux/build script). The video has a single dvb_sub subtitle stream. Ccextractor, however it shows no signs of actually trying to perform ocr and the produced .srt file appears to have binary data instead of having the extracted text. If I try to extract a converted .ts file from this file (via ffmpeg -i file.mkv -c copy -map 0 file.ts) it extracts subtitles properly. Same problem if done in the opposite sequence, that is it extracts from a recorded .ts file but fails to do so from a converted .mkv one.
PS: I have marked this issue as a [BUG] as I consider this to arise from a problem in codec identification for mkv containers. However I have not read the current implementations for this so it could technically be a restriction in the existing implementation, thus actually being a feature request.
@agrafiodata commented on GitHub (Aug 27, 2018):
Posting detailed version parameters that I forgot to post above
CCExtractor detailed version info
Version: 0.87
Git commit:
45ed8456eeCompilation date: 2018-08-27
File SHA256: 56628a1805a3f2d0925f1055905b649c3899c08e9051b1805eb13da1df88b695
Libraries used by CCExtractor
Tesseract Version: 3.04.01
Leptonica Version: leptonica-1.74.1
libGPAC Version: 0.7.2-DEV
zlib: 1.2.11
utf8proc Version: 2.1.0
protobuf-c Version: 1.1.1
libpng Version: 1.6.34
FreeType
libhash
nuklear
libzvbi
@anshul1912 commented on GitHub (Aug 29, 2018):
Yes your guess is correct, its a feature, matroska does not support DVB subtitle yet.
@agrafiodata commented on GitHub (Aug 31, 2018):
Is there a plan to add support for it in the foreseeable future?
@cfsmp3 commented on GitHub (Aug 31, 2018):
No. But no plans doesn't mean it won't ever be added. If some
developer comes along and does it we'll be happy to integrate, or if
someone pays us to do it.
But if no one in the core team needs it for himself then it's not
going to happen.
On Fri, Aug 31, 2018 at 1:29 AM agrafiodata notifications@github.com wrote:
@ashutoshmishraji commented on GitHub (Mar 22, 2019):
HI @cfsmp3
I am a gsoc 2019 participant and can u give me some guidance to solve this bug
@cfsmp3 commented on GitHub (Mar 22, 2019):
Sure, please don't use a github issue to ask unrelated things, please check
out our website and join our slack.
On Fri, Mar 22, 2019, 08:26 ASHUTOSH MISHRA notifications@github.com
wrote:
@thelastpolaris commented on GitHub (Mar 22, 2019):
Hey @ashutoshmishraji . I've almost finished adding this feature and will make a push request soon. Sorry for not stating that earlier.
@ashutoshmishraji commented on GitHub (Mar 22, 2019):
ok @cfsmp3 @thelastpolaris