mirror of
https://github.com/CCExtractor/ccextractor.git
synced 2026-02-03 21:23:48 +00:00
No subtitles with ccextractor - but with TS-Doctor #483
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @poeeast on GitHub (Feb 8, 2019).
[QUESTION].
CCExtractor version : 0.87
In raising this issue, I confirm the following (please check boxes, eg [X] - and delete unchecked ones):
My familiarity with the project is as follows (check one, eg [X] - and delete unchecked ones):
Necessary information
-autoprogram**Video links (replace text below with your links) **
https://mirus.dk/cce/GP.mts
Additional information
https://mirus.dk/cce/GP-LOG.txt
https://mirus.dk/cce/GP-TSD.srt
When running the (4 min, 96 MB) video file GP.mts through ccextractor I get no subtitles (see GP-LOG.txt). I have used standard setup, but have also tried with other combinations. When I run the video file through TS-Doctor I get a file with subtitles (GP-TSD.srt).
PS: Make sure you set an alert in GitHub so you get notifications about your ticket. We may need to ask questions and we do everything inside GitHub's system.
@thelastpolaris commented on GitHub (Apr 11, 2019):
First of all - do you want to extract Teletext subtitles or DVB ones? If you want to extract DVB subs then you have the following issues:
According to your logs, you don't have Danish language data for Tesseract. You can download it here https://github.com/tesseract-ocr/tessdata_best (dan.traineddata). In Ubuntu I put this file here /usr/share/tesseract-ocr/4.00/tessdata/. This file is needed for Tesseract to be able to recognize Danish characters. Also don't forget to compile CCExtractor with OCR support (if you haven't been doing that already)
Have you tried -codec dvbsub option? The way CCExtractor is working by default is that Teletext is given higher priority, meaning that if CCExtractor finds Teletext track before DVB, then Teletext (if any) will be returned. By specifying -codec dvbsub you explicitly tell CCExtractor to look only for DVB subtitles.
@poeeast commented on GitHub (Apr 27, 2019):
Thank you for the answer.
The fault is probably mine. I used ccextractor without having installed Tesseract. I have no previous knowledge with manipulating video files. So thank you for the help.
@MatejMecka commented on GitHub (Jun 2, 2019):
Can this issue be closed as it seems that it's fixed?