mirror of
https://github.com/CCExtractor/ccextractor.git
synced 2026-02-03 21:23:48 +00:00
[CEA-708] Is there a way to extract subtitles #271
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @fsh240led on GitHub (Feb 2, 2017).
The others are extracted without problems
But the video below is not extract
https://docs.google.com/uc?id=0B42jO8cBUe6baEtqLTNkRnExbVk&export=download
PotPlayer displays subtitles
http://i.imgur.com/SX9KApM.png
@cfsmp3 commented on GitHub (Feb 7, 2017):
Are yo sure you aren't using an external subtitle file? I just installed
potplayer for this and it says there's no subtitles available.
On Wed, Feb 1, 2017 at 10:42 PM, fsh240led notifications@github.com wrote:
@fsh240led commented on GitHub (Feb 8, 2017):
Preferences required
Preferences(F5) -> Filter Control -> Video Decoder -> Built-in codec/DXVA settings -> Enable Closed Captioning (Check)
@cfsmp3 commented on GitHub (Feb 8, 2017):
I'm probably missing some fonts, this is what a see after enabling CC.
@fsh240led commented on GitHub (Feb 8, 2017):
korean font download link
http://ko.cooltext.com/Fonts-Unicode-Korean
@cfsmp3 commented on GitHub (Feb 9, 2017):
Assigning to @Izaron since he was the last one looking into Korean.
@cfsmp3 commented on GitHub (Feb 9, 2017):
GSoC qualification: 3 points.
@Izaron commented on GitHub (Feb 9, 2017):
@fsh240led Are this output at 00:09 from the other video player is incorrect also in comparing with PotPlayer? Can you confirm it?
@cfsmp3 and @AlexBratosin2001 (since you're PTS man)
Problem is there: https://github.com/CCExtractor/ccextractor/blob/master/src/lib_ccx/stream_functions.c#L307
Can you give me specs about this binary stream, when
(nextheader[7]&0xC0) == 0xC0? Maybe there just problem with hexadecimal hardcoded constants.@fsh240led commented on GitHub (Feb 9, 2017):
It is an unknown words
It should be output as
@mahalwal commented on GitHub (Dec 21, 2017):
@fsh240led Can you tell which font are you using from this link? ( http://ko.cooltext.com/Fonts-Unicode-Korean ) I tried Batang(che), Gungsuh(che) and they are still giving random characters.
@MatejMecka commented on GitHub (Jan 3, 2018):
@fsh240led Can you tell me with which command did you got this subtitles?
@navimakarov commented on GitHub (Dec 11, 2018):
The problem was that when we tried to process this file we got error "Window has to be defined" because decoder->current_window == -1;
https://github.com/CCExtractor/ccextractor/blob/master/src/lib_ccx/ccx_decoders_708.c#L1375
So I found out that in ccx_decoders_708.c we had a condition which was impossible according to author's comment but here that condition returned true which crashed ccextractor extracting captions and made ret = 0; which is No captions found in Input error.
Here is the problem:
https://github.com/CCExtractor/ccextractor/blob/master/src/lib_ccx/ccx_decoders_708.c#L1728
And the obvious fix is to change:
_dtvcc_decoders_reset(dtvcc);
return;
to
dtvcc->current_packet_length = len;
After all those changes ccextractor is able to extract captions from this file.
@cfsmp3 commented on GitHub (Dec 11, 2018):
OK so it's good research here... but let's not be happy with that, we need to find out what's actually going on.
if (dtvcc->current_packet_length != len) // Is this possible?
So there dtvcc->current_packet_length is how much data we have for that packet.
len is the size length according to the packet header.
So we have 3 possibilities:
a) We have more data than the declared packet length. If yes - what's that data, where did it come from, what is it for? Can we ignore it?
b) We have LESS data than the declared packet length. This is really not good, we can't process the packet at all (we would read out of bounds)
c) They match, which is the expected thing, but we know that's not the case here.
So first - let's check if it's a or b.
@navimakarov commented on GitHub (Dec 12, 2018):
So our problem is "a problem"(We have more data than the declared packet length.) cause after debugging we constantly get in this particular file dtvcc->current_packet_length = len + 2; But before that we have a condition len = len * 2; That means that while getting len from here: https://github.com/CCExtractor/ccextractor/blob/master/src/lib_ccx/ccx_decoders_708.c#L1712 we get it wrong and len must equals to len - 1. Maybe this is a problem with False PTS/DTS or
False PES header which we get before or maybe the problem is with hardcoded value in that line of code I shared above. I'm working on it and has no ideas what is this data for. But I compared my output to PotPlayer's output and it is absolutely identical so I think that we can skip this data.
@cfsmp3 commented on GitHub (Dec 12, 2018):
That line:
int len = dtvcc->current_packet[0] & 0x3F; // 6 least significants bits
is correct. The packet header has 6 bits for the packet length and that mask gives you those 6 bits.
You may need to go over the actual specs to understand that code.
@PunitLodha commented on GitHub (Mar 11, 2021):
What i found was that there are extra 2 bytes (Both are always 0) after the stipulated len specified in the header
Example:-
Here in the header, len is specified as 22.
While parsing line 13, current_packet_length also becomes 22, which means after that everything should be padding(cc_valid = 0)
But on line 14 we have cc_valid = 1, which leads to it being parsed and current_packet_length becoming 24, i.e. current_packet_length = len +2
I guess we can safely ignore the two bytes, and change,
to
Should i go ahead with the change?
@cfsmp3 commented on GitHub (Mar 11, 2021):
@PunitLodha Good research.
Well, give it a go and see if it makes things better or worse.