mirror of
https://github.com/CCExtractor/ccextractor.git
synced 2026-04-24 15:09:55 +00:00
[BUG] Segmentation Fault when converting from McPoodle raw to webVTT #611
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @bbgdzxng1 on GitHub (Jan 24, 2021).
CCExtractor version: 0.88
In raising this issue, I confirm the following:
Necessary information
$ brew install ccextractorDownload cc sample from https://archive.org/download/cc_sample
Convert scc file to McPoodle's raw format using McPoodle's SCCTOOLS' scc2raw.pl, available from http://www.theneitherworld.com/mcpoodle/SCC_TOOLS/scc2raw.pl. Carlos uses McPoodle's SCCTOOLS as part of this project.
At this point, we have a McPoodle raw format file, which has been generated by McPoodle's own tool.
Now attempting to convert McPoodle's raw format to VTT, using ccextractor 0.88 (see Additional Information section for error message)
Additional information
Converting from McPoodle Raw format to SRT is successful. Converting from McPoodle Raw format to VTT fails with Segmentation Fault.
Check input using report
Converting to SRT is successful
Converting to webVTT causes Segmentation Fault 11
Full commands for replication and sample file are included in this ticket.
@NilsIrl commented on GitHub (Feb 2, 2021):
The problem is a null pointer de-reference.
timingis a null pointer so trying to getsync_pts2fts_setcrashes.b1c22e5034/src/lib_ccx/ccx_encoders_webvtt.c (L210)@NilsIrl commented on GitHub (Feb 2, 2021):
If that
timingis meant to be null and it isn't a problem with the parser, should we just set the values to 0? That seems to be what is recommended according to https://sdks.support.brightcove.com/features/synchronizing-webvtt-captions.html@siv2r commented on GitHub (Feb 7, 2021):
Here, by "values," do you mean the variables
context->timing->sync_pts2fts_pts,h1,m1,s1,ms1(in the code below) should be set to0? This makes sense according to the recommended settings.cf84757e02/src/lib_ccx/ccx_encoders_webvtt.c (L220-L222)If you don't mind, can I try to look into whether
context->timingis supposed to benullor not?@NilsIrl commented on GitHub (Feb 7, 2021):
No, I meant we should write the header with as values 0 (instead of what would have been in
sync_pts2fts_fts).@siv2r commented on GitHub (Feb 9, 2021):
I wanted to say the same. I should not have used said "set" variables. Shouldn't we also need to write 0's (in the header) instead of
h1,m1,s1,ms1? Since they are dependent onsync_pts2fts_fts(due to the function below)cf84757e02/src/lib_ccx/ccx_encoders_webvtt.c (L215)@siv2r commented on GitHub (Feb 9, 2021):
I compared the
webvttencoding forcc_sample_dos2unix.bin(given in this thread) and UK_ITVBE.ts (provided on ccextractor's tv samples page) and found something really weird.cf84757e02/src/lib_ccx/ccx_encoders_common.c (L935-L939)The problem occurs in the initialization of the encoder context. The
mallocfunction acts differently forUK_ITVBE.ts(here,ctx->timingis not null) andcc_sample_dos2unix.bin(here,ctx->timingis set tonull)I added a fix by allocating memory for
ctx->timingbut would love to discuss more why this is happening :)@NilsIrl commented on GitHub (Feb 9, 2021):
ctx->timingis not null because timing information is included in the file. I assume this is not included in McPoodle raw (which just contains closed captions), or not in this specific mcpoodle raw file or ccextractor is not able to parse it correctly.@siv2r commented on GitHub (Feb 10, 2021):
By this, I meant that the value of
ctx->timingjust after the initialization ofctx(usingmalloc) was different in both cases. Shouldn't thectx->timingvalue be the same in both cases since this is a simple initialization? (later on, it may change depending on the input)I might have made a mistake somewhere (while debugging in vs code). I will check this again.
Will look into this.
@bbgdzxng1 commented on GitHub (Feb 18, 2021):
I really appreciate you folks taking the time to look at this. I still plan on using McPoodle > ccextractor as a means of converting SCC > McP Raw > webVTT and I am very grateful that you have even looked, whatever the outcome. Thanks!
@siv2r commented on GitHub (Feb 18, 2021):
Happy to help. Hope to fix this soon!
@bbgdzxng1 commented on GitHub (Apr 7, 2021):
Many thanks for trying to fix. I see that PR https://github.com/CCExtractor/ccextractor/pull/1298 has been closed. Just wanted to say thanks for your efforts so far and although the PR itself was not suitable for prod, I do still hope to use ccextractor to convert SCC > McP Raw > WebVTT. I'm currently going via SCC > McP Raw > SRT > WebVTT, but a conversion to WebVTT is hopefully a legitimate use case (for all the benefits of preserving markup, data integrity, character sets etc).
Are there any roadmap plans to support SCC as an input format? I know that ccextractor is a caption extractor rather than a caption convertor, but given the similarities between 608 and SCC, maybe that is a better way to solve for this use case, rather than receiving tickets about a proprietary and legacy McPoodle input format, which I appreciate is an edge case. If this is indeed a possibility, let me know and I would happily raise a feature request for SCC input, rather than a bug report for McPoodle.
There are very few good SCC convertors out there, and ccextractor is the best I have found for 608.
@SuvigyaJain1 commented on GitHub (Apr 26, 2021):
Is this issue still open? And is someone working on it?
@canihavesomecoffee commented on GitHub (Apr 26, 2021):
Yes, still open, and no one's working on it AFAIK.
@SuvigyaJain1 commented on GitHub (Apr 27, 2021):
So I had a look through the file and I think the source of the problem is that the scc file doesn't have the information required to build the
X-TIMESTAMP-MAP=MPEGTSheader?. Thus adding a check forcontext->timing != NULLbefore accessingcontext->timing->sync_pts2ftsin (lib_ccx/ccx_encoders_webvtt.c; line 210) can avoid the segmentation fault in such cases. As far as I know its not a problem for a .vtt file to not contain the synchronisation offset in the header. The default is assumed to be 0 if none is present.I tried running the provided files and am able to get the .vtt file without problem

I have attached a screenshot of the same
@SuvigyaJain1 commented on GitHub (Apr 27, 2021):
I will admit I'm baffled how my PR could break so many tests. How adding 1 line in webvtt encoder file breaks everything else as well, is confusing me and after going through the tests and downloading the files locally and running them they seem to be working. Am I missing something here? Note: This is my first contribution to this repo
@canihavesomecoffee commented on GitHub (Apr 27, 2021):
The sample platform isn't bug-free either ;)
I've looked at the actual results page, and there there's fewer tests broken :)
Since it's a Windows report, the failing test for the dictionary (using a unix path) is expected. The failing test in Broken category can be ignored. The 3 failing ones in DVB worry me a bit more, but might be present in the main repo already, and the same goes for the Teletext one.
That's about it... So I think it's safe to say you can reopen that PR.
@SuvigyaJain1 commented on GitHub (Apr 27, 2021):
I see! Thanks. makes more sense. I have reopened the PR as suggested. Ill look at the DVB tests in a bit more detail just to be on the safer side.
@bbgdzxng1 commented on GitHub (Jun 21, 2021):
Confirming that this is resolved in 0.89. Thank-you...
Thank-you to all who contributed to the investigation, development, testing and release.
ccextractor can now successfully convert
scc-to-srt,scc-to-vtt(and probably other outputs) with the interim step of going via McPoodle raw format.