[BUG] Segmentation Fault when converting from McPoodle raw to webVTT #611

Closed
opened 2026-01-29 16:49:00 +00:00 by claunia · 18 comments
Owner

Originally created by @bbgdzxng1 on GitHub (Jan 24, 2021).

CCExtractor version: 0.88

In raising this issue, I confirm the following:

  • I have read and understood the contributors guide.
  • I have checked that the bug-fix I am reporting can be replicated, or that the feature I am suggesting isn't already present.
  • I have checked that the issue I'm posting isn't already reported.
  • I have checked that the issue I'm porting isn't already solved and no duplicates exist in closed issues and in opened issues
  • I have checked the pull requests tab for existing solutions/implementations to my issue/suggestion.
  • I have used the latest available version of CCExtractor to verify this issue exists.
  • I have ticked all the boxes in this section and to prove it I'm deleting the section completely to remove boilerplate text.

Necessary information

  • Is this a regression (i.e. did it work before) unknown
  • What platform did you use? macOS 10.15.7, CCExtractor version: 0.88 installed via $ brew install ccextractor
  • What were the used arguments? Full demonstration of complete workflow from scc > raw > ccextractor

Download cc sample from https://archive.org/download/cc_sample

$ curl --silent --location --request GET "https://archive.org/download/cc_sample/cc_sample.scc" -o cc_sample.scc
$ dos2unix --newfile cc_sample.scc cc_sample_dos2unix.scc

Convert scc file to McPoodle's raw format using McPoodle's SCCTOOLS' scc2raw.pl, available from http://www.theneitherworld.com/mcpoodle/SCC_TOOLS/scc2raw.pl. Carlos uses McPoodle's SCCTOOLS as part of this project.

$ perl scc2raw.pl cc_sample_dos2unix.scc cc_sample_dos2unix.bin

At this point, we have a McPoodle raw format file, which has been generated by McPoodle's own tool.

Now attempting to convert McPoodle's raw format to VTT, using ccextractor 0.88 (see Additional Information section for error message)

$ ccextractor -debug -in=raw -out=webvtt-full -utf8 cc_sample_dos2unix.bin -o cc_sample_dos2unix.vtt

Additional information

Converting from McPoodle Raw format to SRT is successful. Converting from McPoodle Raw format to VTT fails with Segmentation Fault.

Check input using report

$ ccextractor -debug -in=raw -out=report -utf8 cc_sample_dos2unix.bin
File: cc_sample_dos2unix.bin
Stream Mode: McPoodle's raw
EIA-608: No
CEA-708: No
MPEG-4 Timed Text: No

Converting to SRT is successful

$ ccextractor -debug -in=raw -out=srt -utf8 cc_sample_dos2unix.bin -o cc_sample_dos2unix.srt
CCExtractor 0.88, Carlos Fernandez Sanz, Volker Quetschke.
Teletext portions taken from Petr Kutalek's telxcc
--------------------------------------------------------------------------
Input: Files (1): cc_sample_dos2unix.bin
[Extract: 1] [Stream mode: McPoodle's raw]
[Program : Auto ] [Hauppage mode: No] [Use MythTV code: Auto]
[Timing mode: Auto] [Debug: Yes] [Buffer input: No]
[Use pic_order_cnt_lsb for H.264: No] [Print CC decoder traces: No]
[Target format: .srt] [Encoding: UTF-8] [Delay: 0] [Trim lines: No]
[Add font color data: Yes] [Add font typesetting: Yes]
[Convert case: No] [Video-edit join: No]
[Extraction start time: not set (from start)]
[Extraction end time: not set (to end)]
[Live stream: No] [Clock frequency: 90000]
[Teletext page: Autodetect]
[Start credits text: None]
[Quantisation-mode: CCExtractor's internal function]

-----------------------------------------------------------------
Opening file: cc_sample_dos2unix.bin
Analyzing data in McPoodle raw mode


Total frames time:	  00:00:00:000  (0 frames at 29.97fps)

Min PTS:				00:00:00:001
Max PTS:				00:01:09:070
Length:				 00:01:09:069
Done, processing time = 0 seconds
Issues? Open a ticket here
https://github.com/CCExtractor/ccextractor/issues
$ head -n20 cc_sample_dos2unix.srt 
1
00:00:09,243 --> 00:00:12,378
                  Is that what  
                  the Americans 
                  call doodling?

2
00:00:12,413 --> 00:00:13,278
It is more serious              

3
00:00:13,314 --> 00:00:16,081
than you could                  
possibly realize                
Charlotte                       

4
00:00:20,221 --> 00:00:21,020
Good

Converting to webVTT causes Segmentation Fault 11

$ ccextractor -debug -in=raw -out=webvtt-full -utf8 cc_sample_dos2unix.bin -o cc_sample_dos2unix.vtt
CCExtractor 0.88, Carlos Fernandez Sanz, Volker Quetschke.
Teletext portions taken from Petr Kutalek's telxcc
--------------------------------------------------------------------------
Input: Files (1): cc_sample_dos2unix.bin
[Extract: 1] [Stream mode: McPoodle's raw]
[Program : Auto ] [Hauppage mode: No] [Use MythTV code: Auto]
[Timing mode: Auto] [Debug: Yes] [Buffer input: No]
[Use pic_order_cnt_lsb for H.264: No] [Print CC decoder traces: No]
[Target format: .vtt] [Encoding: UTF-8] [Delay: 0] [Trim lines: No]
[Add font color data: Yes] [Add font typesetting: Yes]
[Convert case: No] [Video-edit join: No]
[Extraction start time: not set (from start)]
[Extraction end time: not set (to end)]
[Live stream: No] [Clock frequency: 90000]
[Teletext page: Autodetect]
[Start credits text: None]
[Quantisation-mode: CCExtractor's internal function]

-----------------------------------------------------------------
Opening file: cc_sample_dos2unix.bin
Analyzing data in McPoodle raw mode
Segmentation fault: 11

Full commands for replication and sample file are included in this ticket.

Originally created by @bbgdzxng1 on GitHub (Jan 24, 2021). CCExtractor version: 0.88 # In raising this issue, I confirm the following: - [x] I have read and understood the [contributors guide](https://github.com/CCExtractor/ccextractor/blob/master/.github/CONTRIBUTING.md). - [x] I have checked that the bug-fix I am reporting can be replicated, or that the feature I am suggesting isn't already present. - [x] I have checked that the issue I'm posting isn't already reported. - [x] I have checked that the issue I'm porting isn't already solved and no duplicates exist in [closed issues](https://github.com/CCExtractor/ccextractor/issues?q=is%3Aissue+is%3Aclosed) and in [opened issues](https://github.com/CCExtractor/ccextractor/issues) - [x] I have checked the pull requests tab for existing solutions/implementations to my issue/suggestion. - [x] I have used the latest available version of CCExtractor to verify this issue exists. - [x] I have ticked all the boxes in this section and to prove it I'm deleting the section completely to remove boilerplate text. # Necessary information - Is this a regression (i.e. did it work before) unknown - What platform did you use? macOS 10.15.7, CCExtractor version: 0.88 installed via `$ brew install ccextractor` - What were the used arguments? Full demonstration of complete workflow from scc > raw > ccextractor Download cc sample from https://archive.org/download/cc_sample ``` $ curl --silent --location --request GET "https://archive.org/download/cc_sample/cc_sample.scc" -o cc_sample.scc $ dos2unix --newfile cc_sample.scc cc_sample_dos2unix.scc ``` Convert scc file to McPoodle's raw format using McPoodle's SCCTOOLS' scc2raw.pl, available from http://www.theneitherworld.com/mcpoodle/SCC_TOOLS/scc2raw.pl. Carlos uses McPoodle's SCCTOOLS as part of this project. ``` $ perl scc2raw.pl cc_sample_dos2unix.scc cc_sample_dos2unix.bin ``` At this point, we have a McPoodle raw format file, which has been generated by McPoodle's own tool. Now attempting to convert McPoodle's raw format to VTT, using ccextractor 0.88 (see Additional Information section for error message) ``` $ ccextractor -debug -in=raw -out=webvtt-full -utf8 cc_sample_dos2unix.bin -o cc_sample_dos2unix.vtt ``` # Additional information Converting from McPoodle Raw format to SRT is successful. Converting from McPoodle Raw format to VTT fails with Segmentation Fault. ### Check input using report ``` $ ccextractor -debug -in=raw -out=report -utf8 cc_sample_dos2unix.bin File: cc_sample_dos2unix.bin Stream Mode: McPoodle's raw EIA-608: No CEA-708: No MPEG-4 Timed Text: No ``` ### Converting to SRT is successful ``` $ ccextractor -debug -in=raw -out=srt -utf8 cc_sample_dos2unix.bin -o cc_sample_dos2unix.srt CCExtractor 0.88, Carlos Fernandez Sanz, Volker Quetschke. Teletext portions taken from Petr Kutalek's telxcc -------------------------------------------------------------------------- Input: Files (1): cc_sample_dos2unix.bin [Extract: 1] [Stream mode: McPoodle's raw] [Program : Auto ] [Hauppage mode: No] [Use MythTV code: Auto] [Timing mode: Auto] [Debug: Yes] [Buffer input: No] [Use pic_order_cnt_lsb for H.264: No] [Print CC decoder traces: No] [Target format: .srt] [Encoding: UTF-8] [Delay: 0] [Trim lines: No] [Add font color data: Yes] [Add font typesetting: Yes] [Convert case: No] [Video-edit join: No] [Extraction start time: not set (from start)] [Extraction end time: not set (to end)] [Live stream: No] [Clock frequency: 90000] [Teletext page: Autodetect] [Start credits text: None] [Quantisation-mode: CCExtractor's internal function] ----------------------------------------------------------------- Opening file: cc_sample_dos2unix.bin Analyzing data in McPoodle raw mode Total frames time: 00:00:00:000 (0 frames at 29.97fps) Min PTS: 00:00:00:001 Max PTS: 00:01:09:070 Length: 00:01:09:069 Done, processing time = 0 seconds Issues? Open a ticket here https://github.com/CCExtractor/ccextractor/issues ``` ``` $ head -n20 cc_sample_dos2unix.srt 1 00:00:09,243 --> 00:00:12,378 Is that what the Americans call doodling? 2 00:00:12,413 --> 00:00:13,278 It is more serious 3 00:00:13,314 --> 00:00:16,081 than you could possibly realize Charlotte 4 00:00:20,221 --> 00:00:21,020 Good ``` ### Converting to webVTT causes Segmentation Fault 11 ``` $ ccextractor -debug -in=raw -out=webvtt-full -utf8 cc_sample_dos2unix.bin -o cc_sample_dos2unix.vtt CCExtractor 0.88, Carlos Fernandez Sanz, Volker Quetschke. Teletext portions taken from Petr Kutalek's telxcc -------------------------------------------------------------------------- Input: Files (1): cc_sample_dos2unix.bin [Extract: 1] [Stream mode: McPoodle's raw] [Program : Auto ] [Hauppage mode: No] [Use MythTV code: Auto] [Timing mode: Auto] [Debug: Yes] [Buffer input: No] [Use pic_order_cnt_lsb for H.264: No] [Print CC decoder traces: No] [Target format: .vtt] [Encoding: UTF-8] [Delay: 0] [Trim lines: No] [Add font color data: Yes] [Add font typesetting: Yes] [Convert case: No] [Video-edit join: No] [Extraction start time: not set (from start)] [Extraction end time: not set (to end)] [Live stream: No] [Clock frequency: 90000] [Teletext page: Autodetect] [Start credits text: None] [Quantisation-mode: CCExtractor's internal function] ----------------------------------------------------------------- Opening file: cc_sample_dos2unix.bin Analyzing data in McPoodle raw mode Segmentation fault: 11 ``` Full commands for replication and sample file are included in this ticket.
Author
Owner

@NilsIrl commented on GitHub (Feb 2, 2021):

The problem is a null pointer de-reference. timing is a null pointer so trying to get sync_pts2fts_set crashes.

b1c22e5034/src/lib_ccx/ccx_encoders_webvtt.c (L210)

@NilsIrl commented on GitHub (Feb 2, 2021): The problem is a null pointer de-reference. `timing` is a null pointer so trying to get `sync_pts2fts_set` crashes. https://github.com/CCExtractor/ccextractor/blob/b1c22e503412f455d3740d0ce5fe9cabc66fe701/src/lib_ccx/ccx_encoders_webvtt.c#L210
Author
Owner

@NilsIrl commented on GitHub (Feb 2, 2021):

If that timing is meant to be null and it isn't a problem with the parser, should we just set the values to 0? That seems to be what is recommended according to https://sdks.support.brightcove.com/features/synchronizing-webvtt-captions.html

@NilsIrl commented on GitHub (Feb 2, 2021): If that `timing` is meant to be null and it isn't a problem with the parser, should we just set the values to 0? That seems to be what is recommended according to https://sdks.support.brightcove.com/features/synchronizing-webvtt-captions.html
Author
Owner

@siv2r commented on GitHub (Feb 7, 2021):

If that timing is meant to be null and it isn't a problem with the parser, should we just set the values to 0? That seems to be what is recommended according to https://sdks.support.brightcove.com/features/synchronizing-webvtt-captions.html

Here, by "values," do you mean the variables context->timing->sync_pts2fts_pts, h1, m1, s1, ms1 (in the code below) should be set to 0? This makes sense according to the recommended settings.

cf84757e02/src/lib_ccx/ccx_encoders_webvtt.c (L220-L222)

If you don't mind, can I try to look into whether context->timing is supposed to be null or not?

@siv2r commented on GitHub (Feb 7, 2021): > If that `timing` is meant to be null and it isn't a problem with the parser, should we just set the values to 0? That seems to be what is recommended according to https://sdks.support.brightcove.com/features/synchronizing-webvtt-captions.html Here, by "values," do you mean the variables `context->timing->sync_pts2fts_pts`, `h1`, `m1`, `s1`, `ms1` (in the code below) should be set to `0`? This makes sense according to the recommended settings. https://github.com/CCExtractor/ccextractor/blob/cf84757e02889ffa0231624fdf941eafded59906/src/lib_ccx/ccx_encoders_webvtt.c#L220-L222 If you don't mind, can I try to look into whether `context->timing` is supposed to be `null` or not?
Author
Owner

@NilsIrl commented on GitHub (Feb 7, 2021):

If that timing is meant to be null and it isn't a problem with the parser, should we just set the values to 0? That seems to be what is recommended according to https://sdks.support.brightcove.com/features/synchronizing-webvtt-captions.html

Here, by "values," do you mean the variables context->timing->sync_pts2fts_pts, h1, m1, s1, ms1 (in the code below) should be set to 0? This makes sense according to the recommended settings.

No, I meant we should write the header with as values 0 (instead of what would have been in sync_pts2fts_fts).

@NilsIrl commented on GitHub (Feb 7, 2021): > > If that `timing` is meant to be null and it isn't a problem with the parser, should we just set the values to 0? That seems to be what is recommended according to https://sdks.support.brightcove.com/features/synchronizing-webvtt-captions.html > > Here, by "values," do you mean the variables `context->timing->sync_pts2fts_pts`, `h1`, `m1`, `s1`, `ms1` (in the code below) should be set to `0`? This makes sense according to the recommended settings. No, I meant we should write the header with as values 0 (instead of what would have been in `sync_pts2fts_fts`).
Author
Owner

@siv2r commented on GitHub (Feb 9, 2021):

No, I meant we should write the header with as values 0 (instead of what would have been in sync_pts2fts_fts).

I wanted to say the same. I should not have used said "set" variables. Shouldn't we also need to write 0's (in the header) instead of h1, m1, s1, ms1? Since they are dependent on sync_pts2fts_fts (due to the function below)
cf84757e02/src/lib_ccx/ccx_encoders_webvtt.c (L215)

@siv2r commented on GitHub (Feb 9, 2021): > No, I meant we should write the header with as values 0 (instead of what would have been in `sync_pts2fts_fts`). I wanted to say the same. I should not have used said "set" variables. Shouldn't we also need to write 0's (in the header) instead of `h1`, `m1`, `s1`, `ms1`? Since they are dependent on `sync_pts2fts_fts` (due to the function below) https://github.com/CCExtractor/ccextractor/blob/cf84757e02889ffa0231624fdf941eafded59906/src/lib_ccx/ccx_encoders_webvtt.c#L215
Author
Owner

@siv2r commented on GitHub (Feb 9, 2021):

I compared the webvtt encoding for cc_sample_dos2unix.bin (given in this thread) and UK_ITVBE.ts (provided on ccextractor's tv samples page) and found something really weird.

cf84757e02/src/lib_ccx/ccx_encoders_common.c (L935-L939)
The problem occurs in the initialization of the encoder context. The malloc function acts differently for UK_ITVBE.ts (here, ctx->timing is not null) and cc_sample_dos2unix.bin (here, ctx->timing is set to null)

I added a fix by allocating memory for ctx->timing but would love to discuss more why this is happening :)

@siv2r commented on GitHub (Feb 9, 2021): I compared the `webvtt` encoding for `cc_sample_dos2unix.bin` (given in this thread) and [UK_ITVBE.ts](https://drive.google.com/file/d/0B_61ywKPmI0TSGVpOXd4NjhuSU0/view?usp=sharing) (provided on ccextractor's [tv samples page](https://www.ccextractor.org/public:general:tvsamples)) and found something really weird. https://github.com/CCExtractor/ccextractor/blob/cf84757e02889ffa0231624fdf941eafded59906/src/lib_ccx/ccx_encoders_common.c#L935-L939 The problem occurs in the initialization of the **encoder context**. The `malloc` function acts differently for `UK_ITVBE.ts` (here, `ctx->timing` is not null) and `cc_sample_dos2unix.bin` (here, `ctx->timing` is set to `null`) I added a fix by allocating memory for `ctx->timing` but would love to discuss more why this is happening :)
Author
Owner

@NilsIrl commented on GitHub (Feb 9, 2021):

ctx->timing is not null because timing information is included in the file. I assume this is not included in McPoodle raw (which just contains closed captions), or not in this specific mcpoodle raw file or ccextractor is not able to parse it correctly.

@NilsIrl commented on GitHub (Feb 9, 2021): `ctx->timing` is not null because timing information is included in the file. I assume this is not included in McPoodle raw (which just contains closed captions), or not in this specific mcpoodle raw file or ccextractor is not able to parse it correctly.
Author
Owner

@siv2r commented on GitHub (Feb 10, 2021):

The malloc function acts differently for UK_ITVBE.ts (here, ctx->timing is not null) and cc_sample_dos2unix.bin (here, ctx->timing is set to null)

By this, I meant that the value of ctx->timing just after the initialization of ctx (using malloc) was different in both cases. Shouldn't the ctx->timing value be the same in both cases since this is a simple initialization? (later on, it may change depending on the input)

I might have made a mistake somewhere (while debugging in vs code). I will check this again.

I assume this is not included in McPoodle raw (which just contains closed captions), or not in this specific mcpoodle raw file or ccextractor is not able to parse it correctly.

Will look into this.

@siv2r commented on GitHub (Feb 10, 2021): > The `malloc` function acts differently for `UK_ITVBE.ts` (here, `ctx->timing` is not null) and `cc_sample_dos2unix.bin` (here, `ctx->timing` is set to `null`) By this, I meant that the value of `ctx->timing` just after the initialization of `ctx` (using `malloc`) was different in both cases. Shouldn't the `ctx->timing` value be the same in both cases since this is a simple initialization? (later on, it may change depending on the input) I might have made a mistake somewhere (while debugging in vs code). I will check this again. > I assume this is not included in McPoodle raw (which just contains closed captions), or not in this specific mcpoodle raw file or ccextractor is not able to parse it correctly. Will look into this.
Author
Owner

@bbgdzxng1 commented on GitHub (Feb 18, 2021):

I really appreciate you folks taking the time to look at this. I still plan on using McPoodle > ccextractor as a means of converting SCC > McP Raw > webVTT and I am very grateful that you have even looked, whatever the outcome. Thanks!

@bbgdzxng1 commented on GitHub (Feb 18, 2021): I really appreciate you folks taking the time to look at this. I still plan on using McPoodle > ccextractor as a means of converting SCC > McP Raw > webVTT and I am very grateful that you have even looked, whatever the outcome. Thanks!
Author
Owner

@siv2r commented on GitHub (Feb 18, 2021):

I really appreciate you folks taking the time to look at this. I still plan on using McPoodle > ccextractor as a means of converting SCC > McP Raw > webVTT and I am very grateful that you have even looked, whatever the outcome. Thanks!

Happy to help. Hope to fix this soon!

@siv2r commented on GitHub (Feb 18, 2021): > I really appreciate you folks taking the time to look at this. I still plan on using McPoodle > ccextractor as a means of converting SCC > McP Raw > webVTT and I am very grateful that you have even looked, whatever the outcome. Thanks! Happy to help. Hope to fix this soon!
Author
Owner

@bbgdzxng1 commented on GitHub (Apr 7, 2021):

Many thanks for trying to fix. I see that PR https://github.com/CCExtractor/ccextractor/pull/1298 has been closed. Just wanted to say thanks for your efforts so far and although the PR itself was not suitable for prod, I do still hope to use ccextractor to convert SCC > McP Raw > WebVTT. I'm currently going via SCC > McP Raw > SRT > WebVTT, but a conversion to WebVTT is hopefully a legitimate use case (for all the benefits of preserving markup, data integrity, character sets etc).

Are there any roadmap plans to support SCC as an input format? I know that ccextractor is a caption extractor rather than a caption convertor, but given the similarities between 608 and SCC, maybe that is a better way to solve for this use case, rather than receiving tickets about a proprietary and legacy McPoodle input format, which I appreciate is an edge case. If this is indeed a possibility, let me know and I would happily raise a feature request for SCC input, rather than a bug report for McPoodle.

There are very few good SCC convertors out there, and ccextractor is the best I have found for 608.

@bbgdzxng1 commented on GitHub (Apr 7, 2021): Many thanks for trying to fix. I see that PR https://github.com/CCExtractor/ccextractor/pull/1298 has been closed. Just wanted to say thanks for your efforts so far and although the PR itself was not suitable for prod, I do still hope to use ccextractor to convert SCC > McP Raw > WebVTT. I'm currently going via SCC > McP Raw > SRT > WebVTT, but a conversion to WebVTT is hopefully a legitimate use case (for all the benefits of preserving markup, data integrity, character sets etc). Are there any roadmap plans to support SCC as an input format? I know that ccextractor is a caption extractor rather than a caption convertor, but given the similarities between 608 and SCC, maybe that is a better way to solve for this use case, rather than receiving tickets about a proprietary and legacy McPoodle input format, which I appreciate is an edge case. If this is indeed a possibility, let me know and I would happily raise a feature request for SCC input, rather than a bug report for McPoodle. There are very few good SCC convertors out there, and ccextractor is the best I have found for 608.
Author
Owner

@SuvigyaJain1 commented on GitHub (Apr 26, 2021):

Is this issue still open? And is someone working on it?

@SuvigyaJain1 commented on GitHub (Apr 26, 2021): Is this issue still open? And is someone working on it?
Author
Owner

@canihavesomecoffee commented on GitHub (Apr 26, 2021):

Yes, still open, and no one's working on it AFAIK.

@canihavesomecoffee commented on GitHub (Apr 26, 2021): Yes, still open, and no one's working on it AFAIK.
Author
Owner

@SuvigyaJain1 commented on GitHub (Apr 27, 2021):

So I had a look through the file and I think the source of the problem is that the scc file doesn't have the information required to build the X-TIMESTAMP-MAP=MPEGTS header?. Thus adding a check for context->timing != NULL before accessing context->timing->sync_pts2fts in (lib_ccx/ccx_encoders_webvtt.c; line 210) can avoid the segmentation fault in such cases. As far as I know its not a problem for a .vtt file to not contain the synchronisation offset in the header. The default is assumed to be 0 if none is present.

I tried running the provided files and am able to get the .vtt file without problem
I have attached a screenshot of the same
Screenshot_20210427_112654

@SuvigyaJain1 commented on GitHub (Apr 27, 2021): So I had a look through the file and I think the source of the problem is that the scc file doesn't have the information required to build the ```X-TIMESTAMP-MAP=MPEGTS``` header?. Thus adding a check for ```context->timing != NULL``` before accessing ```context->timing->sync_pts2fts``` in (lib_ccx/ccx_encoders_webvtt.c; line 210) can avoid the segmentation fault in such cases. As far as I know its not a problem for a .vtt file to not contain the synchronisation offset in the header. The default is assumed to be 0 if none is present. I tried running the provided files and am able to get the .vtt file without problem I have attached a screenshot of the same ![Screenshot_20210427_112654](https://user-images.githubusercontent.com/33174957/116192614-bcfcc500-a74b-11eb-8513-498945856b15.png)
Author
Owner

@SuvigyaJain1 commented on GitHub (Apr 27, 2021):

I will admit I'm baffled how my PR could break so many tests. How adding 1 line in webvtt encoder file breaks everything else as well, is confusing me and after going through the tests and downloading the files locally and running them they seem to be working. Am I missing something here? Note: This is my first contribution to this repo

@SuvigyaJain1 commented on GitHub (Apr 27, 2021): I will admit I'm baffled how my PR could break so many tests. How adding 1 line in webvtt encoder file breaks everything else as well, is confusing me and after going through the tests and downloading the files locally and running them they seem to be working. Am I missing something here? Note: This is my first contribution to this repo
Author
Owner

@canihavesomecoffee commented on GitHub (Apr 27, 2021):

I will admit I'm baffled how my PR could break so many tests. How adding 1 line in webvtt encoder file breaks everything else as well, is confusing me and after going through the tests and downloading the files locally and running them they seem to be working. Am I missing something here? Note: This is my first contribution to this repo

The sample platform isn't bug-free either ;)

I've looked at the actual results page, and there there's fewer tests broken :)

Since it's a Windows report, the failing test for the dictionary (using a unix path) is expected. The failing test in Broken category can be ignored. The 3 failing ones in DVB worry me a bit more, but might be present in the main repo already, and the same goes for the Teletext one.

That's about it... So I think it's safe to say you can reopen that PR.

@canihavesomecoffee commented on GitHub (Apr 27, 2021): > I will admit I'm baffled how my PR could break so many tests. How adding 1 line in webvtt encoder file breaks everything else as well, is confusing me and after going through the tests and downloading the files locally and running them they seem to be working. Am I missing something here? Note: This is my first contribution to this repo The sample platform isn't bug-free either ;) I've looked at the actual results page, and there there's fewer tests broken :) Since it's a Windows report, the failing test for the dictionary (using a unix path) is expected. The failing test in Broken category can be ignored. The 3 failing ones in DVB worry me a bit more, but might be present in the main repo already, and the same goes for the Teletext one. That's about it... So I think it's safe to say you can reopen that PR.
Author
Owner

@SuvigyaJain1 commented on GitHub (Apr 27, 2021):

I see! Thanks. makes more sense. I have reopened the PR as suggested. Ill look at the DVB tests in a bit more detail just to be on the safer side.

@SuvigyaJain1 commented on GitHub (Apr 27, 2021): I see! Thanks. makes more sense. I have reopened the PR as suggested. Ill look at the DVB tests in a bit more detail just to be on the safer side.
Author
Owner

@bbgdzxng1 commented on GitHub (Jun 21, 2021):

Confirming that this is resolved in 0.89. Thank-you...

$ curl --silent --location --request GET "https://archive.org/download/cc_sample/cc_sample.scc" -o cc_sample.scc
$ dos2unix --newfile cc_sample.scc cc_sample_dos2unix.scc
$ perl /opt/mcpoodle/scc2raw.pl cc_sample_dos2unix.scc cc_sample_dos2unix.bin
$ ccextractor -debug -in=raw -out=webvtt-full -utf8 cc_sample_dos2unix.bin -o cc_sample_dos2unix.vtt
$ head -n10 "cc_sample_dos2unix.vtt"
WEBVTT


STYLE

/* default values */
::cue {
  line-height: 5.33vh;
  font-size: 4.1vh;
  font-family: monospace;
$ tail -n10 "cc_sample_dos2unix.vtt"

00:01:03.531 --> 00:01:04.430 line:84.66%
Yes                             

00:01:04.465 --> 00:01:06.298 line:79.33%
              I pray I never    

00:01:04.465 --> 00:01:06.298 line:84.66%
              have to deliver it

Thank-you to all who contributed to the investigation, development, testing and release.

ccextractor can now successfully convert scc-to-srt, scc-to-vtt (and probably other outputs) with the interim step of going via McPoodle raw format.

@bbgdzxng1 commented on GitHub (Jun 21, 2021): Confirming that this is resolved in 0.89. Thank-you... ``` $ curl --silent --location --request GET "https://archive.org/download/cc_sample/cc_sample.scc" -o cc_sample.scc $ dos2unix --newfile cc_sample.scc cc_sample_dos2unix.scc $ perl /opt/mcpoodle/scc2raw.pl cc_sample_dos2unix.scc cc_sample_dos2unix.bin $ ccextractor -debug -in=raw -out=webvtt-full -utf8 cc_sample_dos2unix.bin -o cc_sample_dos2unix.vtt ``` ``` $ head -n10 "cc_sample_dos2unix.vtt" WEBVTT STYLE /* default values */ ::cue { line-height: 5.33vh; font-size: 4.1vh; font-family: monospace; ``` ``` $ tail -n10 "cc_sample_dos2unix.vtt" 00:01:03.531 --> 00:01:04.430 line:84.66% Yes 00:01:04.465 --> 00:01:06.298 line:79.33% I pray I never 00:01:04.465 --> 00:01:06.298 line:84.66% have to deliver it ``` Thank-you to all who contributed to the investigation, development, testing and release. ccextractor can now successfully convert `scc-to-srt`, `scc-to-vtt` (and probably other outputs) with the interim step of going via McPoodle raw format.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/ccextractor#611