[PROPOSAL] Add SCC support to CEA-708 decoder #697

Closed
opened 2026-01-29 16:51:25 +00:00 by claunia · 18 comments
Owner

Originally created by @PunitLodha on GitHub (Mar 23, 2022).

Originally assigned to: @PunitLodha on GitHub.

Add support for SCC format to CEA-708 decoder.
Currently, only SRT, SAMI and Transcript formats are supported, https://github.com/CCExtractor/ccextractor/blob/master/src/rust/src/decoder/tv_screen.rs#L126-L134

SCC format details :- http://www.theneitherworld.com/mcpoodle/SCC_TOOLS/DOCS/SCC_FORMAT.HTML

#1423

Originally created by @PunitLodha on GitHub (Mar 23, 2022). Originally assigned to: @PunitLodha on GitHub. Add support for SCC format to CEA-708 decoder. Currently, only SRT, SAMI and Transcript formats are supported, https://github.com/CCExtractor/ccextractor/blob/master/src/rust/src/decoder/tv_screen.rs#L126-L134 SCC format details :- http://www.theneitherworld.com/mcpoodle/SCC_TOOLS/DOCS/SCC_FORMAT.HTML #1423
claunia added the CEA-708good-first-taskdifficulty: easyGSOC-2023 labels 2026-01-29 16:51:25 +00:00
Author
Owner

@voidash commented on GitHub (Mar 26, 2022):

Just to be clear, i looked up similar function write_sami(). Basically it is writing to a file and the contents should look like the image i have embedded.

So if i want to add support for SCC format , then subtitles that are extracted should look like this right

Scenarist_SCC V1.0

01:02:53:14	94ae 94ae 9420 9420 947a 947a 97a2 97a2 a820 68ef f26e 2068 ef6e 6be9 6e67 2029 942c 942c 8080 8080 942f 942f

01:02:55:14	942c 942c

01:03:27:29	94ae 94ae 9420 9420 94f2 94f2 c845 d92c 2054 c845 5245 ae80 942c 942c 8080 8080 942f 942f

I am working on this problem, and i will be sure to read contributor guidelines and contact you if i get stuck.

@voidash commented on GitHub (Mar 26, 2022): Just to be clear, i looked up similar function [`write_sami()`](https://github.com/CCExtractor/ccextractor/blob/30bc27aa0cd850000f41e59a1d75a20f248aee9e/src/rust/src/decoder/tv_screen.rs#L310). Basically it is writing to a file and the contents should look like the image i have embedded. <img src="https://static.clideo.com/files/content/sami-smi-subtitle-file-format.png" width=200 height=200/> So if i want to add support for SCC format , then subtitles that are extracted should look like this right ``` Scenarist_SCC V1.0 01:02:53:14 94ae 94ae 9420 9420 947a 947a 97a2 97a2 a820 68ef f26e 2068 ef6e 6be9 6e67 2029 942c 942c 8080 8080 942f 942f 01:02:55:14 942c 942c 01:03:27:29 94ae 94ae 9420 9420 94f2 94f2 c845 d92c 2054 c845 5245 ae80 942c 942c 8080 8080 942f 942f ``` I am working on this problem, and i will be sure to read contributor guidelines and contact you if i get stuck.
Author
Owner

@shazbot666 commented on GitHub (Mar 26, 2022):

Here's a sample SCC extract from the sample WhackedOutVideos_short.mov using a commercial tool

sample video:
https://drive.google.com/file/d/13p6HBxGXlm0BGpaS15JwCJjfnBdm_Qbm/view?usp=sharing

Scenarist_SCC V1.0

00:58:56:14 e96e 2043 616e 6164 61ae

00:58:58:19 9426 94ad 9470 4ff2 20e9 7320 f468 e973 2073 796e e368 f2ef 6ee9 7ae5 6480

00:58:59:23 9426 94ad 9470 73f4 e9e3 6b20 70ef 6be9 6e67 bf80

00:59:02:03 9426 94ad 9470 c1e3 f475 61ec ec79 2c20 f468 ef73 e520 61f2 e520 f468 e520 f2ef 6473

00:59:03:09 9426 94ad 9470 f468 e579 2075 73e5 20f4 ef20 ecef e361 f4e5 20ec ef73 f420 70e5 ef70 ece5

00:59:04:29 9426 94ad 9470 eff2 20ef 62ea e5e3 f473 20e9 6e20 7570 20f4 ef20 3132 20e6 e5e5 f480

00:59:06:17 9426 94ad 9470 efe6 2070 eff7 64e5 f2ae

00:59:08:18 9426 94ad 9470 496e 20f4 68e9 7320 e361 73e5 2c20 f468 e579 20e6 e96e 6420 f468 e973

00:59:09:19 9426 94ad 9470 6475 64e5 a773 2076 e964 e5ef 20e3 616d e5f2 61ae

00:59:12:03 9426 94ad 9470 c16e 6420 f468 e520 73f7 e561 f220 62ec e97a 7a61 f264

00:59:13:14 9426 94ad 9470 73f4 61f2 f473 2061 6761 e96e ae80

00:59:14:27 9426 94ad 9470 a862 ece5 e570 e96e 6729

00:59:18:26 9426 94ad 9470 54f7 efad f468 e9f2 6473 20ef e620 f468 e520 f7ef f2ec 6480

00:59:20:22 9426 94ad 9470 e973 20e3 ef76 e5f2 e564 2062 7920 f761 f4e5 f280

00:59:22:16 9426 94ad 9470 616e 6420 f468 e520 f2e5 73f4 20e9 7320 e3ef 76e5 f2e5 6420 6279 2075 73ae

00:59:24:23 9426 94ad 9470 54e9 6de5 20f4 ef20 6761 f468 e5f2 2075 7020 61ec ec20 f468 e520 67ef efe6 7980

00:59:26:04 9426 94ad 9470 67ef e96e 6773 adef 6e20 e6f2

@shazbot666 commented on GitHub (Mar 26, 2022): Here's a sample SCC extract from the sample WhackedOutVideos_short.mov using a commercial tool sample video: https://drive.google.com/file/d/13p6HBxGXlm0BGpaS15JwCJjfnBdm_Qbm/view?usp=sharing Scenarist_SCC V1.0 00:58:56:14 e96e 2043 616e 6164 61ae 00:58:58:19 9426 94ad 9470 4ff2 20e9 7320 f468 e973 2073 796e e368 f2ef 6ee9 7ae5 6480 00:58:59:23 9426 94ad 9470 73f4 e9e3 6b20 70ef 6be9 6e67 bf80 00:59:02:03 9426 94ad 9470 c1e3 f475 61ec ec79 2c20 f468 ef73 e520 61f2 e520 f468 e520 f2ef 6473 00:59:03:09 9426 94ad 9470 f468 e579 2075 73e5 20f4 ef20 ecef e361 f4e5 20ec ef73 f420 70e5 ef70 ece5 00:59:04:29 9426 94ad 9470 eff2 20ef 62ea e5e3 f473 20e9 6e20 7570 20f4 ef20 3132 20e6 e5e5 f480 00:59:06:17 9426 94ad 9470 efe6 2070 eff7 64e5 f2ae 00:59:08:18 9426 94ad 9470 496e 20f4 68e9 7320 e361 73e5 2c20 f468 e579 20e6 e96e 6420 f468 e973 00:59:09:19 9426 94ad 9470 6475 64e5 a773 2076 e964 e5ef 20e3 616d e5f2 61ae 00:59:12:03 9426 94ad 9470 c16e 6420 f468 e520 73f7 e561 f220 62ec e97a 7a61 f264 00:59:13:14 9426 94ad 9470 73f4 61f2 f473 2061 6761 e96e ae80 00:59:14:27 9426 94ad 9470 a862 ece5 e570 e96e 6729 00:59:18:26 9426 94ad 9470 54f7 efad f468 e9f2 6473 20ef e620 f468 e520 f7ef f2ec 6480 00:59:20:22 9426 94ad 9470 e973 20e3 ef76 e5f2 e564 2062 7920 f761 f4e5 f280 00:59:22:16 9426 94ad 9470 616e 6420 f468 e520 f2e5 73f4 20e9 7320 e3ef 76e5 f2e5 6420 6279 2075 73ae 00:59:24:23 9426 94ad 9470 54e9 6de5 20f4 ef20 6761 f468 e5f2 2075 7020 61ec ec20 f468 e520 67ef efe6 7980 00:59:26:04 9426 94ad 9470 67ef e96e 6773 adef 6e20 e6f2
Author
Owner

@voidash commented on GitHub (Apr 7, 2022):

I took a shot at adding SCC support for the 708 decoder. I tried adding a function write_scc on tv_screen.rs and here is the commit on my fork: https://github.com/CCExtractor/ccextractor/compare/master...voidash:master

i ran the ccextractor in debug mode with these flags for the video https://drive.google.com/file/d/13p6HBxGXlm0BGpaS15JwCJjfnBdm_Qbm/view.

-in=mp4
-out=scc
-nofc
-dru
/home/cdjk/Downloads/WhackedOutVideos_short.mov
-o
/home/cdjk/Downloads/main.scc
-708

Here is the complete output: https://pastebin.com/58ieUtfY

Without -708 flag , the output is little different from #1423 . https://pastebin.com/PygNqWRh

My major concern is that Writer object is only being created for the last three lines.

[CEA-708] 00:00:30,030 00:00:30,029
[CEA-708] First: 0, Last: 29
[CEA-708] 9426 94ad 9470 616e 6420 f468 e520 f2e5 73f4 20e9 7320 e3ef 76e5 f2e5 6420 6279 2075 73ae


[CEA-708] 00:00:30,030 00:00:30,029
[CEA-708] First: 0, Last: 30
[CEA-708] 9426 94ad 9470 54e9 6de5 20f4 ef20 6761 f468 e5f2 2075 7020 61ec ec20 f468 e520 67ef efe6 7980


[CEA-708] 00:00:30,030 00:00:30,029
[CEA-708] First: 0, Last: 30
[CEA-708] 9426 94ad 9470 67ef e96e 6773 adef 6e20 e6f2 ef6d 2061 f2ef 756e 6420 f468 e520 67ec ef62 e580

And for those three lines , the start and end times are same. and the output file main.scc contains
Scenarist_SCC V1.0 only

However, the file main.p0.svc01.scc has those last three lines.
Note: i wrote write_scc function by looking how write_srt and write_transcript work. If there is something i need to understand please let me know

@voidash commented on GitHub (Apr 7, 2022): I took a shot at adding SCC support for the 708 decoder. I tried adding a function `write_scc` on `tv_screen.rs` and here is the commit on my fork: [https://github.com/CCExtractor/ccextractor/compare/master...voidash:master ](https://github.com/CCExtractor/ccextractor/compare/master...voidash:master) i ran the ccextractor in debug mode with these flags for the video https://drive.google.com/file/d/13p6HBxGXlm0BGpaS15JwCJjfnBdm_Qbm/view. ``` -in=mp4 -out=scc -nofc -dru /home/cdjk/Downloads/WhackedOutVideos_short.mov -o /home/cdjk/Downloads/main.scc -708 ``` Here is the complete output: https://pastebin.com/58ieUtfY Without `-708` flag , the output is little different from #1423 . https://pastebin.com/PygNqWRh My major concern is that `Writer` object is only being created for the last three lines. ``` [CEA-708] 00:00:30,030 00:00:30,029 [CEA-708] First: 0, Last: 29 [CEA-708] 9426 94ad 9470 616e 6420 f468 e520 f2e5 73f4 20e9 7320 e3ef 76e5 f2e5 6420 6279 2075 73ae [CEA-708] 00:00:30,030 00:00:30,029 [CEA-708] First: 0, Last: 30 [CEA-708] 9426 94ad 9470 54e9 6de5 20f4 ef20 6761 f468 e5f2 2075 7020 61ec ec20 f468 e520 67ef efe6 7980 [CEA-708] 00:00:30,030 00:00:30,029 [CEA-708] First: 0, Last: 30 [CEA-708] 9426 94ad 9470 67ef e96e 6773 adef 6e20 e6f2 ef6d 2061 f2ef 756e 6420 f468 e520 67ec ef62 e580 ``` And for those three lines , the start and end times are same. and the output file `main.scc` contains ` Scenarist_SCC V1.0 ` only However, the file `main.p0.svc01.scc` has those last three lines. Note: i wrote `write_scc` function by looking how `write_srt` and `write_transcript` work. If there is something i need to understand please let me know
Author
Owner

@cfsmp3 commented on GitHub (Apr 8, 2022):

@PunitLodha can you take a look at @voidash 's work?

@cfsmp3 commented on GitHub (Apr 8, 2022): @PunitLodha can you take a look at @voidash 's work?
Author
Owner

@PunitLodha commented on GitHub (Apr 9, 2022):

Yes, I will in some time

@PunitLodha commented on GitHub (Apr 9, 2022): Yes, I will in some time
Author
Owner

@PunitLodha commented on GitHub (Apr 12, 2022):

So, for some reason, mp4 still uses the C decoder. And changing it to rust is not as straightforward. I am working on it.

Meanwhile, @voidash could you replicate the changes in C here, https://github.com/CCExtractor/ccextractor/blob/master/src/lib_ccx/ccx_decoders_708_output.c#L370-L392

@PunitLodha commented on GitHub (Apr 12, 2022): So, for some reason, mp4 still uses the C decoder. And changing it to rust is not as straightforward. I am working on it. Meanwhile, @voidash could you replicate the changes in C here, https://github.com/CCExtractor/ccextractor/blob/master/src/lib_ccx/ccx_decoders_708_output.c#L370-L392
Author
Owner

@voidash commented on GitHub (Apr 12, 2022):

Ok, i will take a look at it.

@voidash commented on GitHub (Apr 12, 2022): Ok, i will take a look at it.
Author
Owner

@voidash commented on GitHub (Apr 12, 2022):

I tried replicating the changes in C. here is the diff file : fb5dbe2959

Here is the output when i passed the following parameters
-in=mp4 -out=scc -nofc -dru /home/cdjk/Downloads/WhackedOutVideos_short.mov -o /home/cdjk/Downloads/main.scc
https://pastebin.com/VeY4BmbK

The temp file main.p0.svc01.scc file is being written and the contents look like this :
https://pastebin.com/xq6Jwfuv

but main.scc is still unwritten. Looking at the console output it looks as if the caption type is roll up

0:00:15:982 --> 00:00:17:350
In this case, they find this
dude's video camera.
And the swear blizzard
00:00:15:29	9426 94ad 9470 496e 20f4 68e9 7320 e361 73e5 2c20 f468 e579 20e6 e96e 6420 f468 e973 

9426 94ad 9470 6475 64e5 a773 2076 e964 e5ef 20e3 616d e5f2 61ae 

9426 94ad 9470 c16e 6420 f468 e520 73f7 e561 f220 62ec e97a 7a61 f264 


00:00:17:351 --> 00:00:18:784
dude's video camera.
And the swear blizzard
starts again.
00:00:17:10	9426 94ad 9470 6475 64e5 a773 2076 e964 e5ef 20e3 616d e5f2 61ae 

9426 94ad 9470 c16e 6420 f468 e520 73f7 e561 f220 62ec e97a 7a61 f264 

9426 94ad 9470 73f4 61f2 f473 2061 6761 e96e ae80

Any suggestions on what should i do next?

@voidash commented on GitHub (Apr 12, 2022): I tried replicating the changes in C. here is the diff file : https://github.com/voidash/ccextractor/commit/fb5dbe29593fb68146ee7deb71270f53f93f0d18 Here is the output when i passed the following parameters `-in=mp4 -out=scc -nofc -dru /home/cdjk/Downloads/WhackedOutVideos_short.mov -o /home/cdjk/Downloads/main.scc` https://pastebin.com/VeY4BmbK The temp file `main.p0.svc01.scc` file is being written and the contents look like this : https://pastebin.com/xq6Jwfuv but `main.scc` is still unwritten. Looking at the console output it looks as if the caption type is roll up ``` 0:00:15:982 --> 00:00:17:350 In this case, they find this dude's video camera. And the swear blizzard 00:00:15:29 9426 94ad 9470 496e 20f4 68e9 7320 e361 73e5 2c20 f468 e579 20e6 e96e 6420 f468 e973 9426 94ad 9470 6475 64e5 a773 2076 e964 e5ef 20e3 616d e5f2 61ae 9426 94ad 9470 c16e 6420 f468 e520 73f7 e561 f220 62ec e97a 7a61 f264 00:00:17:351 --> 00:00:18:784 dude's video camera. And the swear blizzard starts again. 00:00:17:10 9426 94ad 9470 6475 64e5 a773 2076 e964 e5ef 20e3 616d e5f2 61ae 9426 94ad 9470 c16e 6420 f468 e520 73f7 e561 f220 62ec e97a 7a61 f264 9426 94ad 9470 73f4 61f2 f473 2061 6761 e96e ae80 ``` Any suggestions on what should i do next?
Author
Owner

@PunitLodha commented on GitHub (Apr 12, 2022):

main.scc will be empty because it is supposed to contain subs for 608, which is not present here. main.p0.svc01.scc is the file which is supposed to have 708 subs. So that is correct.
But I can see some issues with the output. One being that there are multiple timestamps on the same line. Other than that, I think the clear caption command is missing, which should be present at end time of each subtitle

@PunitLodha commented on GitHub (Apr 12, 2022): `main.scc` will be empty because it is supposed to contain subs for 608, which is not present here. `main.p0.svc01.scc` is the file which is supposed to have 708 subs. So that is correct. But I can see some issues with the output. One being that there are multiple timestamps on the same line. Other than that, I think the clear caption command is missing, which should be present at end time of each subtitle
Author
Owner

@cfsmp3 commented on GitHub (Apr 12, 2022):

The mp4 code has a different flow. We use libgpac to actually open the mp4 file and the entry point into the decoders is different than the usual general loop.

It should be easy to change though and call the rust code.

@cfsmp3 commented on GitHub (Apr 12, 2022): The mp4 code has a different flow. We use libgpac to actually open the mp4 file and the entry point into the decoders is different than the usual general loop. It should be easy to change though and call the rust code.
Author
Owner

@voidash commented on GitHub (Apr 13, 2022):

@PunitLodha . main.p0.svc01.sccnow looks like this.

00:00:02:15	94ae 9420 9470 4ff2 20e9 7320 f468 e973 2073 796e e368 f2ef 6ee9 7ae5 6480

00:00:03:18	942c 942c 

00:00:03:19	94ae 9420 9470 4ff2 20e9 7320 f468 e973 2073 796e e368 f2ef 6ee9 7ae5 648094ae 9420 9470 73f4 e9e3 6b20 70ef 6be9 6e67 bf80

00:00:05:28	942c 942c 

You can take a look at my approach here : e449557c8c

Here is pastebin for main.p0.svc01.scc : https://pastebin.com/aMiaEStY
So 708 decoder found SCC subs which means Scenarist_SCC V1.0 header should be added on top of the main.p0.svc01.scc and also i guess i should remove the rust code which is just appending last three caption text

@voidash commented on GitHub (Apr 13, 2022): @PunitLodha . `main.p0.svc01.scc`now looks like this. ``` 00:00:02:15 94ae 9420 9470 4ff2 20e9 7320 f468 e973 2073 796e e368 f2ef 6ee9 7ae5 6480 00:00:03:18 942c 942c 00:00:03:19 94ae 9420 9470 4ff2 20e9 7320 f468 e973 2073 796e e368 f2ef 6ee9 7ae5 648094ae 9420 9470 73f4 e9e3 6b20 70ef 6be9 6e67 bf80 00:00:05:28 942c 942c ``` You can take a look at my approach here : https://github.com/voidash/ccextractor/commit/e449557c8c6b31b73aa434c81d471818e832f5f8 Here is pastebin for `main.p0.svc01.scc` : https://pastebin.com/aMiaEStY So 708 decoder found SCC subs which means `Scenarist_SCC V1.0` header should be added on top of the `main.p0.svc01.scc` and also i guess i should remove the rust code which is just appending last three caption text
Author
Owner

@cfsmp3 commented on GitHub (Apr 13, 2022):

I'd recommend looking into this - @PunitLodha

6efa41a7e6/src/lib_ccx/mp4.c (L398)

If you can just call rust from there you're good to go. After that everything is the same thing.

@cfsmp3 commented on GitHub (Apr 13, 2022): I'd recommend looking into this - @PunitLodha https://github.com/CCExtractor/ccextractor/blob/6efa41a7e6a083e240015592189391a0f78caa37/src/lib_ccx/mp4.c#L398 If you can just call rust from there you're good to go. After that everything is the same thing.
Author
Owner

@PunitLodha commented on GitHub (Apr 13, 2022):

I did look at that. But due to how the code is structured, it's not as easy as just calling the rust function from there. I'll have to change some stuff from the rust side first

@PunitLodha commented on GitHub (Apr 13, 2022): I did look at that. But due to how the code is structured, it's not as easy as just calling the rust function from there. I'll have to change some stuff from the rust side first
Author
Owner

@PunitLodha commented on GitHub (Apr 13, 2022):

@voidash

So 708 decoder found SCC subs which means Scenarist_SCC V1.0 header should be added on top of the main.p0.svc01.scc

Check out how sami header is added, and do it the same way

also i guess i should remove the rust code which is just appending last three caption text

The last captions are added by the code which you added in rust. It is called by the flush function. So you should correct the rust code too, and send a PR

@PunitLodha commented on GitHub (Apr 13, 2022): @voidash >So 708 decoder found SCC subs which means `Scenarist_SCC V1.0` header should be added on top of the `main.p0.svc01.scc` Check out how sami header is added, and do it the same way >also i guess i should remove the rust code which is just appending last three caption text The last captions are added by the code which you added in rust. It is called by the flush function. So you should correct the rust code too, and send a PR
Author
Owner

@ArchitBhonsle commented on GitHub (Mar 8, 2023):

If this issue has been abandoned, I could start working on this.

The mp4 code has a different flow. We use libgpac to actually open the mp4 file and the entry point into the decoders is different than the usual general loop.

It should be easy to change though and call the rust code.

Is there a video with 708 captions which is not an MP4? This might help me avoid implementing this in C and/or changing the current MP4 flow.

@ArchitBhonsle commented on GitHub (Mar 8, 2023): If this issue has been abandoned, I could start working on this. > The mp4 code has a different flow. We use libgpac to actually open the mp4 file and the entry point into the decoders is different than the usual general loop. > > It should be easy to change though and call the rust code. Is there a video with 708 captions which is not an MP4? This might help me avoid implementing this in C and/or changing the current MP4 flow.
Author
Owner

@cfsmp3 commented on GitHub (Mar 8, 2023):

If this issue has been abandoned, I could start working on this.

Sure, go for it.

Is there a video with 708 captions which is not an MP4? This might help me avoid implementing this in C and/or changing the current MP4 flow.

Yes, almost any US Transport Stream.
You can find plenty on our website.

@cfsmp3 commented on GitHub (Mar 8, 2023): > If this issue has been abandoned, I could start working on this. > Sure, go for it. > > Is there a video with 708 captions which is not an MP4? This might help me avoid implementing this in C and/or changing the current MP4 flow. Yes, almost any US Transport Stream. You can find plenty on our website.
Author
Owner

@PunitLodha commented on GitHub (Mar 14, 2023):

#1499 details the issue with mp4 code flow and how to fix it

@PunitLodha commented on GitHub (Mar 14, 2023): #1499 details the issue with mp4 code flow and how to fix it
Author
Owner

@IshanGrover2004 commented on GitHub (Dec 17, 2023):

Hi,
I would like to work on this issue and continue to work on where @voidash left it.
Just wanted to know what is the current progress and what things are needed to fulfil the feature.
And lil bit of how could i resolve it.

If any necessary information i should know, just tell me that as well.
@PunitLodha @cfsmp3

@IshanGrover2004 commented on GitHub (Dec 17, 2023): Hi, I would like to work on this issue and continue to work on where @voidash left it. Just wanted to know what is the current progress and what things are needed to fulfil the feature. And lil bit of how could i resolve it. If any necessary information i should know, just tell me that as well. @PunitLodha @cfsmp3
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/ccextractor#697