Buggy ttml support #34

Closed
opened 2026-01-29 16:33:26 +00:00 by claunia · 5 comments
Owner

Originally created by @MikaYuoadas on GitHub (Nov 18, 2014).

The generated ttml file are not valid: they start and end with the valid ttml headers (xml), but the rest of the file is just a in regular SubRip format.

Here's a short example of the kind of file I get with ccextractor -out=smptett video.ts:

<?xml version="1.0" encoding="UTF-8" ?>
<tt xmlns="http://www.w3.org/ns/ttml" xml:lang="en">
<body>
<div>
1
00:00:48,280 --> 00:00:49,880
Mon pauvre, je suis désolée !

2
00:00:50,080 --> 00:00:50,720
Ca va ?

3
00:00:50,960 --> 00:00:51,920
T'as rien ?

</div></body></tt>

The input files I have are all .ts with embedded dvb_teletext subtitles.

Originally created by @MikaYuoadas on GitHub (Nov 18, 2014). The generated ttml file are not valid: they start and end with the valid ttml headers (xml), but the rest of the file is just a in regular SubRip format. Here's a short example of the kind of file I get with ccextractor -out=smptett video.ts: ``` xml <?xml version="1.0" encoding="UTF-8" ?> <tt xmlns="http://www.w3.org/ns/ttml" xml:lang="en"> <body> <div> 1 00:00:48,280 --> 00:00:49,880 Mon pauvre, je suis désolée ! 2 00:00:50,080 --> 00:00:50,720 Ca va ? 3 00:00:50,960 --> 00:00:51,920 T'as rien ? </div></body></tt> ``` The input files I have are all .ts with embedded dvb_teletext subtitles.
Author
Owner

@anshul1912 commented on GitHub (Nov 20, 2014):

Which version are you using I am getting correct output.

I am using git version from here.

@anshul1912 commented on GitHub (Nov 20, 2014): Which version are you using I am getting correct output. I am using git version from here.
Author
Owner

@MikaYuoadas commented on GitHub (Nov 20, 2014):

Same here, I'm on latest commit (b95e06c).

I've started looking a bit at the code and it looks like it should only affect dvb_teletext subtitles.
It looks like it's around line 632 in file src/lib_ccx/telxcc.c: the switch only has a a CCX_OF_TRANSCRIPT and a default to srt.

I'm getting a more correct output by adding another case for smptett like this:

case CCX_OF_SMPTETT:
    timestamp_to_smptetttime(page->show_timestamp, timecode_show);
    timestamp_to_smptetttime(page->hide_timestamp, timecode_hide);
    if (ctx->wbout1.fh!=-1)
        fdprintf(ctx->wbout1.fh, "      <p region=\"speaker\" begin=\"%s\" end=\"%s\">%s</p>\n", timecode_show, timecode_hide, page_buffer_cur);

But this quick & dirty fix duplicate existing code to generate smptett and doesn't handle line ending correctly (the -lf param is completly ignored).

@MikaYuoadas commented on GitHub (Nov 20, 2014): Same here, I'm on latest commit (b95e06c). I've started looking a bit at the code and it looks like it should only affect dvb_teletext subtitles. It looks like it's around line 632 in file src/lib_ccx/telxcc.c: the switch only has a a CCX_OF_TRANSCRIPT and a default to srt. I'm getting a more correct output by adding another case for smptett like this: ``` case CCX_OF_SMPTETT: timestamp_to_smptetttime(page->show_timestamp, timecode_show); timestamp_to_smptetttime(page->hide_timestamp, timecode_hide); if (ctx->wbout1.fh!=-1) fdprintf(ctx->wbout1.fh, " <p region=\"speaker\" begin=\"%s\" end=\"%s\">%s</p>\n", timecode_show, timecode_hide, page_buffer_cur); ``` But this quick & dirty fix duplicate existing code to generate smptett and doesn't handle line ending correctly (the -lf param is completly ignored).
Author
Owner

@anshul1912 commented on GitHub (Nov 20, 2014):

can you share your video file, I will look at it.

It seems teletext code is untouched from decades.
Actually correct solution would be not to write anything in the output file, We should pass decoder subtitle and things should be written there. If you look for dvb_subtitle and 608 things are like that.
and that decode sub context must be passed to encoder.

Now at time of initialization encoder writes the header and footer correctly, but it never gets the decode packet so that it can handle it.

and last question are you willing to contribute this in ccextractor.

@anshul1912 commented on GitHub (Nov 20, 2014): can you share your video file, I will look at it. It seems teletext code is untouched from decades. Actually correct solution would be not to write anything in the output file, We should pass decoder subtitle and things should be written there. If you look for dvb_subtitle and 608 things are like that. and that decode sub context must be passed to encoder. Now at time of initialization encoder writes the header and footer correctly, but it never gets the decode packet so that it can handle it. and last question are you willing to contribute this in ccextractor.
Author
Owner

@MikaYuoadas commented on GitHub (Nov 24, 2014):

I'd like to, but unfortunately I don't have the time right now to do it correctly.

@MikaYuoadas commented on GitHub (Nov 24, 2014): I'd like to, but unfortunately I don't have the time right now to do it correctly.
Author
Owner

@MikaYuoadas commented on GitHub (Dec 17, 2015):

Forgot to close this ticket when PR https://github.com/CCExtractor/ccextractor/pull/123 was merged.

@MikaYuoadas commented on GitHub (Dec 17, 2015): Forgot to close this ticket when PR https://github.com/CCExtractor/ccextractor/pull/123 was merged.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/ccextractor#34