@ in teletext-subs saved as asterisk #89

Closed
opened 2026-01-29 16:34:54 +00:00 by claunia · 31 comments
Owner

Originally created by @hurda on GitHub (Nov 18, 2015).

CCExtractor 0.77 and git-677fee4
File: http://www.mediafire.com/download/q6ebvmzwe1prvi3/cce-at-sign.7z (25MB)

Teletext:
at sign

SRT:

2
00:00:12,840 --> 00:00:16,520
ORF 2015
untertitel*orf.at
Originally created by @hurda on GitHub (Nov 18, 2015). CCExtractor 0.77 and git-677fee4 File: http://www.mediafire.com/download/q6ebvmzwe1prvi3/cce-at-sign.7z (25MB) Teletext: ![at sign](https://cloud.githubusercontent.com/assets/3539609/11255535/1a350ddc-8e47-11e5-8437-6f600d72249b.jpg) SRT: ``` 2 00:00:12,840 --> 00:00:16,520 ORF 2015 untertitel*orf.at ```
Author
Owner

@hurda commented on GitHub (Dec 7, 2015):

Addendum: Also affects other output-formats, like SAMI and TTXT.

@hurda commented on GitHub (Dec 7, 2015): Addendum: Also affects other output-formats, like SAMI and TTXT.
Author
Owner

@anshul1912 commented on GitHub (Dec 7, 2015):

when I try to download above file, it shows deleted

@anshul1912 commented on GitHub (Dec 7, 2015): when I try to download above file, it shows deleted
Author
Owner

@cfsmp3 commented on GitHub (Jan 7, 2016):

Please reopen when a working link is available.

@cfsmp3 commented on GitHub (Jan 7, 2016): Please reopen when a working link is available.
Author
Owner

@hurda commented on GitHub (Jan 7, 2016):

http://www.mediafire.com/download/q6ebvmzwe1prvi3/cce-at-sign.7z

@hurda commented on GitHub (Jan 7, 2016): http://www.mediafire.com/download/q6ebvmzwe1prvi3/cce-at-sign.7z
Author
Owner

@cfsmp3 commented on GitHub (Jan 7, 2016):

I'm looking into this. That * is written to the buffer here:

ctx->page_buffer.text[y][i] = telx_to_ucs2(packet->data[i]);

packet->data[i] contains 42 (0x2a) which is indeed an asterisk
http://www.columbia.edu/kermit/ucs2.html

@cfsmp3 commented on GitHub (Jan 7, 2016): I'm looking into this. That \* is written to the buffer here: ctx->page_buffer.text[y][i] = telx_to_ucs2(packet->data[i]); packet->data[i] contains 42 (0x2a) which is indeed an asterisk http://www.columbia.edu/kermit/ucs2.html
Author
Owner

@Dhrumil2910 commented on GitHub (Mar 9, 2016):

Not able to Download the file. Can you pls re-upload the file?

@Dhrumil2910 commented on GitHub (Mar 9, 2016): Not able to Download the file. Can you pls re-upload the file?
Author
Owner

@cfsmp3 commented on GitHub (Mar 9, 2016):

It seems to work fine for me... is there an error message when trying to
download the file?

On Wed, Mar 9, 2016 at 12:19 PM, Dhrumil2910 notifications@github.com
wrote:

Not able to Download the file. Can you pls re-upload the file?


Reply to this email directly or view it on GitHub
https://github.com/CCExtractor/ccextractor/issues/249#issuecomment-194247124
.

@cfsmp3 commented on GitHub (Mar 9, 2016): It seems to work fine for me... is there an error message when trying to download the file? On Wed, Mar 9, 2016 at 12:19 PM, Dhrumil2910 notifications@github.com wrote: > Not able to Download the file. Can you pls re-upload the file? > > — > Reply to this email directly or view it on GitHub > https://github.com/CCExtractor/ccextractor/issues/249#issuecomment-194247124 > .
Author
Owner

@Dhrumil2910 commented on GitHub (Mar 9, 2016):

It is because of the college proxy server which is denying the download

@Dhrumil2910 commented on GitHub (Mar 9, 2016): It is because of the college proxy server which is denying the download
Author
Owner

@cfsmp3 commented on GitHub (Mar 9, 2016):

OK, I've uploaded it to slack. Maybe it can be downloaded from there?

On Wed, Mar 9, 2016 at 3:56 PM, Dhrumil2910 notifications@github.com
wrote:

It is because of the college proxy server which is denying the download


Reply to this email directly or view it on GitHub
https://github.com/CCExtractor/ccextractor/issues/249#issuecomment-194332763
.

@cfsmp3 commented on GitHub (Mar 9, 2016): OK, I've uploaded it to slack. Maybe it can be downloaded from there? On Wed, Mar 9, 2016 at 3:56 PM, Dhrumil2910 notifications@github.com wrote: > It is because of the college proxy server which is denying the download > > — > Reply to this email directly or view it on GitHub > https://github.com/CCExtractor/ccextractor/issues/249#issuecomment-194332763 > .
Author
Owner

@Dhrumil2910 commented on GitHub (Mar 10, 2016):

Ok , thanks a lot for your support

@Dhrumil2910 commented on GitHub (Mar 10, 2016): Ok , thanks a lot for your support
Author
Owner

@isacdaavid commented on GitHub (Mar 11, 2016):

I'm trying to track that 0x2a byte back through the pipeline to find where things went wrong, if anywhere; but I need to know what the adequate value would be.

For this test video telx_to_ucs2() converts 0x40 to '§' rather than '@' because of the local language (German) substitutions in the basic character set specified in ETS 300-706. Is that OK?

@isacdaavid commented on GitHub (Mar 11, 2016): I'm trying to track that 0x2a byte back through the pipeline to find where things went wrong, if anywhere; but I need to know what the adequate value would be. For this test video telx_to_ucs2() converts 0x40 to '§' rather than '@' because of the local language (German) substitutions in the basic character set specified in ETS 300-706. Is that OK?
Author
Owner

@abhishek-vinjamoori commented on GitHub (Mar 11, 2016):

@isacdaavid , ideally telx_to_ucs2() should get an input of 64 to get back "@". But, there is no possible input, for which the output is "@"(according to current decoding)

@abhishek-vinjamoori commented on GitHub (Mar 11, 2016): @isacdaavid , ideally telx_to_ucs2() should get an input of 64 to get back "@". But, there is no possible input, for which the output is "@"(according to current decoding)
Author
Owner

@isacdaavid commented on GitHub (Mar 12, 2016):

I think this bug is invalid after all. The asterisk is really there at offset 0xDBF164 in the file (value is 0x54 which is 0x2A in reverse endianess), and OP's software is responsible for outputting the at sign.

I followed that particular 0x2A back to tlt_process_pes_packet() where its endianess is reversed to 0x54, then after failing to find another transformation through several function calls and buffers all the way back until get_cinfo() I became suspicious that ccextractor had been doing the right thing all the time. Sure enough, I searched the binary file for "a7 37 54 f7" after printing the adjacent values according to ccextractor, and after changing that "54" to "02" (reversed 0x40) the '§' appeared in the .srt output as expected from my previous post.

I can provide the hex-edited video and my debugging patch/pull request. If you like, I could also implement a change to telx_to_ucs2() that would output an at sign when it finds an asterisk, but I guess you don't want to introduce such behaviour. I'm interested in proving that I know some git and can make useful changes to your codebase, but I fear that if this bug gets closed without needing a patch then I will not have earned the points for my GSoC application :(

@isacdaavid commented on GitHub (Mar 12, 2016): I think this bug is invalid after all. The asterisk is really there at offset 0xDBF164 in the file (value is 0x54 which is 0x2A in reverse endianess), and OP's software is responsible for outputting the at sign. I followed that particular 0x2A back to tlt_process_pes_packet() where its endianess is reversed to 0x54, then after failing to find another transformation through several function calls and buffers all the way back until get_cinfo() I became suspicious that ccextractor had been doing the right thing all the time. Sure enough, I searched the binary file for "a7 37 54 f7" after printing the adjacent values according to ccextractor, and after changing that "54" to "02" (reversed 0x40) the '§' appeared in the .srt output as expected from my previous post. I can provide the hex-edited video and my debugging patch/pull request. If you like, I could also implement a change to telx_to_ucs2() that would output an at sign when it finds an asterisk, but I guess you don't want to introduce such behaviour. I'm interested in proving that I know some git and can make useful changes to your codebase, but I fear that if this bug gets closed without needing a patch then I will not have earned the points for my GSoC application :(
Author
Owner

@cfsmp3 commented on GitHub (Mar 12, 2016):

The issue is with supplementary charsets.

A good starting point to research this is google "supplementary charsets
teletext". There's some other teletext applications around surely some of
them get this right and we learn from them.

This is not about replacing one char with another generically (obviously
that might work for this specific file but would break it for many others)
but rather complete the supplementary charset implementation.

A good thing is that teletext specifications are public and totally free,
so this also serve as an introduction to standard documents :-)

Notes to GSoC applicants

  • Points will be awarded to all valid solutions so don't worry if your
    specific solution is not used. We'll need to pick just one and that might
    come down to personal taste. As long as your solution is valid, you'll get
    points.
  • Extra points will be awarded for collaborating so if for example you find
    where in the standard that character appear mention it here, don't keep it
    for yourself.

On Sat, Mar 12, 2016 at 5:06 AM, Isaac David notifications@github.com
wrote:

I think this bug is invalid after all. The asterisk is really there at
offset 0xDBF164 in the file (value is 0x54 which is 0x2A in reverse
endianess), and OP's software is responsible for outputting the at sign.

I followed that particular 0x2A back to tlt_process_pes_packet() where its
endianess is reversed to 0x54, then after failing to find another
transformation through several function calls and buffers all the way back
until get_cinfo() I became suspicious that ccextractor had been doing the
right thing all the time. Sure enough, I searched the binary file for "a7
37 54 f7" after printing the adjacent values according to ccextractor, and
after changing that "54" for "02" (reversed 0x40) the '§' appeared in the
.srt output as expected from my previous post.

I can provide the hex-edited video and my debugging patch/pull request. If
you like, I could also implement a change to telx_to_ucs2() that would
output an at sign when it finds an asterisk, but I guess you don't want to
introduce such behaviour. I'm interested in proving that I know some git
and can make useful changes to your codebase, but I fear that if this bug
gets closed without needing a patch then I would not have earned the points
for my GSoC application :(


Reply to this email directly or view it on GitHub
https://github.com/CCExtractor/ccextractor/issues/249#issuecomment-195655619
.

@cfsmp3 commented on GitHub (Mar 12, 2016): The issue is with supplementary charsets. A good starting point to research this is google "supplementary charsets teletext". There's some other teletext applications around surely some of them get this right and we learn from them. This is not about replacing one char with another generically (obviously that might work for this specific file but would break it for many others) but rather complete the supplementary charset implementation. A good thing is that teletext specifications are public and totally free, so this also serve as an introduction to standard documents :-) Notes to GSoC applicants - Points will be awarded to _all_ valid solutions so don't worry if your specific solution is not used. We'll need to pick just one and that might come down to personal taste. As long as your solution is valid, you'll get points. - Extra points will be awarded for collaborating so if for example you find where in the standard that character appear mention it here, don't keep it for yourself. On Sat, Mar 12, 2016 at 5:06 AM, Isaac David notifications@github.com wrote: > I think this bug is invalid after all. The asterisk is really there at > offset 0xDBF164 in the file (value is 0x54 which is 0x2A in reverse > endianess), and OP's software is responsible for outputting the at sign. > > I followed that particular 0x2A back to tlt_process_pes_packet() where its > endianess is reversed to 0x54, then after failing to find another > transformation through several function calls and buffers all the way back > until get_cinfo() I became suspicious that ccextractor had been doing the > right thing all the time. Sure enough, I searched the binary file for "a7 > 37 54 f7" after printing the adjacent values according to ccextractor, and > after changing that "54" for "02" (reversed 0x40) the '§' appeared in the > .srt output as expected from my previous post. > > I can provide the hex-edited video and my debugging patch/pull request. If > you like, I could also implement a change to telx_to_ucs2() that would > output an at sign when it finds an asterisk, but I guess you don't want to > introduce such behaviour. I'm interested in proving that I know some git > and can make useful changes to your codebase, but I fear that if this bug > gets closed without needing a patch then I would not have earned the points > for my GSoC application :( > > — > Reply to this email directly or view it on GitHub > https://github.com/CCExtractor/ccextractor/issues/249#issuecomment-195655619 > .
Author
Owner

@hurda commented on GitHub (Mar 13, 2016):

A good starting point to research this is google "supplementary charsets teletext". There's some other teletext applications around surely some of them get this right and we learn from them.

Extracting the subtitles using ProjectX 0.91.0.10 portable also outputs "@". Maybe it helps.

EDIT:
http://project-x.cvs.sourceforge.net/viewvc/project-x/Project-X/src/net/sourceforge/dvb/projectx/subtitle/Teletext.java?view=annotate
http://project-x.cvs.sourceforge.net/viewvc/project-x/Project-X/src/net/sourceforge/dvb/projectx/subtitle/CharSet.java?view=markup

http://project-x.cvs.sourceforge.net/viewvc/project-x/Project-X/src/net/sourceforge/dvb/projectx/parser/StreamProcessTeletext.java?view=annotate
http://project-x.cvs.sourceforge.net/viewvc/project-x/Project-X/src/net/sourceforge/dvb/projectx/parser/StreamProcessSubpicture.java?view=annotate

EDIT2:
VLC (2.2.2) shows @ too.
http://git.videolan.org/?p=vlc.git;a=blob;f=modules/codec/telx.c;h=4f8842a95f4a94cb326d3e48234014852f04c235;hb=HEAD

To check this, you'll have to use this file: http://www.mediafire.com/download/7fwbqdw57sxykby/at-sign_teletext_pcr-pts.ts
The other has a difference between the PCR-timestamps of A/V and Teletext of almost three hours, which VLC apparently can't handle.

EDIT3:

and after changing that "54" to "02" (reversed 0x40) the '§' appeared in the .srt output as expected from my previous post.

While this changes what is being output by ccextractor, DVBViewer, ProjectX and VLC are still showing @.
How's that possible?

@hurda commented on GitHub (Mar 13, 2016): > A good starting point to research this is google "supplementary charsets teletext". There's some other teletext applications around surely some of them get this right and we learn from them. Extracting the subtitles using ProjectX 0.91.0.10 portable also outputs "@". Maybe it helps. EDIT: http://project-x.cvs.sourceforge.net/viewvc/project-x/Project-X/src/net/sourceforge/dvb/projectx/subtitle/Teletext.java?view=annotate http://project-x.cvs.sourceforge.net/viewvc/project-x/Project-X/src/net/sourceforge/dvb/projectx/subtitle/CharSet.java?view=markup http://project-x.cvs.sourceforge.net/viewvc/project-x/Project-X/src/net/sourceforge/dvb/projectx/parser/StreamProcessTeletext.java?view=annotate http://project-x.cvs.sourceforge.net/viewvc/project-x/Project-X/src/net/sourceforge/dvb/projectx/parser/StreamProcessSubpicture.java?view=annotate EDIT2: VLC (2.2.2) shows @ too. http://git.videolan.org/?p=vlc.git;a=blob;f=modules/codec/telx.c;h=4f8842a95f4a94cb326d3e48234014852f04c235;hb=HEAD To check this, you'll have to use this file: http://www.mediafire.com/download/7fwbqdw57sxykby/at-sign_teletext_pcr-pts.ts The other has a difference between the PCR-timestamps of A/V and Teletext of almost three hours, which VLC apparently can't handle. EDIT3: > and after changing that "54" to "02" (reversed 0x40) the '§' appeared in the .srt output as expected from my previous post. While this changes what is being output by ccextractor, DVBViewer, ProjectX and VLC are still showing @. How's that possible?
Author
Owner

@isacdaavid commented on GitHub (Mar 14, 2016):

Quick update:
I couldn't find the @ in any of the supplementary (AKA G2) character sets. I still need to find more information on the second G0 sets (mentioned in section 15.3 in the ETS 300 706) and modified G0 and G2 sets (part of Teletext level 2.5 and 3.5, mentioned in section 15.4 in the standard).

@hurda Thanks. I will definitely see what other projects are doing if I fail to find a satisfactory explanation in those extra character sets.

EDIT:
I found it. This weird behaviour seems to have been introduced in ETS 300 706 version 1.2.1 from 2003 as a marginal note in section 15.6.1 about the basic G0 Latin character set. Quoting from it:

NOTE 3: The @ symbol replaces the * symbol at position 2/A when the table is accessed via a packet X/26 Column Address triplet with Mode Description = 10 000 and Data = 0101010. See clause 12.2.4.

Time to implement it!

@isacdaavid commented on GitHub (Mar 14, 2016): Quick update: I couldn't find the @ in any of the supplementary (AKA G2) character sets. I still need to find more information on the second G0 sets (mentioned in section 15.3 in the ETS 300 706) and modified G0 and G2 sets (part of Teletext level 2.5 and 3.5, mentioned in section 15.4 in the standard). @hurda Thanks. I will definitely see what other projects are doing if I fail to find a satisfactory explanation in those extra character sets. EDIT: I found it. This weird behaviour seems to have been introduced in ETS 300 706 version 1.2.1 from 2003 as a marginal note in section 15.6.1 about the basic G0 Latin character set. Quoting from [it](http://www.etsi.org/deliver/etsi_en/300700_300799/300706/01.02.01_60/en_300706v010201p.pdf): > NOTE 3: The @ symbol replaces the \* symbol at position 2/A when the table is accessed via a packet X/26 Column Address triplet with Mode Description = 10 000 and Data = 0101010. See clause 12.2.4. Time to implement it!
Author
Owner

@cfsmp3 commented on GitHub (Mar 14, 2016):

It's in table 36 (Latin National Option subset), in the English row.
Page 115 of ETS 300 706: May 1997

On Mon, Mar 14, 2016 at 5:30 AM, Isaac David notifications@github.com wrote:

Quick update:
I couldn't find the @ in any of the supplementary (AKA G2) character sets. I
still need to find more information on the second G0 sets (mentioned in
section 15.3 in the ETS 300 706) and modified G0 and G2 sets (part of
Teletext level 2.5 and 3.5, mentioned in section 15.4 in the standard).

@hurda Thanks. I will definitely see what other projects are doing if I fail
to find a satisfactory explanation in those extra character sets.


Reply to this email directly or view it on GitHub.

@cfsmp3 commented on GitHub (Mar 14, 2016): It's in table 36 (Latin National Option subset), in the English row. Page 115 of ETS 300 706: May 1997 On Mon, Mar 14, 2016 at 5:30 AM, Isaac David notifications@github.com wrote: > Quick update: > I couldn't find the @ in any of the supplementary (AKA G2) character sets. I > still need to find more information on the second G0 sets (mentioned in > section 15.3 in the ETS 300 706) and modified G0 and G2 sets (part of > Teletext level 2.5 and 3.5, mentioned in section 15.4 in the standard). > > @hurda Thanks. I will definitely see what other projects are doing if I fail > to find a satisfactory explanation in those extra character sets. > > — > Reply to this email directly or view it on GitHub.
Author
Owner

@hurda commented on GitHub (Mar 14, 2016):

This weird behaviour seems to have been introduced in ETS 300 706 version 1.2.1 from 2003 as a marginal note in section 15.6.1 about the basic G0 Latin character set.

Good catch! It's not really helping that the first searchengine-results when searching for "ets 300 706" are for the 1997-version of the spec.
In telxcc.c the 1997-spec is referenced, but not 2003.
Here's the link http://www.etsi.org/deliver/etsi_en/300700_300799/300706/01.02.01_60/en_300706v010201p.pdf

PS: It's actually clause 12.3.4, at the bottom of table 29.

@hurda commented on GitHub (Mar 14, 2016): > This weird behaviour seems to have been introduced in ETS 300 706 version 1.2.1 from 2003 as a marginal note in section 15.6.1 about the basic G0 Latin character set. Good catch! It's not really helping that the first searchengine-results when searching for "ets 300 706" are for the 1997-version of the spec. In telxcc.c the 1997-spec is referenced, but not 2003. Here's the link http://www.etsi.org/deliver/etsi_en/300700_300799/300706/01.02.01_60/en_300706v010201p.pdf PS: It's actually clause 12.3.4, at the bottom of table 29.
Author
Owner

@cfsmp3 commented on GitHub (Mar 14, 2016):

This seems like the best possible explanation. It's a one liner fix
probably. Points will be awarded to the first GSoC applicant that sends a
proper PR :-)

On Mon, Mar 14, 2016 at 11:25 AM, hurda notifications@github.com wrote:

This weird behaviour seems to have been introduced in ETS 300 706 version
1.2.1 from 2003 as a marginal note in section 15.6.1 about the basic G0
Latin character set.

Good catch! It's not really helping that the first searchengine-results
when searching for "ets 300 706" are for the 1997-version of the spec.
In telxcc.c the 1997-spec is referenced, but not 2003.
Here's the link
http://www.etsi.org/deliver/etsi_en/300700_300799/300706/01.02.01_60/en_300706v010201p.pdf

PS: It's actually clause 12.3.4, at the bottom of table 29.


Reply to this email directly or view it on GitHub
https://github.com/CCExtractor/ccextractor/issues/249#issuecomment-196243122
.

@cfsmp3 commented on GitHub (Mar 14, 2016): This seems like the best possible explanation. It's a one liner fix probably. Points will be awarded to the first GSoC applicant that sends a proper PR :-) On Mon, Mar 14, 2016 at 11:25 AM, hurda notifications@github.com wrote: > This weird behaviour seems to have been introduced in ETS 300 706 version > 1.2.1 from 2003 as a marginal note in section 15.6.1 about the basic G0 > Latin character set. > > Good catch! It's not really helping that the first searchengine-results > when searching for "ets 300 706" are for the 1997-version of the spec. > In telxcc.c the 1997-spec is referenced, but not 2003. > Here's the link > http://www.etsi.org/deliver/etsi_en/300700_300799/300706/01.02.01_60/en_300706v010201p.pdf > > PS: It's actually clause 12.3.4, at the bottom of table 29. > > — > Reply to this email directly or view it on GitHub > https://github.com/CCExtractor/ccextractor/issues/249#issuecomment-196243122 > .
Author
Owner

@abhishek-vinjamoori commented on GitHub (Mar 14, 2016):

According to the standards mentioned when the packet is X/26 only, the "*" = 42 must be replaced with "@" -
if(y== 26) //But currently the * is addressed at y=22
{
//And Mode Description = 10000 and Data = 0101010.
if(data == 64 && mode == 0x10)
{
ctx->page_buffer.text[i]j] = 0x0040;
}
}

According to current situation of decoding-

if(data == 10 && mode == 2 && ctx->page_buffer.text[y][k] == 42 && default_g0_charset == LATIN)
ctx->page_buffer.text[y][k] = 0x0040; //Special case only for @
k is iterated from 0 to 39

@abhishek-vinjamoori commented on GitHub (Mar 14, 2016): According to the standards mentioned when the packet is X/26 only, the "*" = 42 must be replaced with "@" - if(y== 26) //But currently the \* is addressed at y=22 { //And Mode Description = 10000 and Data = 0101010. if(data == 64 && mode == 0x10) { ctx->page_buffer.text[i]j] = 0x0040; } } According to current situation of decoding- if(data == 10 && mode == 2 && ctx->page_buffer.text[y][k] == 42 && default_g0_charset == LATIN) ctx->page_buffer.text[y][k] = 0x0040; //Special case only for @ k is iterated from 0 to 39
Author
Owner

@abhishek-vinjamoori commented on GitHub (Mar 14, 2016):

I need a file where in "*" is actually used. (With that it can be verified, whether the data/mode are different when * actually appears)

@abhishek-vinjamoori commented on GitHub (Mar 14, 2016): I need a file where in "*" is actually used. (With that it can be verified, whether the data/mode are different when \* actually appears)
Author
Owner

@hurda commented on GitHub (Mar 14, 2016):

I need a file where in "*" is actually used.

http://www.mediafire.com/download/apc078mz884gbkr/teletext_subtitles_with_asterisk_pcr-pts.ts

@hurda commented on GitHub (Mar 14, 2016): > I need a file where in "*" is actually used. http://www.mediafire.com/download/apc078mz884gbkr/teletext_subtitles_with_asterisk_pcr-pts.ts
Author
Owner

@abhishek-vinjamoori commented on GitHub (Mar 14, 2016):

Could this be hosted somewhere else, as mediafire is blocked.

@abhishek-vinjamoori commented on GitHub (Mar 14, 2016): Could this be hosted somewhere else, as mediafire is blocked.
Author
Owner

@hurda commented on GitHub (Mar 14, 2016):

http://www111.zippyshare.com/v/Gc65zLfD/file.html

@hurda commented on GitHub (Mar 14, 2016): http://www111.zippyshare.com/v/Gc65zLfD/file.html
Author
Owner

@abhishek-vinjamoori commented on GitHub (Mar 14, 2016):

1
00:00:03,560 --> 00:00:06,160
schon wieder Streit, Doris.

2
00:00:10,360 --> 00:00:12,000

  • Der Motor heult auf. *

Is this the desired output ?

@abhishek-vinjamoori commented on GitHub (Mar 14, 2016): 1 00:00:03,560 --> 00:00:06,160 schon wieder Streit, Doris. 2 00:00:10,360 --> 00:00:12,000 - Der Motor heult auf. * Is this the desired output ?
Author
Owner

@hurda commented on GitHub (Mar 14, 2016):

Have you omitted some lines for brevity?

Here are all subtitles of that sample-video:

1
00:00:03,560 --> 00:00:06,160
Mach nicht
schon wieder Streit, Doris.

2
00:00:06,240 --> 00:00:07,320
Grüße an Jesus!

3
00:00:07,400 --> 00:00:08,560
* Diana seufzt. *

4
00:00:08,640 --> 00:00:10,280
* Sie lässt den Motor an. *

5
00:00:10,360 --> 00:00:12,000
* Der Motor heult auf. *

6
00:00:14,200 --> 00:00:15,960
* schelmische Musik *

That's with ccextractor 0.79.

@hurda commented on GitHub (Mar 14, 2016): Have you omitted some lines for brevity? Here are all subtitles of that sample-video: ``` 1 00:00:03,560 --> 00:00:06,160 Mach nicht schon wieder Streit, Doris. 2 00:00:06,240 --> 00:00:07,320 Grüße an Jesus! 3 00:00:07,400 --> 00:00:08,560 * Diana seufzt. * 4 00:00:08,640 --> 00:00:10,280 * Sie lässt den Motor an. * 5 00:00:10,360 --> 00:00:12,000 * Der Motor heult auf. * 6 00:00:14,200 --> 00:00:15,960 * schelmische Musik * ``` That's with ccextractor 0.79.
Author
Owner

@abhishek-vinjamoori commented on GitHub (Mar 15, 2016):

Yes. That is the output I'm getting. Is there any other file with "@" ? It would be really helpful.

@abhishek-vinjamoori commented on GitHub (Mar 15, 2016): Yes. That is the output I'm getting. Is there any other file with "@" ? It would be really helpful.
Author
Owner

@hurda commented on GitHub (Mar 15, 2016):

I only got files with the same "untertitel*orf.at"-output.

@hurda commented on GitHub (Mar 15, 2016): I only got files with the same "untertitel*orf.at"-output.
Author
Owner

@abhishek-vinjamoori commented on GitHub (Mar 15, 2016):

But, are they different files ?

@abhishek-vinjamoori commented on GitHub (Mar 15, 2016): But, are they different files ?
Author
Owner

@hurda commented on GitHub (Mar 15, 2016):

Yes. http://www36.zippyshare.com/v/BwLLxb8i/file.html

@hurda commented on GitHub (Mar 15, 2016): Yes. http://www36.zippyshare.com/v/BwLLxb8i/file.html
Author
Owner

@cfsmp3 commented on GitHub (Mar 18, 2016):

Solved in current github version.

@cfsmp3 commented on GitHub (Mar 18, 2016): Solved in current github version.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/ccextractor#89