[korean] The subtitles are missing relevant to ask for help #119

Closed
opened 2026-01-29 16:35:47 +00:00 by claunia · 18 comments
Owner

Originally created by @HaneolLee on GitHub (Feb 26, 2016).

Originally assigned to: @PunitLodha on GitHub.

Republic of Korea broadcasting is divided into rocal and national television broadcasts.

National broadcasting is finished as soon Local broadcasting started.

At this time, the Closed Caption This is a part missing.

I would like to know what causes it and how to solve .

Through the attachment you can see the details.

https://drive.google.com/file/d/0BxFzM3fSXVOiYWcwQ0RVU2Fic00/view?usp=sharing

Originally created by @HaneolLee on GitHub (Feb 26, 2016). Originally assigned to: @PunitLodha on GitHub. Republic of Korea broadcasting is divided into rocal and national television broadcasts. National broadcasting is finished as soon Local broadcasting started. At this time, the Closed Caption This is a part missing. I would like to know what causes it and how to solve . Through the attachment you can see the details. https://drive.google.com/file/d/0BxFzM3fSXVOiYWcwQ0RVU2Fic00/view?usp=sharing
Author
Owner

@YorkHe commented on GitHub (Mar 10, 2016):

@HaneolLee Can you upload a sample file of the video so that we can make a debug run on it?

@YorkHe commented on GitHub (Mar 10, 2016): @HaneolLee Can you upload a sample file of the video so that we can make a debug run on it?
Author
Owner

@HaneolLee commented on GitHub (Mar 15, 2016):

@YorkHe
The source files recorded broadcast.
Sorry.
Do not know how to edit the sample files. So that was uploaded as a single large file
https://drive.google.com/file/d/0BxFzM3fSXVOiV1BkaHh2WHBKejg/view?usp=sharing

TTXT results
https://drive.google.com/file/d/0BxFzM3fSXVOiMXllemt2T3ZwRUU/view?usp=sharing

Reference image
https://drive.google.com/file/d/0BxFzM3fSXVOiMV9yZlhDdE52bEE/view?usp=sharingg

srt results
https://drive.google.com/file/d/0BxFzM3fSXVOiRDVCbHJYVVVTY3c/view?usp=sharing

Please Note from 00:04:32,206 to 00:04:49,622
Thank you.

@HaneolLee commented on GitHub (Mar 15, 2016): @YorkHe The source files recorded broadcast. Sorry. Do not know how to edit the sample files. So that was uploaded as a single large file https://drive.google.com/file/d/0BxFzM3fSXVOiV1BkaHh2WHBKejg/view?usp=sharing TTXT results https://drive.google.com/file/d/0BxFzM3fSXVOiMXllemt2T3ZwRUU/view?usp=sharing Reference image https://drive.google.com/file/d/0BxFzM3fSXVOiMV9yZlhDdE52bEE/view?usp=sharingg srt results https://drive.google.com/file/d/0BxFzM3fSXVOiRDVCbHJYVVVTY3c/view?usp=sharing Please Note from 00:04:32,206 to 00:04:49,622 Thank you.
Author
Owner

@YorkHe commented on GitHub (Mar 17, 2016):

@HaneolLee
Great, I'll take a look on this issue this weekend.

@YorkHe commented on GitHub (Mar 17, 2016): @HaneolLee Great, I'll take a look on this issue this weekend.
Author
Owner

@YorkHe commented on GitHub (Mar 17, 2016):

@HaneolLee Emm... It's seems like you didn't paste the URL for the video file in this issue. You've misplaced it with the TTXT results. 😄

@YorkHe commented on GitHub (Mar 17, 2016): @HaneolLee Emm... It's seems like you didn't paste the URL for the video file in this issue. You've misplaced it with the TTXT results. :smile:
Author
Owner

@HaneolLee commented on GitHub (Mar 18, 2016):

@YorkHe
I'm sorry.
I had to edit the link

@HaneolLee commented on GitHub (Mar 18, 2016): @YorkHe I'm sorry. I had to edit the link
Author
Owner

@Abhinav95 commented on GitHub (May 22, 2016):

A big part of the problem here seems to be the missing subtitles from the end of the national broadcasting being displayed at the very end of the extracted subtitles.
.
.
.
120
00:04:46,787 --> 00:04:49,622
주장했습니다.
-교도소에 갈 범죄를 저지르면 자살할
용기가 생기지 않을까 싶어서 자포자기

  • Missing 121 here, and this is where the problem occurs

122
00:05:35,936 --> 00:05:41,407
-(앵커) 시청자 여러분, 안녕하십니까?
.
.
.
662
00:30:16,315 --> 00:30:21,252
-(앵커) 올림픽 페스티벌 개막을 맞아
강릉에서 보내드린 특집 뉴스820 여기서

663
00:30:21,253 --> 00:30:22,754
-(앵커) 올림픽 페스티벌 개막을 맞아
강릉에서 보내드린 특집 뉴스820 여기서
마치겠습니다.

664
00:04:49,623 --> 00:34:57,394
강릉에서 보내드린 특집 뉴스820 여기서
마치겠습니다.
-(앵커) 시청해주신 여러분, 고맙습니다.

  • The subtitle which should have been displayed at 121 is now displayed at 664 with the correct begin time but the end time is the end time of the video. Also, the text in this displayed subtitle is the same as that of the previous subtitle (663) which is incorrect.
@Abhinav95 commented on GitHub (May 22, 2016): A big part of the problem here seems to be the missing subtitles from the end of the national broadcasting being displayed at the very end of the extracted subtitles. . . . 120 00:04:46,787 --> 00:04:49,622 <font color="ffff00">주장했습니다. </font> -교도소에 갈 범죄를 저지르면 자살할 용기가 생기지 않을까 싶어서 자포자기 - Missing 121 here, and this is where the problem occurs 122 00:05:35,936 --> 00:05:41,407 <font color="ffff00">-(앵커) 시청자 여러분, 안녕하십니까? </font> . . . 662 00:30:16,315 --> 00:30:21,252 <font color="ffff00">-(앵커) 올림픽 페스티벌 개막을 맞아 </font> <font color="ffff00">강릉에서 보내드린 특집 뉴스820 여기서 </font> 663 00:30:21,253 --> 00:30:22,754 <font color="ffff00">-(앵커) 올림픽 페스티벌 개막을 맞아 </font> <font color="ffff00">강릉에서 보내드린 특집 뉴스820 여기서 </font> 마치겠습니다. 664 00:04:49,623 --> 00:34:57,394 <font color="ffff00">강릉에서 보내드린 특집 뉴스820 여기서 </font> 마치겠습니다. -(앵커) 시청해주신 여러분, 고맙습니다. - The subtitle which should have been displayed at 121 is now displayed at 664 with the correct begin time but the end time is the end time of the video. Also, the text in this displayed subtitle is the same as that of the previous subtitle (663) which is incorrect.
Author
Owner

@Izaron commented on GitHub (Jan 13, 2017):

I repeated this in the current master
I can say that the situation is better, because now we get the correct characters in subtitles. We have also corrected the disappearance in numbers of the subs.

120
00:04:46,787 --> 00:04:49,622
<font color="ffff00">쇖샥쟟뷀듏듙. </font>                                
-놳떵볒뾡 낥 맼쇋뢦 샺쇶뢣룩 샚믬쟒                    
뿫뇢낡 믽뇢쇶 뻊삻뇮 뷍뻮벭 샚웷샚뇢                    

121
00:05:35,936 --> 00:05:41,407
<font color="ffff00">-(뻞쒿) 뷃쎻샚 뾩랯뫐, 뻈돧쟏뷊듏뇮? </font>                 

Same characters and colors we can see in CEA-708 player. We missed the new subtitle with the last line after 뿫뇢낡 믽뇢쇶 뻊삻뇮 뷍뻮벭 샚웷샚뇢, and this subtitle would starts from around 00:04:49,623
The last lines in file:

630
00:30:16,315 --> 00:30:21,252
<font color="ffff00">-(뻞쒿) 뿃뢲쟈 웤붺욼맺 낳뢷삻 룂뻆 </font>                  
<font color="ffff00">낭뢪뾡벭 몸뎻뗥뢰 욯쇽 뒺붺820 뾩뇢벭 </font>                 

631
00:30:21,253 --> 00:30:22,754
<font color="ffff00">-(뻞쒿) 뿃뢲쟈 웤붺욼맺 낳뢷삻 룂뻆 </font>                  
<font color="ffff00">낭뢪뾡벭 몸뎻뗥뢰 욯쇽 뒺붺820 뾩뇢벭 </font>                 
<font color="ffff00">뢶쒡냚뷀듏듙. </font>                                

632
00:04:49,623 --> 00:34:57,394
<font color="ffff00">낭뢪뾡벭 몸뎻뗥뢰 욯쇽 뒺붺820 뾩뇢벭 </font>                 
<font color="ffff00">뢶쒡냚뷀듏듙. </font>                                
-(뻞쒿) 뷃쎻쟘쇖뷅 뾩랯뫐, 냭뢿뷀듏듙.                 

Last sub have wrong timing - starts from this missed 00:04:49,623 and ends in length of the video. Expected time around 00:30:22,72 --> 00:30:33,23

@Izaron commented on GitHub (Jan 13, 2017): I repeated this in the current master I can say that the situation is better, because now we get the **correct** characters in subtitles. We have also corrected the disappearance in numbers of the subs. ``` 120 00:04:46,787 --> 00:04:49,622 <font color="ffff00">쇖샥쟟뷀듏듙. </font> -놳떵볒뾡 낥 맼쇋뢦 샺쇶뢣룩 샚믬쟒 뿫뇢낡 믽뇢쇶 뻊삻뇮 뷍뻮벭 샚웷샚뇢 121 00:05:35,936 --> 00:05:41,407 <font color="ffff00">-(뻞쒿) 뷃쎻샚 뾩랯뫐, 뻈돧쟏뷊듏뇮? </font> ``` Same characters and colors we can see in CEA-708 player. We missed the new subtitle with the last line after `뿫뇢낡 믽뇢쇶 뻊삻뇮 뷍뻮벭 샚웷샚뇢`, and this subtitle would starts from around `00:04:49,623` The last lines in file: ``` 630 00:30:16,315 --> 00:30:21,252 <font color="ffff00">-(뻞쒿) 뿃뢲쟈 웤붺욼맺 낳뢷삻 룂뻆 </font> <font color="ffff00">낭뢪뾡벭 몸뎻뗥뢰 욯쇽 뒺붺820 뾩뇢벭 </font> 631 00:30:21,253 --> 00:30:22,754 <font color="ffff00">-(뻞쒿) 뿃뢲쟈 웤붺욼맺 낳뢷삻 룂뻆 </font> <font color="ffff00">낭뢪뾡벭 몸뎻뗥뢰 욯쇽 뒺붺820 뾩뇢벭 </font> <font color="ffff00">뢶쒡냚뷀듏듙. </font> 632 00:04:49,623 --> 00:34:57,394 <font color="ffff00">낭뢪뾡벭 몸뎻뗥뢰 욯쇽 뒺붺820 뾩뇢벭 </font> <font color="ffff00">뢶쒡냚뷀듏듙. </font> -(뻞쒿) 뷃쎻쟘쇖뷅 뾩랯뫐, 냭뢿뷀듏듙. ``` Last sub have wrong timing - starts from this missed `00:04:49,623` and ends in length of the video. Expected time around `00:30:22,72 --> 00:30:33,23`
Author
Owner

@Izaron commented on GitHub (Jan 13, 2017):

Also there are this bugs:

314
00:14:43,149 --> 00:14:48,087
<font color="ffff00">뿃뢲쟈 뫐삧뇢뢦 좮믪쟏듂 떿뷃뾡 쇶뾪 </font>                   
<font color="ffff00">욯벺삻 믬뢰 릮좭샻 뾪랮삻 얰뿶뎪낥 </font>                    
<font color="ffff00">냨좹샔듏듙. </font>                                 

315
00:14:48,088 --> 00:14:48,087
<font color="ffff00">욯벺삻 믬뢰 릮좭샻 뾪랮삻 얰뿶뎪낥 </font>                    
<font color="ffff00">냨좹샔듏듙. </font>                                 
343
00:16:07,167 --> 00:16:08,667
뿸샌 엵샚낡 뗋듏듙.                             
쟶샧 놹솦붺얰뾬룍 뇢쇘뾡 룂쏟듙 몸듏뇮                   
쏑 믧뻷뫱낡 솻 쇵낡뗉 낡듉벺떵                       

344
00:16:08,668 --> 00:16:08,667
쟶샧 놹솦붺얰뾬룍 뇢쇘뾡 룂쏟듙 몸듏뇮                   
쏑 믧뻷뫱낡 솻 쇵낡뗉 낡듉벺떵                       
샖뷀듏듙.   
@Izaron commented on GitHub (Jan 13, 2017): Also there are this bugs: ``` 314 00:14:43,149 --> 00:14:48,087 <font color="ffff00">뿃뢲쟈 뫐삧뇢뢦 좮믪쟏듂 떿뷃뾡 쇶뾪 </font> <font color="ffff00">욯벺삻 믬뢰 릮좭샻 뾪랮삻 얰뿶뎪낥 </font> <font color="ffff00">냨좹샔듏듙. </font> 315 00:14:48,088 --> 00:14:48,087 <font color="ffff00">욯벺삻 믬뢰 릮좭샻 뾪랮삻 얰뿶뎪낥 </font> <font color="ffff00">냨좹샔듏듙. </font> ``` ``` 343 00:16:07,167 --> 00:16:08,667 뿸샌 엵샚낡 뗋듏듙. 쟶샧 놹솦붺얰뾬룍 뇢쇘뾡 룂쏟듙 몸듏뇮 쏑 믧뻷뫱낡 솻 쇵낡뗉 낡듉벺떵 344 00:16:08,668 --> 00:16:08,667 쟶샧 놹솦붺얰뾬룍 뇢쇘뾡 룂쏟듙 몸듏뇮 쏑 믧뻷뫱낡 솻 쇵낡뗉 낡듉벺떵 샖뷀듏듙. ```
Author
Owner

@cfsmp3 commented on GitHub (Jan 20, 2017):

GSoC qualification: This issues gives 3 points.

@cfsmp3 commented on GitHub (Jan 20, 2017): GSoC qualification: This issues gives 3 points.
Author
Owner

@Izaron commented on GitHub (Jan 22, 2017):

Should we have both "Missing subs" and "Deal with whipped roll-up subs" in one issue?

@Izaron commented on GitHub (Jan 22, 2017): Should we have both "Missing subs" and "Deal with [whipped](https://github.com/CCExtractor/ccextractor/issues/286#issuecomment-272539061) roll-up subs" in one issue?
Author
Owner

@cfsmp3 commented on GitHub (Jan 22, 2017):

In general it's best to have separate issues for separate issues :-)
Otherwise we can't close anything.

On Sun, Jan 22, 2017 at 11:20 AM, Evgeny Shulgin notifications@github.com
wrote:

Should we have both "Missing subs" and "Deal with whipped
https://github.com/CCExtractor/ccextractor/issues/286#issuecomment-272539061
roll-up subs" in one issue?


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/CCExtractor/ccextractor/issues/286#issuecomment-274352311,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AFrJ2WvG52sNvsLbu8is8DBSAdZ8qC30ks5rU6v7gaJpZM4Hjkgm
.

@cfsmp3 commented on GitHub (Jan 22, 2017): In general it's best to have separate issues for separate issues :-) Otherwise we can't close anything. On Sun, Jan 22, 2017 at 11:20 AM, Evgeny Shulgin <notifications@github.com> wrote: > Should we have both "Missing subs" and "Deal with whipped > <https://github.com/CCExtractor/ccextractor/issues/286#issuecomment-272539061> > roll-up subs" in one issue? > > — > You are receiving this because you commented. > Reply to this email directly, view it on GitHub > <https://github.com/CCExtractor/ccextractor/issues/286#issuecomment-274352311>, > or mute the thread > <https://github.com/notifications/unsubscribe-auth/AFrJ2WvG52sNvsLbu8is8DBSAdZ8qC30ks5rU6v7gaJpZM4Hjkgm> > . >
Author
Owner

@HaneolLee commented on GitHub (Jan 24, 2017):

hi.

  1. https://drive.google.com/open?id=0BxFzM3fSXVOiTl83VGczdUpEMEE
  2. https://drive.google.com/open?id=0BxFzM3fSXVOiaUpwN1RVeG11ODA
  3. https://drive.google.com/open?id=0BxFzM3fSXVOiako4TTFMclB3ZHc

The broadcast recorded file was uploaded again. There is a problem that the
subtitles are not displayed on the links 1 and 2. Link 3 has a problem that
the subtitles are repeated and the subtitles do not completely come out.

2017-01-23 4:22 GMT+09:00 Carlos Fernandez Sanz notifications@github.com:

In general it's best to have separate issues for separate issues :-)
Otherwise we can't close anything.

On Sun, Jan 22, 2017 at 11:20 AM, Evgeny Shulgin <notifications@github.com

wrote:

Should we have both "Missing subs" and "Deal with whipped
<https://github.com/CCExtractor/ccextractor/issues/286#issuecomment-
272539061>
roll-up subs" in one issue?


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<https://github.com/CCExtractor/ccextractor/issues/286#issuecomment-
274352311>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/
AFrJ2WvG52sNvsLbu8is8DBSAdZ8qC30ks5rU6v7gaJpZM4Hjkgm>
.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/CCExtractor/ccextractor/issues/286#issuecomment-274352428,
or mute the thread
https://github.com/notifications/unsubscribe-auth/APKq7MOsYlWdb-6rTk9eyrdkI0C0B-9lks5rU6xggaJpZM4Hjkgm
.

@HaneolLee commented on GitHub (Jan 24, 2017): hi. 1. https://drive.google.com/open?id=0BxFzM3fSXVOiTl83VGczdUpEMEE 2. https://drive.google.com/open?id=0BxFzM3fSXVOiaUpwN1RVeG11ODA 3. https://drive.google.com/open?id=0BxFzM3fSXVOiako4TTFMclB3ZHc The broadcast recorded file was uploaded again. There is a problem that the subtitles are not displayed on the links 1 and 2. Link 3 has a problem that the subtitles are repeated and the subtitles do not completely come out. 2017-01-23 4:22 GMT+09:00 Carlos Fernandez Sanz <notifications@github.com>: > In general it's best to have separate issues for separate issues :-) > Otherwise we can't close anything. > > On Sun, Jan 22, 2017 at 11:20 AM, Evgeny Shulgin <notifications@github.com > > > wrote: > > > Should we have both "Missing subs" and "Deal with whipped > > <https://github.com/CCExtractor/ccextractor/issues/286#issuecomment- > 272539061> > > roll-up subs" in one issue? > > > > — > > You are receiving this because you commented. > > Reply to this email directly, view it on GitHub > > <https://github.com/CCExtractor/ccextractor/issues/286#issuecomment- > 274352311>, > > or mute the thread > > <https://github.com/notifications/unsubscribe-auth/ > AFrJ2WvG52sNvsLbu8is8DBSAdZ8qC30ks5rU6v7gaJpZM4Hjkgm> > > . > > > > > — > You are receiving this because you were mentioned. > Reply to this email directly, view it on GitHub > <https://github.com/CCExtractor/ccextractor/issues/286#issuecomment-274352428>, > or mute the thread > <https://github.com/notifications/unsubscribe-auth/APKq7MOsYlWdb-6rTk9eyrdkI0C0B-9lks5rU6xggaJpZM4Hjkgm> > . >
Author
Owner

@fcartegnie commented on GitHub (Sep 20, 2017):

while(true)
{
add samples,
remove samples,
}

really annoying

@fcartegnie commented on GitHub (Sep 20, 2017): while(true) { add samples, remove samples, } really annoying
Author
Owner

@cfsmp3 commented on GitHub (Nov 27, 2017):

Links not working - closing ticket.
Reopen when posting new links. Also, links should be permanent, otherwise it's unlikely we'll ever work on this. We just have to have access to the files when we have time, otherwise we just move on.

@cfsmp3 commented on GitHub (Nov 27, 2017): Links not working - closing ticket. Reopen when posting new links. Also, links should be permanent, otherwise it's unlikely we'll ever work on this. We just have to have access to the files when we have time, otherwise we just move on.
Author
Owner

@cfsmp3 commented on GitHub (Nov 21, 2021):

@ValZapod Can you check if this is a problem with the current master?
Since it's a 708 thing, assigned to @PunitLodha so he can take a look

@cfsmp3 commented on GitHub (Nov 21, 2021): @ValZapod Can you check if this is a problem with the current master? Since it's a 708 thing, assigned to @PunitLodha so he can take a look
Author
Owner

@PunitLodha commented on GitHub (Nov 22, 2021):

Reopen please https://drive.google.com/drive/mobile/folders/0B_61ywKPmI0TZU00VjRYWENfYjg?usp=sharing

Using -svc all[EUC-KR] I can get all subtitles from this file.

We don't have access to the orignal video being discussed in this issue. We'll close this issue for now. If this issue persists in the original video, then feel free to reopen

@PunitLodha commented on GitHub (Nov 22, 2021): > Reopen please https://drive.google.com/drive/mobile/folders/0B_61ywKPmI0TZU00VjRYWENfYjg?usp=sharing Using -svc all[EUC-KR] I can get all subtitles from this file. We don't have access to the orignal video being discussed in this issue. We'll close this issue for now. If this issue persists in the original video, then feel free to reopen
Author
Owner

@PunitLodha commented on GitHub (Nov 24, 2021):

I think auto select should be done still.

We need to know what the language is before we can auto select. Language is present in the caption service descriptor, which is present in either PMT or EIT. But, I didn't find the caption service descriptor in any of the videos.

The Korean standard states,

"DTVCC Default Mode in Korea : Although DTVCC subtitles data exists in DTVCC transmission channels but PMT and EIT do not have any caption service descriptor, it will be treated as Service 1 and EUC-KR."

We cannot default to EUC-KR on all videos, which are in different languages, not just Korean.
I think the best solution here is to just manually pass EUC-KR parameter

@PunitLodha commented on GitHub (Nov 24, 2021): >I think auto select should be done still. We need to know what the language is before we can auto select. Language is present in the caption service descriptor, which is present in either PMT or EIT. But, I didn't find the caption service descriptor in any of the videos. The Korean standard states, > "DTVCC Default Mode in Korea : Although DTVCC subtitles data exists in DTVCC transmission channels but PMT and EIT do not have any caption service descriptor, it will be treated as Service 1 and EUC-KR." We cannot default to EUC-KR on all videos, which are in different languages, not just Korean. I think the best solution here is to just manually pass EUC-KR parameter
Author
Owner

@cfsmp3 commented on GitHub (Nov 24, 2021):

think the best solution here is to just manually pass EUC-KR parameter

Agreed. I don't think it's a good use of our time to try to detect heuristically.

But we'll accept PRs 😊

@cfsmp3 commented on GitHub (Nov 24, 2021): > think the best solution here is to just manually pass EUC-KR parameter Agreed. I don't think it's a good use of our time to try to detect heuristically. But we'll accept PRs 😊
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/ccextractor#119