Cannot extract DVB subtitles from the TVEHD channel #112

Closed
opened 2026-01-29 16:35:35 +00:00 by claunia · 13 comments
Owner

Originally created by @ruralman on GitHub (Feb 10, 2016).

hola a todos,
A pesar que el mediainfo dice:
Video:
PID: 0x12D(H.264)
Audio:
PID: 0x12E(spa)[PD]
PID: 0x12F(qaa)[PD]
PID: 0x130(spa)[PD]
Teletexto:
Vacío
Subtítulos DVB:
PID: 0x131(spa_0x14_p1_a2 )

No consigo extraerlos con el ccextractor(con subtitle edit ,logro visualizarlos)
adjunto archivos de ejemplo:
https://drive.google.com/file/d/0B_gQLghxiDPxakJOUkNLSkl2Z1U/view?usp=drive_web
https://drive.google.com/file/d/0B_gQLghxiDPxTDRseU5SMUpKVVk/view?usp=drive_web

Originally created by @ruralman on GitHub (Feb 10, 2016). hola a todos, A pesar que el mediainfo dice: Video: PID: 0x12D(H.264) Audio: PID: 0x12E(spa)[PD] PID: 0x12F(qaa)[PD] PID: 0x130(spa)[PD] Teletexto: Vacío Subtítulos DVB: PID: 0x131(spa_0x14_p1_a2 ) No consigo extraerlos con el ccextractor(con subtitle edit ,logro visualizarlos) adjunto archivos de ejemplo: https://drive.google.com/file/d/0B_gQLghxiDPxakJOUkNLSkl2Z1U/view?usp=drive_web https://drive.google.com/file/d/0B_gQLghxiDPxTDRseU5SMUpKVVk/view?usp=drive_web
Author
Owner

@anshul1912 commented on GitHub (Feb 14, 2016):

are you able to see subtitles with any other video player

@anshul1912 commented on GitHub (Feb 14, 2016): are you able to see subtitles with any other video player
Author
Owner

@ruralman commented on GitHub (Feb 29, 2016):

si,los subtitulos se ven (mplayer),pero no se pueden extraer.

2016-02-14 12:09 GMT+01:00 Anshul Maheshwari notifications@github.com:

are you able to see subtitles with any other video player


Reply to this email directly or view it on GitHub
https://github.com/CCExtractor/ccextractor/issues/279#issuecomment-183874552
.

@ruralman commented on GitHub (Feb 29, 2016): si,los subtitulos se ven (mplayer),pero no se pueden extraer. 2016-02-14 12:09 GMT+01:00 Anshul Maheshwari notifications@github.com: > are you able to see subtitles with any other video player > > — > Reply to this email directly or view it on GitHub > https://github.com/CCExtractor/ccextractor/issues/279#issuecomment-183874552 > .
Author
Owner

@vinayakathavale commented on GitHub (Mar 18, 2016):

@ruralman you need to use -out=spupng as one of the arguments

refer link for details.

@vinayakathavale commented on GitHub (Mar 18, 2016): @ruralman you need to use -out=spupng as one of the arguments refer [link](http://ccextractor.com/spupng.html) for details.
Author
Owner

@cfsmp3 commented on GitHub (Jul 5, 2016):

This file: 000prueba brujula.ts
Causes a crash which is definitely related to the OCR (with no tesseract the spupng files are generated just fine, even though of course we don't have the text in the XML file).
With tesseract there's a quick crash. VS doesn't say where though, so I assume it's inside one of the libraries.
If exporting to .srt the generated file before the crash contains garbage:

1
00:00:00,001 --> 00:00:00,000
I-Ian: rnllnhnc Ilniunr-can
I I“, IIIIpI\lII\I§ IpIIIIVGI§\I§
r'7-""*1-~1

Assigning to Anshul (sorry) as it's DVB-OCR related.

@cfsmp3 commented on GitHub (Jul 5, 2016): This file: 000prueba brujula.ts Causes a crash which is definitely related to the OCR (with no tesseract the spupng files are generated just fine, even though of course we don't have the text in the XML file). With tesseract there's a quick crash. VS doesn't say where though, so I assume it's inside one of the libraries. If exporting to .srt the generated file before the crash contains garbage: 1 00:00:00,001 --> 00:00:00,000 I-Ian: rnllnhnc Ilniunr-can I I“, IIIIpI\lII\I§ IpIIIIVGI§\I§ r'7-""*1-~1 Assigning to Anshul (sorry) as it's DVB-OCR related.
Author
Owner

@Abhinav95 commented on GitHub (Aug 10, 2016):

The problem with these two files is that the image which we are doing OCR on is not created properly. I'm taking a deeper look into it.

@Abhinav95 commented on GitHub (Aug 10, 2016): The problem with these two files is that the image which we are doing OCR on is not created properly. I'm taking a deeper look into it.
Author
Owner

@cfsmp3 commented on GitHub (Nov 7, 2016):

@ruralman do you still have this problem with the current version?

@cfsmp3 commented on GitHub (Nov 7, 2016): @ruralman do you still have this problem with the current version?
Author
Owner

@Izaron commented on GitHub (Jan 2, 2017):

I researched this issue.
Firstly, we have subtitles, but the display time starts from 1590 minutes. Something is broken at this point anyway.
Secondly, we get errors because 'start_y' or 'end_y' was uninitialized. I initialized them to 0.

image

Anyway was other bugs:

image

After patching, I received such output:

image

And .srt file - hercules.srt What have start time 26:30:43.

I would venture to suggest that the error in the .ts file recording.

@Izaron commented on GitHub (Jan 2, 2017): I researched this issue. Firstly, we have subtitles, but the display time starts from 1590 minutes. Something is broken at this point anyway. Secondly, we get errors because 'start_y' or 'end_y' was uninitialized. I initialized them to 0. ![image](https://cloud.githubusercontent.com/assets/5406399/21588433/a9815164-d0f7-11e6-9475-97472041daa3.png) Anyway was other bugs: ![image](https://cloud.githubusercontent.com/assets/5406399/21588452/e4315458-d0f7-11e6-957e-33739eb6768a.png) After patching, I received such output: ![image](https://cloud.githubusercontent.com/assets/5406399/21588557/f8ff26a2-d0f8-11e6-8432-af45e112b6ff.png) And .srt file - [hercules.srt](https://gist.github.com/Izaron/aeba650fbfab1ef89e20b3765886c712) What have start time 26:30:43. I would venture to suggest that the error in the .ts file recording.
Author
Owner

@cfsmp3 commented on GitHub (Jan 20, 2017):

prueba_hercules.ts crashes with current version.
prueba_brujula.ts doesn't crash but output is garbage.

GSoC qualification: This issues gives 3 points.

@cfsmp3 commented on GitHub (Jan 20, 2017): prueba_hercules.ts crashes with current version. prueba_brujula.ts doesn't crash but output is garbage. GSoC qualification: This issues gives 3 points.
Author
Owner

@ruralman commented on GitHub (Jan 23, 2017):

thanks for your answer

2017-01-20 1:55 GMT+01:00 Carlos Fernandez Sanz notifications@github.com:

prueba_hercules.ts crashes with current version.
prueba_brujula.ts doesn't crash but output is garbage.

GSoC qualification: This issues gives 3 points.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/CCExtractor/ccextractor/issues/279#issuecomment-273946099,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AQXNwDq_HdnDkD6JuQ89oQh0MkPvYDpXks5rUAXmgaJpZM4HXIo3
.

@ruralman commented on GitHub (Jan 23, 2017): thanks for your answer 2017-01-20 1:55 GMT+01:00 Carlos Fernandez Sanz <notifications@github.com>: > prueba_hercules.ts crashes with current version. > prueba_brujula.ts doesn't crash but output is garbage. > > GSoC qualification: This issues gives 3 points. > > — > You are receiving this because you were mentioned. > Reply to this email directly, view it on GitHub > <https://github.com/CCExtractor/ccextractor/issues/279#issuecomment-273946099>, > or mute the thread > <https://github.com/notifications/unsubscribe-auth/AQXNwDq_HdnDkD6JuQ89oQh0MkPvYDpXks5rUAXmgaJpZM4HXIo3> > . >
Author
Owner

@canihavesomecoffee commented on GitHub (Nov 18, 2017):

@ruralman I modified your title to English so that it's a bit clearer :)

Hope you don't mind.

@canihavesomecoffee commented on GitHub (Nov 18, 2017): @ruralman I modified your title to English so that it's a bit clearer :) Hope you don't mind.
Author
Owner

@harrynull commented on GitHub (Dec 26, 2017):

000prueba hercules.ts works perfectly. Subtitles are extracted correctly.
000prueba brujula.ts's output is quite messed up. The cause is that the subtitle is divided into multiple regions, like shown below:
test 4
test 3
test 2
test
The reason -spupng working fine is that it merges the regions before saving to file.

My suggestion is that we should merge the regions before passing to OCR.
Possible flaw: The positioning of the subtitle could be hard.
Possible solution: Only merge the subtitles that are nearby.

@harrynull commented on GitHub (Dec 26, 2017): `000prueba hercules.ts` works perfectly. Subtitles are extracted correctly. `000prueba brujula.ts`'s output is quite messed up. The cause is that the subtitle is divided into multiple regions, like shown below: ![test 4](https://user-images.githubusercontent.com/7413706/34354301-3f848caa-ea68-11e7-9913-1fbaa8a70985.png) ![test 3](https://user-images.githubusercontent.com/7413706/34354300-3f52c38c-ea68-11e7-9139-ebc0a6388837.png) ![test 2](https://user-images.githubusercontent.com/7413706/34354298-3f1f9458-ea68-11e7-9ad7-cf221a33cf9d.png) ![test](https://user-images.githubusercontent.com/7413706/34354297-3eec1006-ea68-11e7-9396-9f021e4268dc.png) The reason `-spupng` working fine is that it merges the regions before saving to file. My suggestion is that we should merge the regions before passing to OCR. Possible flaw: The positioning of the subtitle could be hard. Possible solution: Only merge the subtitles that are nearby.
Author
Owner

@cfsmp3 commented on GitHub (Dec 26, 2017):

This is quite interesting.
If spupng works well then it's clearly a solvable problem.
Do you want to give it a go?

On Tue, Dec 26, 2017 at 11:21 AM, Null notifications@github.com wrote:

000prueba hercules.ts works perfectly. Subtitles are extracted correctly.
000prueba brujula.ts's output is quite messed up. The cause is that the
subtitle is divided into multiple regions, like shown below:
[image: test 4]
https://user-images.githubusercontent.com/7413706/34354301-3f848caa-ea68-11e7-9913-1fbaa8a70985.png
[image: test 3]
https://user-images.githubusercontent.com/7413706/34354300-3f52c38c-ea68-11e7-9139-ebc0a6388837.png
[image: test 2]
https://user-images.githubusercontent.com/7413706/34354298-3f1f9458-ea68-11e7-9ad7-cf221a33cf9d.png
[image: test]
https://user-images.githubusercontent.com/7413706/34354297-3eec1006-ea68-11e7-9396-9f021e4268dc.png
The reason -spupng working fine is that it merges the regions before
saving to file.

My suggestion is that we should merge the regions before passing to OCR.
Possible flaw: The positioning of the subtitle could be hard.
Possible solution: Only merge the subtitles that are nearby.


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/CCExtractor/ccextractor/issues/279#issuecomment-353951515,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AFrJ2QEsELqi-GNtV6RetSup7gn5AyBzks5tEMilgaJpZM4HXIo3
.

@cfsmp3 commented on GitHub (Dec 26, 2017): This is quite interesting. If spupng works well then it's clearly a solvable problem. Do you want to give it a go? On Tue, Dec 26, 2017 at 11:21 AM, Null <notifications@github.com> wrote: > 000prueba hercules.ts works perfectly. Subtitles are extracted correctly. > 000prueba brujula.ts's output is quite messed up. The cause is that the > subtitle is divided into multiple regions, like shown below: > [image: test 4] > <https://user-images.githubusercontent.com/7413706/34354301-3f848caa-ea68-11e7-9913-1fbaa8a70985.png> > [image: test 3] > <https://user-images.githubusercontent.com/7413706/34354300-3f52c38c-ea68-11e7-9139-ebc0a6388837.png> > [image: test 2] > <https://user-images.githubusercontent.com/7413706/34354298-3f1f9458-ea68-11e7-9ad7-cf221a33cf9d.png> > [image: test] > <https://user-images.githubusercontent.com/7413706/34354297-3eec1006-ea68-11e7-9396-9f021e4268dc.png> > The reason -spupng working fine is that it merges the regions before > saving to file. > > My suggestion is that we should merge the regions before passing to OCR. > Possible flaw: The positioning of the subtitle could be hard. > Possible solution: Only merge the subtitles that are nearby. > > — > You are receiving this because you commented. > Reply to this email directly, view it on GitHub > <https://github.com/CCExtractor/ccextractor/issues/279#issuecomment-353951515>, > or mute the thread > <https://github.com/notifications/unsubscribe-auth/AFrJ2QEsELqi-GNtV6RetSup7gn5AyBzks5tEMilgaJpZM4HXIo3> > . >
Author
Owner

@harrynull commented on GitHub (Dec 26, 2017):

@cfsmp3 It doesn't seem to be easy to fix, but I will give it a try.

@harrynull commented on GitHub (Dec 26, 2017): @cfsmp3 It doesn't seem to be easy to fix, but I will give it a try.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/ccextractor#112