Cannot extract DVB subtitles from the TVEHD channel #112

New Issue

claunia · 2026-01-29T16:35:35Z

claunia commented

2026-01-29 16:35:35 +00:00

Originally created by @ruralman on GitHub (Feb 10, 2016).

hola a todos,
A pesar que el mediainfo dice:
Video:
PID: 0x12D(H.264)
Audio:
PID: 0x12E(spa)[PD]
PID: 0x12F(qaa)[PD]
PID: 0x130(spa)[PD]
Teletexto:
Vacío
Subtítulos DVB:
PID: 0x131(spa_0x14_p1_a2 )

No consigo extraerlos con el ccextractor(con subtitle edit ,logro visualizarlos)
adjunto archivos de ejemplo:
https://drive.google.com/file/d/0B_gQLghxiDPxakJOUkNLSkl2Z1U/view?usp=drive_web
https://drive.google.com/file/d/0B_gQLghxiDPxTDRseU5SMUpKVVk/view?usp=drive_web

Originally created by @ruralman on GitHub (Feb 10, 2016). hola a todos, A pesar que el mediainfo dice: Video: PID: 0x12D(H.264) Audio: PID: 0x12E(spa)[PD] PID: 0x12F(qaa)[PD] PID: 0x130(spa)[PD] Teletexto: Vacío Subtítulos DVB: PID: 0x131(spa_0x14_p1_a2 ) No consigo extraerlos con el ccextractor(con subtitle edit ,logro visualizarlos) adjunto archivos de ejemplo: https://drive.google.com/file/d/0B_gQLghxiDPxakJOUkNLSkl2Z1U/view?usp=drive_web https://drive.google.com/file/d/0B_gQLghxiDPxTDRseU5SMUpKVVk/view?usp=drive_web

claunia closed this issue

2026-01-29 16:35:36 +00:00

claunia commented

2026-01-29 16:35:36 +00:00

@anshul1912 commented on GitHub (Feb 14, 2016):

are you able to see subtitles with any other video player

@anshul1912 commented on GitHub (Feb 14, 2016): are you able to see subtitles with any other video player

claunia commented

2026-01-29 16:35:37 +00:00

@ruralman commented on GitHub (Feb 29, 2016):

si,los subtitulos se ven (mplayer),pero no se pueden extraer.

2016-02-14 12:09 GMT+01:00 Anshul Maheshwari notifications@github.com:

are you able to see subtitles with any other video player

—
Reply to this email directly or view it on GitHub
https://github.com/CCExtractor/ccextractor/issues/279#issuecomment-183874552
.

@ruralman commented on GitHub (Feb 29, 2016): si,los subtitulos se ven (mplayer),pero no se pueden extraer. 2016-02-14 12:09 GMT+01:00 Anshul Maheshwari notifications@github.com: > are you able to see subtitles with any other video player > > — > Reply to this email directly or view it on GitHub > https://github.com/CCExtractor/ccextractor/issues/279#issuecomment-183874552 > .

claunia commented

2026-01-29 16:35:37 +00:00

@vinayakathavale commented on GitHub (Mar 18, 2016):

@ruralman you need to use -out=spupng as one of the arguments

refer link for details.

@vinayakathavale commented on GitHub (Mar 18, 2016): @ruralman you need to use -out=spupng as one of the arguments refer [link](http://ccextractor.com/spupng.html) for details.

claunia commented

2026-01-29 16:35:37 +00:00

@cfsmp3 commented on GitHub (Jul 5, 2016):

This file: 000prueba brujula.ts
Causes a crash which is definitely related to the OCR (with no tesseract the spupng files are generated just fine, even though of course we don't have the text in the XML file).
With tesseract there's a quick crash. VS doesn't say where though, so I assume it's inside one of the libraries.
If exporting to .srt the generated file before the crash contains garbage:

1
00:00:00,001 --> 00:00:00,000
I-Ian: rnllnhnc Ilniunr-can
I I“, IIIIpI\lII\I§ IpIIIIVGI§\I§
r'7-""*1-~1

Assigning to Anshul (sorry) as it's DVB-OCR related.

@cfsmp3 commented on GitHub (Jul 5, 2016): This file: 000prueba brujula.ts Causes a crash which is definitely related to the OCR (with no tesseract the spupng files are generated just fine, even though of course we don't have the text in the XML file). With tesseract there's a quick crash. VS doesn't say where though, so I assume it's inside one of the libraries. If exporting to .srt the generated file before the crash contains garbage: 1 00:00:00,001 --> 00:00:00,000 I-Ian: rnllnhnc Ilniunr-can I I“, IIIIpI\lII\I§ IpIIIIVGI§\I§ r'7-""*1-~1 Assigning to Anshul (sorry) as it's DVB-OCR related.

claunia commented

2026-01-29 16:35:38 +00:00

@Abhinav95 commented on GitHub (Aug 10, 2016):

The problem with these two files is that the image which we are doing OCR on is not created properly. I'm taking a deeper look into it.

@Abhinav95 commented on GitHub (Aug 10, 2016): The problem with these two files is that the image which we are doing OCR on is not created properly. I'm taking a deeper look into it.

claunia commented

2026-01-29 16:35:38 +00:00

@cfsmp3 commented on GitHub (Nov 7, 2016):

@ruralman do you still have this problem with the current version?

@cfsmp3 commented on GitHub (Nov 7, 2016): @ruralman do you still have this problem with the current version?

claunia commented

2026-01-29 16:35:38 +00:00

@Izaron commented on GitHub (Jan 2, 2017):

I researched this issue.
Firstly, we have subtitles, but the display time starts from 1590 minutes. Something is broken at this point anyway.
Secondly, we get errors because 'start_y' or 'end_y' was uninitialized. I initialized them to 0.

Anyway was other bugs:

After patching, I received such output:

And .srt file - hercules.srt What have start time 26:30:43.

I would venture to suggest that the error in the .ts file recording.

@Izaron commented on GitHub (Jan 2, 2017): I researched this issue. Firstly, we have subtitles, but the display time starts from 1590 minutes. Something is broken at this point anyway. Secondly, we get errors because 'start_y' or 'end_y' was uninitialized. I initialized them to 0. ![image](https://cloud.githubusercontent.com/assets/5406399/21588433/a9815164-d0f7-11e6-9475-97472041daa3.png) Anyway was other bugs: ![image](https://cloud.githubusercontent.com/assets/5406399/21588452/e4315458-d0f7-11e6-957e-33739eb6768a.png) After patching, I received such output: ![image](https://cloud.githubusercontent.com/assets/5406399/21588557/f8ff26a2-d0f8-11e6-8432-af45e112b6ff.png) And .srt file - [hercules.srt](https://gist.github.com/Izaron/aeba650fbfab1ef89e20b3765886c712) What have start time 26:30:43. I would venture to suggest that the error in the .ts file recording.

claunia commented

2026-01-29 16:35:39 +00:00

@cfsmp3 commented on GitHub (Jan 20, 2017):

prueba_hercules.ts crashes with current version.
prueba_brujula.ts doesn't crash but output is garbage.

GSoC qualification: This issues gives 3 points.

@cfsmp3 commented on GitHub (Jan 20, 2017): prueba_hercules.ts crashes with current version. prueba_brujula.ts doesn't crash but output is garbage. GSoC qualification: This issues gives 3 points.

claunia commented

2026-01-29 16:35:40 +00:00

@ruralman commented on GitHub (Jan 23, 2017):

thanks for your answer

2017-01-20 1:55 GMT+01:00 Carlos Fernandez Sanz notifications@github.com:

prueba_hercules.ts crashes with current version.
prueba_brujula.ts doesn't crash but output is garbage.

GSoC qualification: This issues gives 3 points.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/CCExtractor/ccextractor/issues/279#issuecomment-273946099,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AQXNwDq_HdnDkD6JuQ89oQh0MkPvYDpXks5rUAXmgaJpZM4HXIo3
.

@ruralman commented on GitHub (Jan 23, 2017): thanks for your answer 2017-01-20 1:55 GMT+01:00 Carlos Fernandez Sanz <notifications@github.com>: > prueba_hercules.ts crashes with current version. > prueba_brujula.ts doesn't crash but output is garbage. > > GSoC qualification: This issues gives 3 points. > > — > You are receiving this because you were mentioned. > Reply to this email directly, view it on GitHub > <https://github.com/CCExtractor/ccextractor/issues/279#issuecomment-273946099>, > or mute the thread > <https://github.com/notifications/unsubscribe-auth/AQXNwDq_HdnDkD6JuQ89oQh0MkPvYDpXks5rUAXmgaJpZM4HXIo3> > . >

claunia commented

2026-01-29 16:35:40 +00:00

@canihavesomecoffee commented on GitHub (Nov 18, 2017):

@ruralman I modified your title to English so that it's a bit clearer :)

Hope you don't mind.

@canihavesomecoffee commented on GitHub (Nov 18, 2017): @ruralman I modified your title to English so that it's a bit clearer :) Hope you don't mind.

claunia commented

2026-01-29 16:35:40 +00:00

@harrynull commented on GitHub (Dec 26, 2017):

000prueba hercules.ts works perfectly. Subtitles are extracted correctly.
000prueba brujula.ts's output is quite messed up. The cause is that the subtitle is divided into multiple regions, like shown below:

The reason -spupng working fine is that it merges the regions before saving to file.

My suggestion is that we should merge the regions before passing to OCR.
Possible flaw: The positioning of the subtitle could be hard.
Possible solution: Only merge the subtitles that are nearby.

@harrynull commented on GitHub (Dec 26, 2017): `000prueba hercules.ts` works perfectly. Subtitles are extracted correctly. `000prueba brujula.ts`'s output is quite messed up. The cause is that the subtitle is divided into multiple regions, like shown below: ![test 4](https://user-images.githubusercontent.com/7413706/34354301-3f848caa-ea68-11e7-9913-1fbaa8a70985.png) ![test 3](https://user-images.githubusercontent.com/7413706/34354300-3f52c38c-ea68-11e7-9139-ebc0a6388837.png) ![test 2](https://user-images.githubusercontent.com/7413706/34354298-3f1f9458-ea68-11e7-9ad7-cf221a33cf9d.png) ![test](https://user-images.githubusercontent.com/7413706/34354297-3eec1006-ea68-11e7-9396-9f021e4268dc.png) The reason `-spupng` working fine is that it merges the regions before saving to file. My suggestion is that we should merge the regions before passing to OCR. Possible flaw: The positioning of the subtitle could be hard. Possible solution: Only merge the subtitles that are nearby.

claunia commented

2026-01-29 16:35:41 +00:00

@cfsmp3 commented on GitHub (Dec 26, 2017):

This is quite interesting.
If spupng works well then it's clearly a solvable problem.
Do you want to give it a go?

On Tue, Dec 26, 2017 at 11:21 AM, Null notifications@github.com wrote:

000prueba hercules.ts works perfectly. Subtitles are extracted correctly.
000prueba brujula.ts's output is quite messed up. The cause is that the
subtitle is divided into multiple regions, like shown below:
[image: test 4]
https://user-images.githubusercontent.com/7413706/34354301-3f848caa-ea68-11e7-9913-1fbaa8a70985.png
[image: test 3]
https://user-images.githubusercontent.com/7413706/34354300-3f52c38c-ea68-11e7-9139-ebc0a6388837.png
[image: test 2]
https://user-images.githubusercontent.com/7413706/34354298-3f1f9458-ea68-11e7-9ad7-cf221a33cf9d.png
[image: test]
https://user-images.githubusercontent.com/7413706/34354297-3eec1006-ea68-11e7-9396-9f021e4268dc.png
The reason -spupng working fine is that it merges the regions before
saving to file.

My suggestion is that we should merge the regions before passing to OCR.
Possible flaw: The positioning of the subtitle could be hard.
Possible solution: Only merge the subtitles that are nearby.

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/CCExtractor/ccextractor/issues/279#issuecomment-353951515,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AFrJ2QEsELqi-GNtV6RetSup7gn5AyBzks5tEMilgaJpZM4HXIo3
.

@cfsmp3 commented on GitHub (Dec 26, 2017): This is quite interesting. If spupng works well then it's clearly a solvable problem. Do you want to give it a go? On Tue, Dec 26, 2017 at 11:21 AM, Null <notifications@github.com> wrote: > 000prueba hercules.ts works perfectly. Subtitles are extracted correctly. > 000prueba brujula.ts's output is quite messed up. The cause is that the > subtitle is divided into multiple regions, like shown below: > [image: test 4] > <https://user-images.githubusercontent.com/7413706/34354301-3f848caa-ea68-11e7-9913-1fbaa8a70985.png> > [image: test 3] > <https://user-images.githubusercontent.com/7413706/34354300-3f52c38c-ea68-11e7-9139-ebc0a6388837.png> > [image: test 2] > <https://user-images.githubusercontent.com/7413706/34354298-3f1f9458-ea68-11e7-9ad7-cf221a33cf9d.png> > [image: test] > <https://user-images.githubusercontent.com/7413706/34354297-3eec1006-ea68-11e7-9396-9f021e4268dc.png> > The reason -spupng working fine is that it merges the regions before > saving to file. > > My suggestion is that we should merge the regions before passing to OCR. > Possible flaw: The positioning of the subtitle could be hard. > Possible solution: Only merge the subtitles that are nearby. > > — > You are receiving this because you commented. > Reply to this email directly, view it on GitHub > <https://github.com/CCExtractor/ccextractor/issues/279#issuecomment-353951515>, > or mute the thread > <https://github.com/notifications/unsubscribe-auth/AFrJ2QEsELqi-GNtV6RetSup7gn5AyBzks5tEMilgaJpZM4HXIo3> > . >

claunia commented

2026-01-29 16:35:41 +00:00

@harrynull commented on GitHub (Dec 26, 2017):

@cfsmp3 It doesn't seem to be easy to fix, but I will give it a try.

@harrynull commented on GitHub (Dec 26, 2017): @cfsmp3 It doesn't seem to be easy to fix, but I will give it a try.

claunia referenced this issue

2026-01-29 16:58:14 +00:00

[PR #112] updated changes.txt #1009

Sign in to join this conversation.