[PROPOSAL] Process closed captions and burned-in subtitles in one pass #300

Closed
opened 2026-01-29 16:40:16 +00:00 by claunia · 27 comments
Owner

Originally created by @cfsmp3 on GitHub (Mar 22, 2017).

There's some subs that have both closed captions and burned-in subtitles. This happens typically when part of the audio is in English (which is closed captions so the viewers can enable/disable them) while part of the audio is in a different language, which is subtitled for everyone.

A very good example is the Americans, in which a good portion of the audio is in Russian.

This task gives ENOUGH points to qualify for GSoC. If you're proposal is good and your code to get this done gets merged, you're in.

Originally created by @cfsmp3 on GitHub (Mar 22, 2017). There's some subs that have both closed captions and burned-in subtitles. This happens typically when part of the audio is in English (which is closed captions so the viewers can enable/disable them) while part of the audio is in a different language, which is subtitled for everyone. A very good example is the Americans, in which a good portion of the audio is in Russian. **This task gives ENOUGH points to qualify for GSoC. If you're proposal is good and your code to get this done gets merged, you're in.**
claunia added the OCRdifficulty: mediumGSoC-relatedHardsubX labels 2026-01-29 16:40:16 +00:00
Author
Owner

@cfsmp3 commented on GitHub (Apr 12, 2017):

A sample video:

@cfsmp3 commented on GitHub (Apr 12, 2017): A sample [video](https://drive.google.com/file/d/0B_61ywKPmI0TeU5rOTlCMWxKbW8/view?usp=sharing&resourcekey=0-nZ9e7M3yhB6099__xP8ofw):
Author
Owner

@saurabhshah0410 commented on GitHub (Dec 21, 2017):

Is this issue still open?

@saurabhshah0410 commented on GitHub (Dec 21, 2017): Is this issue still open?
Author
Owner

@cfsmp3 commented on GitHub (Dec 22, 2017):

Yes.

On Thu, Dec 21, 2017 at 1:31 PM, Saurabh Shah notifications@github.com
wrote:

Is this bug still open?


You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
https://github.com/CCExtractor/ccextractor/issues/726#issuecomment-353339477,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AFrJ2XIVe6PF-RzKBicoO69Dsnp3Cj1Eks5tCk-kgaJpZM4MlmnX
.

@cfsmp3 commented on GitHub (Dec 22, 2017): Yes. On Thu, Dec 21, 2017 at 1:31 PM, Saurabh Shah <notifications@github.com> wrote: > Is this bug still open? > > — > You are receiving this because you authored the thread. > Reply to this email directly, view it on GitHub > <https://github.com/CCExtractor/ccextractor/issues/726#issuecomment-353339477>, > or mute the thread > <https://github.com/notifications/unsubscribe-auth/AFrJ2XIVe6PF-RzKBicoO69Dsnp3Cj1Eks5tCk-kgaJpZM4MlmnX> > . >
Author
Owner

@thealphadollar commented on GitHub (Feb 8, 2018):

@cfsmp3 I would like to work on this. Can you please give me some tips for the solution?

@thealphadollar commented on GitHub (Feb 8, 2018): @cfsmp3 I would like to work on this. Can you please give me some tips for the solution?
Author
Owner

@cfsmp3 commented on GitHub (Feb 9, 2018):

@thealphadollar First download the sample file we have for it, extract the subtitles in two passes (one for burned-in and one for closed captions), and then figure out how to do everything in one pass :-)

@cfsmp3 commented on GitHub (Feb 9, 2018): @thealphadollar First download the sample file we have for it, extract the subtitles in two passes (one for burned-in and one for closed captions), and then figure out how to do everything in one pass :-)
Author
Owner

@saurabhshah0410 commented on GitHub (Feb 26, 2018):

@cfsmp3 After running ccextractor once on files like these, and having extracted the closed captions as well as hard subtitles, can you clarify if the result should be a single file with both closed captions and hard subtitles, or two separate files which contain the closed captions and hard subtitles respectively?

@saurabhshah0410 commented on GitHub (Feb 26, 2018): @cfsmp3 After running ccextractor once on files like these, and having extracted the closed captions as well as hard subtitles, can you clarify if the result should be a single file with both closed captions and hard subtitles, or two separate files which contain the closed captions and hard subtitles respectively?
Author
Owner

@cfsmp3 commented on GitHub (Feb 26, 2018):

The result needs to be a single file that contains both kind of subtitles.

On Mon, Feb 26, 2018 at 5:37 AM, Saurabh Shah notifications@github.com
wrote:

@cfsmp3 https://github.com/cfsmp3 After running ccextractor once on
files like these, and having extracted the closed captions as well as hard
subtitles, can you clarify if the result should be a single file with both
closed captions and hard subtitles, or two separate files which contain the
closed captions and hard subtitles respectively?


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/CCExtractor/ccextractor/issues/726#issuecomment-368506094,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AFrJ2dgOZXiH3_KVodUULZUKIeRc7oDmks5tYrOZgaJpZM4MlmnX
.

@cfsmp3 commented on GitHub (Feb 26, 2018): The result needs to be a single file that contains both kind of subtitles. On Mon, Feb 26, 2018 at 5:37 AM, Saurabh Shah <notifications@github.com> wrote: > @cfsmp3 <https://github.com/cfsmp3> After running ccextractor once on > files like these, and having extracted the closed captions as well as hard > subtitles, can you clarify if the result should be a single file with both > closed captions and hard subtitles, or two separate files which contain the > closed captions and hard subtitles respectively? > > — > You are receiving this because you were mentioned. > Reply to this email directly, view it on GitHub > <https://github.com/CCExtractor/ccextractor/issues/726#issuecomment-368506094>, > or mute the thread > <https://github.com/notifications/unsubscribe-auth/AFrJ2dgOZXiH3_KVodUULZUKIeRc7oDmks5tYrOZgaJpZM4MlmnX> > . >
Author
Owner

@saurabhshah0410 commented on GitHub (Feb 26, 2018):

Thanks for clarifying this @cfsmp3 . :-)

@saurabhshah0410 commented on GitHub (Feb 26, 2018): Thanks for clarifying this @cfsmp3 . :-)
Author
Owner

@saurabhshah0410 commented on GitHub (Feb 27, 2018):

I think we should add a new option, something like hcc (hard and closed captions) or whatever name you suggest, which when given would process hard subs and closed captions in one pass. What do you think?

@saurabhshah0410 commented on GitHub (Feb 27, 2018): I think we should add a new option, something like `hcc` (hard and closed captions) or whatever name you suggest, which when given would process hard subs and closed captions in one pass. What do you think?
Author
Owner

@cfsmp3 commented on GitHub (Feb 27, 2018):

Yes, it's a good idea - we need the user to specify that he wants to
process both things, we cannot be looking for hard subs by default.

On Mon, Feb 26, 2018 at 10:47 PM, Saurabh Shah notifications@github.com
wrote:

I think we should add a new option, something like hcc (hard and closed
captions) or whatever name you suggest, which when given would process hard
subs and closed captions in one pass. What do you think?


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/CCExtractor/ccextractor/issues/726#issuecomment-368764970,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AFrJ2fwYoKFpV-DvRRd7XSBz_4V7cAmKks5tY6T1gaJpZM4MlmnX
.

@cfsmp3 commented on GitHub (Feb 27, 2018): Yes, it's a good idea - we need the user to specify that he wants to process both things, we cannot be looking for hard subs by default. On Mon, Feb 26, 2018 at 10:47 PM, Saurabh Shah <notifications@github.com> wrote: > I think we should add a new option, something like hcc (hard and closed > captions) or whatever name you suggest, which when given would process hard > subs and closed captions in one pass. What do you think? > > — > You are receiving this because you were mentioned. > Reply to this email directly, view it on GitHub > <https://github.com/CCExtractor/ccextractor/issues/726#issuecomment-368764970>, > or mute the thread > <https://github.com/notifications/unsubscribe-auth/AFrJ2fwYoKFpV-DvRRd7XSBz_4V7cAmKks5tY6T1gaJpZM4MlmnX> > . >
Author
Owner

@saurabhshah0410 commented on GitHub (Feb 27, 2018):

Thanks, just wanted to be sure about it. Just to play the devil's advocate: If a user gives both the switches: -hardsubx and -hcc, should we process both hard subs and closed captions? Basically, does -hcc takes precedence over anything else(like -hardsubx)?

@saurabhshah0410 commented on GitHub (Feb 27, 2018): Thanks, just wanted to be sure about it. Just to play the devil's advocate: If a user gives both the switches: `-hardsubx` and `-hcc`, should we process both hard subs and closed captions? Basically, does `-hcc` takes precedence over anything else(like `-hardsubx`)?
Author
Owner

@cfsmp3 commented on GitHub (Feb 27, 2018):

If -hcc means both then do both :-)
I don't think -hardsubx and -hcc would be mutually exclusive since they're
not incompatible in their meaning.

On Mon, Feb 26, 2018 at 11:20 PM, Saurabh Shah notifications@github.com
wrote:

Thanks, just wanted to be sure about it. Just to play the devil's
advocate: If a user gives both the switches: -hardsubx and -hcc, should
we process both hard subs and closed captions? Basically, does -hcc takes
precedence over anything else(like -hardsubx)?


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/CCExtractor/ccextractor/issues/726#issuecomment-368771061,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AFrJ2VOE7_pJN3wB2ujgSJ47ATheecEOks5tY6yxgaJpZM4MlmnX
.

@cfsmp3 commented on GitHub (Feb 27, 2018): If -hcc means both then do both :-) I don't think -hardsubx and -hcc would be mutually exclusive since they're not incompatible in their meaning. On Mon, Feb 26, 2018 at 11:20 PM, Saurabh Shah <notifications@github.com> wrote: > Thanks, just wanted to be sure about it. Just to play the devil's > advocate: If a user gives both the switches: -hardsubx and -hcc, should > we process both hard subs and closed captions? Basically, does -hcc takes > precedence over anything else(like -hardsubx)? > > — > You are receiving this because you were mentioned. > Reply to this email directly, view it on GitHub > <https://github.com/CCExtractor/ccextractor/issues/726#issuecomment-368771061>, > or mute the thread > <https://github.com/notifications/unsubscribe-auth/AFrJ2VOE7_pJN3wB2ujgSJ47ATheecEOks5tY6yxgaJpZM4MlmnX> > . >
Author
Owner

@saurabhshah0410 commented on GitHub (Feb 27, 2018):

Yeah okay. That makes sense, thanks!

@saurabhshah0410 commented on GitHub (Feb 27, 2018): Yeah okay. That makes sense, thanks!
Author
Owner

@saurabhshah0410 commented on GitHub (Mar 4, 2018):

@cfsmp3 What should be written to the output file if in a particular interval of time from t1 to t2, the video contains both hard subtitles and closed captions?

Although such a scenario would be very rare, as you wouldn't normally have closed captions as well as hard subs having overlapping time intervals, but if it happens, what should be done in case of an overlap between the burned in and closed captions?

Should it be like:

1
t1 --> t2
<closed captions>
<hardsubs output>

or

1
t1 --> t2
<hardsubs output>
<closed captions>

Or something else?

@saurabhshah0410 commented on GitHub (Mar 4, 2018): @cfsmp3 What should be written to the output file if in a particular interval of time from t1 to t2, the video contains both hard subtitles and closed captions? Although such a scenario would be very rare, as you wouldn't normally have closed captions as well as hard subs having overlapping time intervals, but if it happens, what should be done in case of an overlap between the burned in and closed captions? Should it be like: ``` 1 t1 --> t2 <closed captions> <hardsubs output> ``` or ``` 1 t1 --> t2 <hardsubs output> <closed captions> ``` Or something else?
Author
Owner

@cfsmp3 commented on GitHub (Mar 5, 2018):

Maybe check if they are the same (likely) and then display it just once.
If they're different display in any order, not really important.

@cfsmp3 commented on GitHub (Mar 5, 2018): Maybe check if they are the same (likely) and then display it just once. If they're different display in any order, not really important.
Author
Owner

@thealphadollar commented on GitHub (Apr 12, 2018):

@cfsmp3 I would request you to provide another sample for the same since the sample provided gives error, log report here.

I'll look into the error by myself and, if needed, file an issue for the same.

@thealphadollar commented on GitHub (Apr 12, 2018): @cfsmp3 I would request you to provide another sample for the same since the sample provided gives error, log report [here](https://justpaste.it/1jiab). I'll look into the error by myself and, if needed, file an issue for the same.
Author
Owner

@cfsmp3 commented on GitHub (Apr 18, 2018):

Those are just harmless warnings.
If you can play the file in VLC then assume the file is correct (which it
is) and that it's us who may have a problem.

On Wed, Apr 11, 2018 at 9:03 PM, Shivam Kumar Jha notifications@github.com
wrote:

@cfsmp3 https://github.com/cfsmp3 I would request you to provide
another sample for the same since the sample provided gives error, log
report here https://justpaste.it/1jiab.

I'll look into the error by myself and, if needed, file an issue for the
same.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/CCExtractor/ccextractor/issues/726#issuecomment-380671319,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AFrJ2dPhOh45tlPEcj5nLmDicawXIDbCks5tntIkgaJpZM4MlmnX
.

@cfsmp3 commented on GitHub (Apr 18, 2018): Those are just harmless warnings. If you can play the file in VLC then assume the file is correct (which it is) and that it's us who may have a problem. On Wed, Apr 11, 2018 at 9:03 PM, Shivam Kumar Jha <notifications@github.com> wrote: > @cfsmp3 <https://github.com/cfsmp3> I would request you to provide > another sample for the same since the sample provided gives error, log > report here <https://justpaste.it/1jiab>. > > I'll look into the error by myself and, if needed, file an issue for the > same. > > — > You are receiving this because you were mentioned. > Reply to this email directly, view it on GitHub > <https://github.com/CCExtractor/ccextractor/issues/726#issuecomment-380671319>, > or mute the thread > <https://github.com/notifications/unsubscribe-auth/AFrJ2dPhOh45tlPEcj5nLmDicawXIDbCks5tntIkgaJpZM4MlmnX> > . >
Author
Owner

@cfsmp3 commented on GitHub (Dec 27, 2018):

Update: This task gives ENOUGH points to qualify for GSoC. If you're proposal is good and your code to get this done gets merged, you're in.

@cfsmp3 commented on GitHub (Dec 27, 2018): **Update: This task gives ENOUGH points to qualify for GSoC. If you're proposal is good and your code to get this done gets merged, you're in.**
Author
Owner

@samrat2825 commented on GitHub (Mar 26, 2021):

Does a approach where in the background two .srt files, for both cc and burned-in, are independently generated and are combined in the end using timestamp, via a single command from terminal, is outputted as a single file qualify?

@samrat2825 commented on GitHub (Mar 26, 2021): Does a approach where in the background two .srt files, for both cc and burned-in, are independently generated and are combined in the end using timestamp, via a single command from terminal, is outputted as a single file qualify?
Author
Owner

@cfsmp3 commented on GitHub (Mar 28, 2021):

Does a approach where in the background two .srt files, for both cc and burned-in, are independently generated and are combined in the end using timestamp, via a single command from terminal, is outputted as a single file qualify?

No. That's the lazy approach :-)

@cfsmp3 commented on GitHub (Mar 28, 2021): > Does a approach where in the background two .srt files, for both cc and burned-in, are independently generated and are combined in the end using timestamp, via a single command from terminal, is outputted as a single file qualify? No. That's the lazy approach :-)
Author
Owner
@cfsmp3 commented on GitHub (Feb 18, 2022): Video link again: https://drive.google.com/file/d/0B_61ywKPmI0TeU5rOTlCMWxKbW8/view?usp=sharing&resourcekey=0-nZ9e7M3yhB6099__xP8ofw
Author
Owner

@shashwat1002 commented on GitHub (Feb 19, 2022):

Hi, I tried running the compiled source code on the examples with the command (in the build directory)

./ccextractor -hardsubx -subcolor yellow ~/Downloads/the.americans.s05.e06-original.unedited.from.hdhomerun.ts -conf_thresh 50 -ocr_mode word

and I get the following set of failures

CCExtractor 0.94, Carlos Fernandez Sanz, Volker Quetschke.
Teletext portions taken from Petr Kutalek's telxcc
--------------------------------------------------------------------------
HardsubX (Hard Subtitle Extractor) - Burned-in subtitle extraction subsystem
Input : /home/mrcreator/Downloads/the.americans.s05.e06-original.unedited.from.hdhomerun.ts
Subtitle Color : Yellow
OCR Mode : Word-wise
OCR Confidence Threshold : 50.00
OCR Italic Detection : Off
Minimum subtitle duration : 0.5 seconds (Default)
FFMpeg Media Information:-
[mpeg2video @ 0x55bdef3fc280] Invalid frame dimensions 0x0.
[mpeg2video @ 0x55bdef3fc280] Invalid frame dimensions 0x0.
[mpeg2video @ 0x55bdef3fc280] Invalid frame dimensions 0x0.
[mpeg2video @ 0x55bdef3fc280] Invalid frame dimensions 0x0.
[mpeg2video @ 0x55bdef3fc280] Invalid frame dimensions 0x0.
[mpeg2video @ 0x55bdef3fc280] Invalid frame dimensions 0x0.
[mpeg2video @ 0x55bdef3fc280] Invalid frame dimensions 0x0.
[mpeg2video @ 0x55bdef3fc280] Invalid frame dimensions 0x0.
[mpeg2video @ 0x55bdef3fc280] Invalid frame dimensions 0x0.
[mpeg2video @ 0x55bdef3fc280] Invalid frame dimensions 0x0.
[mpeg2video @ 0x55bdef3fc280] Invalid frame dimensions 0x0.
[mpeg2video @ 0x55bdef3fc280] Invalid frame dimensions 0x0.
[mpeg2video @ 0x55bdef3fc280] Invalid frame dimensions 0x0.
[mpeg2video @ 0x55bdef3fc280] Invalid frame dimensions 0x0.
[mpeg2video @ 0x55bdef3fc280] Invalid frame dimensions 0x0.
[mpeg2video @ 0x55bdef3fc280] Invalid frame dimensions 0x0.
[mpeg2video @ 0x55bdef3fc280] Invalid frame dimensions 0x0.
[mpeg2video @ 0x55bdef3fc280] Invalid frame dimensions 0x0.
[mpeg2video @ 0x55bdef3fc280] Invalid frame dimensions 0x0.
[mpeg2video @ 0x55bdef3fc280] Invalid frame dimensions 0x0.
[mpeg2video @ 0x55bdef3fc280] Invalid frame dimensions 0x0.
[mpeg2video @ 0x55bdef3fc280] Invalid frame dimensions 0x0.
[mpeg2video @ 0x55bdef3fc280] Invalid frame dimensions 0x0.
[mpeg2video @ 0x55bdef3fc280] Invalid frame dimensions 0x0.
[mpeg2video @ 0x55bdef3fc280] Invalid frame dimensions 0x0.
[mpeg2video @ 0x55bdef3fc280] Invalid frame dimensions 0x0.
[mpeg2video @ 0x55bdef3fc280] Invalid frame dimensions 0x0.
[mpeg2video @ 0x55bdef3fc280] Invalid frame dimensions 0x0.
[mpeg2video @ 0x55bdef3fc280] Invalid frame dimensions 0x0.
[mpeg2video @ 0x55bdef3fc280] Invalid frame dimensions 0x0.
[mpegts @ 0x55bdee66c140] start time for stream 3 is not set in estimate_timings_from_pts
[mpegts @ 0x55bdee66c140] PES packet size mismatch
[mpegts @ 0x55bdee66c140] Packet corrupt (stream = 1, dts = 2115777097).
[mpegts @ 0x55bdee66c140] PES packet size mismatch
[mpegts @ 0x55bdee66c140] Packet corrupt (stream = 2, dts = 2115777961).
[mpegts @ 0x55bdee66c140] Could not find codec parameters for stream 4 (Unknown: none (ETV1 / 0x31565445)): unknown codec
Consider increasing the value for the 'analyzeduration' (0) and 'probesize' (5000000) options
[mpegts @ 0x55bdee66c140] Could not find codec parameters for stream 5 (Unknown: none (ETV1 / 0x31565445)): unknown codec
Consider increasing the value for the 'analyzeduration' (0) and 'probesize' (5000000) options
[mpegts @ 0x55bdee66c140] Could not find codec parameters for stream 6 (Unknown: none ([192][0][0][0] / 0x00C0)): unknown codec
Consider increasing the value for the 'analyzeduration' (0) and 'probesize' (5000000) options
Input #0, mpegts, from '/home/mrcreator/Downloads/the.americans.s05.e06-original.unedited.from.hdhomerun.ts':
  Duration: 01:10:00.91, start: 19308.731378, bitrate: 2438 kb/s
  Program 2111 
  Stream #0:0[0xf51]: Video: mpeg2video (Main) ([2][0][0][0] / 0x0002), yuv420p(tv, top first), 720x480 [SAR 8:9 DAR 4:3], Closed Captions, 29.97 fps, 29.97 tbr, 90k tbn, 59.94 tbc
    Side data:
      cpb: bitrate max/min/avg: 18000000/0/0 buffer size: 1835008 vbv_delay: N/A
  Stream #0:1[0xf52](eng): Audio: ac3 ([129][0][0][0] / 0x0081), 48000 Hz, 5.1(side), fltp, 384 kb/s
  Stream #0:2[0xf53](spa): Audio: ac3 ([129][0][0][0] / 0x0081), 48000 Hz, stereo, fltp, 192 kb/s
  Stream #0:3[0xf54]: Data: scte_35
  Stream #0:4[0xf55]: Unknown: none (ETV1 / 0x31565445)
  Stream #0:5[0xf56]: Unknown: none (ETV1 / 0x31565445)
  Stream #0:6[0xfee]: Unknown: none ([192][0][0][0] / 0x00C0)
[1]    46029 segmentation fault (core dumped)  ./ccextractor -hardsubx -subcolor yellow  -conf_thresh 50 -ocr_mode word

It seems to be a segmentation fault.

I haven't yet looked at the source code and was just looking at basic input and output. Before I dive deep, is there something I'm doing very obviously wrong in running it on the test file?

Any advice is appreciated.

@shashwat1002 commented on GitHub (Feb 19, 2022): Hi, I tried running the compiled source code on the examples with the command (in the build directory) `./ccextractor -hardsubx -subcolor yellow ~/Downloads/the.americans.s05.e06-original.unedited.from.hdhomerun.ts -conf_thresh 50 -ocr_mode word` and I get the following set of failures ``` CCExtractor 0.94, Carlos Fernandez Sanz, Volker Quetschke. Teletext portions taken from Petr Kutalek's telxcc -------------------------------------------------------------------------- HardsubX (Hard Subtitle Extractor) - Burned-in subtitle extraction subsystem Input : /home/mrcreator/Downloads/the.americans.s05.e06-original.unedited.from.hdhomerun.ts Subtitle Color : Yellow OCR Mode : Word-wise OCR Confidence Threshold : 50.00 OCR Italic Detection : Off Minimum subtitle duration : 0.5 seconds (Default) FFMpeg Media Information:- [mpeg2video @ 0x55bdef3fc280] Invalid frame dimensions 0x0. [mpeg2video @ 0x55bdef3fc280] Invalid frame dimensions 0x0. [mpeg2video @ 0x55bdef3fc280] Invalid frame dimensions 0x0. [mpeg2video @ 0x55bdef3fc280] Invalid frame dimensions 0x0. [mpeg2video @ 0x55bdef3fc280] Invalid frame dimensions 0x0. [mpeg2video @ 0x55bdef3fc280] Invalid frame dimensions 0x0. [mpeg2video @ 0x55bdef3fc280] Invalid frame dimensions 0x0. [mpeg2video @ 0x55bdef3fc280] Invalid frame dimensions 0x0. [mpeg2video @ 0x55bdef3fc280] Invalid frame dimensions 0x0. [mpeg2video @ 0x55bdef3fc280] Invalid frame dimensions 0x0. [mpeg2video @ 0x55bdef3fc280] Invalid frame dimensions 0x0. [mpeg2video @ 0x55bdef3fc280] Invalid frame dimensions 0x0. [mpeg2video @ 0x55bdef3fc280] Invalid frame dimensions 0x0. [mpeg2video @ 0x55bdef3fc280] Invalid frame dimensions 0x0. [mpeg2video @ 0x55bdef3fc280] Invalid frame dimensions 0x0. [mpeg2video @ 0x55bdef3fc280] Invalid frame dimensions 0x0. [mpeg2video @ 0x55bdef3fc280] Invalid frame dimensions 0x0. [mpeg2video @ 0x55bdef3fc280] Invalid frame dimensions 0x0. [mpeg2video @ 0x55bdef3fc280] Invalid frame dimensions 0x0. [mpeg2video @ 0x55bdef3fc280] Invalid frame dimensions 0x0. [mpeg2video @ 0x55bdef3fc280] Invalid frame dimensions 0x0. [mpeg2video @ 0x55bdef3fc280] Invalid frame dimensions 0x0. [mpeg2video @ 0x55bdef3fc280] Invalid frame dimensions 0x0. [mpeg2video @ 0x55bdef3fc280] Invalid frame dimensions 0x0. [mpeg2video @ 0x55bdef3fc280] Invalid frame dimensions 0x0. [mpeg2video @ 0x55bdef3fc280] Invalid frame dimensions 0x0. [mpeg2video @ 0x55bdef3fc280] Invalid frame dimensions 0x0. [mpeg2video @ 0x55bdef3fc280] Invalid frame dimensions 0x0. [mpeg2video @ 0x55bdef3fc280] Invalid frame dimensions 0x0. [mpeg2video @ 0x55bdef3fc280] Invalid frame dimensions 0x0. [mpegts @ 0x55bdee66c140] start time for stream 3 is not set in estimate_timings_from_pts [mpegts @ 0x55bdee66c140] PES packet size mismatch [mpegts @ 0x55bdee66c140] Packet corrupt (stream = 1, dts = 2115777097). [mpegts @ 0x55bdee66c140] PES packet size mismatch [mpegts @ 0x55bdee66c140] Packet corrupt (stream = 2, dts = 2115777961). [mpegts @ 0x55bdee66c140] Could not find codec parameters for stream 4 (Unknown: none (ETV1 / 0x31565445)): unknown codec Consider increasing the value for the 'analyzeduration' (0) and 'probesize' (5000000) options [mpegts @ 0x55bdee66c140] Could not find codec parameters for stream 5 (Unknown: none (ETV1 / 0x31565445)): unknown codec Consider increasing the value for the 'analyzeduration' (0) and 'probesize' (5000000) options [mpegts @ 0x55bdee66c140] Could not find codec parameters for stream 6 (Unknown: none ([192][0][0][0] / 0x00C0)): unknown codec Consider increasing the value for the 'analyzeduration' (0) and 'probesize' (5000000) options Input #0, mpegts, from '/home/mrcreator/Downloads/the.americans.s05.e06-original.unedited.from.hdhomerun.ts': Duration: 01:10:00.91, start: 19308.731378, bitrate: 2438 kb/s Program 2111 Stream #0:0[0xf51]: Video: mpeg2video (Main) ([2][0][0][0] / 0x0002), yuv420p(tv, top first), 720x480 [SAR 8:9 DAR 4:3], Closed Captions, 29.97 fps, 29.97 tbr, 90k tbn, 59.94 tbc Side data: cpb: bitrate max/min/avg: 18000000/0/0 buffer size: 1835008 vbv_delay: N/A Stream #0:1[0xf52](eng): Audio: ac3 ([129][0][0][0] / 0x0081), 48000 Hz, 5.1(side), fltp, 384 kb/s Stream #0:2[0xf53](spa): Audio: ac3 ([129][0][0][0] / 0x0081), 48000 Hz, stereo, fltp, 192 kb/s Stream #0:3[0xf54]: Data: scte_35 Stream #0:4[0xf55]: Unknown: none (ETV1 / 0x31565445) Stream #0:5[0xf56]: Unknown: none (ETV1 / 0x31565445) Stream #0:6[0xfee]: Unknown: none ([192][0][0][0] / 0x00C0) [1] 46029 segmentation fault (core dumped) ./ccextractor -hardsubx -subcolor yellow -conf_thresh 50 -ocr_mode word ``` It seems to be a segmentation fault. I haven't yet looked at the source code and was just looking at basic input and output. Before I dive deep, is there something I'm doing very obviously wrong in running it on the test file? Any advice is appreciated.
Author
Owner

@cfsmp3 commented on GitHub (Feb 19, 2022):

Before I dive deep, is there something I'm doing very obviously wrong in running it on the test file?

No, a segfault is never the user's fault :-)
Go dive! :-)

@cfsmp3 commented on GitHub (Feb 19, 2022): > Before I dive deep, is there something I'm doing very obviously wrong in running it on the test file? No, a segfault is never the user's fault :-) Go dive! :-)
Author
Owner

@shashwat1002 commented on GitHub (Feb 20, 2022):

Hi, I ran a debug build on two different samples and received a segmentation fault in the same place.

Given that I can't work on this issue until the segmentation faults are resolved, would it be more appropriate to start a new issue?

@cfsmp3

Edit: interestingly enough, the fault happens at

ccx_common_logging.debug_ftn(CCX_DMT_708, "[CEA-708] dtvcc_writer_init\n");

and when that line is commented

ccx_common_logging.debug_ftn(CCX_DMT_708, "[CEA-708] dtvcc_writer_init: "
						  "[%s][%d][%d]\n",
				     base_filename, program_number, service_number);
				     

causes the segmentation fault.

both of which seem to be debug logs related.

is it possible that my build is at fault here?

@shashwat1002 commented on GitHub (Feb 20, 2022): Hi, I ran a debug build on two different samples and received a segmentation fault in the same place. Given that I can't work on this issue until the segmentation faults are resolved, would it be more appropriate to start a new issue? @cfsmp3 Edit: interestingly enough, the fault happens at `ccx_common_logging.debug_ftn(CCX_DMT_708, "[CEA-708] dtvcc_writer_init\n");` and when that line is commented ``` ccx_common_logging.debug_ftn(CCX_DMT_708, "[CEA-708] dtvcc_writer_init: " "[%s][%d][%d]\n", base_filename, program_number, service_number); ``` causes the segmentation fault. both of which seem to be debug logs related. is it possible that my build is at fault here?
Author
Owner

@cfsmp3 commented on GitHub (Feb 20, 2022):

Given that I can't work on this issue until the segmentation faults are resolved, would it be more appropriate to start a new issue?

Well, you can work on fixing that segfault, which seems trivial :-)

I'd recommend to build it with debug info and run with valgrind. It will tell exactly what's going on.

@cfsmp3 commented on GitHub (Feb 20, 2022): > Given that I can't work on this issue until the segmentation faults are resolved, would it be more appropriate to start a new issue? Well, you can work on fixing that segfault, which seems trivial :-) I'd recommend to build it with debug info and run with valgrind. It will tell exactly what's going on.
Author
Owner

@shashwat1002 commented on GitHub (Mar 7, 2022):

Hello,

I feel that this proposal will require making a struct that is in some way a general form of struct lib_ccx_ctx and struct lib_hardsubx_ctx with relevant attributes from both.

Does that sound feasible right now?
Would it be wise for me to go ahead with it?

@shashwat1002 commented on GitHub (Mar 7, 2022): Hello, I feel that this proposal will require making a struct that is in some way a general form of `struct lib_ccx_ctx` and `struct lib_hardsubx_ctx` with relevant attributes from both. Does that sound feasible right now? Would it be wise for me to go ahead with it?
Author
Owner

@cfsmp3 commented on GitHub (Mar 8, 2022):

Hello,

I feel that this proposal will require making a struct that is in some way a general form of struct lib_ccx_ctx and struct lib_hardsubx_ctx with relevant attributes from both.

Maybe you can just pass both as needed instead of creating a new one?

Does that sound feasible right now? Would it be wise for me to go ahead with it?

Sure. Give it a go - there's several approaches, go with the one you feel more comfortable with, we can always iterate once you have something working.

@cfsmp3 commented on GitHub (Mar 8, 2022): > Hello, > > I feel that this proposal will require making a struct that is in some way a general form of `struct lib_ccx_ctx` and `struct lib_hardsubx_ctx` with relevant attributes from both. > Maybe you can just pass both as needed instead of creating a new one? > Does that sound feasible right now? Would it be wise for me to go ahead with it? Sure. Give it a go - there's several approaches, go with the one you feel more comfortable with, we can always iterate once you have something working.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/ccextractor#300