[QUESTION]How to get Teletext pages with possible subtitles without actual extraction #460

Closed
opened 2026-01-29 16:44:27 +00:00 by claunia · 19 comments
Owner

Originally created by @workflowsguy on GitHub (Nov 1, 2018).

CCExtractor version (using the --version parameter preferably) : 0.87

In raising this issue, I confirm the following:

  • I have read and understood the contributors guide.
  • I have checked that the bug-fix I am reporting can be replicated, or that the feature I am suggesting isn't already present.
  • I have checked that the issue I'm posting isn't already reported.
  • I have checked that the issue I'm porting isn't already solved and no duplicates exist in closed issues and in opened issues
  • I have checked the pull requests tab for existing solutions/implementations to my issue/suggestion.
  • I have used the latest available version of CCExtractor to verify this issue exists.

My familiarity with the project is as follows:

  • I am an active contributor to CCExtractor.

Necessary information

  • Is this a regression (did it work before)? [X] NO
  • What platform did you use? [ ] Windows - [ ] Linux - [X] Mac
  • What were the used arguments? -out=report

Additional information

When running ccextractor against a video file, I get an output e.g. like this:

Notice: Teletext page with possible subtitles detected: 152
- No teletext page specified, first received suitable page is 152, not guaranteed
Notice: Teletext page with possible subtitles detected: 888
Notice: Teletext page with possible subtitles detected: 151
Notice: Teletext page with possible subtitles detected: 150
Notice: Teletext page with possible subtitles detected: 889

I would like to get this information without ccextractor automatically extracting the subtitles because I need to specify the actual page to extract from based on service name and desired language.
I thought the option -out=report would achieve this, but for the same video file I get the output

//////// Program #10302: ////////
DVB Subtitles: No
Teletext: Yes
Pages With Subtitles: 
ATSC Closed Caption: Yes
EIA-608: No
CEA-708: No

MPEG-4 Timed Text: No

Is this a bug/limitation in the report parameter or is there a different way to achieve this?

Originally created by @workflowsguy on GitHub (Nov 1, 2018). CCExtractor version (using the --version parameter preferably) : **0.87** **In raising this issue, I confirm the following:** - [X] I have read and understood the [contributors guide](https://github.com/CCExtractor/ccextractor/blob/master/.github/CONTRIBUTING.md). - [X] I have checked that the bug-fix I am reporting can be replicated, or that the feature I am suggesting isn't already present. - [X] I have checked that the issue I'm posting isn't already reported. - [X] I have checked that the issue I'm porting isn't already solved and no duplicates exist in [closed issues](https://github.com/CCExtractor/ccextractor/issues?q=is%3Aissue+is%3Aclosed) and in [opened issues](https://github.com/CCExtractor/ccextractor/issues) - [X] I have checked the pull requests tab for existing solutions/implementations to my issue/suggestion. - [X] I have used the latest available version of CCExtractor to verify this issue exists. **My familiarity with the project is as follows:** - [X] I am an active contributor to CCExtractor. **Necessary information** - Is this a regression (did it work before)? [X] NO - What platform did you use? [ ] Windows - [ ] Linux - [X] Mac - What were the used arguments? `-out=report` **Additional information** When running `ccextractor` against a video file, I get an output e.g. like this: ``` Notice: Teletext page with possible subtitles detected: 152 - No teletext page specified, first received suitable page is 152, not guaranteed Notice: Teletext page with possible subtitles detected: 888 Notice: Teletext page with possible subtitles detected: 151 Notice: Teletext page with possible subtitles detected: 150 Notice: Teletext page with possible subtitles detected: 889 ``` I would like to get this information without `ccextractor` automatically extracting the subtitles because I need to specify the actual page to extract from based on service name and desired language. I thought the option `-out=report` would achieve this, but for the same video file I get the output ``` //////// Program #10302: //////// DVB Subtitles: No Teletext: Yes Pages With Subtitles: ATSC Closed Caption: Yes EIA-608: No CEA-708: No MPEG-4 Timed Text: No ``` Is this a bug/limitation in the `report` parameter or is there a different way to achieve this?
claunia added the good-first-taskdifficulty: easy labels 2026-01-29 16:44:27 +00:00
Author
Owner

@navimakarov commented on GitHub (Dec 14, 2018):

@workflowsguy you can easily extract captions to console only with -stdout parameter.
Note that ccextractor will ignore -o, -o1 and -o2 in this case.
So the most simple way to use this parameter is:
input_file -stdout
You can read more about this option(and other ccextractor available parameters) here:
https://ccextractor.org/public:general:command_line_usage

@navimakarov commented on GitHub (Dec 14, 2018): @workflowsguy you can easily extract captions to console only with -stdout parameter. Note that ccextractor will ignore -o, -o1 and -o2 in this case. So the most simple way to use this parameter is: `input_file -stdout` You can read more about this option(and other ccextractor available parameters) here: https://ccextractor.org/public:general:command_line_usage
Author
Owner

@workflowsguy commented on GitHub (Dec 18, 2018):

@MakarovGCI2018 sorry, but I do not understand how your answer relates to my question.
Using -stdout still causes ccextractor to parse the video file completely which is not what I want.

@workflowsguy commented on GitHub (Dec 18, 2018): @MakarovGCI2018 sorry, but I do not understand how your answer relates to my question. Using `-stdout` still causes `ccextractor` to parse the video file completely which is not what I want.
Author
Owner

@navimakarov commented on GitHub (Dec 18, 2018):

@workflowsguy sorry for misunderstanding. So you just want to get info about teletext pages with possible subtitles without actual processing it using -out=report, right?

@navimakarov commented on GitHub (Dec 18, 2018): @workflowsguy sorry for misunderstanding. So you just want to get info about teletext pages with possible subtitles without actual processing it using -out=report, right?
Author
Owner

@workflowsguy commented on GitHub (Feb 19, 2019):

@navimakarov, sorry for the long delay in replying.
Yes, I need some way to get information if the video file contains teletext pages with possible subtitles without actually starting the extraction process.

@workflowsguy commented on GitHub (Feb 19, 2019): @navimakarov, sorry for the long delay in replying. Yes, I need some way to get information if the video file contains teletext pages with possible subtitles without actually starting the extraction process.
Author
Owner

@neilmehta31 commented on GitHub (Jan 6, 2021):

Hey guys, I am new to open source. I would like to work on this issue if it is still open. Please guide me how to get started and work on this issue.

@neilmehta31 commented on GitHub (Jan 6, 2021): Hey guys, I am new to open source. I would like to work on this issue if it is still open. Please guide me how to get started and work on this issue.
Author
Owner

@cfsmp3 commented on GitHub (Jan 9, 2021):

@neilmehta31 By all means go ahead.
Usually you want to start by reproducing it.

@cfsmp3 commented on GitHub (Jan 9, 2021): @neilmehta31 By all means go ahead. Usually you want to start by reproducing it.
Author
Owner

@neilmehta31 commented on GitHub (Jan 10, 2021):

Hey @cfsmp3 , I got the following output while running against one of the sample recording files given on the website

Stream Mode: Transport Stream
Program Count: 1
Program Numbers: 4287 
PID: 201, Program: 4287, MPEG-2 video
PID: 202, Program: 4287, MPEG-1 audio
PID: 205, Program: 4287, DVB Subtitles
PID: 206, Program: 4287, MPEG-1 audio
PID: 250, Program: 4287, MPEG-2 private table sections
PID: 7201, Program: 4287, ISO/IEC 13818-6 type B
PID: 7219, Program: 4287, ISO/IEC 13818-6 type B
PID: 7270, Program: 4287, MPEG-2 private table sections
//////// Program #4287: ////////
DVB Subtitles: Yes
Teletext: No
ATSC Closed Caption: Yes
EIA-608: No
CEA-708: No

MPEG-4 Timed Text: No

If its the wrong file could you please provide a link to that file to reproduce the issue. Thanks

@neilmehta31 commented on GitHub (Jan 10, 2021): Hey @cfsmp3 , I got the following output while running against one of the sample recording files given on the website ``` Stream Mode: Transport Stream Program Count: 1 Program Numbers: 4287 PID: 201, Program: 4287, MPEG-2 video PID: 202, Program: 4287, MPEG-1 audio PID: 205, Program: 4287, DVB Subtitles PID: 206, Program: 4287, MPEG-1 audio PID: 250, Program: 4287, MPEG-2 private table sections PID: 7201, Program: 4287, ISO/IEC 13818-6 type B PID: 7219, Program: 4287, ISO/IEC 13818-6 type B PID: 7270, Program: 4287, MPEG-2 private table sections //////// Program #4287: //////// DVB Subtitles: Yes Teletext: No ATSC Closed Caption: Yes EIA-608: No CEA-708: No MPEG-4 Timed Text: No ``` If its the wrong file could you please provide a link to that file to reproduce the issue. Thanks
Author
Owner

@siv2r commented on GitHub (Jan 28, 2021):

When running ccextractor against a video file, I get an output e.g. like this:

Notice: Teletext page with possible subtitles detected: 152
- No teletext page specified, first received suitable page is 152, not guaranteed
Notice: Teletext page with possible subtitles detected: 888
Notice: Teletext page with possible subtitles detected: 151
Notice: Teletext page with possible subtitles detected: 150
Notice: Teletext page with possible subtitles detected: 889

@workflowsguy can you provide the video sample used? I am unable to reproduce this.

I ran ccextractor path_to_file for the video file (provided in ccextractor's tv samples page) containing teletex subtitles. I got the following results.

Opening file: linux/tests/teletex_test.mpg
File seems to be a transport stream, enabling TS mode
Analyzing data in general mode
VBI/teletext stream ID 1044 (0x414) for SID 1040 (0x410)
- Programme Identification Data = FAB-TELETEXT SYSTEM 
- Universal Time Co-ordinated = Mon Jan  9 18:27:47 2017
100%  |  24:51
Teletext decoder: 37286 packets processed 

Number of NAL_type_7: 0
Number of VCL_HRD: 0
Number of NAL HRD: 0
Number of jump-in-frames: 0
Number of num_unexpected_sei_length: 0

Min PTS:				04:21:47:620
Max PTS:				04:46:39:020
Length:				 00:24:51:400
Done, processing time = 3 seconds

There is no information like Notice: Teletext page with possible subtitles detected: 152

@siv2r commented on GitHub (Jan 28, 2021): > When running `ccextractor` against a video file, I get an output e.g. like this: > > ```bash > Notice: Teletext page with possible subtitles detected: 152 > - No teletext page specified, first received suitable page is 152, not guaranteed > Notice: Teletext page with possible subtitles detected: 888 > Notice: Teletext page with possible subtitles detected: 151 > Notice: Teletext page with possible subtitles detected: 150 > Notice: Teletext page with possible subtitles detected: 889 > ``` @workflowsguy can you provide the video sample used? I am unable to reproduce this. I ran `ccextractor path_to_file` for the [video file](https://drive.google.com/drive/folders/0B_61ywKPmI0Tc1lTaWVBeHNLTTA) (provided in ccextractor's [tv samples page](https://www.ccextractor.org/public:general:tvsamples)) containing teletex subtitles. I got the following results. ``` Opening file: linux/tests/teletex_test.mpg File seems to be a transport stream, enabling TS mode Analyzing data in general mode VBI/teletext stream ID 1044 (0x414) for SID 1040 (0x410) - Programme Identification Data = FAB-TELETEXT SYSTEM - Universal Time Co-ordinated = Mon Jan 9 18:27:47 2017 100% | 24:51 Teletext decoder: 37286 packets processed Number of NAL_type_7: 0 Number of VCL_HRD: 0 Number of NAL HRD: 0 Number of jump-in-frames: 0 Number of num_unexpected_sei_length: 0 Min PTS: 04:21:47:620 Max PTS: 04:46:39:020 Length: 00:24:51:400 Done, processing time = 3 seconds ``` There is **no information** like `Notice: Teletext page with possible subtitles detected: 152`
Author
Owner

@85ayush commented on GitHub (Feb 23, 2021):

I would love to work on this issue, I am a beginner, can you please guide me.

@85ayush commented on GitHub (Feb 23, 2021): I would love to work on this issue, I am a beginner, can you please guide me.
Author
Owner

@vaishnavi192 commented on GitHub (Nov 15, 2023):

Hey I want to work on this issue @workflowsguy. please tell how to get started I am a beginner

@vaishnavi192 commented on GitHub (Nov 15, 2023): Hey I want to work on this issue @workflowsguy. please tell how to get started I am a beginner
Author
Owner

@workflowsguy commented on GitHub (Nov 17, 2023):

@vaishnavi192, I am the wrong person to ask for guidance. I asked this question here 5 years ago and it has not been answered/adressed to my satisfaction since then. I have long since moved on.

@workflowsguy commented on GitHub (Nov 17, 2023): @vaishnavi192, I am the wrong person to ask for guidance. I asked this question here 5 years ago and it has not been answered/adressed to my satisfaction since then. I have long since moved on.
Author
Owner

@aakarshgopishetty commented on GitHub (Mar 10, 2025):

Hi, I would like to work on this issue as a beginner-friendly task.
I will try to implement a fix and submit a PR soon.
Please let me know if there are any specific requirements or suggestions.
Thanks!

@aakarshgopishetty commented on GitHub (Mar 10, 2025): Hi, I would like to work on this issue as a beginner-friendly task. I will try to implement a fix and submit a PR soon. Please let me know if there are any specific requirements or suggestions. Thanks!
Author
Owner

@Maku38 commented on GitHub (Jun 27, 2025):

Hey , I’m new to CCExtractor and interested in contributing. I came across this issue about the -out=report option not listing detected Teletext subtitle pages. I’d love to try fixing this as my first task. Could someone guide me where I should start looking in the codebase for this feature? Thanks in advance!
I’ve gone through the contributors guide and built the project locally, just need help navigating the relevant files

@Maku38 commented on GitHub (Jun 27, 2025): Hey , I’m new to CCExtractor and interested in contributing. I came across this issue about the -out=report option not listing detected Teletext subtitle pages. I’d love to try fixing this as my first task. Could someone guide me where I should start looking in the codebase for this feature? Thanks in advance! I’ve gone through the contributors guide and built the project locally, just need help navigating the relevant files
Author
Owner

@SinghBappi commented on GitHub (Jul 13, 2025):

Hi! I'm Bappi Singh, a student from India and GSoC 2026 aspirant. I'm new to open source and I’d love to take this up as my first contribution. Can I work on this issue?

@SinghBappi commented on GitHub (Jul 13, 2025): Hi! I'm Bappi Singh, a student from India and GSoC 2026 aspirant. I'm new to open source and I’d love to take this up as my first contribution. Can I work on this issue?
Author
Owner

@cfsmp3 commented on GitHub (Jul 17, 2025):

Hi! I'm Bappi Singh, a student from India and GSoC 2026 aspirant. I'm new to open source and I’d love to take this up as my first contribution. Can I work on this issue?

Yes

@cfsmp3 commented on GitHub (Jul 17, 2025): > Hi! I'm Bappi Singh, a student from India and GSoC 2026 aspirant. I'm new to open source and I’d love to take this up as my first contribution. Can I work on this issue? Yes
Author
Owner

@aman225 commented on GitHub (Nov 16, 2025):

Hey @cfsmp3 , I would like to work on this issue if it is still open. Please guide me how can i reproduce it.

@aman225 commented on GitHub (Nov 16, 2025): Hey @cfsmp3 , I would like to work on this issue if it is still open. Please guide me how can i reproduce it.
Author
Owner

@cfsmp3 commented on GitHub (Nov 16, 2025):

Hey @cfsmp3 , I would like to work on this issue if it is still open. Please guide me how can i reproduce it.

No idea, I didn't open this issue... @workflowsguy would be the right person to ask

@cfsmp3 commented on GitHub (Nov 16, 2025): > Hey [@cfsmp3](https://github.com/cfsmp3) , I would like to work on this issue if it is still open. Please guide me how can i reproduce it. No idea, I didn't open this issue... @workflowsguy would be the right person to ask
Author
Owner

@Rahul-2k4 commented on GitHub (Dec 5, 2025):

Hi @workflowsguy , I would like to work on this.

I have analyzed the issue and reproduced it. The cause is in general_loop.c: Teletext packet processing is currently skipped entirely when the encoder context (enc_ctx) is null, which is the case when running with -out=report.

I have a fix that relaxes this condition to allow processing when print_file_reports is enabled. This will correctly populate the existing seen_sub_page array and allow the final report to list the detected pages. I will submit a PR shortly.

@Rahul-2k4 commented on GitHub (Dec 5, 2025): Hi @workflowsguy , I would like to work on this. I have analyzed the issue and reproduced it. The cause is in general_loop.c: Teletext packet processing is currently skipped entirely when the encoder context (enc_ctx) is null, which is the case when running with -out=report. I have a fix that relaxes this condition to allow processing when print_file_reports is enabled. This will correctly populate the existing seen_sub_page array and allow the final report to list the detected pages. I will submit a PR shortly.
Author
Owner

@Rahul-2k4 commented on GitHub (Dec 10, 2025):

Hi @cfsmp3 , all tests are passing on the CI dashboard, but the CCExtractor CI bot hasn’t reflected the updated status on this PR. Is there a step I should take to trigger the bot again, or should I wait for it to sync?
Let me know how I can help move this forward.
Image
Image
Image
Image

@Rahul-2k4 commented on GitHub (Dec 10, 2025): Hi @cfsmp3 , all tests are passing on the CI dashboard, but the CCExtractor CI bot hasn’t reflected the updated status on this PR. Is there a step I should take to trigger the bot again, or should I wait for it to sync? Let me know how I can help move this forward. <img width="1554" height="630" alt="Image" src="https://github.com/user-attachments/assets/006f25fb-ee61-42a2-9a2e-aac5c0cb491e" /> <img width="1498" height="378" alt="Image" src="https://github.com/user-attachments/assets/dc72d12e-6980-4a20-b467-8d78eac05494" /> <img width="889" height="832" alt="Image" src="https://github.com/user-attachments/assets/228ebba4-dacb-4799-9f48-845a34d7e348" /> <img width="887" height="782" alt="Image" src="https://github.com/user-attachments/assets/06e37715-b431-442a-a0f6-dd4d08826712" />
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/ccextractor#460