[PR #1886] [MERGED] feat(teletext): Add multi-page extraction with separate output files (#665) #2674

Closed
opened 2026-01-29 17:23:22 +00:00 by claunia · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/CCExtractor/ccextractor/pull/1886
Author: @cfsmp3
Created: 12/23/2025
Status: Merged
Merged: 12/23/2025
Merged by: @cfsmp3

Base: masterHead: feat/issue-665-teletext-multi-page


📝 Commits (4)

  • fd06393 feat(teletext): Add multi-page extraction with separate output files (#665)
  • cbb5f0b fix(clippy): Use RangeInclusive::contains() instead of manual range check
  • 1d9f322 docs: Add doxygen comments to should_accept_page function
  • be239a5 fix: Restore teletext auto-detect mode for single-page extraction

📊 Changes

11 files changed (+402 additions, -25 deletions)

View changed files

📝 src/lib_ccx/ccx_common_structs.h (+3 -0)
📝 src/lib_ccx/ccx_encoders_common.c (+177 -0)
📝 src/lib_ccx/ccx_encoders_common.h (+16 -0)
📝 src/lib_ccx/ccx_encoders_srt.c (+31 -11)
📝 src/lib_ccx/lib_ccx.h (+8 -2)
📝 src/lib_ccx/teletext.h (+31 -2)
📝 src/lib_ccx/telxcc.c (+73 -3)
📝 src/rust/lib_ccxr/src/teletext.rs (+11 -2)
📝 src/rust/src/args.rs (+11 -2)
📝 src/rust/src/common.rs (+9 -0)
📝 src/rust/src/parser.rs (+32 -3)

📄 Description

Summary

This PR implements multi-page teletext extraction as requested in issue #665. Users can now extract multiple teletext pages simultaneously, with each page output to a separate file.

New Features

  • Multiple --tpage arguments: Specify multiple teletext pages using repeated --tpage flags
    ccextractor input.mpg --tpage 397 --tpage 398 -o output.srt
    
  • Separate output files per page: Each page is extracted to its own file with a _pNNN suffix
    • output_p397.srt - Page 397 subtitles
    • output_p398.srt - Page 398 subtitles
  • Backward compatibility: Single-page extraction (one --tpage argument) works exactly as before, without any suffix
  • --tpages-all support: Auto-detect and extract all available teletext pages (up to 8)

Implementation Details

  • Added user_pages vector to teletext config to store multiple requested pages
  • Created per-page output file management in encoder_ctx with on-demand file creation
  • Each page maintains its own SRT counter for correct subtitle numbering
  • Fixed BCD to decimal page number conversion in telxcc.c for correct file naming
  • Maximum of 8 simultaneous page extractions (configurable via MAX_TLT_PAGES_EXTRACT)

Files Changed

File Changes
src/rust/src/args.rs Changed --tpage to accept multiple values via clap::ArgAction::Append
src/rust/src/parser.rs Handle Vec<u16> for multiple pages, populate user_pages
src/rust/lib_ccxr/src/teletext.rs Added user_pages field to TeletextConfig
src/rust/src/common.rs Added user_pages to FFI teletext config struct
src/lib_ccx/teletext.h Added user_pages array and count to C teletext config
src/lib_ccx/ccx_encoders_common.h Added teletext output arrays and helper function declarations
src/lib_ccx/ccx_encoders_common.c Implemented get_teletext_output(), get_teletext_srt_counter(), dinit_teletext_outputs()
src/lib_ccx/ccx_encoders_srt.c Route teletext subtitles to per-page output files
src/lib_ccx/telxcc.c Store decimal page numbers in subtitle metadata
src/lib_ccx/ccx_common_structs.h Added teletext_page field to cc_subtitle struct
src/lib_ccx/lib_ccx.h Added MAX_TLT_PAGES_EXTRACT constant

Testing

Test Samples Used

All samples are from the CCExtractor Sample Platform teletext section (view samples).

Multi-Page Samples (Danish TV - DR1)

Sample Teletext Pages Available
5d5838bde97e2a8706890f19a1722fae97610c754518ac509acb7ff3776a29aa.mpg 365, 369, 397, 398, 437, 565, 765, 965+
3b276ad8bf85741a65d8a36add8fbe990f8d11bfb2a908f2093174edced9baa0.mpg 365, 369, 397, 398, 437, 465, 565, 665+
b236a0590b02f8acffa00f18f3e5d62e68e0f5f461dc5e5e428cca3ebc0be7c5.mpg 397, 398

Single-Page Samples

Sample Test Page
44c45593fb32e475fe5295046d97796b32ac00289ec4369bb216d1c705804f1a.mpg 299
b8c55aa2e9d6882b3d5f1fa57f3bb63fc8bf39a45bad42c0d5e5de1f2fbdf2e7.mpg 299
e639e5455049e9b94c89d7f2917d91c349b9eb8e4ced3ea5f0d5efa7fb56426e.ts Auto-detect with --datapid 2310

How to Test

Test 1: Single Page Extraction (Backward Compatibility)

# Should create output.srt (no suffix) - existing behavior preserved
ccextractor 5d5838bde97e...mpg --autoprogram --tpage 398 -o output.srt

# Verify: Single file created
ls output.srt

Test 2: Multiple Explicit Pages

# Should create output_p397.srt and output_p398.srt
ccextractor 5d5838bde97e...mpg --autoprogram --tpage 397 --tpage 398 -o output.srt

# Verify: Two separate files with correct content
ls output_p*.srt
head -20 output_p397.srt
head -20 output_p398.srt

Test 3: Auto-Detect All Pages

# Should create separate files for each detected page (up to 8)
ccextractor 5d5838bde97e...mpg --autoprogram --tpages-all -o output.srt

# Verify: Multiple files created
ls output_p*.srt
# Expected: output_p365.srt, output_p369.srt, output_p397.srt, output_p398.srt, etc.

Test 4: Specific Page with --tpage (regression test)

# Test with sample that has page 299
ccextractor 44c45593fb32...mpg --autoprogram --tpage 299 -o output.srt

# Verify: Single file, correct subtitles
ls output.srt

Test Results

All tests passed successfully:

Test Status Notes
Single page extraction PASS No suffix added, backward compatible
Multiple explicit pages PASS Correct _pNNN suffixes, separate content
Auto-detect all pages PASS Up to 8 pages extracted, overflow handled gracefully
Per-page SRT counters PASS Each file has correct sequential numbering starting at 1
Page number display PASS Decimal page numbers (e.g., 398) not BCD
Existing teletext tests PASS All 21 sample platform teletext samples work

Sample Output

Multiple pages extracted from 5d5838bde97e...mpg:

$ ccextractor sample.mpg --autoprogram --tpage 397 --tpage 398 -o /tmp/test.srt
$ ls /tmp/test_p*.srt
/tmp/test_p397.srt
/tmp/test_p398.srt

$ head -15 /tmp/test_p397.srt
1
02:19:03,085 --> 02:19:07,344
Vi kan sætte tre byggerier i gang.
Det betyder masser af arbejdspladser.

2
02:19:13,785 --> 02:19:18,644
- der skal øge dansk produktivitet.
Men det koster mindre handlende livet

$ head -15 /tmp/test_p398.srt
1
02:19:25,385 --> 02:19:29,424
- der skal øge dansk produktivitet.
Men det koster mindre handlende livet.

2
02:19:35,665 --> 02:19:40,444
I Kolding er der store planer
om et udvalgsvareudvalg -

Checklist

  • Code compiles without warnings
  • Backward compatibility maintained (single --tpage works as before)
  • All existing teletext regression tests pass
  • New feature tested with multiple samples
  • Memory properly freed in dinit_teletext_outputs()
  • Graceful handling when >8 pages detected (warning + fallback to default output)

Closes #665

🤖 Generated with Claude Code


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/CCExtractor/ccextractor/pull/1886 **Author:** [@cfsmp3](https://github.com/cfsmp3) **Created:** 12/23/2025 **Status:** ✅ Merged **Merged:** 12/23/2025 **Merged by:** [@cfsmp3](https://github.com/cfsmp3) **Base:** `master` ← **Head:** `feat/issue-665-teletext-multi-page` --- ### 📝 Commits (4) - [`fd06393`](https://github.com/CCExtractor/ccextractor/commit/fd063931ea408f3b0cf308a89895b2929c25bbd9) feat(teletext): Add multi-page extraction with separate output files (#665) - [`cbb5f0b`](https://github.com/CCExtractor/ccextractor/commit/cbb5f0b0a8fede21a62685eee4fd20b30b38a1cd) fix(clippy): Use RangeInclusive::contains() instead of manual range check - [`1d9f322`](https://github.com/CCExtractor/ccextractor/commit/1d9f32239ec16d9f69384f8e58d8dc13db2226bd) docs: Add doxygen comments to should_accept_page function - [`be239a5`](https://github.com/CCExtractor/ccextractor/commit/be239a5c466a1b7a88ccd80eb769c32cb1f3d18b) fix: Restore teletext auto-detect mode for single-page extraction ### 📊 Changes **11 files changed** (+402 additions, -25 deletions) <details> <summary>View changed files</summary> 📝 `src/lib_ccx/ccx_common_structs.h` (+3 -0) 📝 `src/lib_ccx/ccx_encoders_common.c` (+177 -0) 📝 `src/lib_ccx/ccx_encoders_common.h` (+16 -0) 📝 `src/lib_ccx/ccx_encoders_srt.c` (+31 -11) 📝 `src/lib_ccx/lib_ccx.h` (+8 -2) 📝 `src/lib_ccx/teletext.h` (+31 -2) 📝 `src/lib_ccx/telxcc.c` (+73 -3) 📝 `src/rust/lib_ccxr/src/teletext.rs` (+11 -2) 📝 `src/rust/src/args.rs` (+11 -2) 📝 `src/rust/src/common.rs` (+9 -0) 📝 `src/rust/src/parser.rs` (+32 -3) </details> ### 📄 Description ## Summary This PR implements multi-page teletext extraction as requested in issue #665. Users can now extract multiple teletext pages simultaneously, with each page output to a separate file. ### New Features - **Multiple `--tpage` arguments**: Specify multiple teletext pages using repeated `--tpage` flags ```bash ccextractor input.mpg --tpage 397 --tpage 398 -o output.srt ``` - **Separate output files per page**: Each page is extracted to its own file with a `_pNNN` suffix - `output_p397.srt` - Page 397 subtitles - `output_p398.srt` - Page 398 subtitles - **Backward compatibility**: Single-page extraction (one `--tpage` argument) works exactly as before, without any suffix - **`--tpages-all` support**: Auto-detect and extract all available teletext pages (up to 8) ### Implementation Details - Added `user_pages` vector to teletext config to store multiple requested pages - Created per-page output file management in `encoder_ctx` with on-demand file creation - Each page maintains its own SRT counter for correct subtitle numbering - Fixed BCD to decimal page number conversion in `telxcc.c` for correct file naming - Maximum of 8 simultaneous page extractions (configurable via `MAX_TLT_PAGES_EXTRACT`) ### Files Changed | File | Changes | |------|---------| | `src/rust/src/args.rs` | Changed `--tpage` to accept multiple values via `clap::ArgAction::Append` | | `src/rust/src/parser.rs` | Handle `Vec<u16>` for multiple pages, populate `user_pages` | | `src/rust/lib_ccxr/src/teletext.rs` | Added `user_pages` field to `TeletextConfig` | | `src/rust/src/common.rs` | Added `user_pages` to FFI teletext config struct | | `src/lib_ccx/teletext.h` | Added `user_pages` array and count to C teletext config | | `src/lib_ccx/ccx_encoders_common.h` | Added teletext output arrays and helper function declarations | | `src/lib_ccx/ccx_encoders_common.c` | Implemented `get_teletext_output()`, `get_teletext_srt_counter()`, `dinit_teletext_outputs()` | | `src/lib_ccx/ccx_encoders_srt.c` | Route teletext subtitles to per-page output files | | `src/lib_ccx/telxcc.c` | Store decimal page numbers in subtitle metadata | | `src/lib_ccx/ccx_common_structs.h` | Added `teletext_page` field to `cc_subtitle` struct | | `src/lib_ccx/lib_ccx.h` | Added `MAX_TLT_PAGES_EXTRACT` constant | --- ## Testing ### Test Samples Used All samples are from the CCExtractor Sample Platform teletext section ([view samples](https://sampleplatform.ccextractor.org/test/7136)). #### Multi-Page Samples (Danish TV - DR1) | Sample | Teletext Pages Available | |--------|-------------------------| | `5d5838bde97e2a8706890f19a1722fae97610c754518ac509acb7ff3776a29aa.mpg` | 365, 369, 397, 398, 437, 565, 765, 965+ | | `3b276ad8bf85741a65d8a36add8fbe990f8d11bfb2a908f2093174edced9baa0.mpg` | 365, 369, 397, 398, 437, 465, 565, 665+ | | `b236a0590b02f8acffa00f18f3e5d62e68e0f5f461dc5e5e428cca3ebc0be7c5.mpg` | 397, 398 | #### Single-Page Samples | Sample | Test Page | |--------|-----------| | `44c45593fb32e475fe5295046d97796b32ac00289ec4369bb216d1c705804f1a.mpg` | 299 | | `b8c55aa2e9d6882b3d5f1fa57f3bb63fc8bf39a45bad42c0d5e5de1f2fbdf2e7.mpg` | 299 | | `e639e5455049e9b94c89d7f2917d91c349b9eb8e4ced3ea5f0d5efa7fb56426e.ts` | Auto-detect with `--datapid 2310` | ### How to Test #### Test 1: Single Page Extraction (Backward Compatibility) ```bash # Should create output.srt (no suffix) - existing behavior preserved ccextractor 5d5838bde97e...mpg --autoprogram --tpage 398 -o output.srt # Verify: Single file created ls output.srt ``` #### Test 2: Multiple Explicit Pages ```bash # Should create output_p397.srt and output_p398.srt ccextractor 5d5838bde97e...mpg --autoprogram --tpage 397 --tpage 398 -o output.srt # Verify: Two separate files with correct content ls output_p*.srt head -20 output_p397.srt head -20 output_p398.srt ``` #### Test 3: Auto-Detect All Pages ```bash # Should create separate files for each detected page (up to 8) ccextractor 5d5838bde97e...mpg --autoprogram --tpages-all -o output.srt # Verify: Multiple files created ls output_p*.srt # Expected: output_p365.srt, output_p369.srt, output_p397.srt, output_p398.srt, etc. ``` #### Test 4: Specific Page with --tpage (regression test) ```bash # Test with sample that has page 299 ccextractor 44c45593fb32...mpg --autoprogram --tpage 299 -o output.srt # Verify: Single file, correct subtitles ls output.srt ``` ### Test Results All tests passed successfully: | Test | Status | Notes | |------|--------|-------| | Single page extraction | ✅ PASS | No suffix added, backward compatible | | Multiple explicit pages | ✅ PASS | Correct `_pNNN` suffixes, separate content | | Auto-detect all pages | ✅ PASS | Up to 8 pages extracted, overflow handled gracefully | | Per-page SRT counters | ✅ PASS | Each file has correct sequential numbering starting at 1 | | Page number display | ✅ PASS | Decimal page numbers (e.g., 398) not BCD | | Existing teletext tests | ✅ PASS | All 21 sample platform teletext samples work | ### Sample Output **Multiple pages extracted from `5d5838bde97e...mpg`:** ``` $ ccextractor sample.mpg --autoprogram --tpage 397 --tpage 398 -o /tmp/test.srt $ ls /tmp/test_p*.srt /tmp/test_p397.srt /tmp/test_p398.srt $ head -15 /tmp/test_p397.srt 1 02:19:03,085 --> 02:19:07,344 Vi kan sætte tre byggerier i gang. Det betyder masser af arbejdspladser. 2 02:19:13,785 --> 02:19:18,644 - der skal øge dansk produktivitet. Men det koster mindre handlende livet $ head -15 /tmp/test_p398.srt 1 02:19:25,385 --> 02:19:29,424 - der skal øge dansk produktivitet. Men det koster mindre handlende livet. 2 02:19:35,665 --> 02:19:40,444 I Kolding er der store planer om et udvalgsvareudvalg - ``` --- ## Checklist - [x] Code compiles without warnings - [x] Backward compatibility maintained (single `--tpage` works as before) - [x] All existing teletext regression tests pass - [x] New feature tested with multiple samples - [x] Memory properly freed in `dinit_teletext_outputs()` - [x] Graceful handling when >8 pages detected (warning + fallback to default output) Closes #665 🤖 Generated with [Claude Code](https://claude.com/claude-code) --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
claunia added the pull-request label 2026-01-29 17:23:22 +00:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/ccextractor#2674