[PR #1773] [MERGED] [FIX]: Restore XMLTV generation for ATSC EIT/VCT streams and correct EIT bounds checks #2500

Open
opened 2026-01-29 17:22:28 +00:00 by claunia · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/CCExtractor/ccextractor/pull/1773
Author: @x15sr71
Created: 11/28/2025
Status: Merged
Merged: 12/15/2025
Merged by: @cfsmp3

Base: masterHead: fix/atsc-eit-xmltv-generation


📝 Commits (1)

  • e0ac99a fix(atsc): restore XMLTV generation and ATSC EPG parsing

📊 Changes

1 file changed (+302 additions, -79 deletions)

View changed files

📝 src/lib_ccx/ts_tables_epg.c (+302 -79)

📄 Description

In raising this pull request, I confirm the following (please check boxes):

  • I have read and understood the contributors guide.
  • I have checked that another pull request for this purpose does not exist.
  • I have considered, and confirmed that this submission will be valuable to others.
  • I accept that this submission may not be used, and the pull request closed at the will of the maintainer.
  • I give this submission freely, and claim no ownership to its content.
  • I have mentioned this change in the changelog.

My familiarity with the project is as follows (check one):

  • I have never used CCExtractor.
  • I have used CCExtractor just a couple of times.
  • I absolutely love CCExtractor, but have not contributed previously.
  • I am an active contributor to CCExtractor.

Description

Fixes #1759 - This PR restores functional XMLTV generation for ATSC broadcast streams and adds comprehensive EPG parsing capabilities. ATSC streams with EIT/VCT/ETT tables now generate complete XMLTV output with program titles, descriptions, and extended text metadata.

Problem

The -xmltv parameter was completely non-functional for ATSC broadcast streams. When processing ATSC transport streams containing valid EPG data (EIT tables), channel information (VCT/TVCT tables), and extended text (ETT tables), CCExtractor would:

  • Generate SRT caption files (working correctly)
  • NOT generate XMLTV files (the bug)
  • Ignore extended program descriptions from ETT tables
  • Drop events due to buffer boundary check errors

This made it impossible to extract Electronic Program Guide data from ATSC streams, despite the -xmltv parameter being specified.

Root causes identified:

  1. EPG events stored in fallback storage (TS_PMT_MAP_SIZE) were never output to XMLTV
  2. Inverted buffer boundary check logic (CHECK_OFFSET macro) caused parser failures and potential buffer overruns
  3. Limited ATSC table ID support (missing extended EIT tables, Cable VCT, and ETT tables)
  4. ATSC multiple_string parser incorrectly combined title and description into single field
  5. No support for ETT (Extended Text Table) parsing, losing detailed program information

Solution

Core Fixes

  1. Fixed EPG output logic (EPG_output() function)

    • Modified to always check fallback storage regardless of nb_program value
    • ATSC streams store events in fallback due to VCT source ID mapping, but these were being ignored
    • Now correctly outputs events from both program-mapped storage and fallback storage
    • Ensures ATSC VCT-defined channels generate XMLTV output
  2. Fixed critical buffer boundary check (CHECK_OFFSET macro)

    • Corrected inverted logic from < to > in boundary validation
    • Before: if (offset + val < offset_end) (incorrect - allowed overruns)
    • After: if (offset + (val) > offset_end) (correct - prevents overruns)
    • Applied consistently across EIT, VCT, and ETT parsing functions
    • Prevents crashes and incomplete parsing
  3. Extended ATSC table support (EPG_parse_table() function)

    • Added extended EIT table IDs: 0xCD, 0xCE, 0xCF, 0xD0 (in addition to 0xCB)
    • Added Cable VCT variant: 0xC9 (in addition to Terrestrial VCT 0xC8)
    • New: Added ETT (Extended Text Table) support: 0xCC
    • Ensures comprehensive ATSC EPG data extraction

New Features

  1. Implemented ATSC ETT (Extended Text Table) parsing

    • Added EPG_ATSC_decode_ETT() function to parse ETT table structures
    • Added EPG_ATSC_decode_ETT_text() to extract multiple string format extended descriptions
    • ETT data now populates <desc> tags in XMLTV output with detailed program information
    • Matches ETT extended text to events by source_id (service_id)
    • Supports multi-segment, multi-language text extraction
  2. Enhanced ATSC multiple_string decoder (EPG_ATSC_decode_multiple_string())

    • Fixed to properly separate title (segment 0) and description (segment 1)
    • Before: Both segments written to same field, causing data loss
    • After: First segment → event_name (title), second segment → text (subtitle/description)
    • Added proper memory management and bounds checking
    • Only processes uncompressed ANSI strings (compression_type==0x00, mode==0x00)
  3. Improved XMLTV output formatting

    • Added proper indentation and line breaks for readability
    • ETT extended text now appears in <desc> tags (correct XMLTV placement)
    • Fixed empty subtitle handling (only output when text exists)

Testing

Tested with sample files provided by @TPeterson94070 in issue #1759:

  • channel5FullTS.ts - 5 channels with VCT/TVCT tables
  • ch12FullTS.ts - Additional ATSC test case
  • ch29FullTS.ts - 5 programs with extended EIT data (Nov 26-28, 2025)

Before this PR:

./ccextractor channel5FullTS.ts --xmltv 1

  • Output: Only .srt file generated
  • No XMLTV file created (bug)
  • ETT data completely ignored

After this PR:

./ccextractor channel5FullTS.ts --xmltv 1

  • Output: Both .srt AND .xml files generated successfully
  • XMLTV file contains:
    • Channel listings extracted from VCT with correct IDs
    • Program schedules parsed from EIT-0/1/2/3 (table IDs 0xCB-0xD0)
    • Extended program descriptions from ETT tables (0xCC)
    • UTC timestamps, titles, and subtitles properly captured
    • Unique ts-meta-id values matching EIT event IDs
    • Well-formatted XML with proper indentation

Sample XMLTV output (after ETT parsing):

Known Limitations

  • ATSC date/time conversion issues: ATSC date/time conversion occasionally produces incorrect years in some streams (pre-existing behavior).

  • Channel naming: XMLTV output uses numeric channel IDs (source_id) instead of human-readable names. VCT short_name and major/minor channel numbers are not currently mapped to XMLTV display-name elements.

  • Orphaned events: Some EIT events may appear under channel="0" when their service_id does not match any VCT-defined program. This occurs with malformed streams or when VCT data is incomplete.

These three accuracy issues mentioned above (incorrect dates, channel naming, orphaned programs) are data quality problems that existed in the codebase previously and are not directly caused by or related to the primary bug fix in this PR.

I believe these should be addressed in follow-up PRs for better separation of concerns. However, if maintainers prefer these issues to be fixed in this PR, I'm happy to include them.


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/CCExtractor/ccextractor/pull/1773 **Author:** [@x15sr71](https://github.com/x15sr71) **Created:** 11/28/2025 **Status:** ✅ Merged **Merged:** 12/15/2025 **Merged by:** [@cfsmp3](https://github.com/cfsmp3) **Base:** `master` ← **Head:** `fix/atsc-eit-xmltv-generation` --- ### 📝 Commits (1) - [`e0ac99a`](https://github.com/CCExtractor/ccextractor/commit/e0ac99a2411386e6b8b6160b7060b83285710241) fix(atsc): restore XMLTV generation and ATSC EPG parsing ### 📊 Changes **1 file changed** (+302 additions, -79 deletions) <details> <summary>View changed files</summary> 📝 `src/lib_ccx/ts_tables_epg.c` (+302 -79) </details> ### 📄 Description **In raising this pull request, I confirm the following (please check boxes):** - [x] I have read and understood the [contributors guide](https://github.com/CCExtractor/ccextractor/blob/master/.github/CONTRIBUTING.md). - [x] I have checked that another pull request for this purpose does not exist. - [x] I have considered, and confirmed that this submission will be valuable to others. - [x] I accept that this submission may not be used, and the pull request closed at the will of the maintainer. - [x] I give this submission freely, and claim no ownership to its content. - [x] I have mentioned this change in the [changelog](https://github.com/CCExtractor/ccextractor/blob/master/docs/CHANGES.TXT). **My familiarity with the project is as follows (check one):** - [ ] I have never used CCExtractor. - [ ] I have used CCExtractor just a couple of times. - [ ] I absolutely love CCExtractor, but have not contributed previously. - [x] I am an active contributor to CCExtractor. --- ## Description Fixes #1759 - This PR restores functional XMLTV generation for ATSC broadcast streams and adds comprehensive EPG parsing capabilities. ATSC streams with EIT/VCT/ETT tables now generate complete XMLTV output with program titles, descriptions, and extended text metadata. ## Problem The `-xmltv` parameter was completely non-functional for ATSC broadcast streams. When processing ATSC transport streams containing valid EPG data (EIT tables), channel information (VCT/TVCT tables), and extended text (ETT tables), CCExtractor would: - Generate SRT caption files (working correctly) - **NOT generate XMLTV files** (the bug) - Ignore extended program descriptions from ETT tables - Drop events due to buffer boundary check errors This made it impossible to extract Electronic Program Guide data from ATSC streams, despite the `-xmltv` parameter being specified. **Root causes identified:** 1. EPG events stored in fallback storage (`TS_PMT_MAP_SIZE`) were never output to XMLTV 2. Inverted buffer boundary check logic (`CHECK_OFFSET` macro) caused parser failures and potential buffer overruns 3. Limited ATSC table ID support (missing extended EIT tables, Cable VCT, and ETT tables) 4. ATSC multiple_string parser incorrectly combined title and description into single field 5. No support for ETT (Extended Text Table) parsing, losing detailed program information ## Solution ### Core Fixes 1. **Fixed EPG output logic** (`EPG_output()` function) - Modified to always check fallback storage regardless of `nb_program` value - ATSC streams store events in fallback due to VCT source ID mapping, but these were being ignored - Now correctly outputs events from both program-mapped storage and fallback storage - Ensures ATSC VCT-defined channels generate XMLTV output 2. **Fixed critical buffer boundary check** (`CHECK_OFFSET` macro) - Corrected inverted logic from `<` to `>` in boundary validation - **Before:** `if (offset + val < offset_end)` (incorrect - allowed overruns) - **After:** `if (offset + (val) > offset_end)` (correct - prevents overruns) - Applied consistently across EIT, VCT, and ETT parsing functions - Prevents crashes and incomplete parsing 3. **Extended ATSC table support** (`EPG_parse_table()` function) - Added extended EIT table IDs: 0xCD, 0xCE, 0xCF, 0xD0 (in addition to 0xCB) - Added Cable VCT variant: 0xC9 (in addition to Terrestrial VCT 0xC8) - **New:** Added ETT (Extended Text Table) support: 0xCC - Ensures comprehensive ATSC EPG data extraction ### New Features 4. **Implemented ATSC ETT (Extended Text Table) parsing** - Added `EPG_ATSC_decode_ETT()` function to parse ETT table structures - Added `EPG_ATSC_decode_ETT_text()` to extract multiple string format extended descriptions - ETT data now populates `<desc>` tags in XMLTV output with detailed program information - Matches ETT extended text to events by source_id (service_id) - Supports multi-segment, multi-language text extraction 5. **Enhanced ATSC multiple_string decoder** (`EPG_ATSC_decode_multiple_string()`) - Fixed to properly separate title (segment 0) and description (segment 1) - **Before:** Both segments written to same field, causing data loss - **After:** First segment → `event_name` (title), second segment → `text` (subtitle/description) - Added proper memory management and bounds checking - Only processes uncompressed ANSI strings (compression_type==0x00, mode==0x00) 6. **Improved XMLTV output formatting** - Added proper indentation and line breaks for readability - ETT extended text now appears in `<desc>` tags (correct XMLTV placement) - Fixed empty subtitle handling (only output when text exists) ## Testing Tested with sample files provided by @TPeterson94070 in [issue #1759](https://github.com/CCExtractor/ccextractor/issues/1759): - `channel5FullTS.ts` - 5 channels with VCT/TVCT tables - `ch12FullTS.ts` - Additional ATSC test case - `ch29FullTS.ts` - 5 programs with extended EIT data (Nov 26-28, 2025) **Before this PR:** `./ccextractor channel5FullTS.ts --xmltv 1` - Output: Only `.srt` file generated - No XMLTV file created (bug) - ETT data completely ignored **After this PR:** `./ccextractor channel5FullTS.ts --xmltv 1` - Output: Both `.srt` AND `.xml` files generated successfully - XMLTV file contains: - Channel listings extracted from VCT with correct IDs - Program schedules parsed from EIT-0/1/2/3 (table IDs 0xCB-0xD0) - **Extended program descriptions** from ETT tables (0xCC) - UTC timestamps, titles, and subtitles properly captured - Unique `ts-meta-id` values matching EIT event IDs - Well-formatted XML with proper indentation **Sample XMLTV output** (after ETT parsing): - [20251206ch29FullTS_epg.xml](https://github.com/user-attachments/files/24061035/20251206ch29FullTS_epg.xml) for the [20251206ch29FullTS.ts](https://drive.google.com/file/d/1wOnAE1_D4Wtt4YIfbTfG1-ujQvicdkSe/view?usp=sharing). ### Known Limitations - ATSC date/time conversion issues: ATSC date/time conversion occasionally produces incorrect years in some streams (pre-existing behavior). - Channel naming: XMLTV output uses numeric channel IDs (source_id) instead of human-readable names. VCT short_name and major/minor channel numbers are not currently mapped to XMLTV display-name elements. - Orphaned events: Some EIT events may appear under channel="0" when their service_id does not match any VCT-defined program. This occurs with malformed streams or when VCT data is incomplete. These three accuracy issues mentioned above (incorrect dates, channel naming, orphaned programs) are data quality problems that existed in the codebase previously and are not directly caused by or related to the primary bug fix in this PR. I believe these should be addressed in follow-up PRs for better separation of concerns. However, if maintainers prefer these issues to be fixed in this PR, I'm happy to include them. --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
claunia added the pull-request label 2026-01-29 17:22:28 +00:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/ccextractor#2500