[PROPOSAL] Validate TTML subtitles in ISO-BMFF #626

Open
opened 2026-01-29 16:49:27 +00:00 by claunia · 3 comments
Owner

Originally created by @donmartin00 on GitHub (Apr 12, 2021).

CCExtractor Version: 0.88

Necessary information

  • Is this a regression (i.e. did it work before)? {NO}
  • What platform did you use? {Window}
  • What were the used arguments? {No arguments - we did not attempt to run this type of file through the gui because we did not see an option for the ISO-BMFF.}

Additional information

Add a feature to validate TTML subtitles in ISO-BMFF format per specification 14496-30, the format used in DVB Dash and ATSC 3.0).

Originally created by @donmartin00 on GitHub (Apr 12, 2021). CCExtractor Version: 0.88 # Necessary information - Is this a regression (i.e. did it work before)? {NO} - What platform did you use? {Window} - What were the used arguments? `{No arguments - we did not attempt to run this type of file through the gui because we did not see an option for the ISO-BMFF.}` # Additional information Add a feature to validate TTML subtitles in ISO-BMFF format per specification 14496-30, the format used in DVB Dash and ATSC 3.0).
claunia added the needs-commercial-sponsor label 2026-01-29 16:49:27 +00:00
Author
Owner

@bubbaprog commented on GitHub (Jul 24, 2021):

Not sure what method they're using to do it, but I just uploaded a raw ATSC 3.0 broadcast to Twitter and the captions appeared in the tweet— so a method for extracting and converting them is already out there in something presumably avconv/ffmpeg-related, & could possibly be used to develop it here.

@bubbaprog commented on GitHub (Jul 24, 2021): Not sure what method they're using to do it, but I just uploaded a raw ATSC 3.0 broadcast to Twitter and the captions appeared in the tweet— so a method for extracting and converting them is already out there in something presumably avconv/ffmpeg-related, & could possibly be used to develop it here.
Author
Owner

@bubbaprog commented on GitHub (Jan 16, 2024):

FWIW VLC can decode captions in ATSC3.0 video, and the code can likely be ported over. https://code.videolan.org/videolan/vlc/-/tree/master/modules/codec

@bubbaprog commented on GitHub (Jan 16, 2024): FWIW VLC can decode captions in ATSC3.0 video, and the code can likely be ported over. https://code.videolan.org/videolan/vlc/-/tree/master/modules/codec
Author
Owner

@x15sr71 commented on GitHub (Jan 18, 2026):

I've been looking into CCExtractor's MP4/ISO-BMFF handling recently (related to GPAC and packaging work) and wanted to add some context that might be useful here.

TTML subtitle tracks in MP4 use the stpp sample entry (defined in ISO/IEC 14496-30). CCExtractor's current MP4 demuxer handles similar subtitle formats like tx3g (QuickTime timed text) and clcp (CEA-608/708), so the track detection pattern exists. The main complexity is TTML validation — unlike the tools mentioned above (VLC, FFmpeg, AVconv), which can rely on larger XML parsing libraries, adding libxml2 or similar to CCExtractor just for TTML validation might not be desirable. A minimal approach (namespace detection, well-formedness checks via string parsing) could work for detection and reporting purposes, but would need careful implementation to avoid false positives. Tools like Bento4 are useful references for understanding stpp sample entries at the container level.
I am currently assessing the scope of this to understand what a lightweight implementation would look like, as this seems to require a structural addition rather than a quick patch.

Leaving this note here in case the context is helpful..

@x15sr71 commented on GitHub (Jan 18, 2026): I've been looking into CCExtractor's MP4/ISO-BMFF handling recently (related to GPAC and packaging work) and wanted to add some context that might be useful here. TTML subtitle tracks in MP4 use the `stpp` sample entry (defined in ISO/IEC 14496-30). CCExtractor's current MP4 demuxer handles similar subtitle formats like `tx3g` (QuickTime timed text) and `clcp` (CEA-608/708), so the track detection pattern exists. The main complexity is TTML validation — unlike the tools mentioned above (VLC, FFmpeg, AVconv), which can rely on larger XML parsing libraries, adding libxml2 or similar to CCExtractor just for TTML validation might not be desirable. A minimal approach (namespace detection, well-formedness checks via string parsing) could work for detection and reporting purposes, but would need careful implementation to avoid false positives. Tools like Bento4 are useful references for understanding stpp sample entries at the container level. I am currently assessing the scope of this to understand what a lightweight implementation would look like, as this seems to require a structural addition rather than a quick patch. Leaving this note here in case the context is helpful..
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/ccextractor#626