[PR #2020] [FEATURE]: Add machine-readable JSON output for -out=report #2824

Open
opened 2026-01-29 17:24:05 +00:00 by claunia · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/CCExtractor/ccextractor/pull/2020
Author: @x15sr71
Created: 1/14/2026
Status: 🔄 Open

Base: masterHead: feat/json-report


📝 Commits (3)

  • 102f1fc feat(report): add machine-readable JSON output for -out=report
  • cecb2bf docs(changelog): mention JSON output support for -out=report
  • b0d6205 chore: format Rust code and fix trailing newline

📊 Changes

8 files changed (+338 additions, -1 deletions)

View changed files

📝 docs/CHANGES.TXT (+1 -0)
📝 src/lib_ccx/ccx_common_option.c (+1 -0)
📝 src/lib_ccx/ccx_common_option.h (+1 -0)
📝 src/lib_ccx/params_dump.c (+312 -1)
📝 src/rust/lib_ccxr/src/common/options.rs (+2 -0)
📝 src/rust/src/args.rs (+6 -0)
📝 src/rust/src/common.rs (+10 -0)
📝 src/rust/src/parser.rs (+5 -0)

📄 Description

In raising this pull request, I confirm the following (please check boxes):

  • I have read and understood the contributors guide.
  • I have checked that another pull request for this purpose does not exist.
  • I have considered, and confirmed that this submission will be valuable to others.
  • I accept that this submission may not be used, and the pull request closed at the will of the maintainer.
  • I give this submission freely, and claim no ownership to its content.
  • I have mentioned this change in the changelog.

My familiarity with the project is as follows (check one):

  • I have never used CCExtractor.
  • I have used CCExtractor just a couple of times.
  • I absolutely love CCExtractor, but have not contributed previously.
  • I am an active contributor to CCExtractor.

Summary

This PR implements machine-readable JSON output for the -out=report feature, addressing issue #1399. Users can now generate structured reports that can be parsed with tools like jq, enabling seamless integration with automated workflows.

Background

Currently, CCExtractor’s report output is human-readable text that requires custom parsing for automation. While other media analysis tools such as ffprobe and mediainfo provide JSON output, structured closed-caption reporting is not consistently available across tools or versions. This feature enables CCExtractor to expose its existing report data in a structured JSON format.

Use case: Users running CCExtractor in automated environments (e.g., CI/CD pipelines, media processing workflows) need to programmatically determine if streams contain closed captions without writing custom parsers.

Changes

-out=report Option

ccextractor -out=report input.ts

Existing Text Output (-out=report)

File: ../20251206ch29FullTS.ts
Stream Mode: Transport Stream
Program Count: 5
Program Numbers: 1 2 3 4 5
PID: 49, Program: 1, MPEG-2 video
PID: 52, Program: 1, AC3 audio
PID: 53, Program: 1, AC3 audio
PID: 65, Program: 2, MPEG-2 video
PID: 68, Program: 2, AC3 audio
PID: 81, Program: 3, MPEG-2 video
PID: 84, Program: 3, AC3 audio
PID: 97, Program: 4, MPEG-2 video
PID: 100, Program: 4, AC3 audio
PID: 113, Program: 5, MPEG-2 video
PID: 116, Program: 5, AC3 audio
//////// Program #5: ////////
DVB Subtitles: No
Teletext: No
ATSC Closed Caption: Yes
EIA-608: Yes
XDS: No
CC1: Yes
CC2: No
CC3: No
CC4: No
CEA-708: Yes
Services: 1 2 3 4 5 6 9
Primary Language Present: Yes
Secondary Language Present: Yes
Width: 704
Height: 480
Aspect Ratio: 03 - 16:9
Frame Rate: 04 - 29.97


(More programs omitted for brevity)

JSON Output Structure (v1.0)

The output follows a versioned JSON report structure:

JSON output via --report-format json

ccextractor --report-format json -out=report input.ts
{
  "schema": {
    "name": "ccextractor-report",
    "version": "1.0"
  },
  "input": {
    "source": "file",
    "path": "../20251206ch29FullTS.ts"
  },
  "stream": {
    "mode": "Transport Stream",
    "program_count": 5,
    "program_numbers": [
      1,
      2,
      3,
      4,
      5
    ],
    "pids": [
      {
        "pid": 49,
        "program_number": 1,
        "codec": "MPEG-2 video"
      },
      {
        "pid": 52,
        "program_number": 1,
        "codec": "AC3 audio"
      },
      {
        "pid": 53,
        "program_number": 1,
        "codec": "AC3 audio"
      },
      {
        "pid": 65,
        "program_number": 2,
        "codec": "MPEG-2 video"
      },
      {
        "pid": 68,
        "program_number": 2,
        "codec": "AC3 audio"
      },
      {
        "pid": 81,
        "program_number": 3,
        "codec": "MPEG-2 video"
      },
      {
        "pid": 84,
        "program_number": 3,
        "codec": "AC3 audio"
      },
      {
        "pid": 97,
        "program_number": 4,
        "codec": "MPEG-2 video"
      },
      {
        "pid": 100,
        "program_number": 4,
        "codec": "AC3 audio"
      },
      {
        "pid": 113,
        "program_number": 5,
        "codec": "MPEG-2 video"
      },
      {
        "pid": 116,
        "program_number": 5,
        "codec": "AC3 audio"
      }
    ]
  },
  "programs": [
    {
      "program_number": 1,
      "summary": {
        "has_any_captions": true,
        "has_608": true,
        "has_708": true
      },
      "services": {
        "dvb_subtitles": false,
        "teletext": false,
        "atsc_closed_caption": true
      },
      "captions": {
        "present": true,
        "eia_608": {
          "present": true,
          "xds": false,
          "channels": {
            "cc1": true,
            "cc2": false,
            "cc3": false,
            "cc4": false
          }
        },
        "cea_708": {
          "present": true,
          "services": [
            1,
            2,
            3,
            4,
            5,
            6,
            9
          ]
        }
      },
      "video": {
        "width": 1920,
        "height": 1080,
        "aspect_ratio": "03 - 16:9",
        "frame_rate": "04 - 29.97"
      }
    },

(More programs omitted for brevity)

Schema Notes

  • The JSON schema is intentionally descriptive rather than prescriptive.
  • Field presence and values depend on the input container, stream type, and available metadata.
  • Codec strings reflect CCExtractor's internal stream type descriptions and are container-dependent (e.g., "AC3 audio" vs "AC3").
  • The services object under programs[] indicates which captioning systems are present (DVB, Teletext, ATSC), while captions.cea_708.services[] lists active CEA-708 caption service numbers.

Program Ordering:

  • JSON output: Programs are sorted in ascending order by program number (1, 2, 3, 4, 5) for predictable parsing
  • Text output: Programs are displayed in descending order (5, 4, 3, 2, 1) as they're processed
Text Output Field JSON Field
File: input.path
Stream Mode stream.mode
Program Count stream.program_count
Program Numbers stream.program_numbers[]
PID: X, Program: Y, Codec stream.pids[]
DVB Subtitles programs[].services.dvb_subtitles
Teletext programs[].services.teletext
ATSC Closed Caption programs[].services.atsc_closed_caption
EIA-608 programs[].captions.eia_608.present
XDS programs[].captions.eia_608.xds
CC1..CC4 programs[].captions.eia_608.channels.*
CEA-708 programs[].captions.cea_708.present
Services: programs[].captions.cea_708.services[]
Primary Language Present (not in JSON)
Secondary Language Present (not in JSON)
Width / Height programs[].video.width / height
Aspect Ratio programs[].video.aspect_ratio
Frame Rate programs[].video.frame_rate
MPEG-4 Timed Text container.mp4.timed_text_tracks
(JSON-only) schema.*
(JSON-only) programs[].summary.*
(JSON-only) programs[].captions.present

Key Features:

  • Structured, machine-readable JSON output for -out=report
  • Versioned schema (v1.0) for future extensibility
  • Backward compatible (existing text report remains the default)
  • Caption presence reporting for:
    • ATSC Closed Captions (EIA-608 / CEA-708)
    • DVB subtitles (presence flag)
    • Teletext (presence flag)
    • Note: the has_any_captions summary field reflects EIA-608 / CEA-708 only.)
  • Program-level summary fields for fast closed-caption automation checks
  • PID and codec metadata per program (preserving CCExtractor’s existing codec string formats)
  • Guarded video metadata (emitted only when valid)
  • Multi-program stream support with deterministic ordering
  • Container-level metadata when available (e.g., MP4 timed text track count)

Technical Approach

  • JSON generation is implemented in C using existing CCExtractor internal data structures.
  • String values are properly escaped to ensure valid JSON output.
  • Format selection uses case-insensitive comparison (strcasecmp / _stricmp).
  • The JSON output uses CCExtractor’s existing internal data structures without modifying caption extraction or decoding logic.
  • Memory allocation and cleanup follow existing project patterns.
  • Programs are sorted by program number to provide stable and predictable output.

Example Testing Commands

# Test JSON output
ccextractor --report-format json -out=report sample.ts | jq .

# Verify caption presence
ccextractor --report-format json -out=report sample.ts | jq '.programs[0].summary.has_any_captions'

# Extract specific caption channels
ccextractor --report-format json -out=report sample.ts | jq '.programs[].captions.eia_608.channels'

# Check which CC channels are active
ccextractor --report-format json -out=report sample.ts | jq '.programs[].captions.eia_608.channels | to_entries | map(select(.value == true)) | .[].key'

# Get video dimensions
ccextractor --report-format json -out=report sample.ts | jq '.programs[].video | select(. != null) | {width, height}'

# Default text format still works
ccextractor -out=report sample.ts

Field Value Formats:

  • String values like aspect_ratio and frame_rate preserve CCExtractor's internal enum formatting (e.g., "03 - 16:9", "04 - 29.97")
  • This design choice maintains transparency and aids debugging
  • Users needing normalized values can post-process with simple string operations:
    jq '.programs[].video.aspect_ratio | split(" - ")[1]'

Benefits

  1. Automation-Friendly: Enables programmatic parsing without regex/custom parsers
  2. Familiar Structure: Uses JSON patterns similar to tools like ffprobe and mediainfo
  3. Extensible: Versioned schema to support future extensions
  4. Backward Compatible: Existing workflows continue to work unchanged
  5. Addresses Real Need: Solves problem raised by multiple community members (issue #1399 and related discussions)
  6. Quick Caption Detection: Provides has_any_captions summary field for fast EIA-608 / CEA-708 closed-caption checks

Notes

  • Platform compatibility: uses strcasecmp on POSIX systems and maps to _stricmp on Windows via platform-specific preprocessor guards.
  • Video and container metadata are emitted conditionally when applicable
  • Temporary allocations used for program ordering are properly released
  • The implementation follows existing CCExtractor coding conventions

🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/CCExtractor/ccextractor/pull/2020 **Author:** [@x15sr71](https://github.com/x15sr71) **Created:** 1/14/2026 **Status:** 🔄 Open **Base:** `master` ← **Head:** `feat/json-report` --- ### 📝 Commits (3) - [`102f1fc`](https://github.com/CCExtractor/ccextractor/commit/102f1fcb2d919b9917412068c2fa83cd13ce2e3e) feat(report): add machine-readable JSON output for -out=report - [`cecb2bf`](https://github.com/CCExtractor/ccextractor/commit/cecb2bf66b296c38df2e35f4552aaece628b6ada) docs(changelog): mention JSON output support for -out=report - [`b0d6205`](https://github.com/CCExtractor/ccextractor/commit/b0d62057a807b9ea7e4c1ce9767917ba57ea0eec) chore: format Rust code and fix trailing newline ### 📊 Changes **8 files changed** (+338 additions, -1 deletions) <details> <summary>View changed files</summary> 📝 `docs/CHANGES.TXT` (+1 -0) 📝 `src/lib_ccx/ccx_common_option.c` (+1 -0) 📝 `src/lib_ccx/ccx_common_option.h` (+1 -0) 📝 `src/lib_ccx/params_dump.c` (+312 -1) 📝 `src/rust/lib_ccxr/src/common/options.rs` (+2 -0) 📝 `src/rust/src/args.rs` (+6 -0) 📝 `src/rust/src/common.rs` (+10 -0) 📝 `src/rust/src/parser.rs` (+5 -0) </details> ### 📄 Description <!-- Please prefix your pull request with one of the following: **[FEATURE]** **[FIX]** **[IMPROVEMENT]**. --> **In raising this pull request, I confirm the following (please check boxes):** - [X] I have read and understood the [contributors guide](https://github.com/CCExtractor/ccextractor/blob/master/.github/CONTRIBUTING.md). - [X] I have checked that another pull request for this purpose does not exist. - [X] I have considered, and confirmed that this submission will be valuable to others. - [X] I accept that this submission may not be used, and the pull request closed at the will of the maintainer. - [X] I give this submission freely, and claim no ownership to its content. - [X] **I have mentioned this change in the [changelog](https://github.com/CCExtractor/ccextractor/blob/master/docs/CHANGES.TXT).** **My familiarity with the project is as follows (check one):** - [ ] I have never used CCExtractor. - [ ] I have used CCExtractor just a couple of times. - [ ] I absolutely love CCExtractor, but have not contributed previously. - [X] I am an active contributor to CCExtractor. --- ## Summary This PR implements machine-readable JSON output for the `-out=report` feature, addressing issue #1399. Users can now generate structured reports that can be parsed with tools like `jq`, enabling seamless integration with automated workflows. ## Background Currently, CCExtractor’s report output is human-readable text that requires custom parsing for automation. While other media analysis tools such as ffprobe and mediainfo provide JSON output, structured closed-caption reporting is not consistently available across tools or versions. This feature enables CCExtractor to expose its existing report data in a structured JSON format. Use case: Users running CCExtractor in automated environments (e.g., CI/CD pipelines, media processing workflows) need to programmatically determine if streams contain closed captions without writing custom parsers. ## Changes ### `-out=report` Option ```bash ccextractor -out=report input.ts ``` ### Existing Text Output (-out=report) ``` File: ../20251206ch29FullTS.ts Stream Mode: Transport Stream Program Count: 5 Program Numbers: 1 2 3 4 5 PID: 49, Program: 1, MPEG-2 video PID: 52, Program: 1, AC3 audio PID: 53, Program: 1, AC3 audio PID: 65, Program: 2, MPEG-2 video PID: 68, Program: 2, AC3 audio PID: 81, Program: 3, MPEG-2 video PID: 84, Program: 3, AC3 audio PID: 97, Program: 4, MPEG-2 video PID: 100, Program: 4, AC3 audio PID: 113, Program: 5, MPEG-2 video PID: 116, Program: 5, AC3 audio //////// Program #5: //////// DVB Subtitles: No Teletext: No ATSC Closed Caption: Yes EIA-608: Yes XDS: No CC1: Yes CC2: No CC3: No CC4: No CEA-708: Yes Services: 1 2 3 4 5 6 9 Primary Language Present: Yes Secondary Language Present: Yes Width: 704 Height: 480 Aspect Ratio: 03 - 16:9 Frame Rate: 04 - 29.97 (More programs omitted for brevity) ``` ### JSON Output Structure (v1.0) The output follows a versioned JSON report structure: ### JSON output via `--report-format json` ```bash ccextractor --report-format json -out=report input.ts ``` ```json { "schema": { "name": "ccextractor-report", "version": "1.0" }, "input": { "source": "file", "path": "../20251206ch29FullTS.ts" }, "stream": { "mode": "Transport Stream", "program_count": 5, "program_numbers": [ 1, 2, 3, 4, 5 ], "pids": [ { "pid": 49, "program_number": 1, "codec": "MPEG-2 video" }, { "pid": 52, "program_number": 1, "codec": "AC3 audio" }, { "pid": 53, "program_number": 1, "codec": "AC3 audio" }, { "pid": 65, "program_number": 2, "codec": "MPEG-2 video" }, { "pid": 68, "program_number": 2, "codec": "AC3 audio" }, { "pid": 81, "program_number": 3, "codec": "MPEG-2 video" }, { "pid": 84, "program_number": 3, "codec": "AC3 audio" }, { "pid": 97, "program_number": 4, "codec": "MPEG-2 video" }, { "pid": 100, "program_number": 4, "codec": "AC3 audio" }, { "pid": 113, "program_number": 5, "codec": "MPEG-2 video" }, { "pid": 116, "program_number": 5, "codec": "AC3 audio" } ] }, "programs": [ { "program_number": 1, "summary": { "has_any_captions": true, "has_608": true, "has_708": true }, "services": { "dvb_subtitles": false, "teletext": false, "atsc_closed_caption": true }, "captions": { "present": true, "eia_608": { "present": true, "xds": false, "channels": { "cc1": true, "cc2": false, "cc3": false, "cc4": false } }, "cea_708": { "present": true, "services": [ 1, 2, 3, 4, 5, 6, 9 ] } }, "video": { "width": 1920, "height": 1080, "aspect_ratio": "03 - 16:9", "frame_rate": "04 - 29.97" } }, (More programs omitted for brevity) ``` ### Schema Notes - The JSON schema is intentionally descriptive rather than prescriptive. - Field presence and values depend on the input container, stream type, and available metadata. - Codec strings reflect CCExtractor's internal stream type descriptions and are container-dependent (e.g., "AC3 audio" vs "AC3"). - The services object under `programs[]` indicates which captioning systems are present (DVB, Teletext, ATSC), while `captions.cea_708.services[]` lists active CEA-708 caption service numbers. **Program Ordering:** - **JSON output**: Programs are sorted in ascending order by program number (1, 2, 3, 4, 5) for predictable parsing - **Text output**: Programs are displayed in descending order (5, 4, 3, 2, 1) as they're processed | Text Output Field | JSON Field | |-------------------|------------| | File: | `input.path` | | Stream Mode | `stream.mode` | | Program Count | `stream.program_count` | | Program Numbers | `stream.program_numbers[]` | | PID: X, Program: Y, Codec | `stream.pids[]` | | DVB Subtitles | `programs[].services.dvb_subtitles` | | Teletext | `programs[].services.teletext` | | ATSC Closed Caption | `programs[].services.atsc_closed_caption` | | EIA-608 | `programs[].captions.eia_608.present` | | XDS | `programs[].captions.eia_608.xds` | | CC1..CC4 | `programs[].captions.eia_608.channels.*` | | CEA-708 | `programs[].captions.cea_708.present` | | Services: | `programs[].captions.cea_708.services[]` | | Primary Language Present | *(not in JSON)* | | Secondary Language Present | *(not in JSON)* | | Width / Height | `programs[].video.width / height` | | Aspect Ratio | `programs[].video.aspect_ratio` | | Frame Rate | `programs[].video.frame_rate` | | MPEG-4 Timed Text | `container.mp4.timed_text_tracks` | | *(JSON-only)* | `schema.*` | | *(JSON-only)* | `programs[].summary.*` | | *(JSON-only)* | `programs[].captions.present` | ## Key Features: - Structured, machine-readable JSON output for `-out=report` - Versioned schema (`v1.0`) for future extensibility - Backward compatible (existing text report remains the default) - Caption **presence reporting** for: - ATSC Closed Captions (EIA-608 / CEA-708) - DVB subtitles (presence flag) - Teletext (presence flag) - Note: the `has_any_captions` summary field reflects EIA-608 / CEA-708 only.) - Program-level summary fields for fast closed-caption automation checks - PID and codec metadata per program (preserving CCExtractor’s existing codec string formats) - Guarded video metadata (emitted only when valid) - Multi-program stream support with deterministic ordering - Container-level metadata **when available** (e.g., MP4 timed text track count) ## Technical Approach - JSON generation is implemented in C using existing CCExtractor internal data structures. - String values are properly escaped to ensure valid JSON output. - Format selection uses case-insensitive comparison (strcasecmp / _stricmp). - The JSON output uses CCExtractor’s existing internal data structures without modifying caption extraction or decoding logic. - Memory allocation and cleanup follow existing project patterns. - Programs are sorted by program number to provide stable and predictable output. ## Example Testing Commands ```bash # Test JSON output ccextractor --report-format json -out=report sample.ts | jq . # Verify caption presence ccextractor --report-format json -out=report sample.ts | jq '.programs[0].summary.has_any_captions' # Extract specific caption channels ccextractor --report-format json -out=report sample.ts | jq '.programs[].captions.eia_608.channels' # Check which CC channels are active ccextractor --report-format json -out=report sample.ts | jq '.programs[].captions.eia_608.channels | to_entries | map(select(.value == true)) | .[].key' # Get video dimensions ccextractor --report-format json -out=report sample.ts | jq '.programs[].video | select(. != null) | {width, height}' # Default text format still works ccextractor -out=report sample.ts ``` **Field Value Formats:** - String values like `aspect_ratio` and `frame_rate` preserve CCExtractor's internal enum formatting (e.g., "03 - 16:9", "04 - 29.97") - This design choice maintains transparency and aids debugging - Users needing normalized values can post-process with simple string operations: `jq '.programs[].video.aspect_ratio | split(" - ")[1]'` ## Benefits 1. **Automation-Friendly**: Enables programmatic parsing without regex/custom parsers 2. **Familiar Structure**: Uses JSON patterns similar to tools like ffprobe and mediainfo 3. **Extensible**: Versioned schema to support future extensions 4. **Backward Compatible**: Existing workflows continue to work unchanged 5. **Addresses Real Need**: Solves problem raised by multiple community members (issue #1399 and related discussions) 6. **Quick Caption Detection**: Provides `has_any_captions` summary field for fast EIA-608 / CEA-708 closed-caption checks ## Notes - Platform compatibility: uses `strcasecmp` on POSIX systems and maps `to _stricmp` on Windows via platform-specific preprocessor guards. - Video and container metadata are emitted conditionally when applicable - Temporary allocations used for program ordering are properly released - The implementation follows existing CCExtractor coding conventions --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
claunia added the pull-request label 2026-01-29 17:24:05 +00:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/ccextractor#2824