[PR #2020] [FEATURE]: Add machine-readable JSON output for -out=report #2824

New Issue

claunia · 2026-01-29T17:24:05Z

claunia commented

2026-01-29 17:24:05 +00:00

📋 Pull Request Information

Original PR: https://github.com/CCExtractor/ccextractor/pull/2020
Author: @x15sr71
Created: 1/14/2026
Status: 🔄 Open

Base: master ← Head: feat/json-report

📝 Commits (3)

102f1fc feat(report): add machine-readable JSON output for -out=report
cecb2bf docs(changelog): mention JSON output support for -out=report
b0d6205 chore: format Rust code and fix trailing newline

📊 Changes

8 files changed (+338 additions, -1 deletions)

View changed files

📝 docs/CHANGES.TXT (+1 -0)
📝 src/lib_ccx/ccx_common_option.c (+1 -0)
📝 src/lib_ccx/ccx_common_option.h (+1 -0)
📝 src/lib_ccx/params_dump.c (+312 -1)
📝 src/rust/lib_ccxr/src/common/options.rs (+2 -0)
📝 src/rust/src/args.rs (+6 -0)
📝 src/rust/src/common.rs (+10 -0)
📝 src/rust/src/parser.rs (+5 -0)

📄 Description

In raising this pull request, I confirm the following (please check boxes):

I have read and understood the contributors guide.
I have checked that another pull request for this purpose does not exist.
I have considered, and confirmed that this submission will be valuable to others.
I accept that this submission may not be used, and the pull request closed at the will of the maintainer.
I give this submission freely, and claim no ownership to its content.
I have mentioned this change in the changelog.

My familiarity with the project is as follows (check one):

I have never used CCExtractor.
I have used CCExtractor just a couple of times.
I absolutely love CCExtractor, but have not contributed previously.
I am an active contributor to CCExtractor.

Summary

This PR implements machine-readable JSON output for the -out=report feature, addressing issue #1399. Users can now generate structured reports that can be parsed with tools like jq, enabling seamless integration with automated workflows.

Background

Currently, CCExtractor’s report output is human-readable text that requires custom parsing for automation. While other media analysis tools such as ffprobe and mediainfo provide JSON output, structured closed-caption reporting is not consistently available across tools or versions. This feature enables CCExtractor to expose its existing report data in a structured JSON format.

Use case: Users running CCExtractor in automated environments (e.g., CI/CD pipelines, media processing workflows) need to programmatically determine if streams contain closed captions without writing custom parsers.

Changes

`-out=report` Option

ccextractor -out=report input.ts

Existing Text Output (-out=report)

File: ../20251206ch29FullTS.ts
Stream Mode: Transport Stream
Program Count: 5
Program Numbers: 1 2 3 4 5
PID: 49, Program: 1, MPEG-2 video
PID: 52, Program: 1, AC3 audio
PID: 53, Program: 1, AC3 audio
PID: 65, Program: 2, MPEG-2 video
PID: 68, Program: 2, AC3 audio
PID: 81, Program: 3, MPEG-2 video
PID: 84, Program: 3, AC3 audio
PID: 97, Program: 4, MPEG-2 video
PID: 100, Program: 4, AC3 audio
PID: 113, Program: 5, MPEG-2 video
PID: 116, Program: 5, AC3 audio
//////// Program #5: ////////
DVB Subtitles: No
Teletext: No
ATSC Closed Caption: Yes
EIA-608: Yes
XDS: No
CC1: Yes
CC2: No
CC3: No
CC4: No
CEA-708: Yes
Services: 1 2 3 4 5 6 9
Primary Language Present: Yes
Secondary Language Present: Yes
Width: 704
Height: 480
Aspect Ratio: 03 - 16:9
Frame Rate: 04 - 29.97


(More programs omitted for brevity)

JSON Output Structure (v1.0)

The output follows a versioned JSON report structure:

JSON output via `--report-format json`

ccextractor --report-format json -out=report input.ts

{
  "schema": {
    "name": "ccextractor-report",
    "version": "1.0"
  },
  "input": {
    "source": "file",
    "path": "../20251206ch29FullTS.ts"
  },
  "stream": {
    "mode": "Transport Stream",
    "program_count": 5,
    "program_numbers": [
      1,
      2,
      3,
      4,
      5
    ],
    "pids": [
      {
        "pid": 49,
        "program_number": 1,
        "codec": "MPEG-2 video"
      },
      {
        "pid": 52,
        "program_number": 1,
        "codec": "AC3 audio"
      },
      {
        "pid": 53,
        "program_number": 1,
        "codec": "AC3 audio"
      },
      {
        "pid": 65,
        "program_number": 2,
        "codec": "MPEG-2 video"
      },
      {
        "pid": 68,
        "program_number": 2,
        "codec": "AC3 audio"
      },
      {
        "pid": 81,
        "program_number": 3,
        "codec": "MPEG-2 video"
      },
      {
        "pid": 84,
        "program_number": 3,
        "codec": "AC3 audio"
      },
      {
        "pid": 97,
        "program_number": 4,
        "codec": "MPEG-2 video"
      },
      {
        "pid": 100,
        "program_number": 4,
        "codec": "AC3 audio"
      },
      {
        "pid": 113,
        "program_number": 5,
        "codec": "MPEG-2 video"
      },
      {
        "pid": 116,
        "program_number": 5,
        "codec": "AC3 audio"
      }
    ]
  },
  "programs": [
    {
      "program_number": 1,
      "summary": {
        "has_any_captions": true,
        "has_608": true,
        "has_708": true
      },
      "services": {
        "dvb_subtitles": false,
        "teletext": false,
        "atsc_closed_caption": true
      },
      "captions": {
        "present": true,
        "eia_608": {
          "present": true,
          "xds": false,
          "channels": {
            "cc1": true,
            "cc2": false,
            "cc3": false,
            "cc4": false
          }
        },
        "cea_708": {
          "present": true,
          "services": [
            1,
            2,
            3,
            4,
            5,
            6,
            9
          ]
        }
      },
      "video": {
        "width": 1920,
        "height": 1080,
        "aspect_ratio": "03 - 16:9",
        "frame_rate": "04 - 29.97"
      }
    },

(More programs omitted for brevity)

Schema Notes

The JSON schema is intentionally descriptive rather than prescriptive.
Field presence and values depend on the input container, stream type, and available metadata.
Codec strings reflect CCExtractor's internal stream type descriptions and are container-dependent (e.g., "AC3 audio" vs "AC3").
The services object under programs[] indicates which captioning systems are present (DVB, Teletext, ATSC), while captions.cea_708.services[] lists active CEA-708 caption service numbers.

Program Ordering:

JSON output: Programs are sorted in ascending order by program number (1, 2, 3, 4, 5) for predictable parsing
Text output: Programs are displayed in descending order (5, 4, 3, 2, 1) as they're processed

Text Output Field	JSON Field
File:	`input.path`
Stream Mode	`stream.mode`
Program Count	`stream.program_count`
Program Numbers	`stream.program_numbers[]`
PID: X, Program: Y, Codec	`stream.pids[]`
DVB Subtitles	`programs[].services.dvb_subtitles`
Teletext	`programs[].services.teletext`
ATSC Closed Caption	`programs[].services.atsc_closed_caption`
EIA-608	`programs[].captions.eia_608.present`
XDS	`programs[].captions.eia_608.xds`
CC1..CC4	`programs[].captions.eia_608.channels.*`
CEA-708	`programs[].captions.cea_708.present`
Services:	`programs[].captions.cea_708.services[]`
Primary Language Present	(not in JSON)
Secondary Language Present	(not in JSON)
Width / Height	`programs[].video.width / height`
Aspect Ratio	`programs[].video.aspect_ratio`
Frame Rate	`programs[].video.frame_rate`
MPEG-4 Timed Text	`container.mp4.timed_text_tracks`
(JSON-only)	`schema.*`
(JSON-only)	`programs[].summary.*`
(JSON-only)	`programs[].captions.present`

Key Features:

Structured, machine-readable JSON output for -out=report
Versioned schema (v1.0) for future extensibility
Backward compatible (existing text report remains the default)
Caption presence reporting for:
- ATSC Closed Captions (EIA-608 / CEA-708)
- DVB subtitles (presence flag)
- Teletext (presence flag)
- Note: the has_any_captions summary field reflects EIA-608 / CEA-708 only.)
Program-level summary fields for fast closed-caption automation checks
PID and codec metadata per program (preserving CCExtractor’s existing codec string formats)
Guarded video metadata (emitted only when valid)
Multi-program stream support with deterministic ordering
Container-level metadata when available (e.g., MP4 timed text track count)

Technical Approach

JSON generation is implemented in C using existing CCExtractor internal data structures.
String values are properly escaped to ensure valid JSON output.
Format selection uses case-insensitive comparison (strcasecmp / _stricmp).
The JSON output uses CCExtractor’s existing internal data structures without modifying caption extraction or decoding logic.
Memory allocation and cleanup follow existing project patterns.
Programs are sorted by program number to provide stable and predictable output.

Example Testing Commands

# Test JSON output
ccextractor --report-format json -out=report sample.ts | jq .

# Verify caption presence
ccextractor --report-format json -out=report sample.ts | jq '.programs[0].summary.has_any_captions'

# Extract specific caption channels
ccextractor --report-format json -out=report sample.ts | jq '.programs[].captions.eia_608.channels'

# Check which CC channels are active
ccextractor --report-format json -out=report sample.ts | jq '.programs[].captions.eia_608.channels | to_entries | map(select(.value == true)) | .[].key'

# Get video dimensions
ccextractor --report-format json -out=report sample.ts | jq '.programs[].video | select(. != null) | {width, height}'

# Default text format still works
ccextractor -out=report sample.ts

Field Value Formats:

String values like aspect_ratio and frame_rate preserve CCExtractor's internal enum formatting (e.g., "03 - 16:9", "04 - 29.97")
This design choice maintains transparency and aids debugging
Users needing normalized values can post-process with simple string operations:
jq '.programs[].video.aspect_ratio | split(" - ")[1]'

Benefits

Automation-Friendly: Enables programmatic parsing without regex/custom parsers
Familiar Structure: Uses JSON patterns similar to tools like ffprobe and mediainfo
Extensible: Versioned schema to support future extensions
Backward Compatible: Existing workflows continue to work unchanged
Addresses Real Need: Solves problem raised by multiple community members (issue #1399 and related discussions)
Quick Caption Detection: Provides has_any_captions summary field for fast EIA-608 / CEA-708 closed-caption checks

Notes

Platform compatibility: uses strcasecmp on POSIX systems and maps to _stricmp on Windows via platform-specific preprocessor guards.
Video and container metadata are emitted conditionally when applicable
Temporary allocations used for program ordering are properly released
The implementation follows existing CCExtractor coding conventions

_{🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.}

## 📋 Pull Request Information **Original PR:** https://github.com/CCExtractor/ccextractor/pull/2020 **Author:** [@x15sr71](https://github.com/x15sr71) **Created:** 1/14/2026 **Status:** 🔄 Open **Base:** `master` ← **Head:** `feat/json-report` --- ### 📝 Commits (3) - [`102f1fc`](https://github.com/CCExtractor/ccextractor/commit/102f1fcb2d919b9917412068c2fa83cd13ce2e3e) feat(report): add machine-readable JSON output for -out=report - [`cecb2bf`](https://github.com/CCExtractor/ccextractor/commit/cecb2bf66b296c38df2e35f4552aaece628b6ada) docs(changelog): mention JSON output support for -out=report - [`b0d6205`](https://github.com/CCExtractor/ccextractor/commit/b0d62057a807b9ea7e4c1ce9767917ba57ea0eec) chore: format Rust code and fix trailing newline ### 📊 Changes **8 files changed** (+338 additions, -1 deletions) <details> <summary>View changed files</summary> 📝 `docs/CHANGES.TXT` (+1 -0) 📝 `src/lib_ccx/ccx_common_option.c` (+1 -0) 📝 `src/lib_ccx/ccx_common_option.h` (+1 -0) 📝 `src/lib_ccx/params_dump.c` (+312 -1) 📝 `src/rust/lib_ccxr/src/common/options.rs` (+2 -0) 📝 `src/rust/src/args.rs` (+6 -0) 📝 `src/rust/src/common.rs` (+10 -0) 📝 `src/rust/src/parser.rs` (+5 -0) </details> ### 📄 Description  **In raising this pull request, I confirm the following (please check boxes):** - [X] I have read and understood the [contributors guide](https://github.com/CCExtractor/ccextractor/blob/master/.github/CONTRIBUTING.md). - [X] I have checked that another pull request for this purpose does not exist. - [X] I have considered, and confirmed that this submission will be valuable to others. - [X] I accept that this submission may not be used, and the pull request closed at the will of the maintainer. - [X] I give this submission freely, and claim no ownership to its content. - [X] **I have mentioned this change in the [changelog](https://github.com/CCExtractor/ccextractor/blob/master/docs/CHANGES.TXT).** **My familiarity with the project is as follows (check one):** - [ ] I have never used CCExtractor. - [ ] I have used CCExtractor just a couple of times. - [ ] I absolutely love CCExtractor, but have not contributed previously. - [X] I am an active contributor to CCExtractor. --- ## Summary This PR implements machine-readable JSON output for the `-out=report` feature, addressing issue #1399. Users can now generate structured reports that can be parsed with tools like `jq`, enabling seamless integration with automated workflows. ## Background Currently, CCExtractor’s report output is human-readable text that requires custom parsing for automation. While other media analysis tools such as ffprobe and mediainfo provide JSON output, structured closed-caption reporting is not consistently available across tools or versions. This feature enables CCExtractor to expose its existing report data in a structured JSON format. Use case: Users running CCExtractor in automated environments (e.g., CI/CD pipelines, media processing workflows) need to programmatically determine if streams contain closed captions without writing custom parsers. ## Changes ### `-out=report` Option ```bash ccextractor -out=report input.ts ``` ### Existing Text Output (-out=report) ``` File: ../20251206ch29FullTS.ts Stream Mode: Transport Stream Program Count: 5 Program Numbers: 1 2 3 4 5 PID: 49, Program: 1, MPEG-2 video PID: 52, Program: 1, AC3 audio PID: 53, Program: 1, AC3 audio PID: 65, Program: 2, MPEG-2 video PID: 68, Program: 2, AC3 audio PID: 81, Program: 3, MPEG-2 video PID: 84, Program: 3, AC3 audio PID: 97, Program: 4, MPEG-2 video PID: 100, Program: 4, AC3 audio PID: 113, Program: 5, MPEG-2 video PID: 116, Program: 5, AC3 audio //////// Program #5: //////// DVB Subtitles: No Teletext: No ATSC Closed Caption: Yes EIA-608: Yes XDS: No CC1: Yes CC2: No CC3: No CC4: No CEA-708: Yes Services: 1 2 3 4 5 6 9 Primary Language Present: Yes Secondary Language Present: Yes Width: 704 Height: 480 Aspect Ratio: 03 - 16:9 Frame Rate: 04 - 29.97 (More programs omitted for brevity) ``` ### JSON Output Structure (v1.0) The output follows a versioned JSON report structure: ### JSON output via `--report-format json` ```bash ccextractor --report-format json -out=report input.ts ``` ```json { "schema": { "name": "ccextractor-report", "version": "1.0" }, "input": { "source": "file", "path": "../20251206ch29FullTS.ts" }, "stream": { "mode": "Transport Stream", "program_count": 5, "program_numbers": [ 1, 2, 3, 4, 5 ], "pids": [ { "pid": 49, "program_number": 1, "codec": "MPEG-2 video" }, { "pid": 52, "program_number": 1, "codec": "AC3 audio" }, { "pid": 53, "program_number": 1, "codec": "AC3 audio" }, { "pid": 65, "program_number": 2, "codec": "MPEG-2 video" }, { "pid": 68, "program_number": 2, "codec": "AC3 audio" }, { "pid": 81, "program_number": 3, "codec": "MPEG-2 video" }, { "pid": 84, "program_number": 3, "codec": "AC3 audio" }, { "pid": 97, "program_number": 4, "codec": "MPEG-2 video" }, { "pid": 100, "program_number": 4, "codec": "AC3 audio" }, { "pid": 113, "program_number": 5, "codec": "MPEG-2 video" }, { "pid": 116, "program_number": 5, "codec": "AC3 audio" } ] }, "programs": [ { "program_number": 1, "summary": { "has_any_captions": true, "has_608": true, "has_708": true }, "services": { "dvb_subtitles": false, "teletext": false, "atsc_closed_caption": true }, "captions": { "present": true, "eia_608": { "present": true, "xds": false, "channels": { "cc1": true, "cc2": false, "cc3": false, "cc4": false } }, "cea_708": { "present": true, "services": [ 1, 2, 3, 4, 5, 6, 9 ] } }, "video": { "width": 1920, "height": 1080, "aspect_ratio": "03 - 16:9", "frame_rate": "04 - 29.97" } }, (More programs omitted for brevity) ``` ### Schema Notes - The JSON schema is intentionally descriptive rather than prescriptive. - Field presence and values depend on the input container, stream type, and available metadata. - Codec strings reflect CCExtractor's internal stream type descriptions and are container-dependent (e.g., "AC3 audio" vs "AC3"). - The services object under `programs[]` indicates which captioning systems are present (DVB, Teletext, ATSC), while `captions.cea_708.services[]` lists active CEA-708 caption service numbers. **Program Ordering:** - **JSON output**: Programs are sorted in ascending order by program number (1, 2, 3, 4, 5) for predictable parsing - **Text output**: Programs are displayed in descending order (5, 4, 3, 2, 1) as they're processed | Text Output Field | JSON Field | |-------------------|------------| | File: | `input.path` | | Stream Mode | `stream.mode` | | Program Count | `stream.program_count` | | Program Numbers | `stream.program_numbers[]` | | PID: X, Program: Y, Codec | `stream.pids[]` | | DVB Subtitles | `programs[].services.dvb_subtitles` | | Teletext | `programs[].services.teletext` | | ATSC Closed Caption | `programs[].services.atsc_closed_caption` | | EIA-608 | `programs[].captions.eia_608.present` | | XDS | `programs[].captions.eia_608.xds` | | CC1..CC4 | `programs[].captions.eia_608.channels.*` | | CEA-708 | `programs[].captions.cea_708.present` | | Services: | `programs[].captions.cea_708.services[]` | | Primary Language Present | *(not in JSON)* | | Secondary Language Present | *(not in JSON)* | | Width / Height | `programs[].video.width / height` | | Aspect Ratio | `programs[].video.aspect_ratio` | | Frame Rate | `programs[].video.frame_rate` | | MPEG-4 Timed Text | `container.mp4.timed_text_tracks` | | *(JSON-only)* | `schema.*` | | *(JSON-only)* | `programs[].summary.*` | | *(JSON-only)* | `programs[].captions.present` | ## Key Features: - Structured, machine-readable JSON output for `-out=report` - Versioned schema (`v1.0`) for future extensibility - Backward compatible (existing text report remains the default) - Caption **presence reporting** for: - ATSC Closed Captions (EIA-608 / CEA-708) - DVB subtitles (presence flag) - Teletext (presence flag) - Note: the `has_any_captions` summary field reflects EIA-608 / CEA-708 only.) - Program-level summary fields for fast closed-caption automation checks - PID and codec metadata per program (preserving CCExtractor’s existing codec string formats) - Guarded video metadata (emitted only when valid) - Multi-program stream support with deterministic ordering - Container-level metadata **when available** (e.g., MP4 timed text track count) ## Technical Approach - JSON generation is implemented in C using existing CCExtractor internal data structures. - String values are properly escaped to ensure valid JSON output. - Format selection uses case-insensitive comparison (strcasecmp / _stricmp). - The JSON output uses CCExtractor’s existing internal data structures without modifying caption extraction or decoding logic. - Memory allocation and cleanup follow existing project patterns. - Programs are sorted by program number to provide stable and predictable output. ## Example Testing Commands ```bash # Test JSON output ccextractor --report-format json -out=report sample.ts | jq . # Verify caption presence ccextractor --report-format json -out=report sample.ts | jq '.programs[0].summary.has_any_captions' # Extract specific caption channels ccextractor --report-format json -out=report sample.ts | jq '.programs[].captions.eia_608.channels' # Check which CC channels are active ccextractor --report-format json -out=report sample.ts | jq '.programs[].captions.eia_608.channels | to_entries | map(select(.value == true)) | .[].key' # Get video dimensions ccextractor --report-format json -out=report sample.ts | jq '.programs[].video | select(. != null) | {width, height}' # Default text format still works ccextractor -out=report sample.ts ``` **Field Value Formats:** - String values like `aspect_ratio` and `frame_rate` preserve CCExtractor's internal enum formatting (e.g., "03 - 16:9", "04 - 29.97") - This design choice maintains transparency and aids debugging - Users needing normalized values can post-process with simple string operations: `jq '.programs[].video.aspect_ratio | split(" - ")[1]'` ## Benefits 1. **Automation-Friendly**: Enables programmatic parsing without regex/custom parsers 2. **Familiar Structure**: Uses JSON patterns similar to tools like ffprobe and mediainfo 3. **Extensible**: Versioned schema to support future extensions 4. **Backward Compatible**: Existing workflows continue to work unchanged 5. **Addresses Real Need**: Solves problem raised by multiple community members (issue #1399 and related discussions) 6. **Quick Caption Detection**: Provides `has_any_captions` summary field for fast EIA-608 / CEA-708 closed-caption checks ## Notes - Platform compatibility: uses `strcasecmp` on POSIX systems and maps `to _stricmp` on Windows via platform-specific preprocessor guards. - Video and container metadata are emitted conditionally when applicable - Temporary allocations used for program ordering are properly released - The implementation follows existing CCExtractor coding conventions --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>

claunia added the pull-request label 2026-01-29 17:24:05 +00:00

Sign in to join this conversation.

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: starred/ccextractor#2824

[PR #2020] [FEATURE]: Add machine-readable JSON output for -out=report #2824