Compare commits

...

651 Commits

Author SHA1 Message Date
Carlos Fernandez Sanz
dd2931153e Merge pull request #2077 from pranavshar223/fix/rust-clippy-pedantic
Fix/rust clippy pedantic
2026-02-07 13:37:15 -08:00
Carlos Fernandez Sanz
10288243b9 Merge pull request #2100 from x15sr71/fix/dvb-eit-bcd-start-time
[FIX]: Properly decode DVB EIT start time BCD field in XMLTV output
2026-02-07 13:25:03 -08:00
Carlos Fernandez Sanz
9762223105 Merge pull request #2079 from Atul-Chahar/fix/empty-webvtt-hls-compatibility-1743
Fix empty WebVTT files for HLS compatibility (Issue #1743)
2026-02-07 11:35:12 -08:00
Pranav Sharma
cad4d3d62d fix(rust): resolve clippy casts and enforce consistency in track_lister.rs 2026-02-07 19:32:05 +00:00
Carlos Fernandez Sanz
a30b8d7a83 Merge pull request #2092 from AhmedAlian7/fix-output-malloc
Replace static buffer with dynamic allocation in writercwtdata()
2026-02-07 11:30:35 -08:00
Chandragupt Singh
f920c16a53 docs: add changelog entry 2026-02-08 00:59:18 +05:30
THE-Amrit-mahto-05
5c05173c75 Fix Vec::from_raw_parts UB in string_to_c_chars (#2094)
Co-authored-by: Amrit kumar Mahto <amrit.mahto@adypu.edu.in>
2026-02-07 10:46:56 -08:00
Nicolas Dato
2582f628dd Fix sigsegv (#2090)
* Fix SIGSEGV when using --multiprogram

* Update CHANGES.TXT
2026-02-07 10:24:51 -08:00
dependabot[bot]
99f7d1955a chore(deps): bump time from 0.3.44 to 0.3.47 in /src/rust (#2097)
Bumps [time](https://github.com/time-rs/time) from 0.3.44 to 0.3.47.
- [Release notes](https://github.com/time-rs/time/releases)
- [Changelog](https://github.com/time-rs/time/blob/main/CHANGELOG.md)
- [Commits](https://github.com/time-rs/time/compare/v0.3.44...v0.3.47)

---
updated-dependencies:
- dependency-name: time
  dependency-version: 0.3.47
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-02-07 10:07:52 -08:00
dependabot[bot]
78e94f85c2 chore(deps): bump time from 0.3.41 to 0.3.47 in /src/rust/lib_ccxr (#2096)
Bumps [time](https://github.com/time-rs/time) from 0.3.41 to 0.3.47.
- [Release notes](https://github.com/time-rs/time/releases)
- [Changelog](https://github.com/time-rs/time/blob/main/CHANGELOG.md)
- [Commits](https://github.com/time-rs/time/compare/v0.3.41...v0.3.47)

---
updated-dependencies:
- dependency-name: time
  dependency-version: 0.3.47
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-02-07 10:07:50 -08:00
Chandragupt Singh
36f13922d2 fix: DVB EIT start time decoding using proper BCD parsing and normalization 2026-02-07 18:35:05 +05:30
Ahmed Alian
3a547fdeab Replace static buffer with dynamic allocation in writercwtdata() 2026-02-04 17:38:07 +02:00
Atul Chahar
528021b577 fix: ensure WebVTT header is written for empty files (HLS compatibility)
This fixes issue #1743 where CCExtractor generates empty .webvtt files
when processing content with no subtitles, breaking HLS compatibility.

Changes:
- Fixed X-TIMESTAMP-MAP field order (LOCAL before MPEGTS) per HLS spec
- Added WebVTT case in write_subtitle_file_footer() to ensure header
  is always written even when no subtitles are found
- Added write_webvtt_header declaration in header file

Fixes: #1743
Signed-off-by: Atul Chahar <chaharatul92@gmail.com>
2026-02-01 20:33:59 +05:30
Carlos Fernandez Sanz
270c89b7f8 [FEATURE]: Add Snap packaging support with Github workflow 2026-01-31 17:52:06 -08:00
Carlos Fernandez Sanz
032cd1c6b1 Merge pull request #2040 from THE-Amrit-mahto-05/fix/avc-sei-payload-size
Fix SEI payload type handling: changes payload_type and payload_size from i32 to u32 for type safety, keeping as usize casts only where needed for indexing.
2026-01-31 17:35:40 -08:00
Carlos Fernandez Sanz
42e4e9a657 Merge pull request #2049 from THE-Amrit-mahto-05/fix-null-len-guard
Adds defensive null pointer and negative length checks to ccxr_verify_crc32 FFI function to prevent undefined behavior.
2026-01-31 17:18:31 -08:00
Carlos Fernandez Sanz
821e307333 Merge pull request #2076 from THE-Amrit-mahto-05/fix-miri-null-deref
Verified with Miri - fixes undefined behavior when calling dealloc() on null pointer in window row deallocation.
2026-01-31 13:58:48 -08:00
Amrit kumar Mahto
ae81f3ba3d Fix Miri-reported UB in window row deallocation and tests 2026-01-31 00:49:50 +05:30
Carlos Fernandez Sanz
b190751b2c [FIX]macOS: Fix hardsub pipeline failing due to arm64/x86_64 build mismatch 2026-01-28 18:30:38 -08:00
GAURAV KARMAKAR
f1bb0f4dce macOS: Fix hardsub pipeline failing due to arm64/x86_64 build mismatch 2026-01-29 00:12:09 +05:30
Amrit kumar Mahto
f147ac27f8 re running for CI to pass checks 2026-01-27 21:03:19 +05:30
Amrit kumar Mahto
2dfb44d7d4 re running CI 2026-01-27 20:42:53 +05:30
Carlos Fernandez Sanz
580e721dfe fix: prevent heap overflow in parse_PAT/parse_PMT and null deref in processmp4 2026-01-23 23:06:35 -08:00
Carlos Fernandez
d0a82447ff fix(rust): resolve clippy unnecessary_unwrap warnings for Rust 1.93
Use if-let patterns instead of is_some() + unwrap() to satisfy
the stricter clippy::unnecessary_unwrap lint in Rust 1.93.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-23 20:58:03 -08:00
Carlos Fernandez
5c19c7b932 style: fix Rust formatting in parser.rs test
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-23 20:14:26 -08:00
Carlos Fernandez
fd7271bae2 fix: prevent heap overflow in parse_PAT/parse_PMT and null deref in processmp4
- parse_PAT: Add bounds check for payload_length >= 8 before accessing
  header fields (fixes #2053)
- parse_PMT: Add ES_info_length validation and 2-byte minimum check
  before reading descriptor_tag and desc_len in PRIVATE_USER_MPEG2
  and teletext parsing loops (fixes #2054)
- processmp4: Add NULL check for file parameter before passing to
  mprint (fixes #2055)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-23 20:12:09 -08:00
Chandragupt Singh
05c68349d5 Merge branch 'master' into feat/snap-distribution-support 2026-01-23 15:26:59 +05:30
Chandragupt Singh
09f21f64e4 fix(snap): resolve GPAC dependency and runtime issues in core22 snap 2026-01-23 15:23:33 +05:30
Carlos Fernandez Sanz
c65fb0874e fix(rust): correct mkvlang test to use MkvLangFilter type 2026-01-19 07:43:15 -08:00
Carlos Fernandez
9db727d593 fix(rust): correct mkvlang test to use MkvLangFilter type
The test_mkvlang_sets_mkv_language test was comparing against
Language::Eng, but the mkvlang field type was changed to MkvLangFilter
when BCP 47 language tag support was added in PR #2038.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-19 07:41:36 -08:00
Amrit kumar Mahto
fe6dad83b7 use u32 for SEI payload type and size 2026-01-19 14:16:50 +05:30
Carlos Fernandez Sanz
d494286082 ci: add workflow to build .deb packages 2026-01-18 20:37:22 -08:00
Carlos Fernandez
259e881483 fix(ci): add missing FFmpeg dependencies to hardsubx .deb packages
Add libavdevice, libswresample, and libavfilter dependencies for
the hardsubx variant on both Ubuntu 24.04 and Debian 13 workflows.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-18 20:11:10 -08:00
Carlos Fernandez
197069d3b8 ci: add Debian 13 (Trixie) .deb build workflow
Creates .deb packages for Debian 13 using a Docker container.
- Builds GPAC from source (abi-16.4 tag)
- Creates basic and hardsubx variants
- Uses Debian 13's library versions:
  - libtesseract5, libleptonica6
  - libavcodec61, libavformat61, libavutil59, libswscale8

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-18 20:02:16 -08:00
Carlos Fernandez
7a810d736d fix(ci): add libcurl3t64-gnutls dependency to .deb package
CCExtractor is linked against libcurl-gnutls which requires this
runtime dependency on Ubuntu 24.04.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-18 19:55:09 -08:00
Carlos Fernandez
1413c948c4 fix(ci): correct leptonica package name for Ubuntu 24.04
Ubuntu 24.04 uses liblept5, not libleptonica6 (which is Ubuntu 25.04).

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-18 19:39:42 -08:00
Carlos Fernandez
bb5385913b fix(ci): use apt install to handle .deb dependencies in test step
apt install automatically resolves and installs dependencies,
unlike dpkg -i which fails if dependencies are missing.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-18 19:13:41 -08:00
Carlos Fernandez Sanz
f8981e8e1e refactor(rust): Rename parser tests with descriptive names and expand coverage 2026-01-18 19:12:34 -08:00
Carlos Fernandez
a1871abf04 fix(ci): switch .deb build to Ubuntu 24.04
- Use ubuntu-24.04 runner instead of ubuntu-22.04
- Update dependencies to match Ubuntu 24.04 library versions
  (libtesseract5, libleptonica6, libavcodec60, etc.)
- Update GPAC cache key for new Ubuntu version

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-18 19:08:09 -08:00
Carlos Fernandez
20b3773bb9 fix(ci): correct version and add missing dependencies in .deb workflow
- Update CMakeLists.txt version from 0.89 to 0.96 to match lib_ccx.h
- Extract version from lib_ccx.h instead of CMakeLists.txt for accuracy
- Add missing runtime dependencies: libtesseract, libleptonica
- Add FFmpeg dependencies for hardsubx variant

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-18 19:02:48 -08:00
Carlos Fernandez
8786b4cf75 fix(ci): correct LICENSE filename to LICENSE.txt 2026-01-18 18:08:04 -08:00
Carlos Fernandez
8632ecda5b ci: add workflow to build .deb packages
Add GitHub Actions workflow to build Debian packages (.deb) for Linux.

Features:
- Builds GPAC from source (abi-16.4 tag) since libgpac-dev is not
  available in newer Debian/Ubuntu releases
- Creates two variants: basic (with OCR) and hardsubx (with FFmpeg)
- Bundles GPAC library with the package using patchelf for rpath
- Includes proper Debian package structure with control, postinst, postrm
- Runs on releases, manual trigger, or workflow file changes
- Uploads packages as artifacts and attaches to releases

This provides an unofficial .deb package for users who prefer that
format over AppImage or snap.

Relates to #1610

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-18 18:00:45 -08:00
Carlos Fernandez Sanz
475153a9dd fix(build): resolve Rust-to-C linking issues on Linux 2026-01-18 17:39:27 -08:00
Carlos Fernandez
df90009f73 ci: add CMakeLists.txt to workflow path filters
Build workflows were not triggering on CMakeLists.txt changes.
Added **CMakeLists.txt and **.cmake patterns to path filters for:
- build_linux.yml
- build_mac.yml
- build_windows.yml
- build_docker.yml

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-18 17:23:34 -08:00
Carlos Fernandez
2352ea21e3 fix(build): resolve Rust-to-C linking issues on Linux
Two fixes for static library linking:

1. Preserve CMAKE_C_FLAGS in lib_ccx/CMakeLists.txt instead of
   overwriting them. This allows passing include paths via
   -DCMAKE_C_FLAGS which is needed for some build configurations.

2. Add target_link_options with --undefined flags for C functions
   called from Rust (decode_vbi, do_cb, store_hdcc). With static
   libraries, the linker processes them in order and only pulls
   symbols that are currently unresolved. Since ccx is processed
   before ccx_rust, these symbols weren't being pulled in.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-18 17:11:44 -08:00
Carlos Fernandez Sanz
dc041a35e8 fix(rust): Support BCP 47 language tags in --mkvlang option 2026-01-18 16:33:39 -08:00
Carlos Fernandez Sanz
e99ba1d177 fix(rust): Remove dead code returning pointer to stack variable 2026-01-18 14:11:39 -08:00
Carlos Fernandez
298665faa4 chore: fix cargo fmt formatting
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-18 13:57:53 -08:00
Carlos Fernandez
735a01bf04 refactor(rust): rename parser tests with descriptive names and expand coverage
Replace poorly-named tests (options_1 through options_51, broken_1, etc.)
with 201 descriptively-named tests organized by category:

- Input/output format tests
- Encoding tests
- Stream/program selection tests
- CEA-708 service tests
- Codec selection tests
- Timing option tests
- Debug flag tests
- Teletext option tests
- XMLTV option tests
- Credits option tests
- Buffering option tests
- And more

Each test name now clearly indicates what CLI option is being tested
and what behavior is expected, e.g.:
- test_input_ts_sets_transport_stream_mode
- test_608_enables_decoder_608_debug
- test_service_enables_708_with_single_service

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-18 13:55:56 -08:00
Amrit kumar Mahto
3618c23b5a rust/avc: fix SEI payload size handling and type correctness 2026-01-19 03:09:07 +05:30
Carlos Fernandez Sanz
b7c9da75dd Revert "Automatic extraction of multiple DVB subtitle streams (--split-dvb-subs) fixes#447 #1864"
Was incorrectly merged
2026-01-18 13:37:53 -08:00
Carlos Fernandez Sanz
449d55d5e5 Revert "Automatic extraction of multiple DVB subtitle streams (--split-dvb-subs) fixes#447 #1864" 2026-01-18 13:37:26 -08:00
Carlos Fernandez Sanz
60aa370899 fix(rust): Correct version number in CLI parser 2026-01-18 13:35:25 -08:00
Carlos Fernandez
3d18b38c32 Revert "Merge pull request #1912 from Rahul-2k4/final"
This reverts commit 2a6d27f9ff, reversing
changes made to 74e64c0421.
2026-01-18 13:28:15 -08:00
Carlos Fernandez Sanz
2a6d27f9ff Merge pull request #1912 from Rahul-2k4/final
Automatic extraction of multiple DVB subtitle streams (--split-dvb-subs) fixes#447 #1864
2026-01-18 13:27:17 -08:00
Carlos Fernandez
91d3512bcc fix(rust): Support BCP 47 language tags in --mkvlang option
The --mkvlang option previously only supported single ISO 639-2 codes
due to using a Language enum with a fixed list of variants. Extended
codes (like "fre-ca") and multiple codes (like "eng,chi") would panic.

This change introduces MkvLangFilter, a proper type for language
filtering that:

- Validates language codes per BCP 47 specification
- Supports ISO 639-2 (3-letter codes like "eng")
- Supports BCP 47 tags (like "en-US", "zh-Hans-CN")
- Supports comma-separated multiple codes
- Provides clean error messages for invalid input
- Includes comprehensive unit tests

The C code continues to receive the raw string for strstr() matching,
maintaining backward compatibility.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-18 13:23:39 -08:00
Carlos Fernandez Sanz
74e64c0421 Merge pull request #2035 from THE-Amrit-mahto-05/fix/mkvlang-params-check
fix mkvlang_params_check: prevent panic on multi-byte characters
2026-01-18 13:07:44 -08:00
Carlos Fernandez
c175750ebe fix(rust): Correct version number in CLI parser
The Rust CLI parser was showing "CCExtractor 1.0" instead of the
actual version (0.96.5). This was a placeholder value from when
the parser was first ported to Rust in August 2024 that was never
updated.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-18 12:55:21 -08:00
Carlos Fernandez Sanz
e7dc4d19f7 Merge pull request #2036 from THE-Amrit-mahto-05/fix/process-word-file-safely
fix: process_word_file propagates errors instead of panicking
2026-01-18 12:51:33 -08:00
Carlos Fernandez Sanz
1fbb51056d Merge pull request #1992 from THE-Amrit-mahto-05/fix/teletext-panic
fix: Teletext decoder panic on malformed BCD data
2026-01-18 12:46:56 -08:00
Carlos Fernandez Sanz
5d9a8cc6f2 Merge pull request #2031 from THE-Amrit-mahto-05/fix/rust-userdata-uaf
Fix use-after-free bugs in Rust userdata handling
2026-01-18 12:24:10 -08:00
Amrit kumar Mahto
17abad79f2 fix: process_word_file propagates errors instead of panicking 2026-01-19 01:53:19 +05:30
Amrit kumar Mahto
707e1f01fe updating 2026-01-19 01:34:41 +05:30
Amrit kumar Mahto
efc8b791e7 fix mkvlang_params_check: prevent panic on multi-byte characters 2026-01-19 01:28:25 +05:30
Carlos Fernandez Sanz
a856bbde10 Merge pull request #2015 from Harsh-Sahu43/tests/validate-cc-pair
[FIX] rust: add defensive length check to validate_cc_pair
2026-01-18 11:52:49 -08:00
Carlos Fernandez Sanz
9390b876fa Merge pull request #2034 from THE-Amrit-mahto-05/fix/parser-atol-bug
Fix atol Parsing Bug in parser.rs for Numeric Values and Suffixes
2026-01-18 11:38:53 -08:00
Amrit kumar Mahto
ead0a4beed little fix 2026-01-19 00:45:30 +05:30
Amrit kumar Mahto
b2e9cb74c1 Fix atol parsing bug for numeric values and K/M/G suffixes 2026-01-19 00:31:25 +05:30
Amrit kumar Mahto
20b194aac4 Consolidate Rust userdata fixes: UAF, bounds checks, and VBI safety 2026-01-18 23:34:43 +05:30
Harsh Sahu
2d9b480972 Merge branch 'CCExtractor:master' into tests/validate-cc-pair 2026-01-18 14:48:46 +05:30
Harsh Sahu
1447b021cb Fixed : formatting 2026-01-18 13:58:31 +05:30
Amrit kumar Mahto
e0ac126cff Fix use-after-free bugs in Rust userdata handling 2026-01-18 05:37:44 +05:30
Carlos Fernandez Sanz
b8019bdb35 [FIX] Resolve output artifact on Linux/WSL (line clearing) 2026-01-17 06:02:59 -08:00
Carlos Fernandez Sanz
9d921dec43 fix(matroska): prevent out-of-bounds NAL parsing in AVC/HEVC blocks 2026-01-17 06:00:12 -08:00
Carlos Fernandez Sanz
3ada2b5002 fix(avc): prevent segfault in report-only mode (-out=report) 2026-01-17 05:58:03 -08:00
Rahul Tripathi
50ec9866db style: Fix clang-format ternary operator alignment 2026-01-17 14:12:59 +05:30
Rahul Tripathi
ce87d01fbd fix: Cap DVB subtitle duration to 10s to prevent 65s page timeout bug
Root cause: When FTS timestamps were invalid due to PTS discontinuities,
the code fell back to DVB page timeout (65 seconds) as subtitle duration.
This caused impossible 65-second subtitle durations in split output.

Fix: Added DVB_MAX_SUBTITLE_DURATION_MS constant (10s) and simplified the
duration capping logic to always enforce reasonable subtitle durations.

Tested with: multiprogram_spain.ts, BBC1.ts, BBC2.ts - all outputs now
have properly capped durations with no timestamps exceeding 10 seconds.
2026-01-17 12:14:12 +05:30
Carlos Fernandez
fecd24d08e fix(avc): prevent segfault in report-only mode (-out=report)
When using -out=report mode, the encoder context (enc_ctx) is NULL
because no output file needs to be created. The Rust FFI function
ccxr_process_avc was dereferencing this NULL pointer, causing a
segmentation fault.

Add NULL pointer checks at the FFI boundary to skip AVC processing
when enc_ctx is NULL. This is safe because report mode only needs
stream analysis, not caption extraction.

Fixes #2023

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-16 20:50:42 -08:00
Rahul Tripathi
482544c5bf docs: Add DVB deduplication feature and double-free fix to CHANGES.TXT 2026-01-16 16:41:46 +05:30
Rahul Tripathi
84a7a1fb41 style: Fix remaining clang-format indentation issues 2026-01-16 16:34:26 +05:30
Rahul Tripathi
f198bcd2ec style: Fix clang-format issues across modified files 2026-01-16 16:31:09 +05:30
Rahul Tripathi
4b6016ca1c style: Fix clang-format issues in dvb_dedup files 2026-01-16 16:26:28 +05:30
Rahul Tripathi
9c2ea47eda fix: Add dvb_dedup.c to Windows and Mac build systems 2026-01-16 16:24:52 +05:30
Rahul Tripathi
170b466a20 fix: Add dvb_dedup.c to autoconf build for GitHub Actions Linux CI 2026-01-16 16:23:43 +05:30
Rahul Tripathi
2bdcd20115 cleanup: Remove temporary debug, test, and tool artifacts from final branch
Remove 186 unwanted files including:
- Debug logs and diagnostic output (debug_*.log, debug_output/, diagnosis_output/)
- Test artifacts and binaries (linux/alltests_*, test_output/, test_split_verification/)
- Tool state files (.agent/, .claude/, .ralph/, .mcp.json, etc.)
- Root-level scripts and temporary Python utilities
- Working notes and temporary documentation (DVB_SPLIT_*.md, progress.json, etc.)
- Unfinished MCP server (tools/mcp-ccextractor/)
- Project-specific working notes (CLAUDE.md)

Update .gitignore to prevent re-adding unwanted artifacts.

Result: final branch now contains only DVB-split feature implementation
and core project files, matching upstream structure while preserving
all functional changes.
2026-01-16 16:18:02 +05:30
Rahul Tripathi
ab18d234d2 Merge branch 'CCExtractor:master' into final 2026-01-16 16:05:36 +05:30
Rahul Tripathi
3ff02617b0 fix: Resolve double-free crash in DVB split pipeline cleanup
- Remove redundant free() after free_subtitle() in pipeline cleanup
  (free_subtitle already frees the struct via freep(&sub))
- Add ctx->prev = NULL after free_encoder_context in dinit_encoder
- Keep free_encoder_context non-recursive for prev (dinit_encoder owns it)
- Remove debug output from general_loop.c
2026-01-16 16:02:59 +05:30
Rahul Tripathi
c7fad95e24 test: Fix DVB dedup test suite - DVB-005 and DVB-007 corrections
- DVB-005: Changed from Teletext-only file to proper DVB extraction using --program-number 530
- DVB-007: Fixed shell script globbing error and variable parsing for dedup effectiveness check
- All test cases now pass: DVB-004 (multilingual split), DVB-005 (single program), DVB-006 (non-DVB), DVB-007 (dedup check), DVB-008 (no-dedup flag)
- Verified: No 0-byte files, deduplication removes 19-29 duplicate lines per stream
2026-01-16 15:05:35 +05:30
Rahul Tripathi
c018f1f43c docs: Mark DVB-004 through DVB-008 as complete
- All deduplication infrastructure implemented and tested
- Test script validates code paths execute correctly
- Dedup ring buffer integrated into all DVB subtitle processing
- Full validation requires OCR build (-DWITH_OCR=ON)
- Code review confirms all 8 stories are complete
2026-01-16 14:15:44 +05:30
Rahul Tripathi
98b50b2a35 test: Add DVB dedup test suite script
- Created dvb_dedup_test.sh to test DVB-001 through DVB-008
- Tests multilingual split, single stream, non-DVB files
- Tests --no-dvb-dedup flag functionality
- Checks for excessive duplication in output
- Note: Requires OCR (Tesseract) for full validation
- Without OCR, files are empty but dedup logic still executes
2026-01-16 14:15:03 +05:30
Rahul Tripathi
46cee0893a feat: DVB-003 - Add --no-dvb-dedup CLI flag
- Added no_dvb_dedup field to ccx_s_options structure
- Initialized to 0 (deduplication enabled by default)
- Added --no-dvb-dedup CLI flag in Rust args parser
- Added flag to Options struct in lib_ccxr
- Wired flag through Rust-to-C FFI boundary in common.rs
- Modified dvbsub_handle_display_segment to respect flag
- Dedup logic only runs when no_dvb_dedup is false (default)
- Added help text describing flag purpose
2026-01-16 14:11:13 +05:30
Rahul Tripathi
42ad48ca7f feat: DVB-001 - Add per-stream dedup ring buffer
- Created dvb_dedup.h with dedup_entry and dedup_ring structures
- Implemented dvb_dedup.c with init, is_duplicate, and add functions
- Integrated dedup_ring into DVBSubContext structure
- Added deduplication check in dvbsub_handle_display_segment
- Dedup uses PTS + PID + composition_id + ancillary_id as unique key
- 8-slot ring buffer to track recently emitted subtitles
- Prevents duplicate subtitles from propagating to output files
2026-01-16 14:04:00 +05:30
Akhilesh
ed26a595bd style(matroska): apply clang-format 2026-01-14 13:42:22 +05:30
Akhilesh
b1c2aabb22 fix(matroska): prevent out-of-bounds NAL parsing in AVC/HEVC blocks 2026-01-14 13:20:23 +05:30
Rahul Tripathi
bb2ae1e70f Fix DVB subtitle repetition bug and memory safety issues 2026-01-13 20:29:44 +05:30
Rahul Tripathi
6464fa486e Fix DVB Split: Remove forced dirty flag, rely on natural dirty + clear 2026-01-13 18:16:41 +05:30
Rahul Tripathi
5aa747ab33 Fix DVB Split bugs: Prevent subtitle repetition and buffer overflow crash 2026-01-13 17:53:30 +05:30
Rahul Tripathi
39adfa59b0 Fix Bug 1: Clear OCR text leakage preventing subtitle repetition
- Clear enc_ctx->prev->last_str after encode_sub() in dvb_subtitle_decoder.c
- This prevents OCR-recognized text from leaking into subsequent subtitles
- Tested: All subtitle output shows unique text with zero duplicates
2026-01-12 11:00:27 +05:30
Carlos Fernandez Sanz
20287548cb fix: Correct progress time display for multi-program TS files 2026-01-11 20:56:59 +01:00
collectnis
b7b10419ec style: fix formatting alignment 2026-01-11 13:46:00 +00:00
collectnis
8fbfd68426 style: fix formatting alignment 2026-01-11 13:31:55 +00:00
collectnis
7159d0b6d0 fix: resolve merge conflict in changelog 2026-01-11 11:48:58 +00:00
collectnis
c515578e37 docs: update changelog 2026-01-11 11:30:54 +00:00
collectnis
e55b8eb764 [CLI] Fix output artifacts on Linux/WSL by clearing line on \r 2026-01-11 10:34:16 +00:00
Carlos Fernandez Sanz
0228fbcbfa fix: Skip moov box if buffer too small to verify mvhd 2026-01-11 10:30:32 +01:00
Carlos Fernandez Sanz
0e190e0962 docs: Add changelog for 0.96.6 2026-01-11 10:29:57 +01:00
Carlos Fernandez
13f1b5ab53 docs: Add changelog for 0.96.6
🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-11 10:28:56 +01:00
Carlos Fernandez Sanz
b39f923c46 docs: Clarify PS probe limit calculation (explain magic number) 2026-01-11 08:55:17 +01:00
Harsh Sahu
7e32d6a553 Merge branch 'CCExtractor:master' into tests/validate-cc-pair 2026-01-11 04:51:33 +05:30
Carlos Fernandez
3bde3dceec fix: Skip moov box if buffer too small to verify mvhd
The previous fix (#1996) prevented a panic when the buffer was too small
to verify if a "moov" box contains "mvhd", but it incorrectly accepted
the box without verification.

The original intent was: "moov without mvhd is invalid, skip it."

This fix maintains that intent:
- If buffer too small to verify mvhd → skip the box
- If moov has mvhd → accept (valid)
- If moov lacks mvhd → skip (invalid)

This is safe for format detection since:
1. The probe reads up to 1MB of start bytes
2. The scoring system requires multiple valid boxes
3. Skipping an unverifiable box is safer than accepting it

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-11 00:13:11 +01:00
Carlos Fernandez
d5201b1129 docs: Clarify PS probe limit calculation with inline comment
Replace magic number 49997 with `50000 - 3` and add a comment explaining:
- Why we subtract 3 (the loop accesses i+3, so we stop 3 bytes early)
- Why we cap at 50000 (don't scan huge buffers entirely)
- Why we use saturating_sub (handle tiny buffers safely)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-11 00:07:35 +01:00
Carlos Fernandez Sanz
a199f4f8af Merge pull request #1996 from THE-Amrit-mahto-05/fix/demuxer-panics
fix prevent MP4 & PS demuxer panics due to out-of-bounds/underflow
2026-01-11 00:06:35 +01:00
Harsh Sahu
eea049923d add defensive length check to validate_cc_pair 2026-01-11 04:21:00 +05:30
Carlos Fernandez Sanz
d999c3e0e0 Merge pull request #1985 from x15sr71/docs/homebrew-install
docs: Add Homebrew installation instructions to COMPILATION.MD
2026-01-10 23:43:42 +01:00
Carlos Fernandez
aac90d5a5f fix(rust): Remove dead code returning pointer to stack variable
Delete the unused `impl FromCType<*mut PMT_entry> for *mut PMTEntry`
implementation which had a critical bug: it returned a pointer to a
stack-allocated PMTEntry, causing undefined behavior (dangling pointer).

This code was never called anywhere in the codebase. The actual usage
in demuxer.rs uses the value-returning variant `FromCType<PMT_entry>
for PMTEntry` with explicit `Box::into_raw(Box::new(...))` wrapping,
which is the correct pattern.

Rather than fixing dead buggy code, just remove it.

Supersedes #1988

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-10 23:41:32 +01:00
Carlos Fernandez Sanz
618df184c6 Merge pull request #2011 from THE-Amrit-mahto-05/fix/demuxer-allocator-mismatch
Fix allocator mismatch in Rust demuxer (use malloc/free instead of Box)
2026-01-10 23:21:16 +01:00
Chandragupt
5e6aab8972 fix(snap): drop snap-injected command argument in runtime wrapper 2026-01-11 01:10:29 +05:30
Amrit kumar Mahto
a77c21c06c fix: allocator mismatch in demuxer (use malloc/free instead of Box) 2026-01-11 00:49:17 +05:30
Carlos Fernandez Sanz
4252703431 fix(matroska): Prevent infinite loop on truncated MKV files 2026-01-10 13:16:12 +01:00
Carlos Fernandez Sanz
1af2a29a3c fix: Prevent NULL pointer dereference in DVB subtitle decoder 2026-01-10 11:18:56 +01:00
Carlos Fernandez Sanz
8ab474c593 fix: Remove debug println that printed spurious numbers during processing 2026-01-10 11:18:20 +01:00
Carlos Fernandez
1c781c2a38 fix: Correct progress time display for multi-program TS files
Multi-program transport stream files can have different PCR (Program
Clock Reference) bases for each program. For example, one program might
have timestamps starting at 23 hours, another at 25 hours. This caused
the progress time display to show wildly incorrect values like "265:45"
for a 6-second file.

The fix tracks the minimum timestamp offset seen across all programs and
uses that as the baseline. When timestamps from programs with higher PCR
bases are encountered (offset > 60 seconds from minimum), the display
falls back to showing time relative to the minimum baseline.

Changes:
- Add min_global_timestamp_offset field to lib_ccx_ctx to track the
  minimum PCR-based offset seen
- Update progress display logic in general_loop.c to normalize times
  relative to the minimum offset
- Apply same fix to both live stream and file processing modes

Test results with multi-program DVB teletext sample (dvbt.ts):
- Before: 1% | 265:45, 2% | 00:00, 3% | 263:11, ... (jumping wildly)
- After:  1% | 00:00, 2% | 00:00, ... 87% | 00:05, 100% | 00:00 (stable)

Single-program files continue to work correctly.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-10 10:57:57 +01:00
Carlos Fernandez
4d718378d5 fix: Remove debug println that printed spurious numbers during processing
Removes a debug println statement in the Rust timestamp conversion code
that was printing the hours value when it exceeded 24. This caused
spurious numbers (like "25") to appear in the output when processing
files with PTS timestamps that exceeded 24 hours.

The debug code was likely left over from development/debugging and
should not be present in production code.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-10 10:50:33 +01:00
Carlos Fernandez
1bd4cd5c0a fix: Prevent NULL pointer dereference in DVB subtitle decoder
Add NULL check for `region` before accessing `region->bgcolor` in
the OCR processing block of `write_dvb_sub()`.

The bug occurs when processing DVB subtitles where `get_region()`
returns NULL for all display items in the list. After the display
processing loop, `region` may be NULL, but the code attempted to
access `region->bgcolor` unconditionally, causing a segfault.

The crash manifested as:
- Valgrind: "Invalid read of size 4 at address 0x18"
- The 0x18 offset corresponds to the `bgcolor` field in DVBSubRegion

Testing with bbc_small.ts:
- Before: SIGSEGV crash at 0% processing
- After: 100% processing, 50+ subtitles extracted successfully

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-10 10:20:50 +01:00
Carlos Fernandez
067045ce92 fix(matroska): Prevent infinite loop on truncated MKV files
When parsing truncated MKV files, the Matroska parser would enter an
infinite loop. This happened because:

1. At EOF, fgetc() returns -1 which becomes 0xFF when cast to UBYTE
2. Reading 4 EOF bytes creates element code 0xFFFFFFFF (unknown element)
3. The "skip unknown element" logic reads another 0xFF as vint length (127)
4. FSEEK past EOF clears the EOF flag without error
5. The while loop condition (pos + len > get_current_byte) never becomes
   false because the recorded segment length is larger than the file

The fix adds feof() checks after each mkv_read_byte() call in all
parsing loops. This detects EOF immediately after reading and breaks
out of the loop cleanly.

Tested with truncated MKV samples (ticket1398-orig.mkv, azumi.mkv)
that previously caused timeouts - now complete in under a second.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-10 09:50:21 +01:00
Carlos Fernandez Sanz
2f2904041c prevent unsafe Vec::set_len causing heap corruption 2026-01-09 23:45:34 +01:00
Carlos Fernandez Sanz
d837c369e5 fix prevent FFI memory leaks in demuxer sync 2026-01-09 23:44:52 +01:00
Carlos Fernandez Sanz
686ff69fdc Docs: clarify Linux autotools build and Rust dependency 2026-01-09 23:43:10 +01:00
Carlos Fernandez Sanz
126835d998 Merge pull request #1850 from gaurav02081/gaurav-v1
[FIX] -out=spupng with EIA608/teletext: offset values in XML may be not correct #893
2026-01-09 23:25:58 +01:00
Akhilesh
6e170cd812 Docs: clarify Linux autotools build and Rust dependency 2026-01-09 21:02:18 +05:30
Rahul Tripathi
fe921626e1 Fix: Off-by-one bounds check and encoding corruption
- telxcc.c: Use array_length macro for G0_LATIN_NATIONAL_SUBSETS
  bounds check instead of hardcoded value. Prevents potential
  access to uninitialized memory when index equals array size.
- misc.h: Fix UTF-8 encoding of author name (Iñaki García Etxebarria)
2026-01-09 16:02:10 +05:30
Amrit kumar Mahto
6578f0ff34 fix(avc): prevent unsafe Vec::set_len causing heap corruption 2026-01-09 05:15:57 +05:30
Amrit kumar Mahto
1911068e92 fix(rust): prevent FFI memory leaks in demuxer sync 2026-01-08 14:46:56 +05:30
Chandragupt
493495361d ci(snap): use stable GitHub Actions v6 and make runtime library resolution robust 2026-01-08 09:24:25 +05:30
Chandragupt
643857e98f docs: add changelog entry for Snap packaging 2026-01-08 06:09:33 +05:30
Chandragupt
05adb5f47e snap: add website and source-code metadata 2026-01-08 06:08:29 +05:30
Chandragupt
504877b928 ci(snap): remove temporary push trigger 2026-01-08 06:08:29 +05:30
Chandragupt
64ee63a560 ci(snap): enable push trigger for snap workflow (temporary) 2026-01-08 06:08:00 +05:30
Chandragupt
270c603bd2 ci(snap): add GitHub Actions workflow for Snapcraft-based builds 2026-01-08 06:06:13 +05:30
dependabot[bot]
6d356b4458 chore(deps): bump dawidd6/action-homebrew-bump-formula from 4 to 7 (#1989)
Bumps [dawidd6/action-homebrew-bump-formula](https://github.com/dawidd6/action-homebrew-bump-formula) from 4 to 7.
- [Release notes](https://github.com/dawidd6/action-homebrew-bump-formula/releases)
- [Commits](https://github.com/dawidd6/action-homebrew-bump-formula/compare/v4...v7)

---
updated-dependencies:
- dependency-name: dawidd6/action-homebrew-bump-formula
  dependency-version: '7'
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-01-08 01:24:47 +01:00
Carlos Fernandez Sanz
cfb10d4b91 fix: Delete empty output files instead of leaving 0-byte files (#1282) (#1877)
When using --output-field both (formerly -12), CCExtractor creates
separate output files for each field. If one field has no captions,
a 0-byte file was left behind, which is confusing for users.

This fix checks the file size in dinit_write() before closing.
If the file is empty (0 bytes), it deletes the file and prints
an informational message.

This is a simpler approach than deferred file creation - files are
still created at initialization but cleaned up if they remain empty.

Fixes #1282

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-08 01:23:28 +01:00
Amrit kumar Mahto
ca2b708023 fix: prevent MP4 & PS demuxer panics due to out-of-bounds/underflow (#1995) 2026-01-08 02:36:30 +05:30
Amrit kumar Mahto
10ac5ca6ce add safety checks and comments in Teletext decoder 2026-01-08 01:42:09 +05:30
Amrit kumar Mahto
333cfb3726 fix: Teletext decoder panic on malformed BCD data (#1990) 2026-01-08 01:26:17 +05:30
GAURAV KARMAKAR
c609f66c02 Removed Build Artifact 2026-01-08 01:03:54 +05:30
Gaurav karmakar
91f254017b Merge branch 'master' into gaurav-v1 2026-01-08 00:47:22 +05:30
GAURAV KARMAKAR
1f5d3df0ae Merge branch 'master' of https://github.com/gaurav02081/ccextractor into gaurav-v1 2026-01-08 00:35:33 +05:30
Rahul Tripathi
e36d81c237 Git Cleanup: Update .gitignore and untrack build artifacts 2026-01-07 21:38:36 +05:30
Rahul Tripathi
8d338dc362 Fix DVB subtitle repeating bug: initialize nb_data 2026-01-07 21:37:23 +05:30
Rahul Tripathi
c78e01d186 Merge branch 'CCExtractor:master' into final 2026-01-06 12:31:17 +05:30
Chandragupt Singh
401ff6c105 docs: note Homebrew availability in changelog 2026-01-06 06:04:57 +05:30
Chandragupt Singh
83eb51ed6f docs: add Homebrew installation instructions 2026-01-06 06:01:56 +05:30
Carlos Fernandez
bce0c92fdd ci: Add Homebrew formula auto-bump workflow
Automatically creates a PR to homebrew-core when a new release
is published, updating the ccextractor formula to the new version.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-06 00:08:40 +01:00
Rahul Tripathi
ea4859fd54 Fix: Add split_dvb_subs to Options default 2026-01-05 21:39:54 +05:30
Rahul Tripathi
8d7890c743 Merge branch 'master' into final 2026-01-05 21:10:51 +05:30
Carlos Fernandez Sanz
477307e438 chore: Bump version to 0.96.5 2026-01-05 16:02:39 +01:00
Carlos Fernandez
4a4911bcec chore: Bump version to 0.96.5
Update version number across all packaging and build files for the
0.96.5 release.

Files updated:
- docs/CHANGES.TXT - Added changelog entry
- src/lib_ccx/lib_ccx.h - VERSION define
- linux/configure.ac - AC_INIT version
- mac/configure.ac - AC_INIT version
- OpenBSD/Makefile - V variable
- package_creators/PKGBUILD - pkgver
- package_creators/ccextractor.spec - Version
- package_creators/debian.sh - VERSION
- packaging/chocolatey/ccextractor.nuspec - version
- packaging/chocolatey/tools/chocolateyInstall.ps1 - URL
- packaging/winget/*.yaml - PackageVersion and URLs

Note: SHA256 checksums in chocolatey and winget files will need to be
updated after the MSI is built.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-05 12:44:06 +01:00
Carlos Fernandez Sanz
dc946168e7 Fix OOB read/write and length handling in CEA-608/708 decoders 2026-01-05 12:36:31 +01:00
Carlos Fernandez Sanz
3a60b1268b Merge pull request #1981 from CCExtractor/fix/epg-snprintf-buffer-warning
fix(epg): Silence snprintf buffer truncation warnings
2026-01-05 12:33:15 +01:00
Carlos Fernandez
e3d1c56ad0 fix(epg): Silence snprintf buffer truncation warnings
Extend EPG time string buffers from 21 to 74 bytes to silence
compiler warnings about potential buffer truncation.

The actual output is always 20 chars ("YYYYMMDDHHMMSS +0000") plus
null terminator, but the compiler warns because %02d with int
arguments could theoretically produce larger output.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-05 11:29:48 +01:00
Carlos Fernandez Sanz
b5bc0e2616 Fix OOB read/write in Teletext G0 charset remapping 2026-01-05 11:28:11 +01:00
Carlos Fernandez Sanz
600a9a0e75 Add support for raw CDP (Caption Distribution Packet) files 2026-01-05 10:55:59 +01:00
Amrit kumar Mahto
694b61f862 Fix OOB read/write in Teletext G0 charset remapping 2026-01-04 23:47:08 +05:30
Carlos Fernandez
86925727e0 Merge remote-tracking branch 'origin/master' into feat/issue-1406-raw-cdp-support 2026-01-04 17:20:04 +01:00
Carlos Fernandez Sanz
1c7515681e Fix MXF files containing CEA-708 captions not being detected/extracted 2026-01-04 17:17:33 +01:00
Carlos Fernandez Sanz
2bcac83761 Docs: Add Windows WSL build instructions 2026-01-04 14:33:34 +01:00
Carlos Fernandez
efc28d87d5 Trigger CI 2026-01-04 14:08:41 +01:00
Carlos Fernandez
b4d8e0ffaf Trigger CI 2026-01-04 14:08:26 +01:00
Carlos Fernandez
0b7b7fd031 Trigger CI 2026-01-04 12:56:44 +01:00
Carlos Fernandez
90041554a3 Fix Rust formatting and clippy issues
- Apply cargo fmt to decoder/mod.rs
- Fix clippy manual_flatten warning in build.rs by using .flatten()

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-04 12:55:35 +01:00
Carlos Fernandez
6950a7661e Fix MXF files containing CEA-708 captions not being detected/extracted
Root cause: CCX_RAW_TYPE data from MXF demuxer was not being passed to
the DTVCC decoder, only to the legacy 608 decoder via process_raw_with_field.

Changes:
- general_loop.c: Changed CCX_RAW_TYPE handling to use process_cc_data
  instead of process_raw_with_field to properly invoke DTVCC decoder
- general_loop.c: Added DTVCC activation for MXF/GXF sources since they
  may contain 708 captions
- general_loop.c: Initialize timing from caption PTS when not set
- ccx_dtvcc.h: Added ccxr_dtvcc_set_active FFI declaration
- lib.rs: Added ccxr_dtvcc_set_active function to enable DTVCC decoder
- decoder/mod.rs: Fixed flush logic to always process visible windows
- ccx_demuxer_mxf.c: Fixed PTS calculation to use 90kHz units based on
  edit_rate, and changed verbose logging to debug()

Fixes #1647

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-04 11:17:54 +01:00
Carlos Fernandez
41fb966f6f Add support for raw CDP (Caption Distribution Packet) files
Adds support for processing raw CDP files captured from SDI VANC
(e.g., from Blackmagic Decklink capture cards). CDP packets are
automatically detected by their 0x9669 identifier when using -in=raw.

Changes:
- Added process_raw_cdp() function to parse concatenated CDP packets
- Added CDP format detection in raw_loop() (checks for 0x9669 header)
- Extracts cc_data triplets from CDP packets and processes them
  through process_cc_data() for both CEA-608 and CEA-708 support
- Calculates timing based on CDP frame rate and packet count

Usage:
  ccextractor -in=raw captured_vanc.bin -o output.srt

Fixes #1406

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-04 09:54:37 +01:00
Carlos Fernandez
04ed95f8b5 Fix MXF files containing CEA-708 captions not being detected/extracted
Root cause: CCX_RAW_TYPE data from MXF demuxer was not being passed to
the DTVCC decoder, only to the legacy 608 decoder via process_raw_with_field.

Changes:
- general_loop.c: Changed CCX_RAW_TYPE handling to use process_cc_data
  instead of process_raw_with_field to properly invoke DTVCC decoder
- general_loop.c: Added DTVCC activation for MXF/GXF sources since they
  may contain 708 captions
- general_loop.c: Initialize timing from caption PTS when not set
- ccx_dtvcc.h: Added ccxr_dtvcc_set_active FFI declaration
- lib.rs: Added ccxr_dtvcc_set_active function to enable DTVCC decoder
- decoder/mod.rs: Fixed flush logic to always process visible windows
- ccx_demuxer_mxf.c: Fixed PTS calculation to use 90kHz units based on
  edit_rate, and changed verbose logging to debug()

Fixes #1647

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-04 09:54:20 +01:00
Carlos Fernandez
ddf29672fd Fix MXF files containing CEA-708 captions not being detected/extracted
Root cause: CCX_RAW_TYPE data from MXF demuxer was not being passed to
the DTVCC decoder, only to the legacy 608 decoder via process_raw_with_field.

Changes:
- general_loop.c: Changed CCX_RAW_TYPE handling to use process_cc_data
  instead of process_raw_with_field to properly invoke DTVCC decoder
- general_loop.c: Added DTVCC activation for MXF/GXF sources since they
  may contain 708 captions
- general_loop.c: Initialize timing from caption PTS when not set
- ccx_dtvcc.h: Added ccxr_dtvcc_set_active FFI declaration
- lib.rs: Added ccxr_dtvcc_set_active function to enable DTVCC decoder
- decoder/mod.rs: Fixed flush logic to always process visible windows
- ccx_demuxer_mxf.c: Fixed PTS calculation to use 90kHz units based on
  edit_rate, and changed verbose logging to debug()

Fixes #1647

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-04 09:53:30 +01:00
Kurma Ritish
0890e06d84 docs: add Windows WSL build instructions 2026-01-04 08:47:48 +00:00
Carlos Fernandez Sanz
8c33412888 Merge pull request #1971 from ujjwalr27/scc-accurate-timing
Tested with Broadcast Source sample from issue #1120. Pre-roll timing calculation works correctly, output structure matches broadcast reference patterns.
2026-01-03 21:50:43 +01:00
ujjwalr27
f40294cc5c minor fix 2026-01-03 23:38:16 +05:30
ujjwalr27
22d5d35158 Fix SCC accurate timing: separate load/display timestamps, skip clear commands, pass YouTube validation 2026-01-03 22:38:16 +05:30
Amrit kumar Mahto
51cae1c2f0 Fix OOB read/write and length handling in CEA-608/708 decoders 2026-01-03 17:42:38 +05:30
Carlos Fernandez Sanz
dfaebd5db8 Merge pull request #1968 from THE-Amrit-mahto-05/fix/dtvcc-critical-bugs
fix DTVCC: Heap Buffer Overflow & Out-of-Bounds Read
2026-01-03 11:54:19 +01:00
Carlos Fernandez Sanz
cfa7d912ca fix(rust): Flush stdout after print to fix stream mode display 2026-01-03 11:38:25 +01:00
Carlos Fernandez
ad971f0e72 fix(rust): Flush stdout after print to fix stream mode display
When using --input <format>, the startup output showed [Stream mode: ]
(empty) instead of showing the format name like [Stream mode: SCC].

Root cause: The Rust logger's print() function uses print!() which
doesn't automatically flush stdout. When mixing C and Rust code that
both write to stdout, the Rust output was getting buffered and not
appearing before the C code continued writing.

The fix adds explicit std::io::stdout().flush() after each print!()
call to ensure output appears immediately and interleaves correctly
with C code.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-02 23:22:46 +01:00
Carlos Fernandez Sanz
8aadbfb5f2 feat: Add --input scc option for SCC input format 2026-01-02 23:09:56 +01:00
Amrit kumar Mahto
44eb665cd8 chore: apply clang-format fixes 2026-01-03 03:12:19 +05:30
Amrit kumar Mahto
1255b318ae [FIX] Remove dead safety checks per reviewer feedback 2026-01-03 03:06:23 +05:30
Carlos Fernandez
1b0e66bc67 feat: Add --input scc option for SCC input format
Add support for `--input scc` command line option to explicitly specify
SCC (Scenarist Closed Caption) input format, for consistency with other
input format options.

Changes:
- Add `Scc` variant to `InFormat` enum in args.rs
- Handle `InFormat::Scc` in parser.rs to set StreamMode::Scc
- Add `StreamMode::Scc` case in print_cfg() in both Rust and C code

Fixes #1972

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-02 21:45:08 +01:00
Carlos Fernandez Sanz
f5dc1cf467 fix: Make --quiet flag work again 2026-01-02 21:35:42 +01:00
ujjwalr27
aaf937a135 Fix rustfmt style issues in lib_ccxr 2026-01-03 01:05:59 +05:30
ujjwalr27
317c66f14e Fix clang-format style issues 2026-01-03 01:02:19 +05:30
ujjwalr27
946c5859d4 Add --scc-accurate-timing option for bandwidth-aware SCC output (fixes #1120) 2026-01-03 00:28:16 +05:30
ujjwalr27
7166e48698 Add --scc-accurate-timing option for bandwidth-aware SCC output (fixes #1120) 2026-01-03 00:27:17 +05:30
Carlos Fernandez
d31ea87c03 fix: Make --quiet flag work again
The --quiet flag was broken due to two issues:

1. Inverted mapping in Rust FFI: The C→Rust constant mapping was wrong.
   CCX_MESSAGES_QUIET=0, CCX_MESSAGES_STDOUT=1, CCX_MESSAGES_STDERR=2
   but the Rust code mapped 0→Stdout, 1→Stderr, 2→Quiet.

2. Logger initialization timing: The Rust logger was initialized BEFORE
   command-line arguments were parsed, so --quiet had no effect.

Changes:
- Fix the OutputTarget mapping in ccxr_init_basic_logger()
- Add set_target() method to CCExtractorLogger
- Add ccxr_update_logger_target() to update logger after arg parsing
- Call ccxr_update_logger_target() after ccxr_parse_parameters()

Fixes #1956

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-02 19:49:06 +01:00
Amrit kumar Mahto
028ce9d0b5 [FIX] DTVCC: Heap Overflow & OOB Read 2026-01-02 18:33:26 +05:30
Amrit kumar Mahto
cc7a43b5e2 [FIX] Teletext decoder: fix OOB read/write and loop overflow (#1965) 2026-01-02 18:09:15 +05:30
Amrit kumar Mahto
3e1424cda8 Fix TS/ES: Integer overflow, stack overflow, heap over-read 2026-01-02 17:52:25 +05:30
Amrit kumar Mahto
82109e6cd9 Fix DTVCC structural type confusion and OOB writes (#1961) 2026-01-02 17:27:15 +05:30
Amrit kumar Mahto
5dc8292dd2 Fix out-of-bounds read in H.264 SEI parsing 2026-01-02 16:58:09 +05:30
Carlos Fernandez Sanz
a5b8bc8bf6 fix(rust): Update palette crate to 0.7 for Fedora compatibility 2026-01-02 10:00:00 +01:00
Rahul Tripathi
29158b2c38 Merge branch 'master' into final 2026-01-02 14:18:45 +05:30
Carlos Fernandez
ad2ee70743 fix(rust): Update palette crate to 0.7 for Fedora compatibility
The palette crate renamed `to_positive_degrees()` to `into_positive_degrees()`
in version 0.7.0. This was causing build failures on Fedora which uses
system-packaged Rust crates with newer versions.

Changes:
- Update palette dependency from 0.6.1 to 0.7
- Change method call from to_positive_degrees() to into_positive_degrees()

Fixes build failure reported in #1954.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-02 08:11:47 +01:00
Carlos Fernandez Sanz
562de8893b Merge pull request #1953 from THE-Amrit-mahto-05/fix/ts-heap-overflow
Fix/ts heap overflow
2026-01-02 08:09:39 +01:00
Carlos Fernandez Sanz
12adb5e92b fix(ci): Fix Windows CI cargo build cache path 2026-01-02 08:06:22 +01:00
Carlos Fernandez Sanz
203eb23030 fix(build): Support FFMPEG_INCLUDE_DIR on Linux for hardsubx 2026-01-02 08:02:46 +01:00
Amrit Kumar Mahto
774c3a0d3a Update CHANGES.TXT 2026-01-02 04:31:39 +05:30
Amrit Kumar Mahto
07f1ddc3fe Fix capbufsize and capbuflen assignments to use size_t 2026-01-02 04:26:23 +05:30
Carlos Fernandez
303bec8d5d fix(build): Support FFMPEG_INCLUDE_DIR on Linux for hardsubx
The FFMPEG_INCLUDE_DIR environment variable was only checked inside
the macOS-specific block, so it had no effect on Linux builds.

Changes:
- Move FFMPEG_INCLUDE_DIR check outside platform-specific blocks so
  it works on all platforms
- Add pkg-config fallback on Linux to automatically find FFmpeg
  include paths

This fixes compilation on systems like Fedora where FFmpeg headers
are installed in non-standard locations (e.g., /usr/include/ffmpeg).

Fixes #1954

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-01 23:24:44 +01:00
Amrit kumar Mahto
e43a6b5ced Fix TS Heap Buffer Overflow in copy_payload_to_capbuf (ts_functions.c) 2026-01-02 00:59:31 +05:30
Amrit kumar Mahto
64484af49e [FIX] Prevent stack buffer overflow in ISDB-CC decoder parse_csi 2026-01-02 00:40:07 +05:30
Amrit kumar Mahto
7526da884c Prevent integer overflow in EIA-608 screen buffer reallocation 2026-01-01 23:20:25 +05:30
Carlos Fernandez Sanz
3529bb29b4 fix(avc): Remove unnecessary TODO for idr_pic_id 2026-01-01 13:02:25 +01:00
Carlos Fernandez
925560f773 fix(avc): Remove unnecessary TODO for idr_pic_id
The idr_pic_id is read to advance the bitstream position (required for
correct parsing of subsequent fields), but the value itself is not
needed for caption extraction. CCExtractor uses pic_order_cnt_lsb for
frame ordering and PTS for timing - idr_pic_id serves no purpose here.

Closes #1895

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-01 12:58:55 +01:00
Carlos Fernandez
200eb1750a fix(ci): Fix Windows CI cargo build cache path
- Fix cargo build cache path: rust.bat sets CARGO_TARGET_DIR to the
  windows/ directory, which results in artifacts at
  windows/x86_64-pc-windows-msvc/, not windows/target/
- Remove redundant CARGO_TARGET_DIR from build steps since rust.bat
  overrides it anyway

Note: vcpkg.json builtin-baseline intentionally not changed to avoid
breaking transitive dependencies (libxml2 etc.)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-01 12:44:18 +01:00
Carlos Fernandez Sanz
6dcdb4b2d8 chore: Bump version to 0.96.4 2026-01-01 10:52:36 +01:00
Carlos Fernandez Sanz
a2d2c4f063 Merge branch 'master' into release/0.96.4 2026-01-01 10:39:12 +01:00
Carlos Fernandez
4ab6c83c27 chore: Bump version to 0.96.4
Update version numbers across all packaging and build files for the
0.96.4 release.

Changes in 0.96.4:
- New: Persistent CEA-708 decoder context
- New: OCR character blacklist options
- New: OCR line-split option
- Fix: 32-bit build failures (i686, armv7l)
- Fix: Legacy argument compatibility (-1, -2, -12, --sc, --svc)
- Fix: Prevent heap buffer overflow in Teletext (security)
- Fix: Lazy OCR initialization

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-01 10:17:56 +01:00
Carlos Fernandez Sanz
e66a0183c3 Merge pull request #1941 from Harshdhall01/cleanup-rust-todos
[RUST] Document EIA-708 buffer size and remove debug logging
2026-01-01 09:59:22 +01:00
Carlos Fernandez Sanz
a8ec28630a Merge pull request #1934 from THE-Amrit-mahto-05/fix/teletext-overflow
prevent heap buffer overflow in Teletext demux path
2026-01-01 09:53:01 +01:00
Carlos Fernandez Sanz
432d4237ec ci(windows): Optimize Windows build workflow for faster CI 2026-01-01 09:42:19 +01:00
Carlos Fernandez
e9519c4a67 fix(ci): Remove broken Chocolatey caching for GPAC
The Chocolatey cache only stored package metadata, not the actual
installed SDK files at C:\Program Files\GPAC\sdk\include. This caused
build failures when the cache hit but GPAC headers weren't available.

GPAC install is fast (~30s) so caching isn't worth the complexity.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-01 09:31:11 +01:00
Carlos Fernandez Sanz
fef005ddaf perf(dvb): Lazy OCR initialization for DVB subtitle decoder 2026-01-01 02:48:22 +01:00
Carlos Fernandez
546c776e57 ci(windows): Optimize Windows build workflow for faster CI
Major optimizations to reduce Windows build time from ~45 min to ~10 min:

1. **Single consolidated job** - Previously two parallel jobs (Release/Debug)
   duplicated the entire 34-minute vcpkg install. Now builds both
   configurations sequentially in one job, sharing all cached dependencies.

2. **lukka/run-vcpkg action** - Replaces manual git clone + bootstrap with
   the official vcpkg action that has built-in caching and better handling.

3. **Cache vcpkg installed packages** - Separately cache the installed/
   directory with hash-based keys for faster cache hits.

4. **Cargo caching** - Add caching for Rust registry and build artifacts,
   similar to the Linux build workflow.

5. **Chocolatey caching** - Cache gpac package to skip download on hits.

6. **Conditional installs** - Skip vcpkg install and choco install when
   cache is available.

7. **Updated Rust toolchain action** - Replace deprecated actions-rs/toolchain
   with dtolnay/rust-toolchain.

Expected improvements:
- Cold build: ~20 minutes (down from ~45 min)
- Warm build (cache hit): ~5-10 minutes

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-01 02:03:35 +01:00
Carlos Fernandez Sanz
daeed5df71 fix(args): Add legacy aliases for backwards compatibility 2026-01-01 01:49:59 +01:00
Carlos Fernandez
b56ab005a8 perf(dvb): Lazy OCR initialization for DVB subtitle decoder
Previously, Tesseract OCR was initialized eagerly when a DVB subtitle
stream was detected in the transport stream. This caused ~10 second
startup overhead even for files that:
- Have DVB streams but no actual bitmap subtitles
- Have DVB streams alongside CEA-608 text captions (which don't need OCR)
- Have DVB streams but the user only wants raw bitmap output

The initialization also created OpenMP worker threads that generated
hundreds of thousands of futex syscalls, causing valgrind tests to
take 15+ minutes instead of seconds.

This change defers OCR initialization until a DVB bitmap region actually
needs to be processed with OCR. Benefits:

- Files with DVB streams but no bitmap content: 10s → 0.1s
- Files with DVB + CEA-608 captions: 10s → 1-3s
- Valgrind test performance: 15+ min → seconds (no thread pool overhead
  when OCR isn't used)

The ocr_initialized flag ensures init_ocr() is called only once, on
first bitmap encounter.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-01 01:26:27 +01:00
Carlos Fernandez
f1681ee929 fix(args): Add support for legacy -1, -2, -12 numeric options
Map legacy CEA-608 field extraction options to their modern equivalent:
- -1  → --output-field=1 (extract field 1 only)
- -2  → --output-field=2 (extract field 2 only)
- -12 → --output-field=12 (extract both fields)

These options are documented in the help text and were commonly used
but stopped working after the Rust argument parser migration.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-01 01:02:54 +01:00
Carlos Fernandez
031f463b5c fix(args): Add legacy aliases for backwards compatibility
Add aliases for options that were commonly used with single-dash
or without hyphens in older versions of ccextractor:

- --parsePAT: add alias "pat" (for -pat)
- --parsePMT: add alias "pmt" (for -pmt)
- --no-teletext: add alias "noteletext" (for -noteletext)
- --no-rollup: add alias "noru" (for -noru)
- --no-bom: add alias "nobom" (for -nobom)
- --no-autotimeref: add alias "noautotimeref" (for -noautotimeref)
- --no-scte20: add alias "noscte20" (for -noscte20)

These aliases, combined with normalize_legacy_option() which converts
single-dash to double-dash (e.g., -noteletext -> --noteletext), allow
old scripts using legacy syntax to continue working.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-01 00:42:23 +01:00
Carlos Fernandez Sanz
b23866f5a8 feat(rust): Add persistent DtvccRust context for CEA-708 decoder 2026-01-01 00:21:40 +01:00
Carlos Fernandez
2ec93c3d3d fix(rust): Check dtvcc_rust instead of dtvcc in ccxr_process_cc_data
When Rust CEA-708 decoder is enabled, dec_ctx.dtvcc is set to NULL
and dec_ctx.dtvcc_rust holds the actual DtvccRust context. The null
check was incorrectly checking dtvcc, causing the function to return
early and skip all CEA-708 data processing.

This fixes tests 21, 31, 32, 105, 137, 141-149 which were failing
with exit code 10 (EXIT_NO_CAPTIONS) because no captions were being
extracted from CEA-708 streams.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-31 19:47:24 +01:00
Harshdhall01
5564aa8a54 Merge upstream/master and resolve CHANGES.TXT conflict 2025-12-31 23:51:24 +05:30
Harshdhall01
868fac5423 Update CHANGES.TXT with Rust documentation improvements 2025-12-31 23:33:49 +05:30
Harshdhall01
9ca26171d6 Document EIA-708 buffer size and remove debug logging
- Added documentation for EIA_708_BUFFER_LENGTH explaining that 2048 bytes
  is 16x the CEA-708 specification minimum of 128 bytes per service
- Removed debug logging of target address from target.rs as per TODO
- References CEA-708-E Section 8.4.3 for buffer specifications

Addresses two TODO items in the Rust codebase cleanup effort.
2025-12-31 23:24:39 +05:30
Carlos
ead4cbb278 fix(rust): remove double-increment of cb_708 counter
The cb_708 counter was being incremented twice for each CEA-708 data block:
1. In do_cb_dtvcc_rust() in Rust (src/rust/src/lib.rs)
2. In do_cb() in C (src/lib_ccx/ccx_decoders_common.c)

Since FTS calculation uses cb_708 (fts = fts_now + fts_global + cb_708 * 1001 / 30),
the double-increment caused timestamps to advance ~2x as fast as expected,
resulting in incorrect milliseconds in start timestamps.

This fix removes the increment from the Rust code since the C code already
handles it in do_cb().

Fixes timestamp issues reported in PR #1782 tests where start times like
00:00:20,688 were incorrectly output as 00:00:20,737.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-31 14:18:13 +01:00
Carlos
dfd7101f54 chore: Remove plan file from repo and add plans/ to .gitignore
- Move PLAN_PR1618_REIMPLEMENTATION.md to local plans/ folder
- Add plans/ to .gitignore to keep plans local

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-31 14:18:13 +01:00
Carlos
9659d3cf4c fix(rust): Use persistent DtvccRust context in ccxr_process_cc_data
The ccxr_process_cc_data function was still accessing dec_ctx.dtvcc
(which is NULL when Rust is enabled), causing a null pointer panic.

Changed to use dec_ctx.dtvcc_rust (the persistent DtvccRust context)
instead, which fixes the crash when processing CEA-708 data.

Added do_cb_dtvcc_rust() function that works with DtvccRust instead
of the old Dtvcc struct.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-31 14:18:13 +01:00
Carlos
34c7cd6d2e style(c): Fix clang-format issues in Phase 3 code
- Remove extra space before comment in ccx_decoders_common.c
- Fix comment indentation in mp4.c

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-31 14:16:31 +01:00
Carlos
7448a260c7 feat(c): Use Rust CEA-708 decoder in C code (Phase 3)
- init_cc_decode(): Initialize dtvcc_rust via ccxr_dtvcc_init()
- dinit_cc_decode(): Free dtvcc_rust via ccxr_dtvcc_free()
- flush_cc_decode(): Flush via ccxr_flush_active_decoders()
- general_loop.c: Set encoder via ccxr_dtvcc_set_encoder() (3 locations)
- mp4.c: Use ccxr_dtvcc_set_encoder() and ccxr_dtvcc_process_data()
- Add ccxr_dtvcc_is_active() declaration to ccx_dtvcc.h
- Fix clippy warnings in tv_screen.rs (unused assignments)
- All changes guarded with #ifndef DISABLE_RUST
- Update implementation plan to mark Phase 3 complete

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-31 14:16:31 +01:00
Carlos
54236f840c feat(c): Add C header declarations for Rust CEA-708 FFI (Phase 2)
- Add void *dtvcc_rust field to lib_cc_decode struct
- Declare ccxr_dtvcc_init, ccxr_dtvcc_free, ccxr_dtvcc_process_data in ccx_dtvcc.h
- Declare ccxr_dtvcc_set_encoder in lib_ccx.h
- Declare ccxr_flush_active_decoders in ccx_decoders_common.h
- All declarations guarded with #ifndef DISABLE_RUST
- Update implementation plan to mark Phase 2 complete

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-31 14:16:31 +01:00
Carlos Fernandez Sanz
f2aeef167b feat(ocr): Add character blacklist and line-split options for better accuracy 2025-12-31 14:16:15 +01:00
Carlos
6a4a1c97ec fix(rust): Address PR review - use existing DTVCC_MAX_SERVICES constant
- Remove duplicate CCX_DTVCC_MAX_SERVICES constant from decoder/mod.rs
- Import existing DTVCC_MAX_SERVICES from lib_ccxr::common
- Fix clippy uninlined_format_args warnings in avc/core.rs and decoder/mod.rs

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-31 14:15:29 +01:00
Carlos
f369959096 style(rust): Apply cargo fmt formatting
🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-31 14:15:29 +01:00
Carlos
1c2bcb5088 feat(rust): Add persistent DtvccRust context for CEA-708 decoder (Phase 1)
This is Phase 1 of the fix for issue #1499. It adds the Rust-side
infrastructure for a persistent CEA-708 decoder context without
modifying any C code, ensuring backward compatibility.

Problem:
The current Rust CEA-708 decoder creates a new Dtvcc struct on every
call to ccxr_process_cc_data(), causing all state to be reset. This
breaks stateful caption processing.

Solution:
Add a new DtvccRust struct that:
- Owns its decoder state (rather than borrowing from C)
- Persists across processing calls
- Is managed via FFI functions callable from C

Changes:
- Add DtvccRust struct in decoder/mod.rs with owned decoders
- Add CCX_DTVCC_MAX_SERVICES constant (63)
- Add FFI functions in lib.rs:
  - ccxr_dtvcc_init(): Create persistent context
  - ccxr_dtvcc_free(): Free context and all owned memory
  - ccxr_dtvcc_set_encoder(): Set encoder (not available at init)
  - ccxr_dtvcc_process_data(): Process CC data
  - ccxr_flush_active_decoders(): Flush all active decoders
  - ccxr_dtvcc_is_active(): Check if context is active
- Add unit tests for DtvccRust
- Use heap allocation for large structs to avoid stack overflow

The existing Dtvcc struct and ccxr_process_cc_data() remain unchanged
for backward compatibility. Phase 2-3 will add C header declarations
and modify C code to use the new functions.

Fixes: #1499 (partial)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-31 14:15:29 +01:00
Carlos Fernandez Sanz
da79ee44d9 fix(rust): Fix 32-bit build failures (i686, armv7l) 2025-12-31 13:16:17 +01:00
Carlos Fernandez Sanz
26434a7f89 fix(args): Add --sc alias for --sentencecap for backwards compatibility 2025-12-31 13:02:50 +01:00
Carlos Fernandez
718eb1a37f fix(args): Add --sc alias for --sentencecap for backwards compatibility
The -sc flag was used in older versions (0.94 and earlier) for sentence
capitalization. The Rust argument parser only accepts --sentencecap now.
This adds --sc as an alias to maintain backwards compatibility with
older documentation and user scripts.

Related to #1917

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-31 12:57:42 +01:00
Carlos Fernandez
ace6361bfb fix(rust): Fix armv7l build failure with 64-bit literal
The literal `0xcdcdcdcdcdcdcdcd` is a 64-bit value used as a "poison"
pattern to detect uninitialized pointers. On 32-bit systems like
armv7l, this causes a compile error because `usize` is only 32 bits.

The fix defines a platform-appropriate constant:
- 64-bit: 0xcdcdcdcdcdcdcdcd
- 32-bit: 0xcdcdcdcd

Fixes #1938

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-31 12:46:39 +01:00
Carlos Fernandez
7041441d39 fix(rust): Fix 32-bit x86 (i686) build failure
The code was using `std::arch::x86_64::*` unconditionally for both
x86 and x86_64 architectures. On 32-bit x86 (i686), the correct
module is `std::arch::x86`, not `std::arch::x86_64`.

This caused a build failure on i686:
  error[E0432]: unresolved import `std::arch::x86_64`

The fix uses separate conditional imports:
- `std::arch::x86::*` for 32-bit x86
- `std::arch::x86_64::*` for 64-bit x86_64

Both modules provide the same SSE2 intrinsics used by find_next_zero().

Fixes #1937

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-31 12:42:12 +01:00
Rahul-2k4
1589c31774 fix: Revert credits text deep-copy to fix CI startcredits regressions 2025-12-31 15:23:55 +05:30
Rahul-2k4
c96d3ff3f1 fix(encoder): Deep copy start/end credits text to prevent memory corruption
The start_credits_text and end_credits_text pointers were being copied
directly from the encoder config options, but free_encoder_context()
would later free them. This caused memory corruption when the pointers
referred to memory owned by ccx_options.

Now these strings are deep-copied in init_encoder() so each encoder
context owns its own copy, fixing the --startcreditstext regression.
2025-12-31 14:18:29 +05:30
Rahul-2k4
598a48e260 style: Apply clang-format to pass CI formatting check 2025-12-31 12:45:56 +05:30
Rahul-2k4
0cc3626261 ci: Trigger workflow run 2025-12-31 12:18:27 +05:30
Rahul-2k4
e0e66bd0ba style: Apply clang-format and update CHANGES.TXT
- Run clang-format on all source files to fix CI formatting check
- Add Issue #447 DVB multi-stream feature to CHANGES.TXT
2025-12-31 12:08:56 +05:30
Rahul-2k4
2642ca8805 Merge upstream/master into final branch
Resolves conflicts while preserving Issue #447 fix for DVB multi-stream handling:
- Kept DVB metadata update logic in ts_tables.c for split mode
- Adapted to upstream's single-param dvbsub_init_decoder signature
- Updated lib_ccx.c and general_loop.c to match new API
2025-12-31 11:42:08 +05:30
Rahul-2k4
a108302dc0 fix(dvb): Reinitialize decoder after PAT change for continuous extraction
After PAT changes, the pipeline's decoder was NULLed out to prevent
crashes, but this caused all subsequent DVB data to be skipped.

Now the decoder is reinitialized when detected as NULL, allowing
subtitle extraction to continue across PAT changes.
2025-12-31 11:19:56 +05:30
Rahul-2k4
ce90b61923 fix(dvb): Add NULL checks to prevent crash after PAT change
Fixes segmentation fault at 99% when PAT changes occur during DVB
subtitle processing. The crash happened because decoder context
private_data was freed but still accessed.

Changes:
- Add NULL check in process_data() before dvbsub_decode call
- Add defensive NULL check at start of dvbsub_decode()
- Add defensive NULL check at start of write_dvb_sub()
- Deep copy DVB bitmap data in copy_subtitle() to avoid aliasing
- Safe DVBSubContext copy that doesn't alias linked list pointers
- Clean up pipeline decoder refs in dinit_cap() after PAT change
- Direct FTS calculation for DVB-only streams

Tested with 11GB TS file with 23 PAT changes - no crash.
2025-12-31 10:44:00 +05:30
Rahul-2k4
18566f2213 fix(dvb): Improve multi-stream DVB subtitle handling for Issue #447
- Replace spin-lock with proper mutex (CRITICAL_SECTION/pthread_mutex)
- Add per-pipeline OCR contexts for thread safety
- Include PID in output filenames to handle duplicate languages
- Add dvbsub_get_context_size() and dvbsub_copy_context() for state management
- Improve language code validation (ISO 639-2 compliant)
- Change fatal error to warning for oversized PES packets
- Better language lookup from potential_streams before cinfo fallback
- Reset potential_stream data in demuxer cleanup
2025-12-30 21:58:40 +05:30
Amrit Kumar Mahto
125c5e8821 Update ts_functions.c 2025-12-30 15:13:19 +05:30
Carlos Fernandez Sanz
64ce4ac84f fix(args): Add --svc alias for --service for backwards compatibility 2025-12-30 09:49:44 +01:00
Carlos Fernandez
674b859284 fix(args): Add --svc alias for --service for backwards compatibility
The help text references -svc for CEA-708 service selection, but the
Rust argument parser only accepted --service. This adds --svc as an
alias to maintain backwards compatibility with older documentation
and user scripts.

Fixes #1917

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-30 09:30:15 +01:00
Carlos Fernandez Sanz
9a761331f8 Merge pull request #1905 from VS7686/fix-networking-warnings
The fix looks correct - properly adding `return;` after Rust calls to prevent the C code from also executing, and using `(void)` to silence return value warnings.

Windows CI passes (which was the target for this MSVC fix). The Linux CI failure appears unrelated since networking code isn't typically part of the regression test suite.

Merging - thanks for the fix!
2025-12-30 09:02:52 +01:00
Carlos Fernandez Sanz
046ee71eda Merge pull request #1921 from ChubbyChipmunk77/simplify-and-document
Excellent work addressing the feedback! The separation of CC_SOLID_BLANK and PARITY_BIT_MASK makes the code much clearer - even though they have the same value, they serve different purposes and that's now well-documented.

The additional documentation for validate_cc_pair is very helpful for understanding the CEA-608/708 validation logic.

Merging - thanks for the thorough fix!
2025-12-30 08:51:30 +01:00
Carlos Fernandez Sanz
b5fc3e63c4 Merge pull request #1924 from Harshdhall01/cleanup-vcl-hrd-todo
Looks good! The explanation is clearer and removing the dead code (commented exit) is a nice cleanup. Tests pass.

Merging - thanks!
2025-12-30 08:49:18 +01:00
VS7686
5eaf805d27 Add missing returns after Rust calls to prevent fallthrough 2025-12-30 09:20:59 +05:30
Amrit kumar Mahto
0ba941e8c0 ts: prevent heap buffer overflow in Teletext demux path 2025-12-30 07:13:04 +05:30
Carlos Fernandez Sanz
a9413a2312 fix(dvb): Enable OCR for all DVB subtitle streams, not just first 2025-12-29 23:09:18 +01:00
Carlos Fernandez Sanz
a2eb03cb73 docs: Add Windows package manager installation instructions 2025-12-29 23:04:41 +01:00
Carlos Fernandez
06063f26a4 docs: Add Windows package manager installation instructions
Add instructions for installing CCExtractor via:
- WinGet (winget install CCExtractor.CCExtractor)
- Chocolatey (choco install ccextractor)
- Scoop (scoop bucket add extras && scoop install ccextractor)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-29 22:56:45 +01:00
Carlos Fernandez Sanz
82daa7fb2b fix: Properly handle ATSC CC in private MPEG-2 streams 2025-12-29 22:55:00 +01:00
Carlos Fernandez
a71687e19f fix(dvb): Enable OCR for all DVB subtitle streams, not just first
Previously, the `initialized_ocr` flag was stored at the program level
and shared across all DVB subtitle streams within a program. This caused
OCR to only initialize for the first DVB stream, leaving subsequent
streams without an OCR context and unable to extract subtitles.

The fix removes the `initialized_ocr` flag entirely. Each DVB subtitle
decoder now gets its own OCR context, matching the behavior of DVD and
VOBSUB decoders which already worked correctly with multiple streams.

Test results with multi-language DVB sample:
- Before: Second stream (0xCE0) → "No captions were found"
- After: Second stream (0xCE0) → 5 subtitles extracted correctly

Fixes #1067

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-29 21:26:56 +01:00
Carlos Fernandez
25162fe40a chore: Add build directories to .gitignore
Add build_*/ pattern and linux/build_scan/ to ignore various build
output directories (build_ocr/, build_ocr_asan/, etc.)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-29 21:11:51 +01:00
Carlos Fernandez
3365a715a6 fix: Properly handle ATSC CC in private MPEG-2 streams
This commit fixes two issues:

1. ATSC CC data in private MPEG-2 streams (stream type 0x06) was not
   being processed. The code returned CCX_PRIVATE_MPEG2_CC buffer type
   which was never properly implemented - it just dumped debug output
   and returned placeholder bytes.

   Fix: Treat ATSC CC in private MPEG-2 streams the same as in
   user-private streams (0x80-0x8F) by returning CCX_PES buffer type.
   Both contain the same CC data format and should use the same
   processing path.

2. Several dump() calls were using CCX_DMT_GENERIC_NOTICES which is
   enabled by default, causing binary output to flood the terminal
   when processing certain files.

   Fix: Changed to appropriate debug-only masks (CCX_DMT_VERBOSE,
   CCX_DMT_PARSE) so binary dumps only appear when debug mode is
   explicitly enabled.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-29 21:10:11 +01:00
Carlos Fernandez Sanz
26e0f64720 fix(windows): Configure MSI as 64-bit installer 2025-12-29 20:25:41 +01:00
Carlos Fernandez
a1ed940c8b fix(build): Use -arch x64 flag for WiX build instead of Package attribute
The Platform attribute is not valid in WiX v4+. Instead, specify the
target architecture at build time using the -arch x64 flag.

Changes:
- Remove invalid Platform="x64" attribute from Package element
- Add -arch x64 to wix build command in release workflow
- Keep ProgramFiles64Folder for explicit 64-bit installation path

This ensures the MSI is built as a proper 64-bit package that installs
to "Program Files" instead of "Program Files (x86)".

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-29 19:03:04 +01:00
ChubbyChipmunk77
f5f4768503 style: fix doc comment formatting for Clippy 2025-12-29 22:01:07 +05:30
Carlos Fernandez
e4374204bd fix(windows): Configure MSI as 64-bit installer
Add Platform="x64" to the WiX Package element and use ProgramFiles64Folder
instead of ProgramFiles6432Folder to ensure the MSI:
- Is recognized as a 64-bit installer by tools like winget/komac
- Installs to "Program Files" instead of "Program Files (x86)"

This fixes winget manifest detection issues where the installer was
incorrectly identified as x86 architecture.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-29 17:05:51 +01:00
ChubbyChipmunk77
7f55ae5c1d Fixed semantic naming and update doc comments 2025-12-29 21:30:40 +05:30
Harshdhall01
8bf1bc16de Remove blank line to fix formatting check 2025-12-29 21:14:35 +05:30
Harshdhall01
5352a8b877 Fix formatting: use consistent tab indentation and remove trailing whitespace
- Line 908: Changed spaces+tabs to consistent tabs only
- Line 911: Removed trailing tabs on empty line
2025-12-29 21:05:17 +05:30
Carlos Fernandez Sanz
fd155285d2 0.96.3 2025-12-29 14:56:33 +01:00
Carlos Fernandez
a6fd8d468a chore: Bump version to 0.96.3
Update version number across all files:
- src/lib_ccx/lib_ccx.h (main version define)
- linux/configure.ac, mac/configure.ac (autoconf)
- OpenBSD/Makefile
- package_creators/ (PKGBUILD, ccextractor.spec, debian.sh)
- packaging/winget/ (all yaml manifests)
- packaging/chocolatey/ (nuspec and install script)

Note: Checksums in winget/chocolatey will need to be updated
when the actual release MSI is built.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-29 14:11:23 +01:00
Carlos Fernandez
5b05ce5073 docs: Add changelog entries for version 0.96.3
Document all changes since 0.96.2 including:
- VOBSUB subtitle extraction for MP4 and MKV files
- Native SCC input file support
- SCC output improvements (frame rate, styled PAC codes)
- Various bug fixes for timing, builds, and OCR

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-29 13:28:24 +01:00
Carlos Fernandez
d28bc4e114 style: Fix formatting issues in ocr.c and options.rs
- Use tabs for continuation indentation in C code (clang-format)
- Remove extra trailing spaces in Rust code (rustfmt)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-29 12:39:08 +01:00
Carlos Fernandez Sanz
285e81f9a7 Merge pull request #1898 from hridyasadanand/docs-remove-travis-badge
Good cleanup - removing the outdated Travis CI badge and adding a usage example helps new users. Merging.
2025-12-29 12:23:58 +01:00
Carlos Fernandez Sanz
730156f33b Merge pull request #1914 from VS7686/fix-epg-warnings
Clean fix for unused variable warnings. Verified locally. Merging.
2025-12-29 11:49:37 +01:00
Carlos Fernandez Sanz
152bbd308c Merge pull request #1922 from x15sr71/fix/utf8proc-include-path
Excellent fix! The `__has_include()` approach is clean and removes the symlink workaround.

Verified locally:
- Normal build: 
- `-system-libs` build: 

Merging.
2025-12-29 11:44:48 +01:00
Carlos Fernandez
8c586bccbd feat(ocr): Add character blacklist and line-split options for better accuracy
Add two new OCR options to improve subtitle recognition:

1. Character blacklist (enabled by default):
   - Blacklists characters |, \, `, _, ~ that are commonly misrecognized
   - Prevents "I" being recognized as "|" (pipe character)
   - Use --no-ocr-blacklist to disable if needed

2. Line-split mode (opt-in via --ocr-line-split):
   - Splits multi-line subtitle images into individual lines
   - Uses PSM 7 (single text line mode) for each line
   - Adds 10px padding around each line for better edge recognition
   - May improve accuracy for some VOBSUB subtitles

Test results with VOBSUB sample:
- Blacklist: Reduces pipe errors from 14 to 0
- Matches subtile-ocr's approach for preventing misrecognition

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-29 11:33:29 +01:00
Carlos Fernandez Sanz
434cd3959a fix(mp4): Use fixed-width integer types in bswap functions 2025-12-29 11:13:38 +01:00
Harshdhall01
3cb0f61b0c Clean up VCL HRD TODO comment
Replace unclear TODO with explanation of why VCL HRD parameters
are skipped. VCL HRD is for video buffering compliance and not
needed for caption extraction.

Changes:
- Replace TODO comment with clear explanation
- Update mprint message to be more informative
- Remove commented-out exit(1)

Addresses #1894
2025-12-29 15:01:40 +05:30
Chandragupt Singh
a18eaa2c96 fix: utf8proc include path for system library builds 2025-12-29 13:37:39 +05:30
Carlos Fernandez
69b7f9f4c3 fix(mp4): Use fixed-width integer types in bswap functions
Change bswap16 and bswap32 to use int16_t and int32_t instead of
short and long for consistent behavior across platforms.

On Windows x64, `long` is 4 bytes (LLP64 model), while on Linux x64
`long` is 8 bytes (LP64 model). This difference could cause
inconsistent NAL unit length parsing in MP4/MOV files, potentially
affecting timestamp calculations.

This fix ensures the byte-swapping functions work identically on
both platforms by using fixed-width integer types from <stdint.h>.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-29 08:52:33 +01:00
Carlos Fernandez Sanz
63dde6f3b2 feat(mp4): Add VOBSUB subtitle extraction with OCR for MP4 files 2025-12-29 08:47:33 +01:00
Carlos Fernandez
8f64eeb54f ci: Trigger CI tests
🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-28 19:57:11 +01:00
ChubbyChipmunk77
02d91c4a03 REFACTOR: 1. simplified verify_parity function. 2.Improved documentation for public function validate_cc_pair. 3. Added constant for 0x7F. 2025-12-29 00:00:38 +05:30
Carlos Fernandez
463a4a85a1 build(windows): Add vobsub_decoder to Windows build
Add vobsub_decoder.c and vobsub_decoder.h to the Visual Studio project
and filters files.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-28 18:44:32 +01:00
Carlos Fernandez
ba2833b819 style: Fix clang-format indentation in vobsub_decoder.c
🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-28 17:49:34 +01:00
Carlos Fernandez
635a305c37 build: Add vobsub_decoder to autoconf build system
Add vobsub_decoder.c and vobsub_decoder.h to linux and mac Makefile.am
to fix autoconf build failures.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-28 17:42:08 +01:00
Carlos Fernandez
6fe612db3e fix: Guard ocr_text access with ENABLE_OCR preprocessor check
The ocr_text field in struct cc_bitmap is only defined when ENABLE_OCR
is set. Wrap the free() calls with #ifdef ENABLE_OCR to fix build
failures in non-OCR configurations.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-28 17:37:05 +01:00
Carlos Fernandez
2930c61420 feat(mp4): Add VOBSUB subtitle extraction with OCR for MP4 files
Add support for extracting VOBSUB (bitmap) subtitles from MP4 files
and converting them to text formats via OCR. This complements the
existing MKV VOBSUB support added in commit 1fccb783.

Changes:
- Add shared vobsub_decoder module for SPU parsing and OCR
- Add process_vobsub_track() function in mp4.c for subp:MPEG tracks
- Detect and count VOBSUB tracks in MP4 container
- Extract palette from decoder config when available
- Process SPU samples through OCR pipeline

The VOBSUB decoder module provides:
- SPU control sequence parsing (timing, colors, coordinates)
- RLE-encoded bitmap decoding (interlaced format)
- Palette parsing from idx header format
- Integration with Tesseract OCR via ocr_rect()

Tested with sample from issue #1349 - successfully extracted 61
subtitles from 128 SPU samples with accurate OCR text output.

Fixes #1349

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-28 17:32:24 +01:00
Carlos Fernandez Sanz
173db88dcf feat(matroska): Add VOBSUB subtitle extraction support for MKV files 2025-12-28 14:28:02 +01:00
VS7686
29c3f4e684 Trigger CI re-run 2 2025-12-28 18:04:30 +05:30
VS7686
d4a7b1d6ed Trigger CI re-run 2025-12-28 16:05:22 +05:30
Carlos Fernandez
9d14766b0d fix: Use #define instead of const int for VOBSUB_BLOCK_SIZE
MSVC doesn't support variable-length arrays (VLAs). The const int
declaration wasn't being treated as a compile-time constant,
causing Windows build failure with errors C2057, C2466, C2133.

Changed to #define which is a true compile-time constant.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-28 11:32:48 +01:00
Carlos Fernandez
6f2a73d706 docs: Add VOBSUB extraction documentation and subtile-ocr Dockerfile
- Add docs/VOBSUB.md explaining the VOBSUB extraction workflow
- Add tools/vobsubocr/Dockerfile for building subtile-ocr OCR tool
- Document how to convert VOBSUB (.idx/.sub) to SRT using OCR

The Dockerfile uses subtile-ocr (https://github.com/gwen-lg/subtile-ocr),
an actively maintained fork of vobsubocr with better accuracy.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-28 10:26:41 +01:00
Carlos Fernandez
1fccb783f2 feat(matroska): Add VOBSUB subtitle extraction support for MKV files
Previously, CCExtractor would only print "Error: VOBSUB not supported"
when encountering VOBSUB (S_VOBSUB) subtitle tracks in Matroska files.
This left users without any usable output.

This commit adds full VOBSUB extraction support:
- Generate proper .idx index files with timestamps and file positions
- Generate proper .sub files with PS-wrapped SPU data
- Correct PS Pack header with SCR derived from timestamps
- Correct PES header with PTS for each subtitle
- 2048-byte block alignment (standard VOBSUB format)

The output is compatible with VLC, FFmpeg, and other players that
support VobSub subtitle format.

Tested with sample from issue #1371 - output validates correctly
with FFprobe and produces identical subtitle data to mkvextract.

Fixes #1371

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-28 10:02:19 +01:00
Carlos Fernandez Sanz
ec30a79be9 fix(mp4): Fix 200ms timing offset for MOV/MP4 caption extraction 2025-12-28 09:37:46 +01:00
Carlos Fernandez Sanz
5beb4389f6 fix: Apply --delay option to DVB/bitmap subtitles 2025-12-28 09:36:59 +01:00
Carlos Fernandez
a6ccf29630 fix: Apply --delay option to DVB/bitmap subtitles
The --delay option was not being applied to DVB and other bitmap-based
subtitles (DVD subtitles, etc.), only to CEA-608 subtitles. This made
it impossible for users to correct timing offsets in DVB subtitle
extraction.

Changes:
- Add subs_delay to sub->start_time and sub->end_time for CC_BITMAP
  subtitles in encode_sub(), matching the behavior for CC_608
- Add bounds checking to skip subtitles that become negative after
  applying a negative delay
- Properly free bitmap data when skipping to avoid memory leaks

This provides a workaround for issue #1248 where DVB subtitles were
extracted with incorrect timing offset. Users can now use --delay to
adjust the timing.

Fixes #1248

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-28 07:58:58 +01:00
Carlos Fernandez Sanz
b6d7c7e778 feat(scc): Add configurable frame rate and styled PAC codes for SCC output 2025-12-28 06:54:29 +01:00
Rahul-2k4
117c2fce69 fix(dvb): Apply 3 code review fixes for Issue #447
- Fix escaped newline in debug print (dvb_subtitle_decoder.c:1861)
- Replace hardcoded PID 0x106 with 0 in debug calls (lines 1822, 1835)
- Accept uppercase letters in language code validation (ts_tables.c:396)
2025-12-28 11:06:31 +05:30
Rahul-2k4
ffd6a34c30 Fix Windows CI: change PlatformToolset from v145 to v143 for VS 2022 2025-12-28 10:34:46 +05:30
Rahul-2k4
70af627078 Fix syntax errors in lib_ccx.c: add missing ocr.h include and fix brace structure 2025-12-28 10:32:08 +05:30
Rahul-2k4
b0a5c069ed style: fix clang-format issues for Linux CI compatibility 2025-12-28 10:22:44 +05:30
Rahul-2k4
53ee63894c style: apply clang-format to fix CI formatting check 2025-12-28 10:12:40 +05:30
Rahul-2k4
50ece42e0a style: apply clang-format and normalize line endings to all source files 2025-12-28 00:47:25 +05:30
Rahul-2k4
3d00e718f6 style: normalize line endings and apply clang-format 2025-12-28 00:26:17 +05:30
Carlos Fernandez
021b788461 feat(scc): Add configurable frame rate and styled PAC codes for SCC output
This commit addresses the remaining items from issue #1191:

1. SCC Output Frame Rate:
   - Added scc_framerate to encoder_cfg and encoder_ctx structs
   - The --scc-framerate option now affects both input parsing AND output
   - Supports 24, 25, 29.97 (default), and 30 fps

2. Styled PAC (Preamble Address Code) Optimization:
   - Added support for styled PACs that encode color/font at column 0
   - When captions start at column 0 with non-default style, uses a single
     styled PAC instead of indent PAC + mid-row code
   - More efficient output that matches professional SCC files

Files changed:
- ccx_common_option.h/c: Added scc_framerate to encoder_cfg
- ccx_encoders_common.h/c: Added scc_framerate to encoder_ctx
- ccx_encoders_scc.c: Added get_scc_fps(), styled PAC functions,
  and optimized write_cc_buffer_as_scenarist()
- common.rs: Copy scc_framerate to enc_cfg

Fixes #1191

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-27 19:45:05 +01:00
Rahul-2k4
86e5d47141 style: apply clang-format to all source files 2025-12-28 00:14:16 +05:30
Rahul-2k4
5b36356456 style: apply clang-format fixes 2025-12-28 00:04:26 +05:30
Rahul-2k4
ba04aedae1 fix: add missing set_pipeline_pts and dump_rect_and_log functions 2025-12-27 23:58:26 +05:30
Rahul-2k4
5001df0d6c fix(rust): add missing lang field to cap_info initializer 2025-12-27 23:56:26 +05:30
Rahul-2k4
28506fee7b Add lang member to struct cap_info for DVB split mode 2025-12-27 23:49:29 +05:30
Rahul-2k4
47d8aaddb9 Merge upstream/master into final: Resolve conflicts in option structs (kept both split_dvb_subs and scc_framerate) 2025-12-27 23:34:40 +05:30
Rahul-2k4
1b2254f911 Fix DVB split output: include core logic handling and memory safety fixes 2025-12-27 23:27:36 +05:30
Rahul-2k4
dc34b26afb Fix DVB split output: handle empty PBUS and missing OCR init (Issue #447) 2025-12-27 23:21:08 +05:30
Carlos Fernandez
c06102678e fix(mp4): Fix 200ms timing offset for MOV/MP4 caption extraction
Set in_bufferdatatype for MP4/MOV container tracks to prevent incorrect
cb_field counter increments that were adding ~200ms to caption timestamps.

Root Cause:
-----------
The in_bufferdatatype variable was never set in mp4.c, remaining as
CCX_UNKNOWN. This caused the check in do_cb() (ccx_decoders_common.c)
to fail:

  if (ctx->in_bufferdatatype != CCX_H264 && ctx->in_bufferdatatype != CCX_PES)
      cb_field1++;

With in_bufferdatatype == CCX_UNKNOWN, cb_field1 was incremented for
each CEA-608 caption block processed. When get_fts() was called to
timestamp captions, it added cb_field1 * 1001/30 ms to the base time.

With ~6 caption blocks per frame (typical for roll-up captions), this
added approximately 200ms (6 × 33.37ms ≈ 200ms) to caption start times.

Analysis:
---------
Sample file: 1974a299f0502fc8199dabcaadb20e422e79df45972e554d58d1d025ef7d0686.mov

Before fix:
- FFmpeg first caption: 13,847ms
- CCExtractor first caption: 14,047ms
- Offset: 200ms late

The timing flow:
1. MP4 sample has PTS=1246245 (13,847ms at 90kHz)
2. set_fts() correctly sets fts_now based on PTS
3. do_cb() processes caption blocks, incrementing cb_field1 each time
4. get_fts() returns: fts_now + fts_global + cb_field1 * 1001/30
5. With cb_field1=6: adds 6 * 33.37 = 200ms offset

The fix ensures cb_field counters are not incremented for container
formats (MP4, MOV, MKV) because these formats associate all caption
data with the frame's PTS directly - there's no sub-frame timing.

Fix:
----
Set in_bufferdatatype in the three MP4 track processing functions:
- process_avc_track(): CCX_H264 for H.264/AVC tracks
- process_hevc_track(): CCX_H264 for H.265/HEVC tracks
- process_xdvb_track(): CCX_PES for MPEG-2 video tracks

After fix:
- FFmpeg first caption: 13,847ms
- CCExtractor first caption: 13,847ms
- Offset: 0ms (exact match)

This fix resolves timing issues for tests 226-230 on the sample platform.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-27 16:34:05 +01:00
Carlos Fernandez Sanz
b0800a112c feat(input): Add native SCC (Scenarist Closed Caption) input support 2025-12-27 16:16:31 +01:00
Carlos Fernandez
2b0d9ed427 chore: trigger CI rebuild
Timing issues in tests 226-230 are pre-existing and unrelated to SCC support.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-27 15:37:49 +01:00
Carlos Fernandez
fd4db0e7bf chore: Trigger CI re-run 2025-12-27 11:18:02 +01:00
VS7686
00d8c9cb0a Fix unused variable warnings in ts_tables_epg.c 2025-12-27 14:01:13 +05:30
Carlos Fernandez
7829c14c60 fix: Initialize scc_framerate in init_options()
The scc_framerate field was not being initialized in the C init_options()
function, leaving it with an undefined value. This could cause undefined
behavior when the options struct is used before the Rust code initializes
the field.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-27 08:38:32 +01:00
Rahul-2k4
d3602ec938 Fix: Defensive handling of invalid caption_field in DVB subtitle timing (fixes #447) 2025-12-27 12:48:28 +05:30
Rahul-2k4
f9b5e081a7 Remove duplicate comment in parser.rs 2025-12-27 11:46:24 +05:30
Rahul-2k4
bdc3eaa81b Fix: update Rust parser to allow text based formats for DVB split 2025-12-27 10:16:36 +05:30
Carlos Fernandez
2820042c1d style: Fix formatting and clippy warnings
- Replace tabs with spaces in doc comments
- Use #[derive(Default)] with #[default] attribute
- Use array syntax for char pattern matching
- Apply clang-format to C files

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-27 01:19:00 +01:00
Carlos Fernandez
d4d228125a feat(input): Add native SCC (Scenarist Closed Caption) input support
Add native support for reading SCC files directly, eliminating the need
for external conversion tools like SCC2RAW.exe or Perl scripts.

Implementation:
- New Rust parser module (src/rust/src/demuxer/scc.rs) with:
  - SMPTE timecode parsing (HH:MM:SS:FF format)
  - Configurable frame rates: 29.97 (default), 24, 25, 30 fps
  - CEA-608 hex pair extraction
  - UTF-8 BOM handling
  - 12 comprehensive unit tests
- Stream mode detection in both C and Rust code
- FFI exports for C integration (ccxr_is_scc_file, ccxr_process_scc)
- New --scc-framerate command line option
- Integration in raw_loop() following the McPoodle DVD raw pattern

Testing performed:
- Round-trip test: video → SRT, video → SCC, SCC → SRT
  Result: 118/118 captions matched (100% accuracy)
- Multiple output formats verified (SRT, WebVTT, transcript)
- Frame rate option tested with 24fps sample
- UTF-8 BOM handling verified
- All 260 Rust tests pass

Usage:
  ccextractor input.scc -o output.srt
  ccextractor input.scc --scc-framerate 25 -o output.srt

Closes #1293

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-27 00:54:44 +01:00
Rahul-2k4
43d5ba2f34 Improve error message for incompatible OutputFormat in Rust parser 2025-12-27 02:03:51 +05:30
Rahul-2k4
557774b202 Apply code style fixes from clang-format 2025-12-27 01:59:48 +05:30
Rahul-2k4
4e0472bddf Fix DVB split critical bugs: per-pipeline state separation and timing sync 2025-12-27 01:56:12 +05:30
Rahul-2k4
9a2fe6221e Switch platform toolset from v145 to v143 for GitHub Actions compatibility 2025-12-27 01:12:40 +05:30
Rahul Tripathi
182b23a283 Merge branch 'CCExtractor:master' into final 2025-12-27 00:13:39 +05:30
Rahul-2k4
77f3fd35f4 Fix #447: Resolve DVB split mode crash and routing logic
- Fixed NULL pointer dereference in dvb_subtitle_decoder.c (sub->prev check).
- Corrected logic in dvbsub_handle_display_segment to prevent dropped subtitles.
- Implemented robust encoder context swapping in general_loop.c for DVB streams.
- Added regression test: tests/regression/dvb_split.txt.
- Verified 100% completion in split mode and correct Teletext/DVB routing.
2025-12-27 00:11:09 +05:30
Carlos Fernandez Sanz
14e6919f2e ci: Add winget and Chocolatey packaging workflows 2025-12-26 18:20:55 +01:00
Carlos Fernandez
353a37010d ci: Add winget and Chocolatey packaging workflows
Add automated package publishing for Windows package managers:

## Winget
- Initial manifest files for CCExtractor.CCExtractor
- Workflow to auto-submit PRs to microsoft/winget-pkgs on release

## Chocolatey
- Package files (nuspec, install/uninstall scripts)
- Workflow to build and push packages on release

## Setup Required
- WINGET_TOKEN secret (GitHub PAT with public_repo scope)
- CHOCOLATEY_API_KEY secret (from chocolatey.org account)

Closes #1308

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-26 18:19:11 +01:00
Carlos Fernandez Sanz
921cbe0c57 ci(linux): Add workflow for system-libs builds 2025-12-26 18:08:11 +01:00
VS7686
f0523ceaa3 Fix logic error: removed early returns to restore C implementation 2025-12-26 21:44:12 +05:30
Carlos Fernandez
7284430fc6 fix(build): Preserve FFmpeg libs with -system-libs -hardsubx
The -system-libs mode was overwriting BLD_LINKER and losing the FFmpeg
libraries that -hardsubx adds. This fix preserves the FFmpeg libraries
when both flags are used together.

Also add permissions: contents: write to the workflow to allow
uploading assets to releases.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-26 15:49:51 +01:00
Carlos Fernandez
68d0d4094e ci(linux): Add workflow for system-libs builds
Add a new GitHub Actions workflow that builds CCExtractor using the
-system-libs flag, creating binaries that dynamically link against
system libraries instead of bundling dependencies.

This is useful for:
- Linux distribution packaging (Debian, Ubuntu, Fedora, etc.)
- Homebrew/Linuxbrew packaging
- Users who prefer smaller binaries with system library updates

Two variants are built:
- basic: Standard OCR-enabled build
- hardsubx: Build with HardSubX (burned-in subtitle extraction)

The workflow runs on releases and can be manually triggered.

Related to #1907

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-26 15:38:41 +01:00
Carlos Fernandez Sanz
7075f6291d build(linux): Add -system-libs flag for package manager compatibility 2025-12-26 15:32:32 +01:00
Carlos Fernandez Sanz
170d769476 Merge branch 'master' into build/linux-system-libs-flag 2025-12-26 15:31:31 +01:00
Carlos Fernandez
1ff3457744 Updated CHANGES.TXT for 0.96.2 2025-12-26 15:27:02 +01:00
Carlos Fernandez Sanz
dc352a2202 fix(windows): Bundle tessdata for OCR support out of the box 2025-12-26 15:23:34 +01:00
Chandragupt Singh
c8750e42d1 build(linux): use pkg-config cflags for system-libs includes 2025-12-26 18:51:16 +05:30
Carlos Fernandez
20448bfeb2 fix(windows): Bundle tessdata for OCR support out of the box
The Windows release was missing Tesseract OCR runtime dependencies
(tessdata files) needed for the HardSubx feature to work. Users had
to manually install Tesseract OCR and set TESSDATA_PREFIX.

Changes:
- Add get_executable_directory() to ocr.c that returns the directory
  containing the executable (works on Windows, Linux, and macOS)
- Update probe_tessdata_location() to search for tessdata in the
  executable directory, enabling bundled tessdata to be found
- Update release workflow to download eng.traineddata and osd.traineddata
  from tesseract-ocr/tessdata_fast during release builds
- Update WiX installer to include tessdata directory with the
  traineddata files

Now the Windows release includes tessdata files, and CCExtractor will
automatically find them in the installation directory without requiring
users to install Tesseract separately or set environment variables.

Fixes #1578

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-26 13:05:46 +01:00
VS7686
807df0339e Fix styling: Apply clang-format 2025-12-26 16:11:57 +05:30
Rahul-2k4
6642973c63 CLI + option plumbing for --split-dvb-subs 2025-12-26 14:43:36 +05:30
Chandragupt Singh
f08fd658e6 build(linux): Add -system-libs flag for Homebrew compatibility 2025-12-26 13:07:09 +05:30
VS7686
5ae3116a6c Fix indentation: reduce to 4 spaces 2025-12-26 10:29:36 +05:30
VS7686
826afcd991 Fix styling: increase indentation inside ifndef 2025-12-26 10:14:18 +05:30
VS7686
46af5ce9bb Fix coding style and formatting 2025-12-26 09:59:38 +05:30
VS7686
123b35ae69 Fix coding style and formatting 2025-12-26 09:49:17 +05:30
Carlos Fernandez Sanz
f6e9d55838 fix(release): Update Flutter GUI files and add versioned filenames 2025-12-25 22:34:24 +01:00
VS7686
6f7d3f6169 Fix C4098 warnings in networking.c 2025-12-26 00:26:11 +05:30
Carlos Fernandez
07cc78c2f1 feat(release): Add version numbers to release asset filenames 2025-12-25 16:36:18 +01:00
Carlos Fernandez
affa34848c fix(installer): Update Flutter GUI files for v0.7.0 2025-12-25 13:47:57 +01:00
Carlos Fernandez Sanz
45ee03aecc fix(release): Support 3-part version numbers (e.g., v0.96.1) 2025-12-25 12:58:04 +01:00
Carlos Fernandez
c6e27ca809 fix(release): Support 3-part version numbers (e.g., v0.96.1)
Update the version extraction logic in the release workflow to properly
handle 3-part semantic versions like v0.96.1 in addition to existing
2-part versions like v0.96.

MSI installers require 4-part versions (major.minor.build.revision):
- v0.96 → 0.96.0.0 (unchanged behavior)
- v0.96.1 → 0.96.1.0 (new support)
- v0.96.1.2 → 0.96.1.2 (passthrough)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-25 12:56:13 +01:00
Carlos Fernandez Sanz
a8f25ce25e fix(installer): Fix Windows MSI installer for WiX v6 2025-12-25 11:53:45 +01:00
Carlos Fernandez Sanz
2781a7f7d6 docs(mac): Add documentation for -system-libs build mode 2025-12-25 11:00:47 +01:00
Carlos Fernandez
903ccc1442 chore: trigger CI rerun 2025-12-25 09:59:16 +01:00
Hridya
857a3bc9c6 docs: add basic usage example to documentation 2025-12-25 13:53:15 +05:30
Hridya
c2c589d6f6 docs: remove outdated Travis CI badge from README 2025-12-25 12:44:02 +05:30
GAURAV KARMAKAR
941604b33c docs(mac): Add documentation for -system-libs build mode 2025-12-25 02:15:02 +05:30
Carlos Fernandez
1950f096b6 fix(workflow): Extract only numeric version for MSI
MSI version numbers must be numeric (major.minor.build format).
Strip everything after the first dash from tag names to get valid
version numbers (e.g., v1.08-test becomes 1.08).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-24 20:05:20 +01:00
Carlos Fernandez
1fc5ec00d4 fix(installer): Use correct WiX v4+ attribute name 'Scope' not 'InstallScope'
🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-24 19:03:53 +01:00
Carlos Fernandez
c0deae4b0c fix(installer): Add InstallScope=perMachine and update InstallerVersion
- Set InstallScope="perMachine" to ensure proper admin-level registry access
- Bump InstallerVersion from 200 to 500 (Windows Installer 5.0)

This should fix the "Could not write key VersionMinor to Product" error.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-24 18:20:18 +01:00
Carlos Fernandez
84692b5658 fix(installer): Disable path validation to avoid local drive errors
🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-24 17:23:09 +01:00
Carlos Fernandez
4a51ad114e fix(installer): Use custom UI without license dialog
Instead of trying to override WixUI_InstallDir, create a custom UI
based on it but without the LicenseAgreementDlg. This is the proper
way to remove dialogs from WiX UI sets.

- Add CustomUI.wxs with dialog flow: Welcome -> InstallDir -> VerifyReady
- Update installer.wxs to use CustomInstallDirUI instead of WixUI_InstallDir
- Update workflow to build both .wxs files

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-24 16:25:35 +01:00
Carlos Fernandez
6789376b92 fix(installer): Try Order=999 to force dialog override to fire last
🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-24 16:01:43 +01:00
Carlos Fernandez
ea5125f030 fix(installer): Use Order attribute to override license dialog navigation
The previous Publish elements without Order didn't override the defaults.
Adding Order="1" ensures our overrides fire after the WixUI defaults,
making our InstallDirDlg navigation take precedence.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-24 13:45:28 +01:00
Carlos Fernandez Sanz
000b39775c Fix typo: 'sring' -> 'string' in DVB subtitle decoder 2025-12-24 12:02:34 +01:00
Carlos Fernandez
23fe02f0d2 fix(installer): Skip license dialog with Publish overrides
Override the WixUI_InstallDir dialog sequence to skip the license
agreement dialog, restoring the original behavior before WiX v6 migration.

- WelcomeDlg Next button now goes directly to InstallDirDlg
- InstallDirDlg Back button returns to WelcomeDlg

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-24 11:47:33 +01:00
Carlos Fernandez
394fb39a9c fix(installer): Update DLL list to match current build output
The installer.wxs was referencing old FFmpeg DLLs that no longer exist:
- avcodec-57.dll → avcodec-60.dll
- avformat-57.dll → avformat-60.dll
- avutil-55.dll → avutil-58.dll
- swresample-2.dll → swresample-4.dll
- swscale-4.dll → swscale-7.dll

Added new DLLs that are now part of the build:
- avdevice-60.dll, avfilter-9.dll, postproc-57.dll
- libgpac.dll, OpenSVCDecoder.dll
- libcryptoMD.dll, libsslMD.dll
- desktop_drop_plugin.dll

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-24 10:07:01 +01:00
Harshdhall01
294bf5bc18 Fix typo: 'sring' -> 'string' in DVB subtitle decoder 2025-12-24 13:54:47 +05:30
Carlos Fernandez
4e52e61c91 fix: Remove duplicate WiX property declarations
The <ui:WixUI Id="WixUI_InstallDir" InstallDirectory="INSTALLFOLDER" />
element already defines WIXUI_INSTALLDIR (via the InstallDirectory attribute)
and ARPNOMODIFY (in the wixlib). Declaring them again causes WIX0091 errors.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-24 09:05:55 +01:00
Carlos Fernandez
faaaabf63c fix(installer): Add missing WIXUI_INSTALLDIR property and fix RemoveFolder ID
- Added WIXUI_INSTALLDIR property (required per WiX issue #7105)
- Changed RemoveFolder Id from "DesktopFolder" to "RemoveDesktopShortcut"
  to avoid ID conflict with StandardDirectory element

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-24 07:28:33 +01:00
Carlos Fernandez
f5a9018ef0 fix(release): Upgrade WiX from v4.0.0-preview.0 to v6.0.2 stable
The WiX build was failing due to several WiX v4 to v6 migration issues.

Workflow changes:
- Uninstall existing WiX before installing v6.0.2 (force clean install)
- WiX version: 4.0.0-preview.0 -> 6.0.2
- Extension: WixToolset.UI.wixext/4.0.0-preview.0 -> WixToolset.UI.wixext/6.0.2
- Fixed extension command syntax: "extension -g add" -> "extension add -g"

installer.wxs changes (WiX v6 migration):
- Added ui namespace: xmlns:ui="http://wixtoolset.org/schemas/v4/wxs/ui"
- Replaced custom inline UI with standard <ui:WixUI Id="WixUI_InstallDir">
  (fixes WIX0094 error for WixUIValidatePath custom action)
- Changed Directory to StandardDirectory for DesktopFolder (fixes WIX5437)

See: https://github.com/orgs/wixtoolset/discussions/6516
     https://github.com/wixtoolset/issues/issues/6998

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-24 07:14:18 +01:00
Carlos Fernandez
e01720c05e fix: Use WiX extension by name instead of hardcoded path
The WiX v4 extension path was hardcoded and didn't match the actual
installed location. WiX v4 allows referencing globally installed
extensions by name directly.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-23 23:35:10 +01:00
Carlos Fernandez
f80b1f26ca fix(ci): Add -Force to Expand-Archive for Flutter GUI
The installer directory already has files from the copy step, so
Expand-Archive needs -Force to overwrite/merge.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-23 22:43:07 +01:00
GAURAV KARMAKAR
e42bc2b9f9 fixed the merged conflict in the ccx_encoders_common.h 2025-12-24 02:25:53 +05:30
Carlos Fernandez
f9ebfd2a32 fix(ci): Add vcpkg setup and fix permissions in release workflow
- Add permissions: contents: write for upload-release-assets
- Add vcpkg environment variables and setup steps from build_windows.yml
- Add gpac installation
- Add vcpkg clone, bootstrap, and dependency installation
- Add VCPKG_ROOT env var to build step
- Change runner to windows-2022 to match build workflow
- Add msbuild-architecture: x64
- Remove redundant llvm/clang setup (pre-installed on runner)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-23 21:53:40 +01:00
Gaurav karmakar
bf9841a255 Merge branch 'master' into gaurav-v1 2025-12-24 01:55:53 +05:30
Carlos Fernandez
9f670de8ed fix(windows): Use latest Windows SDK instead of hardcoded version
Changed WindowsTargetPlatformVersion from 10.0.22621.0 to 10.0 to
automatically use whichever Windows 10 SDK is installed on the build
machine. This fixes CI failures when the runner has a different SDK
version installed.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-23 21:20:51 +01:00
Carlos Fernandez
fc4a14e7d6 0.96 release, for real 2025-12-23 21:09:47 +01:00
Carlos Fernandez Sanz
4f13b861cd Merge pull request #1888 from CCExtractor/fix/release-workflow-x64
fix(ci): Update Windows release build to use x64 platform
2025-12-23 21:03:16 +01:00
Carlos Fernandez
df692f296d fix(ci): Update Windows release build to use x64 platform
The solution file only has x64 configurations (Release-Full|x64,
Debug-Full|x64). The workflow was incorrectly trying to build with
Win32 platform which doesn't exist.

Changes:
- Platform=Win32 → Platform=x64
- Output path ./Release-Full/ → ./x64/Release-Full/

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-23 20:58:56 +01:00
Carlos Fernandez Sanz
419fc4694d Changelog clean up and start of new version
docs: Add Upcoming section to changelog
2025-12-23 19:38:25 +01:00
Carlos Fernandez Sanz
fc230fc217 feat(teletext): Add multi-page extraction with separate output files (#665) 2025-12-23 19:37:12 +01:00
Carlos Fernandez
825e160e72 Clean up CHANGES.TXT 2025-12-23 19:33:23 +01:00
Carlos Fernandez
8e24c17c1e Clean up CHANGES.TXT 2025-12-23 19:30:32 +01:00
Carlos Fernandez
4e21fae053 docs: Add Upcoming section to changelog with teletext multi-page feature
Start new changelog section for unreleased changes. First entry is
the multi-page teletext extraction feature (#665) which allows
extracting multiple teletext pages simultaneously with separate
output files.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-23 17:42:50 +01:00
Carlos Fernandez
be239a5c46 fix: Restore teletext auto-detect mode for single-page extraction
The page update logic at line 1029-1035 was incorrectly updating
tlt_config.page for all accepted pages, even in single-page auto-detect
mode. This caused the auto-detect logic at line 979 to be bypassed
because the first packet (even with an invalid page number like 0xFF)
would set tlt_config.page, preventing proper auto-detection.

The fix restricts the page update to multi-page mode only. In single-page
mode, tlt_config.page is set exclusively by:
1. User specification (--tpage option)
2. Auto-detect logic (first valid subtitle page found)

This fixes regression in SP Test 76 which uses sample
8c1615c1a84d4b9b34134bde8085214bb93305407e935edcdfd4c2fc522c215f.mpg
with --autoprogram --out=ttxt --latin1.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-23 16:36:02 +01:00
Carlos Fernandez
1d9f32239e docs: Add doxygen comments to should_accept_page function
🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-23 15:43:54 +01:00
Carlos Fernandez
cbb5f0b0a8 fix(clippy): Use RangeInclusive::contains() instead of manual range check
🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-23 14:41:18 +01:00
Carlos Fernandez
fd063931ea feat(teletext): Add multi-page extraction with separate output files (#665)
Implement support for extracting multiple teletext pages simultaneously,
with each page output to a separate file.

Changes:
- Support multiple --tpage arguments (e.g., --tpage 397 --tpage 398)
- Create separate output files per page with _pNNN suffix
  (e.g., output_p397.srt, output_p398.srt)
- Maintain backward compatibility for single-page extraction (no suffix)
- Add per-page SRT counters for correct subtitle numbering
- Fix BCD to decimal page number conversion in telxcc.c
- Add --tpages-all mode support for auto-detecting all pages

Tested with 21 teletext samples from the sample platform, all passing.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-23 14:28:15 +01:00
Carlos Fernandez Sanz
7a9acb7bd2 Merge pull request #1883 from CCExtractor/dependabot/github_actions/actions/upload-artifact-6
build(deps): Bump actions/upload-artifact from 4 to 6
2025-12-23 10:19:30 +01:00
Carlos Fernandez Sanz
cbf180eb39 build(deps): Bump actions/checkout from 4 to 6 2025-12-23 10:19:16 +01:00
Carlos Fernandez Sanz
614e6c42b5 build(deps): Bump softprops/action-gh-release from 1 to 2 2025-12-23 10:18:50 +01:00
Carlos Fernandez Sanz
38bcb7ed85 Merge pull request #1884 from CCExtractor/dependabot/github_actions/actions/cache-5
Routine dependency update for GitHub Actions
2025-12-23 09:32:05 +01:00
Carlos Fernandez Sanz
d57354830e chore: Bump version to 0.96 2025-12-23 00:06:45 +01:00
Carlos Fernandez Sanz
7b43201ce1 fix(mp4/mkv): Add HEVC/H.265 caption extraction for MP4 and Matroska containers 2025-12-23 00:06:12 +01:00
Carlos Fernandez Sanz
ea1c82ac17 [FIX] Handle NULL bitmap gracefully in OCR instead of crashing (#1010) 2025-12-23 00:05:32 +01:00
dependabot[bot]
b3f1e27f5c build(deps): Bump actions/cache from 4 to 5
Bumps [actions/cache](https://github.com/actions/cache) from 4 to 5.
- [Release notes](https://github.com/actions/cache/releases)
- [Changelog](https://github.com/actions/cache/blob/main/RELEASES.md)
- [Commits](https://github.com/actions/cache/compare/v4...v5)

---
updated-dependencies:
- dependency-name: actions/cache
  dependency-version: '5'
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-12-22 18:02:20 +00:00
dependabot[bot]
82c92d3910 build(deps): Bump actions/upload-artifact from 4 to 6
Bumps [actions/upload-artifact](https://github.com/actions/upload-artifact) from 4 to 6.
- [Release notes](https://github.com/actions/upload-artifact/releases)
- [Commits](https://github.com/actions/upload-artifact/compare/v4...v6)

---
updated-dependencies:
- dependency-name: actions/upload-artifact
  dependency-version: '6'
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-12-22 18:02:11 +00:00
dependabot[bot]
5bf8e7de0d build(deps): Bump actions/checkout from 4 to 6
Bumps [actions/checkout](https://github.com/actions/checkout) from 4 to 6.
- [Release notes](https://github.com/actions/checkout/releases)
- [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md)
- [Commits](https://github.com/actions/checkout/compare/v4...v6)

---
updated-dependencies:
- dependency-name: actions/checkout
  dependency-version: '6'
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-12-22 18:02:04 +00:00
dependabot[bot]
5b8a9709df build(deps): Bump softprops/action-gh-release from 1 to 2
Bumps [softprops/action-gh-release](https://github.com/softprops/action-gh-release) from 1 to 2.
- [Release notes](https://github.com/softprops/action-gh-release/releases)
- [Changelog](https://github.com/softprops/action-gh-release/blob/master/CHANGELOG.md)
- [Commits](https://github.com/softprops/action-gh-release/compare/v1...v2)

---
updated-dependencies:
- dependency-name: softprops/action-gh-release
  dependency-version: '2'
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-12-22 18:01:54 +00:00
Carlos Fernandez Sanz
063786c4b7 [FEATURE] Add AppImage build variants and CI workflow (#1348) 2025-12-22 09:12:36 +01:00
GAURAV KARMAKAR
6ed09ea397 SPUPNG: fix formatting to match clang-format 2025-12-22 13:22:25 +05:30
Carlos Fernandez
44363c0acd fix(mkv): Add HEVC/H.265 caption extraction for Matroska containers
Extends HEVC caption extraction support to MKV files.

Changes to matroska.h:
- Add hevc_codec_id constant for V_MPEGH/ISO/HEVC
- Add hevc_track_number field to matroska_ctx structure
- Add process_hevc_frame_mkv() function declaration

Changes to matroska.c:
- Detect HEVC tracks in parse_segment_track_entry()
- Modify parse_simple_block() to route HEVC tracks to HEVC processor
- Add process_hevc_frame_mkv() with is_hevc flag and store_hdcc() call
- Parse HEVCDecoderConfigurationRecord in parse_private_codec_data()
- Initialize hevc_track_number in matroska_loop()
- Update output messages to report HEVC tracks

Tested with HEVC MKV file - extracts 73 captions matching MP4 output.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-22 05:59:23 +01:00
Carlos Fernandez
701271ec82 fix(mp4): Add HEVC/H.265 caption extraction for MP4 containers
PR #1852 added HEVC caption extraction for MPEG-TS containers,
but MP4/MKV containers weren't supported. This adds HEVC support
for MP4 containers using GPAC.

Changes:
- Add HEVC subtype definitions (hev1, hvc1)
- Add process_hevc_sample() to parse HEVC NAL units and extract CC
- Add process_hevc_track() to iterate through HEVC track samples
- Detect and process HEVC tracks in processmp4()
- Add store_hdcc() call to flush buffered CC data after each sample

The key fix was adding store_hdcc() after processing each sample.
Without this, CC data was being parsed but never output because
store_hdcc() is normally called from slice_header() which is
AVC-only.

Closes #1690 (for MP4 containers)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-22 05:59:23 +01:00
Carlos Fernandez
7c74ea4112 docs: Add 0.96 (Unreleased) section to CHANGES.TXT
Move all changes made after the 0.95 version bump (commit ee232b5)
to a new 0.96 section marked as "Unreleased".

This separates the released 0.95 content from ongoing development
work that will be included in the next release.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-22 05:58:01 +01:00
Carlos Fernandez
ed42525f44 chore: Bump version to 0.96
Update version strings across all build configurations:
- src/lib_ccx/lib_ccx.h
- linux/configure.ac
- mac/configure.ac
- package_creators/PKGBUILD
- package_creators/ccextractor.spec
- package_creators/debian.sh
- OpenBSD/Makefile

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-22 05:58:01 +01:00
Carlos Fernandez
b88d1ebab2 fix(ci): Fix AppImage build failures for OCR and HardSubX variants
OCR build fix:
- linuxdeploy was failing with "Invalid magic bytes in file header"
  because it was passed the wrapper script instead of the actual binary
- When OCR is enabled, ccextractor is renamed to ccextractor.bin and
  a wrapper script sets TESSDATA_PREFIX before executing the binary
- Now correctly passes ccextractor.bin to linuxdeploy when it exists

HardSubX build fix:
- Add libavdevice-dev to FFmpeg dependencies in CI workflow
- rusty_ffmpeg requires libavdevice which was missing

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-21 22:47:24 +01:00
Carlos Fernandez
ec11b00f9f fix(ci): Use correct Rust toolchain action name 2025-12-21 22:40:06 +01:00
Carlos Fernandez
8c0fe08781 feat: Add AppImage build variants and CI workflow (#1348)
Rewrites the AppImage build script to support three build variants
matching the Docker build options:
- minimal: Basic CCExtractor without OCR (smallest size)
- ocr: CCExtractor with OCR support (default)
- hardsubx: CCExtractor with burned-in subtitle extraction

Changes to build_appimage.sh:
- Add BUILD_TYPE environment variable to select variant
- Fix CMake options (was incorrectly using make flags)
- Bundle tessdata for OCR builds with wrapper script
- Create proper desktop file and icon handling
- Improve error handling and cleanup

New GitHub Actions workflow (build_appimage.yml):
- Builds all three variants on release
- Uploads AppImages as release assets
- Can be manually triggered for specific variants
- Caches GPAC build for faster CI runs

Usage:
  ./build_appimage.sh              # Builds 'ocr' variant
  BUILD_TYPE=minimal ./build_appimage.sh
  BUILD_TYPE=hardsubx ./build_appimage.sh

Closes #1348

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-21 22:37:22 +01:00
Carlos Fernandez
3304c1b094 fix(ocr): Handle NULL bitmap gracefully instead of crashing (#1010)
When processing DVB subtitles from live streams or corrupted files,
the bitmap clipping operation can fail, resulting in a NULL pix object.
Previously, this would cause a fatal crash with "Failed to perform OCR -
Failed to get text" because the code continued to call TessBaseAPIGetUTF8Text
even when no image was set.

Changes:
- Handle cpix_gs == NULL by logging a message and returning NULL
  (skip this bitmap) instead of continuing and crashing
- Change the fatal error when TessBaseAPIGetUTF8Text returns NULL
  to a non-fatal skip, since this can happen with empty/invalid bitmaps
- Both cases now properly clean up allocated resources before returning

This allows CCExtractor to gracefully skip problematic subtitle frames
instead of crashing, which is especially important for live streams
where packet loss or discontinuities can occur.

Fixes #1010

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-21 22:25:35 +01:00
Carlos Fernandez
5bad3732c3 chore: Remove plan files from git tracking
The plans/ directory is in .gitignore but these files were added
before that entry existed. Removing from tracking while keeping
files on disk.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-21 21:46:39 +01:00
Carlos Fernandez Sanz
e3b0defb49 build(rust): Upgrade bindgen to 0.72.1 for Fedora packaging 2025-12-21 21:38:02 +01:00
Carlos Fernandez Sanz
2065c5509d fix(windows): Fix c_long ABI mismatch causing Windows CI failures 2025-12-21 20:16:56 +01:00
Carlos Fernandez
5458370346 refactor: Replace c_longlong with i64 for consistency
For clarity and consistency, use explicit i64 instead of c_longlong.
While c_longlong is 64-bit on all platforms, i64 is clearer and
follows the same pattern as the previous commit that removed c_long.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-21 17:55:57 +01:00
Carlos Fernandez
9e19c58edf refactor: Replace platform-dependent 'long' with 'int64_t'
The C type 'long' has different sizes on different platforms:
- Linux: 64-bit
- Windows: 32-bit

This causes ABI mismatches when interfacing with Rust, since Rust's
c_long matches the platform's long size, but we were treating these
values as 64-bit throughout.

Changed the following fields from 'long' to 'int64_t':
- asf_constants.h: parsebufsize
- avc_functions.h: cc_databufsize, num_nal_unit_type_7, num_vcl_hrd,
  num_nal_hrd, num_jump_in_frames, num_unexpected_sei_length
- ccx_decoders_608.h: bytes_processed_608
- ccx_demuxer.h: capbufsize, capbuflen
- lib_ccx.h: ts_readstream() return type, FILEBUFFERSIZE
- file_functions.c: FILEBUFFERSIZE definition
- ts_functions.c: ts_readstream() implementation

Also updated Rust code in common.rs to remove c_long casts, since
bindgen will now generate i64 for these fields.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-21 17:52:24 +01:00
Carlos Fernandez Sanz
0bb56d508a fix(timing): Fix --goptime producing compressed timestamps 2025-12-21 17:50:53 +01:00
Carlos Fernandez
2c67381d2b fix(windows): Fix c_long ABI mismatch in demuxer.rs
The extern declaration for ccxr_add_current_pts used c_long, but the
actual implementation in time.rs uses i64. This caused an ABI mismatch
on Windows where:
- c_long = i32 (32-bit)
- i64 = 64-bit

On Linux both are 64-bit so it worked, but on Windows the type
mismatch could cause incorrect parameter passing.

Changes:
- Change extern fn declaration from c_long to i64
- Remove unnecessary cast (FRAME_DURATION_TICKS is already i64)
- Remove unused c_long import

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-21 17:00:50 +01:00
GAURAV KARMAKAR
2b708c4a31 Enhance SPUPNG offset calculations and XML tag handling in EIA608 encoder
- Introduced a forward declaration for .
- Updated  to calculate and set image dimensions before writing XML tags.
- Adjusted offset calculations based on screen size for better alignment of subtitles.
- Improved handling of the opening XML tag based on subtitle data presence.
2025-12-21 19:20:28 +05:30
Carlos Fernandez
94a43928ad fix(timing): Fix --goptime producing compressed timestamps (Test 163)
When using --goptime, timestamps were compressed to 00:00:01-02 instead
of actual GOP times (17:56:40-47). This was caused by conflicts between:
- GOP timing set from GOP headers (wall-clock time, e.g., 17:56:40)
- PES PTS timing (stream-relative time, e.g., 00:00:02)

The sync detection saw these as 64,598-second "jumps" and kept resetting
timing, corrupting the output.

Fixes:
1. Guard video PES timing in general_loop.c - skip set_current_pts and
   set_fts when use_gop_as_pts == 1 to prevent PES PTS from overwriting
   GOP-based timing
2. Disable sync check in ccextractor.c when use_gop_as_pts == 1 since
   GOP time and PES PTS are in different time bases and sync detection
   is meaningless

Test results:
- Before: 00:00:01,231 --> 00:00:01,729
- After:  17:56:41,319 --> 17:56:43,084

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-21 12:34:05 +01:00
Carlos Fernandez Sanz
25d68b75bd fix(708): Support Korean EUC-KR encoding in CEA-708 decoder 2025-12-21 12:23:39 +01:00
Carlos Fernandez
73cd19f5d0 fix(rust): Use i64 instead of c_long for Windows compatibility
On Windows, c_long is i32 while on Linux it's i64. The function
ccxr_print_mstime_static expects i64, so casting to c_long caused
a type mismatch error on Windows builds.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-21 09:43:27 +01:00
Carlos Fernandez
d0caf23a82 fix(timing): Use i64 instead of c_long for Windows compatibility
The Rust FFI functions were using c_long for PTS/FTS timestamps, but:
- C code uses LLONG (int64_t, 64 bits on all platforms)
- Rust c_long is 32 bits on Windows, 64 bits on Linux

This caused timestamp truncation on Windows when PTS values exceeded
2^31 (~24 days at 90kHz), resulting in wrong subtitle timestamps.

For example, a file with Min PTS of 23:50:45 (7,726,090,500 ticks)
would have its PTS truncated, breaking the teletext delta calculation
that normalizes timestamps to start at 0.

Changes:
- ccxr_add_current_pts: pts parameter i64
- ccxr_set_current_pts: pts parameter i64
- ccxr_get_fts: return type i64
- ccxr_get_visible_end: return type i64
- ccxr_get_visible_start: return type i64
- ccxr_get_fts_max: return type i64
- ccxr_print_mstime_static: mstime parameter i64
- fts_at_gop_start: extern static i64

Fixes tests 18 and 19 on Windows CI which showed raw PTS timestamps
(23:50:46) instead of normalized timestamps (00:00:00).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-21 09:43:27 +01:00
Carlos Fernandez
da3dc52b45 fix(708): Support Korean EUC-KR encoding in CEA-708 decoder
Korean broadcasts use EUC-KR encoding (variable-width) in CEA-708
captions, where ASCII is 1 byte and Korean characters are 2 bytes.
The decoder was always writing 2 bytes per character (UTF-16BE style),
causing NULL bytes to be inserted before every ASCII character.

Changes:
- Add is_utf16_charset() to detect fixed-width 16-bit encodings
- Modify write_char() to accept use_utf16 flag:
  - true: Always 2 bytes (UTF-16BE for Japanese, issue #1451)
  - false: 1 byte for ASCII, 2 bytes for extended (EUC-KR for Korean)
- Detect charset type in write_row() before building output buffer

This fixes Korean subtitle extraction when using --service "1[EUC-KR]"
while maintaining compatibility with Japanese UTF-16BE (issue #1451).

Closes #1065

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-21 09:43:27 +01:00
Carlos Fernandez Sanz
0fdfb751ba fix(708): Handle null timing pointer in CEA-708 settings conversion 2025-12-21 09:41:25 +01:00
Carlos Fernandez Sanz
0b5f13e2c4 feat(wtv): Add DVB teletext stream detection in WTV files 2025-12-21 09:40:59 +01:00
Carlos Fernandez
60cec9e6de style: Fix clang-format indentation
🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-21 09:38:50 +01:00
Carlos Fernandez Sanz
d758f3156a fix(windows): Prevent CEA-708 output file truncation on Windows 2025-12-21 09:36:32 +01:00
Carlos Fernandez Sanz
da802a0a39 fix(security): Add bounds checks for buffer overflow vulnerabilities 2025-12-21 09:35:47 +01:00
Carlos Fernandez
8f78a8bbb2 fix(708): Handle null timing pointer in CEA-708 settings conversion
When converting CEA-708 decoder settings from C to Rust via from_ctype(),
a null timing pointer would cause the entire conversion to fail and return
None. This triggered the unwrap_or(default()) fallback, resetting critical
settings like `enabled` and `services_enabled` to false/0.

This caused CEA-708 captions to not be extracted (exit code 10) even when
--service was specified, because the decoder's is_active flag was reset
to 0 during demuxer initialization.

The fix handles null timing pointer gracefully by using a default
CommonTimingCtx instead of propagating None, preserving the other
decoder settings.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-20 22:34:44 +01:00
Carlos Fernandez
e87807ec27 feat(wtv): Add DVB teletext stream detection in WTV files
This commit adds detection and basic handling of DVB teletext streams
in WTV (Windows TV) files. Previously, teletext streams were silently
ignored.

Changes:
- Add WTV_STREAM_TELETEXT GUID to wtv_constants.h
- Detect teletext streams by examining the format GUID at offset 0x4C
  in MSTVCAPTION stream metadata
- Initialize teletext decoder when teletext stream is found
- Add timing support for teletext streams
- Wrap teletext data in PES headers for the teletext decoder

Limitation: WTV files store teletext in Microsoft's VBI sample format,
which differs from standard DVB teletext data units. The decoder will
process the data but may not extract subtitles from all WTV files.
This is noted in a warning message shown when teletext is detected.
Even FFmpeg's libzvbi fails to decode this format in the test sample.

Addresses: #1391

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-20 21:58:50 +01:00
Carlos Fernandez
d097ec881c build(rust): Upgrade bindgen to 0.72.1 for Fedora packaging
Fixes #1608 - Update bindgen to enable Fedora Linux packaging.

- Upgrade bindgen from 0.64.0 to 0.72.1
- Fix deprecated CargoCallbacks API
- Replace (?i) regex flags with character classes for compatibility

The inline case-insensitivity flag (?i) causes bindgen 0.72.1 to
silently produce empty bindings. This fix uses [Dd][Tt][Vv][Cc][Cc]
character classes to match both lowercase (dtvcc_*) and uppercase
(DTVCC_*) type/function names.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-20 21:04:28 +01:00
Carlos Fernandez Sanz
87c898497a build(linux): Suppress find error when GPAC is not installed 2025-12-20 19:56:30 +01:00
Carlos Fernandez
49b698259d fix(windows): Prevent CEA-708 output file truncation on Windows
On Windows, when processing MP4/MOV files with CEA-708 captions, the
output file was being truncated to only the last subtitle. This occurred
because:

1. C code opened the file using open() and stored the fd in writer->fd
2. At end of processing, Rust's ccxr_flush_decoder was called
3. Rust checked writer->fhandle (a separate Windows-specific field)
4. Since fhandle was null (C only set fd), Rust called File::create()
5. File::create() truncates existing files, losing all previous content

The fix checks if fd is already valid before creating a new file. If fd
is valid, it converts it to a Windows handle using _get_osfhandle(),
avoiding the file truncation.

Fixes #1449

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-20 19:55:12 +01:00
Carlos Fernandez
5715d6d315 build(linux): Suppress find error when GPAC is not installed
Redirect stderr to /dev/null for the GPAC source file search to avoid
showing "No such file or directory" error when GPAC is not installed.
The build continues to work correctly in both cases.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-20 19:35:35 +01:00
Carlos Fernandez
9fddaab3b0 fix(security): Add bounds checks for buffer overflow vulnerabilities
Fixes two buffer overflow vulnerabilities reported in issues #1427 and #1428:

- #1428 (Global buffer overflow in slice_header): The slice_type value
  read from H.264 exp-golomb data was used to index slice_types[] array
  without bounds checking. Valid values are 0-9 per H.264 spec Table 7-6.
  Now validates slice_type < 10 before use.

- #1427 (Heap buffer overflow in parse_PMT): ES_info_length from PMT
  descriptor data was trusted without validation against buffer bounds.
  Malformed PMT with excessive ES_info_length could read past buffer end.
  Now validates ES_info_length and descriptor lengths against buffer.

Both issues were discovered using AddressSanitizer with crafted TS files.

Fixes #1427
Fixes #1428

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-20 19:34:22 +01:00
Carlos Fernandez Sanz
6fdfde0838 fix(mac): Fix HARDSUBX configure script and add documentation 2025-12-20 19:06:17 +01:00
Carlos Fernandez
8db7fc7a6d fix(mac): Correct leptonica library name in configure.ac
Homebrew installs leptonica as 'libleptonica.dylib', not 'liblept.dylib'.
Changed AC_CHECK_LIB from [lept] to [leptonica] to match the actual
library name on macOS.
2025-12-20 18:56:02 +01:00
Carlos Fernandez
d8504f80bd ci(mac): Set Homebrew paths for autoconf HARDSUBX build
The AC_CHECK_LIB checks in configure.ac need LDFLAGS and CPPFLAGS
to find libraries installed via Homebrew (in /opt/homebrew on Apple
Silicon or /usr/local on Intel Macs).
2025-12-20 18:48:43 +01:00
Carlos Fernandez
70404c29ca fix(mac): Fix HARDSUBX configure script and add documentation
Fixes #1173 - Error in ./configure enabling hardsubx on Mac
Fixes #1306 - Add HARDSUBX compilation docs for macOS

The configure.ac script failed on macOS with "binary operator expected"
because pkg-config output was unquoted. When pkg-config returns multiple
libraries (e.g., "-ltesseract -lcurl"), the unquoted expansion caused
`test ! -z` to receive multiple arguments instead of a single string.

Changes:
- Quote pkg-config output in TESSERACT_PRESENT conditional (mac & linux)
- Add macOS section to docs/HARDSUBX.txt with all build methods
- Add GitHub Actions jobs to test HARDSUBX builds on macOS:
  - build_shell_hardsubx: Tests ./build.command -hardsubx
  - build_autoconf_hardsubx: Tests ./configure --enable-hardsubx

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-20 18:41:37 +01:00
Carlos Fernandez Sanz
feb2a61c1d fix(ts): Skip broken PES packets instead of terminating file processing 2025-12-20 18:22:22 +01:00
Carlos Fernandez Sanz
6503502624 fix(mcc): Add MCC output support for raw caption files 2025-12-20 18:21:39 +01:00
Carlos Fernandez Sanz
bf271de52c build(mac): Add -system-libs flag for Homebrew compatibility 2025-12-20 18:20:59 +01:00
Carlos Fernandez Sanz
67e560d288 build(autoconf): Add GPAC library detection to configure 2025-12-20 18:19:57 +01:00
Carlos Fernandez Sanz
54bc97a3f8 fix(hevc): Add HEVC/H.265 caption extraction support with B-frame reordering 2025-12-20 18:18:27 +01:00
Carlos Fernandez Sanz
3d7c534824 ci: Add Docker build workflow to test all image variants 2025-12-20 18:13:49 +01:00
Carlos Fernandez
eda489265d fix(mac): Correct lib_hash include path for system-libs build
The include "../lib_hash/sha2.h" in params.c requires an include path
that makes "../lib_hash" resolve to "thirdparty/lib_hash".

Changed -I../src/lib_hash (which doesn't exist) to
-I../src/thirdparty/lib_hash. With this path, the compiler searches
for "../lib_hash/sha2.h" as:
  ../src/thirdparty/lib_hash/../lib_hash/sha2.h
  = ../src/thirdparty/lib_hash/sha2.h ✓

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-20 18:13:12 +01:00
Carlos Fernandez
0ac093e4b2 ci: Add Docker build workflow to test all image variants
Tests all three Dockerfile build types in parallel:
- minimal: Basic CCExtractor without OCR
- ocr: CCExtractor with Tesseract OCR support
- hardsubx: CCExtractor with burned-in subtitle extraction

Each job builds from local source and verifies the image works
by running --version. Uses GitHub Actions cache for faster rebuilds.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-20 18:06:27 +01:00
Carlos Fernandez
6838666b79 build(mac): Add -system-libs flag for Homebrew compatibility
Add a new `-system-libs` flag to mac/build.command that uses
system-installed libraries via pkg-config instead of bundled ones.
This enables Homebrew formula compatibility while preserving the
default standalone build behavior.

When `-system-libs` is passed:
- Uses pkg-config for: freetype2, gpac, libpng, libprotobuf-c,
  libutf8proc, zlib
- Does not compile bundled thirdparty sources
- Links against system libraries

Default behavior (no flag):
- Compiles bundled libraries as before
- No change to existing builds

Also adds a CI job `build_shell_system_libs` to test the new flag.

Refs #1580, #1534

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-20 17:58:46 +01:00
Carlos Fernandez
08d59ecb5f build(autoconf): Add GPAC library detection to configure
Previously, configure would succeed even without GPAC installed,
leading to a confusing compile-time error:
  "gpac/isomedia.h: No such file or directory"

Now configure checks for GPAC via pkg-config and fails early with
a helpful error message listing the package names for common distros:
  - gpac-devel (Fedora/RHEL)
  - libgpac-dev (Debian/Ubuntu)
  - gpac (Arch)

Fixes #1584

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-20 17:36:54 +01:00
Carlos Fernandez Sanz
2ce3e0c0de fix(docker): Rewrite Dockerfile to fix broken builds 2025-12-20 17:29:14 +01:00
Carlos Fernandez
3f45a4e136 fix(docker): Rewrite Dockerfile to fix broken builds
Fixes #1550 - Docker builds were broken after PR #1535 switched from
vendored GPAC to system GPAC.

Changes:
- Switch from Alpine to Debian Bookworm (Alpine's musl libc has issues
  with Rust bindgen's libclang dynamic loading)
- Support three build variants via BUILD_TYPE argument:
  - minimal: No OCR support
  - ocr (default): Tesseract OCR for bitmap subtitles
  - hardsubx: OCR + FFmpeg for burned-in subtitle extraction
- Support dual source modes via USE_LOCAL_SOURCE argument:
  - 0 (default): Clone from GitHub (standalone Dockerfile)
  - 1: Use local source (faster for developers)
- Add .dockerignore to exclude build artifacts (~2.7GB -> ~900KB context)
- Update README.md with comprehensive build instructions

Tested all three variants successfully:
- minimal: ~130MB image
- ocr: ~215MB image
- hardsubx: ~610MB image

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-20 17:27:42 +01:00
Carlos Fernandez
d0d46fc176 fix(mcc): Add MCC output support for raw caption files
Previously, when using -out=mcc with raw input files (-in=raw),
CCExtractor would print "Output format not supported" and produce
no output. This was because the raw file processing path decoded
CEA-608 data to text, but MCC format requires raw cc_data bytes.

The fix adds a new code path that bypasses the 608 decoder when
MCC output is requested:

- Added process_raw_for_mcc() helper function that:
  - Converts 2-byte raw pairs to 3-byte cc_data format
  - Wraps each CC pair in CDP format via mcc_encode_cc_data()
  - Maintains proper timing at 29.97fps

- Modified raw_loop() to detect MCC output and use the new path

Test results with McPoodle raw files:
- Before: "Output format not supported" (exit code 10)
- After: Valid MCC file with proper timing and CDP-wrapped data

Fixes #1542

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-20 11:53:50 +01:00
Carlos Fernandez
3e9ed3043b fix(ts): Skip broken PES packets instead of terminating file processing
Fixes #1455

When read_video_pes_header() encounters a malformed or truncated PES
packet (returns -1), copy_capbuf_demux_data() previously returned
CCX_EOF which terminated the entire file processing. This was overly
aggressive - a single broken PES packet should be skipped, not
terminate the entire file.

UK Freeview DVB recordings from September 2022 onwards contain some
malformed PES packets in the DVB subtitle stream that triggered this
condition, causing ccextractor to stop at 0% with "Processing ended
prematurely" error even though VLC could display the subtitles.

The fix changes the error handling to skip the broken packet and
continue processing:
- Before: return CCX_EOF (terminates file)
- After: return CCX_OK (skips packet, continues)

Test results with UK Freeview sample:
- Before: 0% processed, 0 subtitles extracted
- After: 100% processed, 10 subtitles extracted correctly

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-20 11:08:18 +01:00
Carlos Fernandez
1bdd9abd35 fix(clippy): Suppress dead_code warnings for unused HEVC NAL constants
The HEVC NAL type constants are defined for completeness and reference,
but not all are currently used in the codebase.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-20 10:50:47 +01:00
Carlos Fernandez
9e970fd788 style: Run cargo fmt on avc/core.rs
🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-20 10:35:18 +01:00
Carlos Fernandez
87bc1d9613 style: Fix clang-format issue in ts_functions.c
🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-20 10:34:50 +01:00
Carlos Fernandez
440cd5527f fix(hevc): Fix garbled captions by implementing B-frame reordering
HEVC uses B-frames extensively, causing CC data to arrive in decode
order instead of presentation order. This was causing character pairs
to be scrambled (e.g., "MEDIOCRE" became "MIOEDCRE").

Changes:
- Implement PTS-based sequence numbering for HEVC CC data (similar to H.264)
- Change flush logic to only trigger on IDR frames (not every VCL NAL)
- Add HEVC fallback detection for streams without PAT/PMT

Fixes #1639 (ATSC 3.0 HEVC caption extraction)
Tested with issue_1639_sample.ts and caption_test_1690.ts

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-20 10:34:50 +01:00
Carlos Fernandez
0fbbc06bcf fix(hevc): Add HEVC/H.265 caption extraction support
Fixes #1690 - Captions fail to extract on HEVC video stream

HEVC video streams with embedded EIA-608/708 captions weren't being
extracted, even though VLC/MPV could display them.

Root causes fixed:
1. HEVC stream type (0x24) wasn't recognized for CC extraction
2. HEVC NAL parsing used H.264 format (1-byte) instead of HEVC (2-byte)
3. HEVC SEI types (39/40) weren't handled (only H.264 SEI type 6)
4. CC data accumulation across SEIs caused u8 overflow/garbled output

Changes:
- C code: Add HEVC stream detection, CCX_HEVC buffer type, is_hevc flag
- Rust code: HEVC NAL header parsing (2-byte, type=(byte[0]>>1)&0x3F),
  HEVC SEI handling (PREFIX_SEI=39, SUFFIX_SEI=40), immediate CC flush

Thanks to @trufio465-bot for the initial research in PR #1735.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-20 10:34:50 +01:00
Carlos Fernandez Sanz
5f0c6728bf fix(avc): Handle streams that don't start with NAL start codes 2025-12-20 01:33:37 -08:00
Carlos Fernandez Sanz
b9aabcd60d fix(raw): Fix premature EOF and timing overflow in raw_loop 2025-12-20 01:32:43 -08:00
Carlos Fernandez Sanz
d0243237db fix(args): Add backward compatibility for single-dash long options 2025-12-20 01:32:08 -08:00
Carlos Fernandez Sanz
a86a4ca7ce feat: Add --list-tracks option to list media file tracks 2025-12-20 01:31:38 -08:00
Carlos Fernandez
77624ec678 style: Run cargo fmt on Rust code
🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-20 10:27:22 +01:00
Carlos Fernandez
73db3a2c39 fix(avc): Handle streams that don't start with NAL start codes (#1626)
The AVC parser would fail with "Leading bytes are non-zero" error when
processing HLS/Twitch stream segments that start mid-stream without
proper NAL unit headers at the beginning.

Root cause: When process_avc encountered non-zero leading bytes, it
returned an error with 0 bytes processed. The C code would not remove
any bytes from the buffer, causing subsequent data to accumulate with
the corrupt beginning, leading to infinite errors.

Fix:
- Add find_nal_start_code() to search for valid NAL start codes
- If buffer doesn't start with 0x00 0x00, search for first NAL start
- Skip garbage data before first valid NAL unit
- Return full buffer length when no NAL found (clears the buffer)
- Change forbidden_zero_bit error from fatal to skip-and-continue

Tested with 6 Twitch HLS sample files - all now process correctly.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-20 09:08:14 +01:00
Carlos Fernandez
dd3dab7d52 fix(args): Add backward compatibility for single-dash long options (#1576)
Old versions of ccextractor accepted single-dash long options like
-quiet, -stdout, -autoprogram. The new Rust-based argument parser
(clap) only accepts double-dash options (--quiet, --stdout, etc.).

When users ran scripts with -quiet, clap parsed it as individual
short options -q -u -i -e -t and failed with exit code 7. Users
with stderr redirected never saw the error, causing silent failures
with zero-length output files.

This adds a normalize_legacy_option() function that pre-processes
arguments before passing them to clap:
- Single-dash long options (e.g., -quiet) convert to --quiet
- Double-dash options remain unchanged
- Short options like -o remain unchanged
- Numeric options like -1, -12 remain unchanged

Includes 6 unit tests for the new function.

Fixes #1576

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-20 08:54:48 +01:00
Carlos Fernandez
ebfa31c333 fix(raw): Fix premature EOF and timing overflow in raw_loop (#1565)
Fix raw caption file processing that would stop at exactly 9:43:00 (2MB).

Root causes and fixes:
1. Premature EOF: After processing first chunk (BUFSIZE ~2MB), data->len
   was never reset. On next iteration, general_get_more_data() calculated
   want = BUFSIZE - len = 0 and returned EOF immediately.
   Fix: Reset data->len = 0 after each chunk and change loop condition.

2. 32-bit integer overflow: The calculation cb_field1 * 1001 / 30 * 90
   overflowed for large cb_field1 values (>1M). For example,
   34,989,487 * 90 = 3,149,053,830 exceeds 32-bit signed max.
   Fix: Cast cb_field1 to LLONG before multiplication.

3. Timing initialization: Raw mode needs min_pts=0, sync_pts=0, and
   pts_set=MinPtsSet for correct fts_now calculation.

Tested with sample files from issue #1565:
- DTV3.raw: Now processes to 17:59:56 (was stopping at 9:43)
- DTV4.raw: Now processes to 14:00:00 (was stopping at 9:43)
- DTV5.raw: Now processes to 13:19:59 (was stopping at 9:43)

Closes #1565

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-20 08:37:52 +01:00
Carlos Fernandez
d52d26baf8 style: Format Rust code with cargo fmt
🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-20 07:47:17 +01:00
Carlos Fernandez
3a852b7915 feat: Add --list-tracks option to list media file tracks
Add a new --list-tracks (-L) option that lists all tracks found in
media files without processing them. This is useful for exploring
media files before caption extraction.

Supports:
- Matroska (MKV/WebM) files
- MP4/MOV files
- MPEG Transport Stream files

The feature is implemented entirely in Rust with native parsers for
each format, avoiding dependency on external libraries.

Closes #1669

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-20 07:42:38 +01:00
Carlos Fernandez Sanz
c3f637a10e fix(rust): Handle NULL file pointer in ccxr_demuxer_open for UDP/TCP input 2025-12-19 07:44:16 -08:00
Carlos Fernandez Sanz
f3768625c6 fix(wtv): Set sync_pts alongside min_pts to prevent PTS jump detection 2025-12-19 07:43:39 -08:00
Carlos Fernandez
c733902473 fix(wtv): Set sync_pts alongside min_pts to prevent PTS jump detection
The previous WTV timing fix (commit 300f8ca6) set min_pts and pts_set=2
(MinPtsSet) but didn't set sync_pts. This caused the Rust timing code
to detect a massive PTS jump when processing WTV files with large
initial timestamps (e.g., files recorded at 18:38:23).

The PTS jump detection computes (current_pts - sync_pts), and with
sync_pts=0 but current_pts=6039323550 (18:38:23 in PTS units), the
difference exceeded MAX_DIF and triggered the jump handling, resulting
in empty output.

This fix sets sync_pts to the same value as min_pts when first
initializing timing, preventing the false PTS jump detection.

Test results:
- Before: WTV files with large initial PTS produced empty output
- After: Timestamps match expected ground truth exactly
  (e.g., 00:00:00,601 --> 00:00:02,801 for first caption)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-19 16:40:58 +01:00
Carlos Fernandez
6c44100f97 fix(rust): Handle NULL file pointer in ccxr_demuxer_open for UDP/TCP input
When using --udp or --tcp options, ccxr_demuxer_open() was called with
a NULL file pointer, causing a crash in CStr::from_ptr().

The fix checks if the file pointer is NULL before dereferencing it,
and uses an empty string for network input modes.

Fixes #1846

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-19 15:30:41 +01:00
Carlos Fernandez Sanz
a0593c60e3 fix: RCWT/WTV timing fixes, Latin-1 music note encoding 2025-12-19 06:25:05 -08:00
Carlos Fernandez
300f8ca65a fix(wtv,encoding): Fix WTV timing and Latin-1 music note encoding
WTV timing fix:
- Set min_pts on first valid timestamp to enable fts_now calculation
- Set pts_set = 2 (MinPtsSet) instead of 1 (Received)
- This fixes WTV files where all timestamps were clustered around 1 second
  instead of being spread across the actual video duration

Latin-1 encoding fix:
- Change music note substitution from pilcrow (0xB6) to '#' (0x23)
- Pilcrow caused grep to treat output files as binary
- '#' is a more recognizable substitute for the musical note character

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-19 14:00:35 +01:00
Carlos Fernandez
8988152fa5 fix(rcwt): Fix timestamp calculation for RCWT/BIN format files
The rcwt_loop() function set min_pts = 0 for RCWT files but did not
set pts_set = 2 (MinPtsSet). This caused the Rust timing code to skip
the fts_now calculation (which checks pts_set == MinPtsSet), resulting
in all captions having timestamps compressed near 0 instead of their
correct times spread across the file duration.

The fix adds pts_set = 2 after setting min_pts, which tells the timing
system that min_pts is valid and fts_now can be calculated properly.

Fixes Test 217 timing issue where:
- Before: 00:00:00,001 --> 00:00:00,091 (wrong)
- After:  00:00:02,402 --> 00:00:04,536 (correct)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-19 11:50:57 +01:00
Carlos Fernandez
78642bcf02 ci: Retrigger Sample Platform CI 2025-12-19 09:24:12 +01:00
GAURAV KARMAKAR
609a53f373 [BUG] -out=spupng with EIA608/teletext: offset values in XML may be not correct #893 2025-12-19 13:27:08 +05:30
Carlos Fernandez
0c0e44472d ci: Trigger verification run after merging PRs #1847 and #1848
This PR triggers a fresh CI run to verify the combined effect of:
- PR #1847: Hardsubx crash fix, memory leak fixes, rcwt exit code fix
- PR #1848: XDS empty content entries fix

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-19 07:08:59 +01:00
Carlos Fernandez Sanz
2060db99c8 fix(hardsubx): Fix heap corruption from Rust/C allocator mismatch 2025-12-18 22:02:30 -08:00
Carlos Fernandez Sanz
a299d06d97 fix(xds): Don't output empty XDS content entries 2025-12-18 22:02:04 -08:00
Carlos Fernandez
50b51e4234 fix(xds): Don't output empty XDS content entries
When outputting US TV Parental Guidelines ContentAdvisory XDS data,
the code was always calling xdsprint() for both the age rating and
the content flags (violence, language, etc). However, if there are
no content flags (e.g., for TV-G which has no additional advisories),
the content string is empty.

This caused duplicate XDS entries in the output - one with the age
rating and one with an empty string. The fix only outputs the content
string if it is not empty.

Fixes regression test 113 output mismatch.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-19 05:48:51 +01:00
Carlos Fernandez
0b74c9226a fix(rcwt): Fix incorrect exit code when captions are found in BIN format
The rcwt_loop function was returning exit code 10 (no captions) even
when CEA-608 captions were successfully extracted from RCWT/BIN format
files. This happened because CEA-608 decoding writes directly to the
encoder via printdata() without setting dec_sub->got_output.

Add a check after the main loop (similar to general_loop) that also
considers enc_ctx->srt_counter, enc_ctx->cea_708_counter, and
dec_ctx->saw_caption_block to properly detect when captions were found.

Fixes regression test 217 which was failing with exit code 10.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-19 05:40:52 +01:00
Carlos Fernandez
80957d645b fix(hardsubx): Fix heap corruption from Rust/C allocator mismatch
The hardsubx code was using C's free() on strings allocated by Rust's
CString::into_raw(). Since Rust and C use different memory allocators,
this caused heap corruption that manifested as garbage OCR output after
processing ~27 subtitle frames.

Changes:
- Export free_rust_c_string() from Rust as extern "C" function
- Declare free_rust_c_string() in hardsubx.h for C code
- Replace free(subtitle_text) with free_rust_c_string(subtitle_text)
  in hardsubx_decoder.c for Rust-allocated strings
- Fix memory leaks in process_hardsubx_linear_frames_and_normal_subs()
  where subtitle_text_hard and prev_subtitle_text_hard were not freed
- Remove dummy CI trigger file (no longer needed)

Testing:
- AddressSanitizer: No memory errors detected
- Valgrind: 0 bytes definitely lost, 0 bytes indirectly lost
- Manual testing: OCR output now correct for entire video duration

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-19 05:29:04 +01:00
Carlos Fernandez
80a117e643 fix(hardsubx): Fix memory leaks in hardsubx processing
- Free basefilename in _dinit_hardsubx (allocated by get_basename)
- Free subtitle_text after each frame processing iteration
- Free prev_subtitle_text when replaced and at end of function
- Free sws_ctx with sws_freeContext (was never freed)

Reduces memory leaks from 63,926 bytes to 0 bytes.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-19 04:46:19 +01:00
Carlos Fernandez
63999369b7 fix(hardsubx): Fix multiple memory bugs causing crashes
1. Remove invalid free(tessdata_path) - probe_tessdata_location() returns
   a pointer to static strings or getenv() result, not heap memory.

2. Fix alloc-dealloc mismatch in OCR text handling:
   - TessBaseAPIGetUTF8Text() allocates with C++ operator new[]
   - The code was freeing with C free() causing allocator mismatch
   - Now properly copy string and use TessDeleteText() before returning
   - Unified all OCR text return paths to use Rust-allocated strings

3. Previous fix: freep(&lctx->dec_sub) instead of freep(lctx->dec_sub)

These fixes resolve Test 241 (Hardsubx) crash on Sample Platform.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-19 04:40:31 +01:00
Carlos Fernandez
0e815c6e2d fix(hardsubx): Fix crash in _dinit_hardsubx due to incorrect freep usage
The freep() function expects a pointer-to-pointer (void**) so it can
dereference, free, and NULL-out the pointer. The code was passing
lctx->dec_sub directly instead of &lctx->dec_sub.

This caused freep to interpret the first 8 bytes of the cc_subtitle
struct as a pointer and attempt to free() it, resulting in a crash
(SIGABRT/exit code 134) in the memory allocator.

Fixes Test 241 (Hardsubx) crash on Sample Platform.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-19 04:33:11 +01:00
Carlos Fernandez
0ef7227d7e ci: Add dummy C file to trigger Sample Platform CI
🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-19 04:04:58 +01:00
Carlos Fernandez
2fa023b9fe ci: Add triage tracking file for December 2025 CI analysis
This PR triggers a fresh CI run to analyze all failing regression tests
and determine whether each needs a ground truth update or a code fix.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-19 04:01:18 +01:00
Carlos Fernandez Sanz
2f0770d45f docs: Update CHANGES.TXT with recent bug fixes 2025-12-18 04:20:41 -08:00
Carlos Fernandez
ee36ac1d4d docs: Update CHANGES.TXT with recent bug fixes
Add changelog entries for recent merged PRs:
- Fix: Garbled captions from HDHomeRun and I/P-only H.264 streams (#1109)
- Fix: Enable stdout output for CEA-708 captions on Windows (#1693)
- Fix: McPoodle DVD raw format read/write (#1524)
- Fix: Variable shadowing in general_loop
- Fix: Double-free crash in teletext cleanup
- Fix: Uninitialized memory and memory leaks (Valgrind)
- Fix: Dangling pointers in Rust FFI
- New: Teletext subtitle pages in -out=report (#1034)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-18 13:19:13 +01:00
Carlos Fernandez Sanz
e160a533b0 fix: McPoodle DVD raw format read/write (Issue #1524) 2025-12-18 04:16:47 -08:00
Carlos Fernandez Sanz
083c12698f fix: Enable stdout output for CEA-708 captions on Windows 2025-12-18 04:11:42 -08:00
Carlos Fernandez
88fbe9190a style: Fix formatting and clippy warnings
- Fix comment spacing (single space before //)
- Mark is_two_byte_loop_marker as #[cfg(test)] since it's only used in tests

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-18 13:08:21 +01:00
Carlos Fernandez
ac49bb5978 fix: McPoodle DVD raw format read/write (Issue #1524)
Reading:
- Migrate DVD raw parser from C to Rust (src/rust/src/demuxer/dvdraw.rs)
- Add FFI exports: ccxr_process_dvdraw(), ccxr_is_dvdraw_header()
- Handle both McPoodle's single-byte and legacy 2-byte loop markers
- Add 15 unit tests covering all edge cases

Writing:
- Fix LC3/LC4 constants from 2-byte to 1-byte to match McPoodle's format
- Output files now have identical size to McPoodle's original

Fixes #1524

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-18 13:03:29 +01:00
Carlos Fernandez Sanz
138ccd01c2 fix: Fix garbled captions from HDHomeRun and I/P-only H.264 streams 2025-12-18 04:01:44 -08:00
Carlos Fernandez
9fe2dab6d4 style: Remove unused mut from current_index variable
Fix clippy warning: variable does not need to be mutable.
The current_index variable is only assigned once during initialization
and never modified afterward.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-18 12:57:06 +01:00
Carlos Fernandez Sanz
a28561ad0d Merge pull request #1841 from CCExtractor/fix/general-loop-ret-shadowing
fix: Fix variable shadowing and teletext context refresh issues
2025-12-18 03:26:37 -08:00
Carlos Fernandez
c8f6b565fd fix: Fix garbled captions from HDHomeRun and I/P-only H.264 streams
For I/P-only streams (like HDHomeRun recordings), the caption buffer was
being flushed on every reference frame (I and P). Since ALL frames in these
streams are reference frames, this defeated the caption reordering mechanism,
causing garbled output.

The fix:
- Only flush the buffer and reset reference PTS on IDR frames (NAL type 5),
  not on P-frames
- Initialize currefpts on first frame to avoid huge indices at stream start
- Properly flush buffer and reset reference when large PTS gaps are detected

This allows P-frames to accumulate in the buffer and be sorted by their
PTS-based indices before output.

Fixes #1109

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-18 10:35:20 +01:00
Carlos Fernandez
442ce1015d fix: Fix variable shadowing and teletext context refresh issues
This commit fixes two issues uncovered during Sample Platform testing:

1. Variable shadowing in general_loop() (general_loop.c):
   - The inner `int ret = process_non_multiprogram_general_loop(...)`
     was shadowing the outer `ret` variable
   - This caused the return value to always be 0, making ccextractor
     report "No captions found" even when captions were extracted
   - Also added `ret = 1` when captions are detected via counters,
     needed for CEA-708 which writes directly via Rust

2. Missing private_data refresh in update_decoder_list_cinfo (lib_ccx.c):
   - After PAT changes, dinit_cap() frees the teletext context and
     NULLs dec_ctx->private_data
   - But update_decoder_list_cinfo() returned existing decoder without
     refreshing private_data from the new cap_info
   - This caused all subsequent teletext processing to be skipped
   - Fixed by updating dec_ctx->private_data when returning existing decoder

These fixes resolve Sample Platform test failures in CEA-708 and Teletext
categories where tests returned exit code 10 (no captions) unexpectedly.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-18 10:10:25 +01:00
Carlos Fernandez
e2dfdaa6a8 Merge branch 'master' into fix/issue-1693-stdout-crash
Resolved conflict in src/rust/src/lib.rs:
- Kept stderr target change from this branch (for --stdout option)
- Merged safety documentation from master
2025-12-18 09:18:50 +01:00
Carlos Fernandez Sanz
a0809caa94 fix(memory): Fix uninitialized memory and memory leaks found by Valgrind 2025-12-18 00:16:01 -08:00
Carlos Fernandez
859741a22c fix(rust): Remove unused import free_rust_c_string_array
This fixes the clippy error: "unused import: crate::utils::free_rust_c_string_array"

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-18 07:41:29 +01:00
Carlos Fernandez
4429067965 fix(rust): Fix Drop compatibility and formatting issues
- demux.rs: Update dummy_demuxer() to explicitly initialize all fields
  instead of using ..Default::default(), which is not allowed when the
  struct implements Drop
- common.rs, demuxer.rs: Apply cargo fmt formatting fixes

This fixes the Rust test compilation error:
"cannot move out of type CcxDemuxer which implements the Drop trait"

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-18 07:37:39 +01:00
Carlos Fernandez
d72646ac85 fix(memory): Fix XDS memory leak in rcwt_loop path
Add proper cleanup of xds_ctx in rcwt_loop() for --in=bin and --in=raw
formats. The general_loop() path already frees xds_ctx, but rcwt_loop()
was missing this cleanup, causing an 880-byte leak.

This fixes Valgrind tests 217 (--in=bin) and 218 (--in=raw).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-18 07:31:50 +01:00
Carlos Fernandez
4a304346c9 fix(memory): Fix XDS memory leaks in encoder and decoder cleanup
- XDS encoder leak: Free xds_str when skipping subtitles with invalid timestamps
- XDS decoder cleanup: Add proper cleanup for leftover XDS strings in dinit_cc_decode()
- Remove incorrect free(p) after write_xds_string() - the pointer is stored
  for later use by the encoder and must not be freed immediately
- Remove xds_ctx free from dinit_cc_decode() to avoid double-free

These fixes address the 100-byte XDS leak found in Valgrind test 114.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-17 16:54:38 +01:00
Carlos Fernandez
627e0855ce fix(memory): Fix 608 decoder memory leak in dec_sub.data
The embedded dec_sub struct in lib_cc_decode had its data field
allocated by write_cc_buffer() but never freed during cleanup.

Added cleanup in dinit_cc_decode() to:
- Free DVB bitmap data (data0/data1) if present
- Free the dec_sub.data field itself

This fixes ~1.7MB to ~2.6MB leaks seen in tests 89, 93, and 96.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-17 13:58:15 +01:00
Carlos Fernandez
7b1a169b8f fix(memory): Fix use-after-free in Teletext and uninitialized variables
This commit fixes several Valgrind-detected memory issues:

1. Use-after-free in Teletext during PAT changes:
   - When parse_PAT() calls dinit_cap() to reinitialize stream info,
     it freed the Teletext context but dec_ctx->private_data still
     pointed to the freed memory
   - Fixed by NULLing out dec_ctx->private_data in dinit_cap() when
     freeing shared codec private data
   - Also added NULL check in process_data() before calling teletext
     functions to gracefully handle freed contexts

2. Uninitialized variables in general_loop():
   - stream_mode, get_more_data, ret, and program_iter were declared
     without initialization
   - While logically set before use, Valgrind tracked them as
     potentially uninitialized through complex control flow
   - Fixed by initializing all variables at declaration

These fixes eliminate millions of Valgrind errors in teletext tests
(tests 78, 80) and uninitialized value warnings (tests 67, 84, 86).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-17 13:44:13 +01:00
Carlos Fernandez
3d5d8e2a0a fix(memory): Fix major memory leaks in Rust FFI demuxer and decoder
This commit fixes several significant memory leaks found by Valgrind testing:

1. Dtvcc::new encoder leak (decoder/mod.rs):
   - Previously always allocated a new encoder_ctx even when ctx.encoder
     was not null, then threw away the allocation
   - Fix: Only allocate when ctx.encoder is null
   - Impact: Eliminated 55MB-331MB leaks per video processing run

2. ccxr_demuxer_isopen optimization (demuxer.rs):
   - Previously copied entire demuxer structure just to check infd
   - Fix: Directly check (*ctx).infd != -1
   - Impact: Eliminated repeated allocations during file processing

3. ccxr_demuxer_close optimization (demuxer.rs):
   - Previously did full copy roundtrip (C->Rust->C) to close a file
   - Fix: Work directly on C struct, call close() and activity callback
   - Impact: Eliminated copy-related allocations and leaks

4. CcxDemuxer Drop implementation (common_types.rs):
   - pid_buffers and pids_programs contain raw pointers from Box::into_raw
   - These were never freed when CcxDemuxer was dropped
   - Fix: Implement Drop to free all non-null Box pointers
   - Impact: Eliminates remaining FFI-related leaks

Test results show dramatic improvement:
- Test 24: 55MB leak -> 0 bytes (PERFECT)
- Test 26: 9.75MB leak -> 0 bytes (PERFECT)
- Test 27: 237MB leak -> 0 bytes (PERFECT)
- Test 28: 331MB leak -> 0 bytes (PERFECT)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-17 12:48:51 +01:00
Carlos Fernandez
683468e233 fix(memory): Fix use-after-free and memory leaks in Rust FFI
This commit fixes critical memory issues found during comprehensive
Valgrind testing:

1. **Use-after-free in inputfile array** (common.rs):
   - Problem: `copy_from_rust` was called multiple times (parse_parameters,
     demuxer_open, demuxer_close), and each call freed and reallocated the
     inputfile array. C code holding references to the old array would then
     access freed memory.
   - Fix: Only set inputfile on the first call (when inputfile is null).
     Subsequent calls skip modifying inputfile since it shouldn't change
     during processing.

2. **Memory leak in enc_cfg strings** (common.rs):
   - Problem: Each call to `copy_from_rust` allocated new encoder config
     strings without freeing the old ones, causing 1,536 bytes leaked per
     demuxer open/close cycle.
   - Fix: Only set enc_cfg on the first call (when output_filename is null).
     Encoder config is static and doesn't need to be re-synced.

3. **Uninitialized memory in telxcc_init** (telxcc.c):
   - Problem: `malloc` was used to allocate TeletextCtx but not all fields
     were explicitly initialized, causing Valgrind to report 400+ errors
     about conditional jumps on uninitialized values.
   - Fix: Changed to `calloc` to zero-initialize all fields.

**Valgrind results improvement (Test 3):**
- Errors: 458 → 21 (95% reduction)
- Definitely lost: 2,304 → 768 bytes (67% reduction)
- Use-after-free bugs: Eliminated
- Double-free bugs: Eliminated

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-17 11:04:19 +01:00
Carlos Fernandez
89849d321f fix(memory): Fix uninitialized memory and memory leaks found by Valgrind
Addresses memory issues identified during Phase 5 (Runtime Analysis) of
the bug analysis plan using Valgrind memory checking.

## Changes

### C Code (Uninitialized Memory)
- ccx_demuxer.c: Use calloc() instead of malloc() in init_demuxer() to
  ensure all struct fields are zero-initialized before use
- lib_ccx.c: Use calloc() instead of malloc() in init_decoder_setting()
  for consistent initialization

### Rust FFI Code (Memory Leaks)
- utils.rs: Add helper functions for proper FFI string memory management:
  - free_rust_c_string(): Free a Rust-allocated CString
  - replace_rust_c_string(): Free old string before allocating new one
  - free_rust_c_string_array(): Free an array of Rust-allocated CStrings
- common.rs: Update copy_from_rust() to properly manage string memory:
  - Free old strings before allocating new ones for all string fields
  - Add free_encoder_cfg_strings() to clean up encoder config strings
  - Free old inputfile array before allocating new one

## Valgrind Results Comparison

| Metric              | Before    | After     | Improvement     |
|---------------------|-----------|-----------|-----------------|
| Definitely lost     | 2,371 B   | 1,536 B   | 35% reduction   |
| Indirectly lost     | 212 B     | 0 B       | 100% fixed      |
| Uninitialized errors| 131,095   | 0         | 100% fixed      |

The remaining 1,536 bytes are from services_charsets array in
EncoderConfig (low priority, rare use case).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-17 09:21:51 +01:00
Carlos Fernandez Sanz
588ad5260a fix(rust-ffi): Prevent dangling pointers in copy_from_rust 2025-12-17 00:07:27 -08:00
Carlos Fernandez Sanz
ebd8148cad Merge pull request #1838 from CCExtractor/fix/teletext-double-free-crash
fix(teletext): Prevent double-free crash in teletext cleanup
2025-12-17 00:06:32 -08:00
Carlos Fernandez
ba33f7572d fix(rust-ffi): Prevent dangling pointers in copy_from_rust
The `to_ctype()` implementations for `DecoderDtvccSettings` and
`Decoder608Settings` were creating temporaries on the stack and
returning pointers to them. These pointers became dangling after
the function returned, causing memory corruption when
`copy_from_rust()` was called.

This fix:
- Preserves the original C-managed `report` and `timing` pointers
  in `copy_from_rust()` instead of overwriting them with dangling
  pointers to temporaries
- Adds explicit `settings_dtvcc.timing = NULL` initialization in
  `init_options()` for completeness

Before this fix, valgrind reported:
- "Invalid write of size 4" in `dtvcc_init` (4016 bytes below stack
   pointer)
- "Invalid read" errors in `copy_to_rust` / `DecoderDtvccSettings::
   from_ctype`

After this fix, these critical memory corruption errors are resolved.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-17 09:05:48 +01:00
Carlos Fernandez
9cf96b1899 fix(teletext): Prevent double-free crash in teletext cleanup
This fixes a double-free bug that caused CCExtractor to crash with
exit code 134 (SIGABRT) when processing teletext streams.

## Root Cause

The teletext context (TeletextCtx) pointer was shared between two
structures:
- `dec_ctx->private_data` (decoder context)
- `cinfo->codec_private_data` (capture info in cinfo_tree)

When `general_loop()` ended, it called `telxcc_close()` which freed
the TeletextCtx and NULLed `dec_ctx->private_data`. However, the
shared pointer in `cinfo->codec_private_data` was NOT NULLed.

Later, during cleanup in `dinit_cap()`, the code would find the
non-NULL `cinfo->codec_private_data` and attempt to free it again,
causing a double-free crash.

## The Fix

After `telxcc_close()` frees the teletext context in `general_loop()`,
iterate through all cinfo entries and NULL out any that shared the
same pointer. This prevents `dinit_cap()` from attempting to free
already-freed memory.

## Regression

This bug was exposed by commit 7e1a01447 which added cleanup code
to `dinit_cap()` to free `codec_private_data`. The `telxcc_close()`
call in `general_loop()` has existed since 2015, but the double-free
only became possible after the new cleanup code was added.

## Testing

Validated fix against all 27 teletext-related CI tests that were
failing with exit code 134:

Teletext section (21 tests): 63-83 - all PASS
DVB section: 18, 19 - all PASS
Other teletext tests: 224, 234, 235, 236 - all PASS

Verified with valgrind that no "Invalid free" or "double free"
errors occur after the fix.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-17 08:46:37 +01:00
Carlos Fernandez Sanz
0b3ad40377 Merge pull request #1837 from x15sr71/fix/atsc-vct-xmltv-mapping
[FIX]: Add ATSC VCT virtual channel numbers and call signs to XMLTV output
2025-12-16 22:29:04 -08:00
Chandragupt Singh
ac72625030 Fix ATSC XMLTV output to include VCT virtual channels and call signs 2025-12-17 10:49:41 +05:30
Carlos Fernandez Sanz
f6cb862dcb bump MSRV from 1.54.0 to 1.87.0 (rust) 2025-12-15 23:25:22 -08:00
Carlos Fernandez Sanz
53c0f56b6f Merge pull request #1833 from CCExtractor/dependabot/github_actions/actions/upload-artifact-6
chore(deps): bump actions/upload-artifact from 5 to 6
2025-12-15 23:07:50 -08:00
Carlos Fernandez Sanz
62272e7be6 [FIX] Correct typos in warning message and code comment
[FIX] Correct typos in warning message and code comment
2025-12-15 23:06:59 -08:00
Carlos Fernandez Sanz
a7e05c265c fix(ocr): Improve DVB subtitle OCR quality (fixes #243)
fix(ocr): Improve DVB subtitle OCR quality (fixes #243)
2025-12-15 23:05:58 -08:00
Carlos Fernandez Sanz
9ce13cf45f FIX]: Restore XMLTV generation for ATSC EIT/VCT streams and correct EIT bounds checks
[FIX]: Restore XMLTV generation for ATSC EIT/VCT streams and correct EIT bounds checks
2025-12-15 13:27:41 -08:00
Chandragupt Singh
e0ac99a241 fix(atsc): restore XMLTV generation and ATSC EPG parsing 2025-12-16 01:46:28 +05:30
GAURAV KARMAKAR
6ebf98ea4a Fix typos in encoder warning and comment 2025-12-16 00:59:45 +05:30
dependabot[bot]
9372e15024 chore(deps): bump actions/upload-artifact from 5 to 6
Bumps [actions/upload-artifact](https://github.com/actions/upload-artifact) from 5 to 6.
- [Release notes](https://github.com/actions/upload-artifact/releases)
- [Commits](https://github.com/actions/upload-artifact/compare/v5...v6)

---
updated-dependencies:
- dependency-name: actions/upload-artifact
  dependency-version: '6'
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-12-15 18:02:19 +00:00
Carlos
7e1a01447a fix(ocr): Improve DVB subtitle OCR quality (fixes #243)
This commit addresses Issue #243 where DVB subtitles from Spanish
broadcasts were producing corrupt/garbled OCR output like
"alajentiegaranual dep jemios" instead of "a la entrega anual de premios".

Root cause analysis:
1. Image preprocessing was degrading quality - pixContrastNorm was
   causing issues for some DVB sources
2. Default quantization mode (ocr_quantmode=1) was too aggressive,
   reducing images to just 3 colors which lost important detail

Changes:
- Remove pixContrastNorm calls from ocr.c (both main OCR and color
  detection passes) - these were causing more harm than good
- Change default ocr_quantmode from 1 to 0 (no quantization) in both
  C code (ccx_common_option.c) and Rust code (options.rs)
- Add NULL checks in dvbsub_close_decoder() and telxcc_close() for
  safety
- Add proper cleanup of codec_private_data pointers in lib_ccx.c and
  ts_info.c to prevent double-free crashes

Testing performed:
- Test 21 (English DVB): Completes in ~1 second with good OCR quality
- Test 239 (DVB timing): All 8 subtitles have correct timing
- Spanish DVB (Issue #243): Now produces readable text like
  "¡Bienvenidos a la entrega anual de premios" instead of garbage

Users can still use --quant 1 to restore the old quantization behavior
if needed.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-15 11:51:30 +01:00
Carlos Fernandez Sanz
b728ddadfa fix: Comprehensive bug fixes - Phases 2-4 (Memory, Buffer, Rust FFI)
Lots of sanitation work - always free stuff, validate buffer sizes, etc.
2025-12-15 02:50:06 -08:00
Carlos Fernandez Sanz
300541b873 Merge pull request #1809 from Rahul-2k4/master
Improve -out=report to show detected Teletext subtitle pages (Fixes #1034)
2025-12-14 23:36:41 -08:00
Carlos Fernandez Sanz
2f1c1bf227 Merge pull request #1721 from Ari1009/mcc_encoder
fix: MCC encoder 16-bit sequence
2025-12-14 23:27:08 -08:00
Carlos Fernandez Sanz
0bcb532428 Merge pull request #1829 from CCExtractor/fix/autoconf-hardsubx-tesseract
build(autoconf): add tesseract/leptonica linking for HARDSUBX
2025-12-14 23:18:12 -08:00
Carlos
d8698dc9cb build(autoconf): add tesseract/leptonica linking for HARDSUBX
This is the autoconf equivalent of the CMake fix in PR #1760.

When building with HARDSUBX enabled but OCR disabled, the autoconf
build system was missing explicit tesseract/leptonica linking in the
HARDSUBX block. While configure.ac sets OCR_IS_ENABLED when HARDSUBX
is enabled (so it would work via the OCR block), this change makes
the dependency explicit and consistent with the CMake fix.

Related: PR #1760, Issue #1719

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-15 08:12:16 +01:00
Carlos Fernandez Sanz
4cc9231fc8 Merge pull request #1760 from DhanushVarma-2/fix-tesseract-linking-1719
build: add tesseract library linking for hardsubx feature
2025-12-14 23:08:00 -08:00
Carlos
d202a66fd0 style(rust): Apply cargo fmt formatting
🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-15 07:07:41 +01:00
Carlos
d8048bc95a fix(rust): Complete Phase 4 - FFI safety and documentation
Phase 4 of the bug analysis cycle addresses all Rust/FFI boundary issues:

Safety Documentation:
- Added # Safety docs to all 83 production FFI functions
- lib.rs: ccxr_init_logger, ccxr_close_handle
- decoder/encoding.rs: 4 G0/G1/G2/G3 conversion functions
- decoder/service_decoder.rs: ccxr_flush_decoder
- hardsubx/imgops.rs: rgb_to_hsv, rgb_to_lab
- hardsubx/utility.rs: convert_pts_to_ns/ms/s

Panic Prevention (FFI function bodies):
- hardsubx/decoder.rs: Replaced 8 .try_into().unwrap() calls with
  safe `as` casts to prevent potential panics across FFI boundary
- libccxr_exports/net.rs: Replaced expect() with safe error handling
- libccxr_exports/mod.rs: Removed panic!/expect(), use defaults
- libccxr_exports/time.rs: Replaced try_into().unwrap() with unwrap_or()

Clippy Fixes:
- Fixed 72 Clippy warnings across the codebase
- Replaced assert!(false) with unreachable!()
- Added #[allow] attributes for acceptable test code patterns

All 269 tests pass, Clippy reports 0 warnings.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-15 07:03:25 +01:00
Carlos
af3ab5acd4 fix(buffer): Replace unsafe string functions with safe alternatives
Phase 3: Buffer overrun fixes

Changes:
- Replace 17 sprintf calls with snprintf
- Replace 3 strcpy calls with memcpy (known length)
- Replace 9 strcat calls with safer alternatives (snprintf, memcpy, strncat)
- Fix telxcc.c buffer size for page number formatting
- Add bounds checking to eia608_to_str function

Files modified:
- ocr.c: 7 sprintf→snprintf, 2 strcat→snprintf
- ts_tables_epg.c: 4 sprintf→snprintf, 1 strcat→snprintf
- ccx_encoders_spupng.c: 4 sprintf→snprintf, 1 strcpy→memcpy, 2 strcat→strncat/memcpy
- ccx_encoders_splitbysentence.c: 2 sprintf→snprintf (commented debug code)
- utility.c: 2 strcpy→memcpy, 4 strcat→snprintf/memcpy
- telxcc.c: increased buffer size from 4 to 8 bytes

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-15 06:33:17 +01:00
Carlos
90519e2296 fix(memory): Fix memory issues in final batch of files (Batch 2.7)
Files fixed:
- hardsubx.c: Add free() calls before return NULL at lines 247, 255;
  add null check for dec_sub malloc; free tessdata_path
- ccx_gxf.c: Fix unsafe realloc pattern for ctx->cdp
- wtv_functions.c: Add null checks for malloc calls at lines 143, 192,
  283, 384
- dvd_subtitle_decoder.c: Fix memset before null check; add null checks
  for rect->data0 and rect->data1; add null checks in init_dvdsub_decode
- ts_tables.c: Add null check for PID_buffers malloc; add null check for
  buffer malloc; fix unsafe realloc pattern
- myth.c: Fix unsafe realloc pattern for desp buffer
- ffmpeg_intgr.c: Fix memory leaks in init_ffmpeg error paths; add proper
  cleanup labels; properly allocate codec context instead of using codecpar

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-15 06:17:45 +01:00
Carlos
494b14b651 fix(memory): Fix memory issues in helpers, splitbysentence, and output
- ccx_encoders_helpers.c:
  - add_word(): Fix unsafe realloc pattern, preserve original pointer
  - shell_sort(): Add null check for temp buffer allocation

- ccx_encoders_splitbysentence.c:
  - init_sbs_context(): Add null checks for context and buffer allocations
  - sbs_append_string(): Fix unsafe realloc pattern for buffer
  - sbs_append_string(): Add null check for cc_subtitle allocation

- output.c:
  - writeraw(): Fix unsafe realloc pattern, preserve original pointer
    and set to NULL on failure

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-15 06:07:58 +01:00
Carlos
5b286c5b8d fix(memory): Fix potential memory leaks in encoder files
- ccx_encoders_ssa.c: Fix combined malloc check pattern
  - Check each allocation separately
  - Free first allocation if second fails before calling fatal

- ccx_encoders_webvtt.c: Fix 2 combined check patterns
  - write_stringz_as_webvtt: Separate checks with proper cleanup
  - write_cc_bitmap_as_webvtt: Separate calloc checks with cleanup

- ccx_encoders_smptett.c: Fix combined malloc check pattern
  - Check each allocation separately
  - Free first allocation if second fails before calling fatal

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-15 06:04:53 +01:00
Carlos
ea4f884b9d fix(memory): Fix unsafe realloc patterns in asf_functions, telxcc, and ccx_encoders_srt
- asf_functions.c: Fix 2 unsafe realloc patterns
  - Use temporary pointer to preserve original buffer reference
  - Free original buffer before calling fatal on allocation failure

- telxcc.c: Fix 2 unsafe realloc patterns in teletext buffer functions
  - page_buffer_add_string: Use safe realloc pattern with temp pointer
  - ucs2_buffer_add_char: Use safe realloc pattern with temp pointer

- ccx_encoders_srt.c: Fix potential memory leak in write_stringz_as_srt
  - Check each allocation separately
  - Free successful allocation before fatal if second allocation fails

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-15 06:01:36 +01:00
Carlos
3b0a63d9c6 fix(memory): Fix memory leaks and unsafe realloc patterns in lib_ccx, utility, avc_functions
- lib_ccx.c: Fix memory leaks in init_libraries error paths
  - Add proper cleanup for report_608, EPG buffers, and ctx when
    init_decoder_setting fails
  - Add comprehensive cleanup at end: label when init_ctx_outbase fails

- utility.c: Fix unsafe realloc in str_reallocncat
  - Preserve original pointer and free it on realloc failure
  - Prevents memory leak when realloc returns NULL

- avc_functions.c: Fix unsafe realloc patterns in user_data_registered_itu_t_t35
  - Use temporary pointer for realloc result
  - Free original buffer before calling fatal on allocation failure
  - Fixes two instances of unsafe realloc pattern

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-15 05:58:20 +01:00
Carlos
390c96f00d fix(memory): Fix memory leaks and unsafe realloc patterns in multiple files
Batch 2.2 memory fixes:

dvb_subtitle_decoder.c:
- Fix memory leak in write_dvb_sub: free rect->data1 and rect before fatal
  when data0 allocation fails

general_loop.c:
- Fix unsafe realloc in rcwt_loop: use temp variable to preserve original
  parsebuf pointer on failure
- Fix memory leak: free parsebuf on early return in rcwt_loop

ts_functions.c:
- Fix unsafe realloc in copy_payload_to_capbuf: use temp variable to
  preserve original cinfo->capbuf on failure
- Fix unsafe realloc in hauppauge buffer handling: free original buffer
  before fatal on failure

ccx_decoders_608.c:
- Fix two unsafe realloc patterns in write_cc_buffer_as_transcript and
  write_cc_buffer_to_gui: use temp variable to preserve original sub->data
  on failure

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-15 05:52:12 +01:00
Carlos
95f6f09659 fix(memory): Fix memory leaks in ocr.c and ts_tables_epg.c
In ocr.c:
- Fix realloc failure leak in search_language_pack (free dirname)
- Fix malloc failure leaks in ocr_bitmap (free histogram, iot, mcit)
- Fix realloc failure leak for new_text_out
- Fix multiple allocation failure paths in ocr_rect with proper cleanup

In ts_tables_epg.c:
- Fix malloc failure leak in EPG_ATSC_decode_multiple_string (free event_name)
- Fix realloc failure leak in parse_EPG_packet (free buffer)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-15 05:46:52 +01:00
Carlos Fernandez Sanz
42885caedd fix(dvb): Multiple fixes for DVB subtitles - timing, OCR quality, memory access bugs (#224) (#1826)
* fix(dvb): Multiple fixes for DVB subtitle extraction from Chinese broadcasts (#224)

This commit addresses multiple issues with DVB subtitle extraction reported in #224:

1. **PMT parsing crash fix** (ts_tables.c):
   - Added minimum length check (16 bytes) to prevent out-of-bounds access
   - Added bounds check before memcpy to prevent buffer overflow when section > 1021 bytes

2. **Negative subtitle timing fix** (general_loop.c):
   - For DVB subtitle streams, properly initialize min_pts from audio/subtitle PTS
   - This fixes the issue where all timestamps were negative (~95000 seconds off)

3. **OCR improvements** (ocr.c):
   - Fixed ignore_alpha_at_edge() which could create invalid crop windows
   - Added image inversion for DVB subtitles (light text on dark background)
     to improve Tesseract OCR accuracy
   - Added contrast normalization to further improve character recognition
   - Fixed nofontcolor check to respect --no-fontcolor parameter
   - Added iteration safety limit in color detection loop

4. **--ocrlang parameter fix** (Rust files):
   - Changed ocrlang from Language enum to String to accept Tesseract language
     names directly (e.g., "chi_tra", "chi_sim", "eng")
   - Added case-insensitive matching for --dvblang parameter
   - Added better error messages for invalid language codes

Tested with 12GB Chinese DVB broadcast file:
- Timing: All timestamps now positive (0.235s, 2.594s, etc.)
- OCR: ~80-90% accuracy with chi_tra traineddata (improved from ~70%)
- No crashes during full file processing

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* fix(ocr): Fix crashes in DVB subtitle color detection

Two issues fixed in the OCR color detection code:

1. Tesseract crash during iteration:
   - The color detection pass used raw color images without preprocessing
   - Tesseract expects dark text on light background, but DVB subtitles
     have light text on dark background
   - Added grayscale conversion, inversion, and contrast enhancement
     (same preprocessing as the main OCR pass)

2. Heap corruption in histogram calculation:
   - The histogram loop had no bounds checking on array accesses
   - Tesseract could return invalid bounding boxes causing buffer overflows
   - Added validation of bounding box coordinates before processing
   - Added safe index checking for copy->data and histogram arrays

Also added skip_color_detection label for clean error handling and
proper cleanup of the preprocessed image.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* fix(dvb): Fix zero-duration subtitles and overlaps during PTS jumps

Add start_pts field to cc_subtitle struct to track raw PTS values
independent of FTS timeline resets. Modify end_time calculation in
dvbsub_handle_display_segment() to cap duration at 4 seconds when
PTS jumps cause timeline discontinuities, preventing zero-duration
and overlapping subtitles.

Also update .gitignore to exclude plans/ directory and temp files.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-14 20:03:55 -08:00
Carlos Fernandez Sanz
8d95ad0e7b chore: Apply code formatting and update changelog (#1825)
- Apply clang-format to all C/H files in src/
- Apply cargo fmt to Rust code
- Update Cargo.lock with latest compatible dependency versions
- Add 24 new entries to CHANGES.TXT for recent fixes and features

Changes in CHANGES.TXT cover:
- CEA-708 bounds checks and UTF-16BE encoding fixes
- New --ttxtforcelatin option for Teletext
- TS files without PAT/PMT fallback support
- Timing accuracy improvements across MP4/MPEG/TS
- Memory safety improvements (null checks, buffer overruns)
- Multi-file processing fixes

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-14 13:34:16 -08:00
Carlos Fernandez Sanz
1f0980185f fix(rust): Add bounds checks to prevent panic on malformed CEA-708 data (#1817)
* fix(rust): Add bounds checks to prevent panic on malformed CEA-708 data

Fixes #1616 - Segmentation fault when extracting from MP4 remuxed from HLS

The CEA-708 decoder could panic when processing truncated or malformed
caption data blocks:

1. Fixed EXT1 command handling in process_service_block():
   - Changed &block[1..] to &block[(i+1)..] for correct slice offset
   - Added bounds check before accessing the next byte after EXT1

2. Added bounds checks in handle_extended_char():
   - Check for empty block before accessing block[0]
   - Check block.len() >= 2 before accessing block[1] for C3 commands

3. Removed unnecessary `as i64` cast in es/pic.rs to fix clippy warning

Added 4 unit tests to verify the bounds checking:
- test_handle_extended_char_empty_block
- test_handle_extended_char_c3_insufficient_bytes
- test_process_service_block_ext1_at_end
- test_process_service_block_ext1_with_truncated_c3

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* fix(rust): cast c_long to i64 in pic.rs for Windows compatibility

On Windows, c_long is i32 (32-bit) while on Linux it's i64 (64-bit).
The addition of fts_at_gop_start + frame_offset_ms was failing on Windows
because fts_at_gop_start (c_long = i32) couldn't be added to frame_offset_ms (i64).

Added explicit cast to i64 with #[allow(clippy::unnecessary_cast)] since
the cast is necessary for Windows even though it's redundant on Linux.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-14 07:47:47 -08:00
Carlos Fernandez Sanz
6c764aa56c fix: Correct is_decoder_processed_enough() multiprogram logic and suppress false warnings (#1823)
Fixes #1701

The `is_decoder_processed_enough()` function had a bug where it would always
return FALSE in multiprogram mode due to the condition:
  `dec_ctx->processed_enough == CCX_TRUE && ctx->multiprogram == CCX_FALSE`

This caused the "Error in switch_to_next_file()" warning to trigger incorrectly
for files without captions or in multiprogram mode.

Changes:
- Fix `is_decoder_processed_enough()` in C and Rust:
  - In single-program mode: return TRUE if ANY decoder has processed enough
  - In multiprogram mode: return TRUE only if ALL decoders have processed enough
- Add check for empty decoder list in `switch_to_next_file()`:
  - If no decoders exist (no captions found), suppress the premature ending warning
  - This is a normal condition, not an error
- Update Rust tests to verify the new behavior

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-14 07:45:16 -08:00
Carlos Fernandez Sanz
a0129df16c fix(708): Write consistent 2-byte UTF-16BE encoding for CEA-708 captions (#1820)
* fix(708): Write consistent 2-byte UTF-16BE encoding for CEA-708 captions

Previously, the write_utf16_char (C) and write_char (Rust) functions
wrote 1 byte for ASCII characters (high byte = 0) and 2 bytes for
non-ASCII characters. This created an invalid mix of 8-bit and 16-bit
values that iconv/encoding_rs couldn't convert properly when UTF-16BE
encoding was specified.

The fix always writes 2 bytes per character, ensuring consistent
UTF-16BE encoding. This allows iconv to properly convert the data to
UTF-8, fixing garbled output for Japanese and Chinese captions.

Before fix (garbled):
人々が私を知‰挰弰栰䴰Ź섰漠時間管理につい‰晦<U+F830>䐰昰䐰縰

After fix (correct):
人々が私を知 ったとき、私は 時間管理につい て書いています

Fixes #1451

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* test(708): Update write_char test to expect 2-byte UTF-16BE output

The test was checking for the old (incorrect) behavior where ASCII
characters were written as 1 byte. The fix for issue #1451 correctly
changed write_char to always write 2 bytes for proper UTF-16BE encoding.
Updated the test to match this correct behavior.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-14 07:42:46 -08:00
Carlos Fernandez Sanz
d2ab31fe38 fix(teletext): Add --ttxtforcelatin option to force Latin G0 charset (#1821)
Some broadcast streams incorrectly signal Cyrillic character set (via
X/28 or M/29 packets) when the actual content is Latin text. This causes
garbled output where Latin text like "No. Not back then, anyway." appears
as Cyrillic "Но. Нот бацк тхен, анiваi."

This fix adds a new --ttxtforcelatin option that forces the teletext G0
character set to Latin, ignoring any Cyrillic designation in the stream.

Root cause: The broadcast contained triplet 0x1290 which has bits 10-13
set to 0x1 (Cyrillic family) and bits 7-9 set to 0x5 (Ukrainian option),
causing CCExtractor to use CYRILLIC3 charset instead of Latin.

Usage: ccextractor input.ts --ttxtforcelatin -o output.srt

Before fix (without option):
  Subtitle 3: Но. Нот бацк тхен, анiваi.

After fix (with --ttxtforcelatin):
  Subtitle 3: No. Not back then, anyway.

Fixes #1395

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-14 07:42:06 -08:00
Carlos Fernandez Sanz
3f6656176e fix(ts): Add fallback for TS files without PAT/PMT tables (#1822)
Some DVR recordings (e.g., Channel Master DVR+) create transport stream
files that contain valid video and audio data but lack PAT (Program
Association Table) and PMT (Program Map Table). Without these tables,
CCExtractor couldn't identify which PIDs contain video streams with
embedded captions.

This change adds a fallback mechanism that:
1. Enables packet analysis mode when no PAT is found after reading ~1000
   TS packets (188KB)
2. Detects video streams by analyzing PES headers (stream_id 0xE0-0xEF)
3. Identifies stream type (MPEG-2 vs H.264) from elementary stream data
4. Registers detected video streams for caption extraction
5. Also detects GA94 caption markers to identify caption-carrying PIDs

The fix allows CCExtractor to extract CEA-608/708 captions from TS files
without PAT/PMT, matching the behavior when FFmpeg is enabled.

Fixes #805

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-14 07:40:52 -08:00
Carlos Fernandez Sanz
f2f63ed65f fix(timing): Set pts_set to MinPtsSet after PTS jump to continue fts_now updates (#1824)
When a PTS discontinuity (jump) is detected, the code updates fts_offset
and min_pts to establish a new timeline. However, it was not setting
pts_set back to MinPtsSet, which meant fts_now calculation (which only
runs when pts_set == MinPtsSet) would stop working. This caused all
timestamps after the PTS jump to be stuck.

This fixes issue #1277 where DVD VOB files with PTS discontinuities
(common at chapter boundaries) would stop extracting captions after
about 6 minutes. Version 0.84 worked correctly, but 0.85+ had this
regression.

Closes #1277

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude <noreply@anthropic.com>
2025-12-14 07:36:01 -08:00
Carlos Fernandez Sanz
3738540804 style: use CCX_STREAM_TYPE_VIDEO_HEVC enum instead of raw 0x24 (#1819)
Follow-up to PR #1769 - use the defined enum constant for HEVC stream
type (0x24) instead of magic numbers for better code maintainability.

Also simplifies the case statement in get_printable_stream_type() by
removing redundant assignment since the enum value passes through
unchanged.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-14 03:55:38 -08:00
Carlos Fernandez Sanz
31c6e94e25 fix(memory): Add null checks for unchecked memory allocations (#1815)
Add proper null checks after malloc/calloc/realloc calls to prevent
potential NULL pointer dereferences on out-of-memory conditions.

Files fixed:
- general_loop.c: Add null checks for line buffer and parsebuf; remove
  duplicate allocation that shadowed outer variable (memory leak fix)
- ccx_encoders_webvtt.c: Add null check for color_events/font_events
- ccx_decoders_isdb.c: Add null check for text->buf before dereference
- dvb_subtitle_decoder.c: Move null check before memset
- mp4.c: Add null check for dec_sub->data before memcpy
- ccx_decoders_608.c: Add null check for decoder context
- ccx_decoders_xds.c: Add null check for string buffer
- asf_functions.c: Add null check after struct initialization with malloc
- ccx_dtvcc.c: Move null check before dereferences (was checking after use)
- lib_ccx.c: Fix memset-before-check ordering; add checks for pesheaderbuf
  and DVB context allocations

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-14 03:55:06 -08:00
Carlos Fernandez Sanz
33f41f6045 fix(rust): Add null checks and handle invalid UTF-8 in FFI functions (#1816)
- ccxr_process_cc_data: Add null pointer checks for dec_ctx, data, and
  dec_ctx.dtvcc before dereferencing. Also check cc_count > 0.
- ccxr_parse_parameters: Add null check for argv pointer and use
  to_string_lossy() instead of expect() to handle invalid UTF-8
  gracefully without panicking.

These changes prevent potential crashes when FFI functions are called
with invalid arguments from C code.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-14 03:48:29 -08:00
Chandragupt Singh
137719ebea [FIX]: Add HEVC/H.265 stream type recognition to prevent crashes on ATSC 3.0 streams (#1769)
* Add basic HEVC (0x24) TS stream detection to avoid unknown buffer type errors

* docs: update CHANGES.TXT with HEVC/H.265 stream type fix entry
2025-12-14 03:25:05 -08:00
Carlos
ecb0780af5 fix: Enable stdout output for CEA-708 captions on Windows
Fixes #1693 - ccextractorwinfull.exe can't print captions to stdout

The CEA-708 decoder crashed on Windows when using --stdout because the
dtvcc_writer was not properly initialized for stdout output:

1. Fixed Windows stdout handle initialization in ccx_encoders_common.c:
   - Use GetStdHandle(STD_OUTPUT_HANDLE) instead of NULL for fhandle
   - This allows the Rust writer to detect stdout mode properly

2. Changed env_logger target from Stdout to Stderr in lib.rs:
   - Debug messages no longer pollute stdout when using --stdout
   - This prevents mixing debug output with subtitle content

3. Removed redundant debug statement in service_decoder.rs:
   - The bare `debug!("{}", self.current_window)` was noisy and
     duplicated by a more detailed debug statement below it

Added tests:
- test_writer_output_with_valid_fd: Verifies stdout mode works
- test_writer_output_missing_filename_and_fd: Verifies proper error handling

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-14 12:09:24 +01:00
Carlos Fernandez Sanz
abce0864a5 fix(rust): prevent panics in timing code when processing multiple files
fix(rust): prevent panics in timing code when processing multiple files
2025-12-14 02:17:29 -08:00
Carlos Fernandez Sanz
9ff46656be fix(timing): correct caption start/end times to match FFmpeg in mp4 / mpeg / ts 2025-12-14 02:13:03 -08:00
Rahul Tripathi
446923c79d Merge pull request #3 from Rahul-2k4/copilot/apply-clang-format-to-source-files
[FIX] Apply clang-format to ensure CI formatting checks pass
2025-12-14 15:11:57 +05:30
copilot-swe-agent[bot]
cde9e1f842 Initial plan 2025-12-14 09:34:22 +00:00
Rahul Tripathi
6c75b26484 Merge branch 'CCExtractor:master' into master 2025-12-14 14:47:03 +05:30
Rahul Tripathi
9c4d5a8a58 patch on teletext
Added conditional check for printing notice about teletext pages based on file report settings.
2025-12-14 14:45:04 +05:30
Carlos
a49ebf4230 fix(rust): cast c_long to i64 for cross-platform compatibility
On Windows, c_long is i32, while on Linux it's i64. This causes
a type mismatch when adding fts_at_gop_start (c_long) to
frame_offset_ms (i64). Fix by explicitly casting to i64.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-14 09:58:55 +01:00
Carlos
7b8533a2dc Merge branch 'master' into fix/caption-timing-accuracy 2025-12-14 09:58:42 +01:00
Carlos Fernandez Sanz
134cd75d3b Merge pull request #1811 from CCExtractor/fix/multi-file-processing
fix(rust): correctly count and store multiple input files
2025-12-14 00:47:07 -08:00
Carlos
80e21171b1 style: apply cargo fmt formatting
🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-14 09:43:42 +01:00
Carlos
0b262d0e17 fix(rust): prevent panics in timing code when processing multiple files
Replace `.unwrap()` and `.expect()` calls with safe alternatives to prevent
Rust panics when processing multiple files with different characteristics
(e.g., DVD-type followed by HDTV-type).

Changes:
- Use `unwrap_or(0)` for all type conversions that could fail
- Handle RwLock poisoning gracefully in apply_timing_info/write_back_from_timing_info
- Add fps validation and millis capping in GopTimeCode::new()
- Add fallback calculation in ccxr_calculate_ms_gop_time when GopTimeCode
  creation fails

Fixes #1377

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-14 09:39:44 +01:00
Rahul Tripathi
f579cbe45d Merge branch 'CCExtractor:master' into master 2025-12-14 14:02:16 +05:30
Carlos Fernandez Sanz
1a83913540 Merge pull request #1806 from CCExtractor/fix/ttxt-timestamp-milliseconds
fix(parser): use HHMMSSFFF format for ttxt output timestamps
2025-12-14 00:11:08 -08:00
Carlos
075ae04f1d fix(rust): correctly count and store multiple input files
Fix two bugs that prevented multi-file processing from working:

1. In common.rs: `options.inputfile.iter()` was iterating over the
   Option itself (yielding 0 or 1 items) instead of the Vec contents,
   causing num_input_files to always be 1.

2. In parser.rs: append_file_to_queue() was using vec.len() as the
   index for new files after resizing with empty strings, causing
   files to be placed at positions 0, 10, 20... instead of 0, 1, 2...

Fixes #1810

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-14 08:52:51 +01:00
Carlos
d4949ccfa3 style: apply clang-format and cargo fmt formatting fixes
Fix formatting issues detected by CI:
- C files: Tab alignment, trailing whitespace, blank line cleanup
- Rust: Import statement grouping in pic.rs
- Cargo.lock: Remove duplicate bindgen dependency entries

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-13 18:41:42 +01:00
Carlos
588c981184 docs: update timing verification plan with Fix 7 results
- Document Fix 7: MP4 c608 track timing and garbage frame detection
- Mark all regressions as fixed or documented as known limitations
- Update status to "Ready for Merge"
- MPEG-PS 66ms offset documented as known limitation (FFmpeg uses
  different timing reference for MPEG-PS vs TS containers)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-13 18:38:34 +01:00
Carlos
941b88f3f9 fix(timing): handle MP4 c608 tracks and improve garbage frame detection
- Fix MP4 c608/c708 caption tracks by setting frame type to I-frame
  before calling set_fts(). Without video frames, frame type would stay
  Unknown and min_pts would never be set, causing broken timestamps.

- Fix premature pts_set = MinPtsSet assignment. Now only set after
  min_pts is actually set, preventing fts_now calculation with
  uninitialized min_pts (0x01FFFFFFFF) which caused negative timestamps.

- Add garbage frame detection threshold (100ms). When an I-frame arrives:
  - If gap between pending_min_pts and I-frame PTS > 100ms: use I-frame
    PTS (garbage leading frames from truncated GOP)
  - If gap <= 100ms: use pending_min_pts (valid B-frames)

- Track pending_min_pts for all frames (not just unknown type) to enable
  proper garbage vs valid B-frame detection.

Results:
- 5df914ce...mp4: 666ms -> 0ms (FIXED)
- c032183e...ts: 284ms -> 0ms (FIXED)
- addf5e2f...ts: 68ms -> ~1ms (FIXED)
- 80848c45...mpg: remains 66ms (FFmpeg uses different reference for MPEG-PS)
- da904de3...mpg: remains 66ms (FFmpeg uses different reference for MPEG-PS)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-13 18:35:15 +01:00
Carlos
071d017b27 docs: update timing verification plan with Fix 6 results
- Added Fix 6: Elementary stream frame-by-frame timing
- Updated Category 3 testing results:
  - dc7169d7...h264: FIXED (~500ms, acceptable for roll-up)
  - 6395b281...asf: FIXED (1ms)
  - 0069dffd...mpg: Comparison invalid (mixed language CC)
  - b2771c84...mp4: No captions in file

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-13 13:55:20 +01:00
Carlos
65d9a7ed1a fix(timing): update fts_now for each frame in elementary streams
For elementary streams with GOP timing (use_gop_as_pts=1), fts_now was
only updated when a GOP header was parsed, not for each frame. This
caused all frames within a GOP to have the same timestamp, resulting
in broken caption timing (1ms, 9ms, 17ms instead of proper times).

The fix calculates fts_now for each frame based on:
  fts_at_gop_start + (frames_since_last_gop * 1000 / fps)

Test results for dc7169d7...h264 (raw MPEG-2 elementary stream):
- Before: 1ms, 9ms, 17ms, 25ms (broken)
- After: 2867ms, 4634ms, 6368ms (correct range)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-13 13:51:59 +01:00
Carlos
54df50f4fe fix(timing): preserve CR time during pop-on to roll-up transition
When transitioning from pop-on to roll-up mode, the first CR command
(with only 1 line visible, changes=0) was resetting ts_start_of_current_line
to -1. This caused the next caption's start time to be set when characters
were typed (~133ms later), not when the CR command was received.

The fix preserves the CR time when rollup_from_popon=1 and changes=0,
ensuring the caption start time matches when the display state changed.

Test results:
- c83f765c...ts: 134ms offset → 1ms (fixed)
- 725a49f8...mpg: 133ms offset → 0ms (fixed)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-13 13:37:57 +01:00
Carlos
bc5d605543 fix(timing): handle pop-on to roll-up mode transition timing
When transitioning from pop-on to roll-up mode, CCExtractor was setting
the caption start time when the first character was typed. FFmpeg uses
the time when the display state changed to show multiple lines. This
caused the first roll-up caption after a mode switch to be timestamped
too early.

Changes:
- Add rollup_from_popon flag to track mode transitions
- Reset ts_start_of_current_line on mode switch
- Defer start time until CR causes scrolling in transition mode
- Use ts_start_of_current_line when buffer scrolls during transition

Test results for 725a49f8...mpg:
- Before: 484ms early
- After: 133ms late (~4 frames, acceptable)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-13 13:21:20 +01:00
Carlos
a1a0094167 fix(timing): defer min_pts until frame type is known
The previous timing fixes were being bypassed because set_fts() is called
multiple times per frame - first from the PES/TS layer (with unknown frame
type) and later from the ES parsing layer (with known frame type). The first
call was setting min_pts before we knew whether it was an I-frame.

Changes:
- When frame type is unknown, track PTS in pending_min_pts but DON'T set min_pts
- Only set min_pts when frame type is known AND it's an I-frame
- Added unknown_frame_count for fallback handling of H.264 streams
- After 100+ calls with unknown frame type, use pending_min_pts as fallback

Test results:
- 8e8229b88bc6...mpg: 101ms -> 1ms offset ✓
- c032183ef018...ts: 284ms -> 0ms offset ✓
- add511677cc42...vob: 366ms -> 34ms offset ✓

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-13 12:12:49 +01:00
Carlos
5b8d8a72d8 fix(timing): add frame type tracking for future timing improvements
Add seen_known_frame_type and pending_min_pts fields to track frame
types during initial stream parsing. This infrastructure supports
distinguishing between MPEG-2 streams (where frame types are set) and
H.264 in MPEG-PS (where frame types remain unknown).

Current behavior maintains compatibility by allowing min_pts to be set
from any frame type, which correctly handles both stream types and
matches FFmpeg timing output.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-13 11:58:23 +01:00
Carlos
621871eb7c fix(timing): skip leading non-I-frames when setting min_pts
Streams recorded mid-broadcast often start with trailing B/P frames from
a previous GOP. These frames have earlier PTS values than the first
decodable I-frame.

Previously, CCExtractor set min_pts from the first PES packet with a PTS,
which could be an undecodable B/P frame. FFmpeg's cc_dec uses the first
decoded frame (necessarily an I-frame) as its timing reference.

This caused consistent timing offsets. For example, c032183ef01...ts had
a 284ms offset because:
- First PES packet PTS: 2508198438
- First I-frame PTS: 2508223963
- Difference: 25525 ticks = 284ms

Changes:
- timing.rs: Only set min_pts when current_picture_coding_type == IFrame
- ccx_decoders_common.c: Don't increment cb_field counters for container
  formats (CCX_H264, CCX_PES) since frame PTS is already correct
- sequencing.c: Include CCX_PES in reset_cb logic alongside CCX_H264

Test results for c032183ef01...ts:
- Before: CCExtractor 1,836ms vs FFmpeg 1,552ms = 284ms offset
- After: CCExtractor 1,552ms vs FFmpeg 1,552ms = 0ms offset

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-13 11:29:07 +01:00
Carlos Fernandez Sanz
ffcb5fe149 Merge pull request #1802 from CCExtractor/fix/utility-buffer-overruns
fix(utility): prevent buffer overruns and add OOM checks in change_filename
2025-12-13 01:36:57 -08:00
Carlos Fernandez Sanz
1b0808b4f3 Merge pull request #1807 from CCExtractor/fix/phase3-buffer-safety-medium-priority
fix(lib_ccx): replace unsafe string functions with bounds-checked versions
2025-12-13 01:25:25 -08:00
Carlos
68da0a044d style: fix clang-format issues
🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-13 08:38:23 +01:00
Carlos
87b0d22057 fix(ts_tables_epg): add NULL checks and fix memory leaks
- EPG_output_live: add NULL checks for filename/finalfilename malloc,
  add fopen failure check
- EPG_DVB_decode_string: add NULL checks for decode_buffer and out
  malloc
- EPG_decode_content_descriptor: add NULL check for categories malloc
- EPG_decode_parental_rating_descriptor: add NULL check for ratings
  malloc
- EPG_decode_extended_event_descriptor: add NULL checks for net and
  extended_text malloc
- EPG_ATSC_decode_multiple_string: add NULL checks for event_name and
  text malloc
- parse_EPG_packet: add NULL check for buffer malloc, fix unsafe
  realloc that lost original pointer on failure
- EPG_decode_short_event_descriptor: fix memory leak - free event_name
  on early return
- EPG_DVB_decode_EIT: fix memory leak - call EPG_free_event on early
  return

All OOM conditions now use fatal(EXIT_NOT_ENOUGH_MEMORY, ...) following
the project's coding patterns.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-13 08:38:23 +01:00
Carlos
af5e36cdab style: fix clang-format issues in macro definitions
Fix macro formatting to have 'do' and '{' on separate lines and
align backslashes consistently, as required by clang-format.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-13 08:38:23 +01:00
Carlos
8329257b99 fix(708_output): replace sprintf with snprintf for buffer safety
Replace all sprintf calls with snprintf to prevent potential buffer
overflows in CEA-708 output functions. Key changes:

- dtvcc_change_pen_colors: add bounds checking for font color tags
- dtvcc_change_pen_attribs: add bounds checking for italic/underline tags
- dtvcc_write_srt: track buffer length with snprintf
- dtvcc_write_transcript: add bounds checking for CC/mode labels
- dtvcc_write_sami_header: use snprintf macro for all SAMI tags
- dtvcc_write_sami_footer: use snprintf with length check
- dtvcc_write_sami: add bounds checking for sync tags
- dtvcc_write_scc_header: use snprintf for SCC header
- add_needed_scc_labels: add buffer size parameter for safe writes
- dtvcc_write_scc: use snprintf macro for all SCC formatting
- dtvcc_writer_init: use snprintf for filename suffix

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-13 08:38:23 +01:00
Carlos
1869c4c713 fix(mcc_encoder): prevent buffer overruns and add OOM checks
- Add NULL checks after malloc calls for compressed_data_buffer and buff_ptr
- Replace sprintf with snprintf for all string formatting operations
- Replace strcat with bounds-checked direct character assignment
- Replace vsprintf with vsnprintf in debug_log function
- Replace sprintf loop in random_chars with direct character lookup table
- Increase buffer sizes for date_str (50->64), time_str (30->32), tcr_str (25->32)
- Initialize tcr_str in default case to prevent uninitialized use
- Add lib_ccx.h include for fatal() function declaration

Functions modified:
- mcc_encode_cc_data: OOM check + sprintf -> snprintf + strcat -> direct assignment
- generate_mcc_header: sprintf -> snprintf for uuid_str, date_str, time_str, tcr_str
- add_boilerplate: OOM check for buff_ptr
- random_chars: sprintf -> direct character lookup (more efficient)
- debug_log: vsprintf -> vsnprintf + safer strlen check

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-13 08:38:23 +01:00
Carlos
b3c3bdcdac fix(ocr): add NULL checks and fix memory leaks
- search_language_pack: add NULL check after strdup(), fix unsafe
  realloc() that lost original pointer on failure
- init_ocr: fix memory leak where ctx wasn't freed on early return
  when tessdata not found, add NULL checks for strdup() calls
- ocr_bitmap: fix memory leak when pixCreate partially fails, add
  missing boxDestroy for crop_points on early return, add NULL checks
  for histogram/iot/mcit allocations, fix unsafe realloc() calls,
  add NULL check for text_out strdup
- ocr_rect: add NULL check for copy allocation, initialize copy->data
  to NULL to prevent freep on uninitialized pointer, add NULL check
  for copy->data allocation
- paraof_ocrtext: use fatal() on malloc failure for consistent OOM
  handling

All OOM conditions now use fatal(EXIT_NOT_ENOUGH_MEMORY, ...) following
the project's coding patterns.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-13 08:38:23 +01:00
Carlos
6e295ac374 fix(ccx_encoders_spupng): add NULL checks and fix memory leaks
This commit addresses multiple memory safety issues in ccx_encoders_spupng.c:

**NULL pointer dereference fixes (crash prevention):**

1. write_cc_bitmap_as_spupng() line 440: Added NULL check after malloc
   for pbuf - previously would crash on memset if allocation failed.

2. write_image() line 541: Added NULL check after malloc for row buffer
   with proper cleanup via goto finalise.

3. center_justify() line 611: Added NULL check after malloc for
   temp_buffer - previously would crash immediately on use.

4. utf8_to_utf32() line 718: Added NULL check after calloc for
   string_utf32 - previously would crash on use by iconv.

5. spupng_export_string2png() line 780: Fixed existing NULL check that
   printed error but did not return/exit - code would continue to
   memset(NULL, ...) causing a crash.

**Memory leak fixes:**

6. spupng_export_string2png() line 789: Fixed leak where buffer was not
   freed when strdup(str) failed and function returned early.

7. spupng_export_string2png() line 901: Fixed leak on realloc failure
   where buffer, tmp, and string_utf32 were leaked. Now properly frees
   all three before calling fatal().

All fatal() calls include diagnostic information (function name and
bytes requested where applicable) to aid debugging OOM conditions.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-13 08:38:23 +01:00
Carlos
468bd2c156 style: fix clang-format issues in macro definitions
Fix macro formatting to have 'do' and '{' on separate lines and
align backslashes consistently, as required by clang-format.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-13 08:31:10 +01:00
Carlos
bcf7eb2a50 fix(708_output): replace sprintf with snprintf for buffer safety
Replace all sprintf calls with snprintf to prevent potential buffer
overflows in CEA-708 output functions. Key changes:

- dtvcc_change_pen_colors: add bounds checking for font color tags
- dtvcc_change_pen_attribs: add bounds checking for italic/underline tags
- dtvcc_write_srt: track buffer length with snprintf
- dtvcc_write_transcript: add bounds checking for CC/mode labels
- dtvcc_write_sami_header: use snprintf macro for all SAMI tags
- dtvcc_write_sami_footer: use snprintf with length check
- dtvcc_write_sami: add bounds checking for sync tags
- dtvcc_write_scc_header: use snprintf for SCC header
- add_needed_scc_labels: add buffer size parameter for safe writes
- dtvcc_write_scc: use snprintf macro for all SCC formatting
- dtvcc_writer_init: use snprintf for filename suffix

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-13 08:31:10 +01:00
Carlos
54c7dfa45f fix(mcc_encoder): prevent buffer overruns and add OOM checks
- Add NULL checks after malloc calls for compressed_data_buffer and buff_ptr
- Replace sprintf with snprintf for all string formatting operations
- Replace strcat with bounds-checked direct character assignment
- Replace vsprintf with vsnprintf in debug_log function
- Replace sprintf loop in random_chars with direct character lookup table
- Increase buffer sizes for date_str (50->64), time_str (30->32), tcr_str (25->32)
- Initialize tcr_str in default case to prevent uninitialized use
- Add lib_ccx.h include for fatal() function declaration

Functions modified:
- mcc_encode_cc_data: OOM check + sprintf -> snprintf + strcat -> direct assignment
- generate_mcc_header: sprintf -> snprintf for uuid_str, date_str, time_str, tcr_str
- add_boilerplate: OOM check for buff_ptr
- random_chars: sprintf -> direct character lookup (more efficient)
- debug_log: vsprintf -> vsnprintf + safer strlen check

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-13 08:31:10 +01:00
Carlos
984123521d fix(ocr): add NULL checks and fix memory leaks
- search_language_pack: add NULL check after strdup(), fix unsafe
  realloc() that lost original pointer on failure
- init_ocr: fix memory leak where ctx wasn't freed on early return
  when tessdata not found, add NULL checks for strdup() calls
- ocr_bitmap: fix memory leak when pixCreate partially fails, add
  missing boxDestroy for crop_points on early return, add NULL checks
  for histogram/iot/mcit allocations, fix unsafe realloc() calls,
  add NULL check for text_out strdup
- ocr_rect: add NULL check for copy allocation, initialize copy->data
  to NULL to prevent freep on uninitialized pointer, add NULL check
  for copy->data allocation
- paraof_ocrtext: use fatal() on malloc failure for consistent OOM
  handling

All OOM conditions now use fatal(EXIT_NOT_ENOUGH_MEMORY, ...) following
the project's coding patterns.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-13 08:31:10 +01:00
Carlos
a2cb65f181 fix(ccx_encoders_spupng): add NULL checks and fix memory leaks
This commit addresses multiple memory safety issues in ccx_encoders_spupng.c:

**NULL pointer dereference fixes (crash prevention):**

1. write_cc_bitmap_as_spupng() line 440: Added NULL check after malloc
   for pbuf - previously would crash on memset if allocation failed.

2. write_image() line 541: Added NULL check after malloc for row buffer
   with proper cleanup via goto finalise.

3. center_justify() line 611: Added NULL check after malloc for
   temp_buffer - previously would crash immediately on use.

4. utf8_to_utf32() line 718: Added NULL check after calloc for
   string_utf32 - previously would crash on use by iconv.

5. spupng_export_string2png() line 780: Fixed existing NULL check that
   printed error but did not return/exit - code would continue to
   memset(NULL, ...) causing a crash.

**Memory leak fixes:**

6. spupng_export_string2png() line 789: Fixed leak where buffer was not
   freed when strdup(str) failed and function returned early.

7. spupng_export_string2png() line 901: Fixed leak on realloc failure
   where buffer, tmp, and string_utf32 were leaked. Now properly frees
   all three before calling fatal().

All fatal() calls include diagnostic information (function name and
bytes requested where applicable) to aid debugging OOM conditions.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-13 08:31:10 +01:00
Carlos Fernandez Sanz
fe7a4b3f45 Merge pull request #1799 from CCExtractor/fix/ts-tables-epg-memory-safety
fix(ts_tables_epg): add NULL checks and fix memory leaks
2025-12-12 23:30:02 -08:00
Carlos Fernandez Sanz
d4ec0fe49b Merge pull request #1800 from CCExtractor/fix/708-output-buffer-safety
fix(708_output): replace sprintf with snprintf for buffer safety
2025-12-12 23:24:26 -08:00
Carlos Fernandez Sanz
4a98bf5290 Merge pull request #1804 from CCExtractor/fix/mcc-encoder-buffer-overruns
fix(mcc_encoder): prevent buffer overruns and add OOM checks
2025-12-12 23:23:25 -08:00
Carlos Fernandez Sanz
249cac359f Merge pull request #1798 from CCExtractor/fix/ocr-memory-safety
fix(ocr): add NULL checks and fix memory leaks
2025-12-12 23:21:11 -08:00
Carlos
69e521b320 fix(timing): correct caption start/end times to match video frame PTS
The get_visible_start() and get_visible_end() functions were adding a
cb_field offset (cb_field * 1001/30 ms) to caption timestamps. This
offset was designed for broadcast MPEG-TS streams where caption data
arrives continuously at field rate (59.94 fields/sec).

However, for container formats like MP4, all caption data for a video
frame is bundled together and should use the frame's PTS directly. The
offset was causing caption start times to be ~300ms (9 frames) later
than the actual video frame timestamp.

Root cause analysis:
1. Previous caption ends → get_visible_end() returns inflated time
   due to cb_field offset → minimum_fts set to this inflated value
2. New caption starts → get_visible_start() constrained by
   minimum_fts + 1 → start time incorrectly pushed forward

Fix:
- Add new Rust FFI functions ccxr_get_visible_start() and
  ccxr_get_visible_end() that return base FTS (fts_now + fts_global)
  without the cb_field offset
- Update C wrappers to call the new Rust functions
- Update Rust decoder timing to use base FTS

Verification against ffmpeg:
- Before fix: 00:16:06,799 (300ms late)
- After fix:  00:16:06,499 (matches ffmpeg exactly)
- ffmpeg ref: 00:16:06,499

The get_fts() function is unchanged - it still returns the
offset-adjusted time for use cases that need it (like extraction
time boundary checking).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-13 08:01:29 +01:00
Carlos
8af19df556 fix(lib_ccx): replace remaining unsafe string functions with bounds-checked versions
Replace sprintf/strcpy with snprintf/memcpy in LOW priority files:
- general_loop.c: proper buffer allocation with OOM check, snprintf
- ccx_encoders_g608.c: snprintf with sizeof for timeline buffer
- lib_ccx.c: fix buffer size calculation, add missing null check, snprintf
- ccx_common_timing.c: snprintf with documented max size for time functions
- ts_functions.c: snprintf with sizeof in debug code
- matroska.c: bounded memcpy to prevent overflow from malformed language codes
- output.c: snprintf with known allocated size

This completes Phase 3.1 of the buffer safety audit.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-13 07:10:54 +01:00
Carlos
bff08bec9e fix(encoders): replace unsafe string functions with bounds-checked versions
Replace sprintf/strcpy/strcat with snprintf/strncat/memmove in:
- ccx_encoders_common.c: 4 sprintf -> snprintf
- ccx_encoders_helpers.c: 3 strcat -> strncat, 1 strcpy -> memcpy
- telxcc.c: 3 sprintf -> snprintf
- asf_functions.c: 3 sprintf -> snprintf
- ccx_encoders_ssa.c: 3 sprintf -> snprintf
- ccx_encoders_curl.c: 1 sprintf -> snprintf, strcpy+strcat -> snprintf with OOM check
- ccx_encoders_splitbysentence.c: 1 strcpy -> memmove (overlapping memory fix), 2 strcat -> strncat

This is part of Phase 3.1 of the buffer safety audit, addressing MEDIUM priority files.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-13 07:00:44 +01:00
Carlos
a66fb8c661 fix(utility): prevent buffer overruns and add OOM checks in change_filename
- Add NULL checks after malloc calls for temp_encoder, current_name, and newname
- Replace sprintf with snprintf for safe string formatting
- Replace strcpy/strcat with strncpy and snprintf to prevent buffer overflows
- Increase buffer sizes from 6/10/15 to 16 chars to safely hold extension numbers
- Use proper size tracking with filename_len and buffer size variables

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-13 06:46:28 +01:00
Carlos Fernandez Sanz
042716adde fix(xds_decoder): prevent buffer overruns and fix sprintf logic bug (#1803)
- Replace sprintf with snprintf for all string formatting operations
- Replace strcpy/strcat chains with snprintf for bounds-safe concatenation
- Replace strcpy with strncpy + null terminator for fixed-size buffers
- Fix bug in xds_do_private_data: sprintf in loop was overwriting instead
  of appending hex bytes to output string

Functions modified:
- xds_do_copy_generation_management_system: 3 sprintf -> snprintf
- xds_do_content_advisory: 5 sprintf -> snprintf, strcpy/strcat chain fixed
- xds_do_current_and_future: strcpy -> strncpy for program description
- xds_do_channel: strcpy -> strncpy for network name
- xds_do_private_data: fixed loop to properly append hex bytes

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-12 21:40:54 -08:00
Carlos
1342e4edee fix(ocr): add NULL checks and fix memory leaks
- search_language_pack: add NULL check after strdup(), fix unsafe
  realloc() that lost original pointer on failure
- init_ocr: fix memory leak where ctx wasn't freed on early return
  when tessdata not found, add NULL checks for strdup() calls
- ocr_bitmap: fix memory leak when pixCreate partially fails, add
  missing boxDestroy for crop_points on early return, add NULL checks
  for histogram/iot/mcit allocations, fix unsafe realloc() calls,
  add NULL check for text_out strdup
- ocr_rect: add NULL check for copy allocation, initialize copy->data
  to NULL to prevent freep on uninitialized pointer, add NULL check
  for copy->data allocation
- paraof_ocrtext: use fatal() on malloc failure for consistent OOM
  handling

All OOM conditions now use fatal(EXIT_NOT_ENOUGH_MEMORY, ...) following
the project's coding patterns.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-13 06:26:59 +01:00
Carlos
4d1d874243 fix(ccx_encoders_spupng): add NULL checks and fix memory leaks
This commit addresses multiple memory safety issues in ccx_encoders_spupng.c:

**NULL pointer dereference fixes (crash prevention):**

1. write_cc_bitmap_as_spupng() line 440: Added NULL check after malloc
   for pbuf - previously would crash on memset if allocation failed.

2. write_image() line 541: Added NULL check after malloc for row buffer
   with proper cleanup via goto finalise.

3. center_justify() line 611: Added NULL check after malloc for
   temp_buffer - previously would crash immediately on use.

4. utf8_to_utf32() line 718: Added NULL check after calloc for
   string_utf32 - previously would crash on use by iconv.

5. spupng_export_string2png() line 780: Fixed existing NULL check that
   printed error but did not return/exit - code would continue to
   memset(NULL, ...) causing a crash.

**Memory leak fixes:**

6. spupng_export_string2png() line 789: Fixed leak where buffer was not
   freed when strdup(str) failed and function returned early.

7. spupng_export_string2png() line 901: Fixed leak on realloc failure
   where buffer, tmp, and string_utf32 were leaked. Now properly frees
   all three before calling fatal().

All fatal() calls include diagnostic information (function name and
bytes requested where applicable) to aid debugging OOM conditions.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-13 06:26:59 +01:00
Carlos
155f56ede7 style: fix clang-format issues in macro definitions
Fix macro formatting to have 'do' and '{' on separate lines and
align backslashes consistently, as required by clang-format.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-13 06:24:27 +01:00
Carlos
fb49d9460d fix(708_output): replace sprintf with snprintf for buffer safety
Replace all sprintf calls with snprintf to prevent potential buffer
overflows in CEA-708 output functions. Key changes:

- dtvcc_change_pen_colors: add bounds checking for font color tags
- dtvcc_change_pen_attribs: add bounds checking for italic/underline tags
- dtvcc_write_srt: track buffer length with snprintf
- dtvcc_write_transcript: add bounds checking for CC/mode labels
- dtvcc_write_sami_header: use snprintf macro for all SAMI tags
- dtvcc_write_sami_footer: use snprintf with length check
- dtvcc_write_sami: add bounds checking for sync tags
- dtvcc_write_scc_header: use snprintf for SCC header
- add_needed_scc_labels: add buffer size parameter for safe writes
- dtvcc_write_scc: use snprintf macro for all SCC formatting
- dtvcc_writer_init: use snprintf for filename suffix

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-13 06:24:27 +01:00
Carlos
37fed5e5b5 fix(mcc_encoder): prevent buffer overruns and add OOM checks
- Add NULL checks after malloc calls for compressed_data_buffer and buff_ptr
- Replace sprintf with snprintf for all string formatting operations
- Replace strcat with bounds-checked direct character assignment
- Replace vsprintf with vsnprintf in debug_log function
- Replace sprintf loop in random_chars with direct character lookup table
- Increase buffer sizes for date_str (50->64), time_str (30->32), tcr_str (25->32)
- Initialize tcr_str in default case to prevent uninitialized use
- Add lib_ccx.h include for fatal() function declaration

Functions modified:
- mcc_encode_cc_data: OOM check + sprintf -> snprintf + strcat -> direct assignment
- generate_mcc_header: sprintf -> snprintf for uuid_str, date_str, time_str, tcr_str
- add_boilerplate: OOM check for buff_ptr
- random_chars: sprintf -> direct character lookup (more efficient)
- debug_log: vsprintf -> vsnprintf + safer strlen check

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-13 06:23:45 +01:00
Carlos
7113036719 fix(parser): use HHMMSSFFF format for ttxt output timestamps
The Rust parser was incorrectly setting date_format to HHMMSS (no
milliseconds) instead of HHMMSSFFF (with milliseconds) for --out=ttxt.

This bug was introduced in PR #1619 when porting the parser to Rust.
The original C code correctly used ODF_HHMMSSMS which includes
milliseconds in the timestamp format (HH:MM:SS,mmm).

Before: 10:25:16 (missing milliseconds)
After:  10:25:16,000 (correct format matching original C behavior)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-13 06:21:09 +01:00
Carlos Fernandez Sanz
d93d6731ba fix(encoders): replace sprintf/strcpy with bounds-checked versions (#1805)
Replace unsafe string functions with safer alternatives:
- ccx_encoders_sami.c: sprintf -> snprintf (10 fixes)
- ccx_encoders_srt.c: sprintf -> snprintf (6 fixes)
- mp4.c: sprintf/strcpy/strcat -> snprintf (6 fixes, including
  buffer overflow fix in format_duration where 20-byte buffer
  was too small for long duration strings)
- ccx_encoders_webvtt.c: sprintf -> snprintf (6 fixes), plus:
  - Fixed malloc size bug (+4 instead of +5 for null terminator)
  - Added OOM checks for css_file_name and outline_css_file
  - Fixed memory leaks (css_file_name and outline_css_file not freed)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-12 21:16:20 -08:00
Carlos Fernandez Sanz
77e1dff779 fix(smptett): replace unsafe string operations with bounds-checked versions (#1801)
Replace sprintf, strcpy, and strcat calls with snprintf and bounds-checked
operations to prevent potential buffer overflows. Key changes:

- write_stringz_as_smptett: use snprintf for timestamp formatting
- write_cc_bitmap_as_smptett: use snprintf with INITIAL_ENC_BUFFER_CAPACITY
- write_cc_buffer_as_smptett:
  - Add NULL checks for malloc allocations
  - Track buffer size and use snprintf throughout
  - Replace strcpy/strcat chains with bounds-checked memcpy/snprintf
  - Use snprintf for style tag and color code formatting

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-12 21:15:39 -08:00
Carlos Fernandez Sanz
58dedba93f fix(scc): Always emit position codes at start of caption (fixes #1776) (#1791)
* fix(scc): Always emit position codes at start of caption (fixes #1776)

The SCC encoder was initializing current_row=14 and current_column=0,
which caused the first position code (PAC) to be skipped when caption
content started at row 14 (the last row), column 0. This happened because
the condition checking if row/column changed would be false.

For example, a caption starting at row 15 (1-indexed), column 0 should
output the PAC code 9470/{1500} but this was being omitted.

Fix by initializing current_row and current_column to UINT8_MAX, which
is an impossible value that will never match any valid row (0-14) or
column (0-31), ensuring the position code is always written for the
first character of each caption.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* fix(rust): Remove unused assignments to fix clippy warnings

Remove unnecessary `time_show.time_in_ms += 1000 / 29.97` operations
that were restoring values that were never read afterwards.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-12 21:13:02 -08:00
Carlos
9eb266914a fix(ts_tables_epg): add NULL checks and fix memory leaks
- EPG_output_live: add NULL checks for filename/finalfilename malloc,
  add fopen failure check
- EPG_DVB_decode_string: add NULL checks for decode_buffer and out
  malloc
- EPG_decode_content_descriptor: add NULL check for categories malloc
- EPG_decode_parental_rating_descriptor: add NULL check for ratings
  malloc
- EPG_decode_extended_event_descriptor: add NULL checks for net and
  extended_text malloc
- EPG_ATSC_decode_multiple_string: add NULL checks for event_name and
  text malloc
- parse_EPG_packet: add NULL check for buffer malloc, fix unsafe
  realloc that lost original pointer on failure
- EPG_decode_short_event_descriptor: fix memory leak - free event_name
  on early return
- EPG_DVB_decode_EIT: fix memory leak - call EPG_free_event on early
  return

All OOM conditions now use fatal(EXIT_NOT_ENOUGH_MEMORY, ...) following
the project's coding patterns.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-13 02:00:22 +01:00
Carlos Fernandez Sanz
1510396aa0 fix(ccx_decoders_common): add NULL checks and fix memory safety issues (#1796)
- Add NULL checks after malloc calls in copy_encoder_context(),
  copy_decoder_context(), copy_subtitle(), and init_cc_decode()
- Fix buffer overflows in copy_encoder_context() where string
  allocations were missing +1 for null terminator
- Call fatal(EXIT_NOT_ENOUGH_MEMORY, ...) on allocation failure
  following the pattern used in matroska.c
- Initialize pointers to NULL after memcpy to prevent use of
  stale pointers from the copied structure
- Prevent null pointer dereference in init_cc_decode() when dtvcc_init
  returns NULL

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-12 16:01:47 -08:00
dependabot[bot]
a7dfaea559 chore(deps): bump actions/cache from 4 to 5 (#1790)
Bumps [actions/cache](https://github.com/actions/cache) from 4 to 5.
- [Release notes](https://github.com/actions/cache/releases)
- [Changelog](https://github.com/actions/cache/blob/main/RELEASES.md)
- [Commits](https://github.com/actions/cache/compare/v4...v5)

---
updated-dependencies:
- dependency-name: actions/cache
  dependency-version: '5'
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-12-12 14:31:22 -08:00
Carlos Fernandez Sanz
e8383c84ee fix(rust): remove unused assignments in tv_screen.rs (#1795)
Remove three unused assignments to `time_show.time_in_ms` that were
flagged by Clippy as "value assigned is never read".

The pattern was: subtract frame delay, use the value, then restore it.
However, since `time_show` is not used after the match statement, the
restoration assignments were unnecessary dead code.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-12 13:56:03 -08:00
Carlos Fernandez Sanz
810c869bc5 fix(dvb_subtitle_decoder): add NULL checks after malloc calls (#1794)
* fix(matroska): add memory safety checks and fix memory leaks

This commit addresses multiple memory safety issues in the Matroska
parser identified through static analysis (cppcheck).

## Null pointer dereference after malloc (15 fixes)

Added null checks after all malloc/calloc calls to prevent crashes
when memory allocation fails:

- read_byte_block(): line 28
- read_bytes_signed(): line 38
- generate_timestamp_ass_ssa(): line 267
- parse_segment_cluster_block_group_block(): lines 306, 361
- parse_segment_cluster_block_group_block_additions(): line 405
- parse_segment_cluster_block_group(): line 476
- parse_segment_track_entry(): lines 958, 973
- parse_private_codec_data(): line 1019
- generate_filename_from_track(): line 1167
- ass_ssa_sentence_erase_read_order(): line 1191
- save_sub_track(): lines 1264, 1271, 1303, 1310
- matroska_loop(): lines 1496, 1505

## Buffer overflow fixes (3 fixes)

- generate_timestamp_ass_ssa(): Increased buffer from 15 to 32 bytes,
  changed sprintf to snprintf. GCC warned output could be 11-23 bytes.
- save_sub_track(): Increased number[] buffer from 9 to 16 bytes,
  changed sprintf to snprintf.
- generate_filename_from_track(): Now calculates required buffer size
  dynamically instead of using fixed 200 bytes.

## Memory leak fixes (7 fixes)

- parse_ebml(): Fixed leak of read_vint_block_string() return value
- parse_segment_info(): Fixed 4 leaks of read_vint_block_string()
  returns (filename, title, muxing_app, writing_app)
- parse_segment_track_entry(): Added free(lang) before reassignment
- save_sub_track(): Fixed leak where text pointer was advanced,
  losing original allocation

## Realloc error handling (3 fixes)

Fixed realloc calls to use temporary variable, preventing loss of
original pointer if realloc fails:

- parse_segment_cluster_block_group_block(): line 366
- parse_segment_cluster_block_group(): line 475
- parse_segment_track_entry(): line 973

## Use-after-free fix (1 fix)

- matroska_loop(): Saved avc_track_number and dec_sub.got_output
  before calling matroska_free_all(), then used saved values

## Missing free fixes (2 fixes)

- free_sub_track(): Added free(track->sentences) for the array itself
- matroska_free_all(): Added free(mkv_ctx->sub_tracks) for the array

## Other improvements

- Initialized sub_track->sentences to NULL in parse_segment_track_entry()
  to ensure safe NULL check in free_sub_track()

All changes use EXIT_NOT_ENOUGH_MEMORY (exit code 500) for
out-of-memory conditions, consistent with the rest of the codebase.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* fix(dvb_subtitle_decoder): add NULL checks after malloc calls

Add missing NULL checks for 9 malloc() calls in the DVB subtitle decoder
that could cause crashes or undefined behavior if memory allocation fails.

All checks use fatal(EXIT_NOT_ENOUGH_MEMORY, ...) to terminate gracefully
with an appropriate error message, consistent with the approach used in
matroska.c and other parts of the codebase.

Affected functions and allocations:
- dvbsub_init_decoder(): DVBSubContext allocation
- dvbsub_parse_clut_segment(): DVBSubCLUT allocation
- dvbsub_parse_region_segment(): DVBSubRegion, pbuf, DVBSubObject,
  and DVBSubObjectDisplay allocations
- dvbsub_parse_page_segment(): DVBSubRegionDisplay allocation
- write_dvb_sub(): cc_bitmap (rect), data1, and data0 allocations
- dvbsub_handle_display_segment(): private_data allocation

This also fixes a potential memory leak in write_dvb_sub() where rect
and rect->data1 would be leaked if the rect->data0 allocation failed
(previously returned -1 without cleanup, now terminates via fatal()).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-12 13:49:09 -08:00
Carlos Fernandez Sanz
b32c120e89 fix(matroska): add memory safety checks and fix memory leaks (#1792)
This commit addresses multiple memory safety issues in the Matroska
parser identified through static analysis (cppcheck).

## Null pointer dereference after malloc (15 fixes)

Added null checks after all malloc/calloc calls to prevent crashes
when memory allocation fails:

- read_byte_block(): line 28
- read_bytes_signed(): line 38
- generate_timestamp_ass_ssa(): line 267
- parse_segment_cluster_block_group_block(): lines 306, 361
- parse_segment_cluster_block_group_block_additions(): line 405
- parse_segment_cluster_block_group(): line 476
- parse_segment_track_entry(): lines 958, 973
- parse_private_codec_data(): line 1019
- generate_filename_from_track(): line 1167
- ass_ssa_sentence_erase_read_order(): line 1191
- save_sub_track(): lines 1264, 1271, 1303, 1310
- matroska_loop(): lines 1496, 1505

## Buffer overflow fixes (3 fixes)

- generate_timestamp_ass_ssa(): Increased buffer from 15 to 32 bytes,
  changed sprintf to snprintf. GCC warned output could be 11-23 bytes.
- save_sub_track(): Increased number[] buffer from 9 to 16 bytes,
  changed sprintf to snprintf.
- generate_filename_from_track(): Now calculates required buffer size
  dynamically instead of using fixed 200 bytes.

## Memory leak fixes (7 fixes)

- parse_ebml(): Fixed leak of read_vint_block_string() return value
- parse_segment_info(): Fixed 4 leaks of read_vint_block_string()
  returns (filename, title, muxing_app, writing_app)
- parse_segment_track_entry(): Added free(lang) before reassignment
- save_sub_track(): Fixed leak where text pointer was advanced,
  losing original allocation

## Realloc error handling (3 fixes)

Fixed realloc calls to use temporary variable, preventing loss of
original pointer if realloc fails:

- parse_segment_cluster_block_group_block(): line 366
- parse_segment_cluster_block_group(): line 475
- parse_segment_track_entry(): line 973

## Use-after-free fix (1 fix)

- matroska_loop(): Saved avc_track_number and dec_sub.got_output
  before calling matroska_free_all(), then used saved values

## Missing free fixes (2 fixes)

- free_sub_track(): Added free(track->sentences) for the array itself
- matroska_free_all(): Added free(mkv_ctx->sub_tracks) for the array

## Other improvements

- Initialized sub_track->sentences to NULL in parse_segment_track_entry()
  to ensure safe NULL check in free_sub_track()

All changes use EXIT_NOT_ENOUGH_MEMORY (exit code 500) for
out-of-memory conditions, consistent with the rest of the codebase.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-12 13:29:55 -08:00
Vidit
3d7553349f remove label -without-rust (#1780)
* fix minor issue

* remove -without-rust

* fixed
2025-12-09 20:38:55 +05:30
Rahul Tripathi
d524a0247f Merge pull request #2 from Rahul-2k4/copilot/fix-teletext-page-detection-issue-1034 2025-12-09 13:00:03 +05:30
copilot-swe-agent[bot]
f30f276456 Apply code style fixes from clang-format
Co-authored-by: Rahul-2k4 <216878448+Rahul-2k4@users.noreply.github.com>
2025-12-09 06:28:15 +00:00
copilot-swe-agent[bot]
17a8e1ec7b Remove unintended Cargo.lock changes
Co-authored-by: Rahul-2k4 <216878448+Rahul-2k4@users.noreply.github.com>
2025-12-09 06:23:19 +00:00
copilot-swe-agent[bot]
ebe25af476 Fix indentation to use tabs consistently
Co-authored-by: Rahul-2k4 <216878448+Rahul-2k4@users.noreply.github.com>
2025-12-09 06:16:17 +00:00
copilot-swe-agent[bot]
1f7120f32f Apply teletext page detection fix from fix branch
Co-authored-by: Rahul-2k4 <216878448+Rahul-2k4@users.noreply.github.com>
2025-12-09 06:15:23 +00:00
copilot-swe-agent[bot]
9e9023c258 Initial plan 2025-12-09 06:05:32 +00:00
Dhanush
b2930178be Fix G608 output extra NULL character (#1777) (#1786)
Co-authored-by: dhanush varma <dhanushvarma@dhanushs-MacBook-Air.local>
2025-12-08 20:37:29 -08:00
rudera-byte
759c3f5d41 fix: Issue #1162 TESSDATA_PREFIX requires path separator at its end (#1674) 2025-12-09 04:30:26 +05:30
moveman
3c51fb6536 Handle row_count decrease (#1702)
Co-authored-by: ewong <Edmond.Wong@harmonicinc.com>
Co-authored-by: Prateek Sunal <prtksunal@gmail.com>
Co-authored-by: Carlos Fernandez Sanz <carlos@ccextractor.org>
2025-12-09 04:19:13 +05:30
Deepnarayan Sett
494df3edae [FEAT] added demuxer and file_functions module (#1662)
* feat: added demuxer module

* Cargo Lock Update

* Completed file_functions and demuxer

* Completed file_functions and demuxer

* written extern functions for demuxer

* Removed libc completely, added tests for gxf and ported gxf to C

* Hardsubx error fixed

* Fixing format issues

* clippy errors fixed

* fixing format issues

* fixing format issues

* Windows failing tests

* Windows failing tests

* demuxer: added demuxer data transfer functions and removed some structs

* made Demuxer and File Functions

* Minor formatting changes

* Minor Rebasing changes

* demuxer: format rust and unit test rust checks

* C formatting

* Windows Failing test

* Windows Failing test

* Update CHANGES.TXT

* Update CHANGES.TXT

* Windows Failing Tests

* Windows Failing Tests

* Problem in Copy to Rust and some typos that copilot review suggested

* Minor Formatting Error

* Windows Failing Regressions

* Windows Failing Regressions

* Minor Comment Change

* Data transfer module for DemuxerData added and more rustlike syntax to ctorust.rs

* Minor Formatting Changes

* demuxer: Rebase and a few tweaks to file_functions

* demuxer: Minor Formatting Error

* [FIX] 134 Codes in XDS and General Tests (#1708)

* Made pointers valid in Unit Tests of Decoder

* fix: test_do_cb

* Copilot Suggestions

* Suggestions about Redundancy

* Suggestions about Redundancy

* [FEAT] Add `bitstream` module in `lib_ccxr` (#1649)

* feat: Add bitstream module

* run code formatters

* Run cargo clippy --fix

* Run cargo fmt --all

* refactor: remove rust pointer from C struct

* feat: Add bitstream module

* run code formatters

* Run cargo clippy --fix

* Run cargo fmt --all

* refactor: remove rust pointer from C struct

* Added Bitstream to libccxr_exports

* Minor Formatting Issue

* Bitstream: Removed redundant CType

* bitstream: recommended changes for is_byte_aligned

* bitstream: recommended changes for long comments

* bitstream: comment fix

* bitstream: removed redundant comparism comments

---------

Co-authored-by: Deepnarayan Sett <depnra1@gmail.com>
Co-authored-by: Deepnarayan Sett <71217129+steel-bucket@users.noreply.github.com>

* demuxer: minor formatting changes

* Demuxer: Changes to mistakes in CHANGES.txt

* Demuxer: Removed extra newline in ccextractor.c

* Demuxer: Changes to Encoding resolved

* Demuxer: Moved CCX_NOPTS to common structs and some changes to Demuxer Data regd. MPEG_CLOCK_FREQ

* some refactoring to CCX_NOPTS

* Demuxer: Minor Mistake regarding CHANGES.txt

* Demuxer: Unit test rust failing because of CCX_NOPTS

* Demuxer: changed common_structs to common_types

* Demuxer: Removed redundant libraries from Cargo.toml and moved tempfile to dev-dependencies

* Demuxer: Removed to_vec function and renamed PSIBuffer/PMTEntry from_ctype functions

* Demuxer:  Renamed Stream_Type, improved Time complexity of the default() function and removed redundant comments

* Demuxer:  Removed two repeated code blocks and removed redundant comments

* Demuxer:  Removed two code blocks

* Demuxer: Review Changes

* Demuxer: Removed redundant tests

* Update src/rust/src/demuxer/demux.rs

Co-authored-by: Prateek Sunal <prtksunal@gmail.com>

* Demuxer: Errors due to Rebase

* Demuxer: Removed get_stream_mode

* Demuxer: Errors due to rebasing and removing redundant CType Functions

* Demuxer: Failing ES regressions

* Demuxer: MythTV failing regression

* Demuxer: Removed redundant comments

* Demuxer: Unplugged ES for now

* Demuxer: Replugged in ES

* Demuxer: Formatting error

* Demuxer: Windows failing CI

* Demuxer: Windows failing CI

* Demuxer: Windows failing Regressions

* Demuxer: Formatting

* Demuxer: Minor Cargo Clippy change

* Demuxer: running regressions again

* Demuxer: Cargo Lockfile Change

* Demuxer: running regressions again

* Demuxer: running regressions again

---------

Co-authored-by: Swastik Patel <swastikpatel29@gmail.com>
Co-authored-by: Prateek Sunal <prtksunal@gmail.com>
2025-12-08 22:26:20 +05:30
Carlos Fernandez Sanz
810e02f7fa Fix Issue#1235: Sanitize XML comment to prevent invalid token errors (#1783)
Original description:

Pull Requests Description :
Added logic to detect and replace any occurrence of "--" in comments with a single "-" to ensure valid XML.
Used a bulk write ('fwrite') to efficiently handle portions of the string that don't contain invalid sequences.
Ensured that comments are written correctly without altering the original structure of the code.
Updated function 'write_spucomment' to handle the sanitization process efficiently.
2025-12-07 22:41:11 -08:00
dependabot[bot]
2720448e87 chore(deps): bump actions/checkout from 4 to 6 (#1766)
Bumps [actions/checkout](https://github.com/actions/checkout) from 4 to 6.
- [Release notes](https://github.com/actions/checkout/releases)
- [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md)
- [Commits](https://github.com/actions/checkout/compare/v4...v6)

---
updated-dependencies:
- dependency-name: actions/checkout
  dependency-version: '6'
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-12-07 18:54:15 -08:00
dependabot[bot]
5fceac5e90 chore(deps): bump actions/upload-artifact from 4 to 5 (#1757)
Bumps [actions/upload-artifact](https://github.com/actions/upload-artifact) from 4 to 5.
- [Release notes](https://github.com/actions/upload-artifact/releases)
- [Commits](https://github.com/actions/upload-artifact/compare/v4...v5)

---
updated-dependencies:
- dependency-name: actions/upload-artifact
  dependency-version: '5'
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-12-07 18:53:35 -08:00
Carlos Fernandez Sanz
60ae6fb760 [FIX] Fix Windows build by updating vcpkg baseline and other packages (#1778)
* [FIX] Update vcpkg baseline and use forked rsmpeg for FFmpeg 7

Update vcpkg baseline from Feb 2024 to Dec 2025 to resolve libxml2
hash mismatch. GitLab regenerates archives dynamically, causing
SHA512 verification failures with old baselines.

Switch to CCExtractor's forked rsmpeg (github.com/CCExtractor/rsmpeg)
which pins rusty_ffmpeg to 0.16.4 for FFmpeg 7.1 compatibility.
This provides consistent FFmpeg 7 support across all platforms.

Changes:
- Update vcpkg baseline in workflow and vcpkg.json
- Use forked rsmpeg from git for all platforms
- Use ffmpeg7_1 feature instead of ffmpeg6/ffmpeg8
- Use link_vcpkg_ffmpeg for Windows

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* Enable use_prebuilt_binding feature for rsmpeg

This ensures consistent FFmpeg 7 API signatures across all platforms,
regardless of the system FFmpeg version installed. Ubuntu's FFmpeg 6
has different function signatures than FFmpeg 7.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* Standardize on FFmpeg 6.1.1 across all platforms

Use FFmpeg 6 consistently:
- Linux: uses apt packages (libavcodec-dev, etc.) which provide FFmpeg 6
- Windows: vcpkg baseline pinned to FFmpeg 6.1.1 (commit 5a58e645)
- macOS: uses system FFmpeg 6

This ensures consistent behavior and API compatibility across all platforms.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* Use platform-appropriate FFmpeg versions

- Linux: FFmpeg 6 (from Ubuntu apt packages)
- Windows: FFmpeg 7 (from vcpkg with recent baseline)
- macOS: FFmpeg 7 (from Homebrew)

This fixes the Windows build which was failing due to vcpkg
baseline hash mismatch for libxml2 in older baselines.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* Use FFmpeg 7 with prebuilt bindings for Linux

Use ffmpeg7 feature everywhere and use_prebuilt_binding for Linux
to ensure FFmpeg 7 API signatures regardless of system FFmpeg version.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* Fix library names for Windows build with updated vcpkg

- Update leptonica library name from 1.83.1 to 1.85.0
- Update tesseract library name from tesseract53 to tesseract55 (v5.5.1)
- Update libiconv library names: charset.lib -> libcharset.lib, iconv.lib -> libiconv.lib

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* Fix iconv library name for vcpkg static build

vcpkg libiconv for x64-windows-static produces only iconv.lib
with charset functionality bundled in, not separate libcharset.lib
and libiconv.lib files.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* Fix iconv library names: use charset.lib and iconv.lib

Restores the correct vcpkg libiconv library names:
- charset.lib (libcharset library)
- iconv.lib (libiconv library)

These are the original names from vcpkg libiconv package for x64-windows-static.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* try: New Hash

Updated the builtin baseline hash for ccextractor.

* Remove charset.lib and iconv.lib from dependencies

The project has its own win_iconv.c implementation in src/thirdparty/win_iconv/
which provides iconv functionality. With the updated vcpkg baseline (ab2977be),
the libiconv library doesn't produce charset.lib or libcharset.lib files.

FFmpeg is also built with --disable-iconv in this vcpkg configuration, so
the external iconv libraries are not needed by any of the vcpkg dependencies.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Co-authored-by: Deepnarayan Sett <71217129+steel-bucket@users.noreply.github.com>
2025-12-07 13:20:41 -08:00
dhanush varma
c9d80e12b8 bump: update MSRV from 1.54.0 to 1.87.0
- Update all build configuration files to require Rust 1.87.0+
- Add clippy.toml with MSRV configuration as requested
- Maintain modern Rust features like is_multiple_of()
- Fixes build compatibility issue #1765
2025-11-23 00:10:04 +05:30
dhanush varma
a0aa9e4616 fix(rust): revert is_multiple_of to maintain MSRV 1.54.0
- Reverts is_multiple_of(2) to stable % 2 == 0 check to maintain
  compatibility with Rust 1.54.0 (project MSRV)
- Adds clippy.toml with msrv = '1.54.0' to prevent Clippy from
  suggesting APIs that aren't available in the MSRV

Fixes: #1765
2025-11-21 22:39:26 +05:30
dhanush varma
1515f5c1be build: add tesseract library linking for hardsubx feature
Fixes #1719 - build was failing with --enable-hardsubx due to missing
tesseract library linking. Added pkg_check_modules for tesseract and
leptonica in the HARDSUBX section of CMakeLists.txt.

Tested with: cmake -DWITH_HARDSUBX=ON -DWITH_OCR=ON -DWITH_FFMPEG=ON
2025-11-08 11:42:54 +05:30
Prateek Sunal
42d750950a [FIX] add mac-ocr-hardsubx workflow & ffmpeg variants support (#1745)
## Fix
- Update params and there doc

## Mac OS:
- Fix FFMpeg, tesseract compilation
- Re-add Mac os build hardsubx workflow

## FFMpeg used in workflow:
- MacOS: `8.*`
- Windows: `6.*` (pinned VCPKG supports this)
- Linux: `6.*` (Latest ubuntu runner supports this)
2025-11-03 23:47:42 +05:30
Deepnarayan Sett
5338c15f8d fix: Cargo Clippy failing on 1.91 (#1758) 2025-10-31 23:38:10 -07:00
Hridesh MG
ee232b5ded bump version 0.94 -> 0.95 (#1751) 2025-10-26 20:19:55 +05:30
Ari1009
5a016d09b1 fix: MCC encoder 16-bit sequence 2025-07-29 13:25:09 +05:30
232 changed files with 28708 additions and 7026 deletions

37
.dockerignore Normal file
View File

@@ -0,0 +1,37 @@
# Build artifacts
linux/ccextractor
linux/rust/
linux/*.o
linux/*.a
mac/ccextractor
mac/rust/
build/
build_*/
# Git
.git/
.github/
# IDE
.vscode/
.idea/
*.swp
*.swo
# Docker
docker/
# Documentation (not needed for build)
docs/
*.md
!README.md
# Test files
*.ts
*.mp4
*.mkv
*.srt
*.vtt
# Plans
plans/

157
.github/workflows/build_appimage.yml vendored Normal file
View File

@@ -0,0 +1,157 @@
name: Build Linux AppImage
on:
# Build on releases
release:
types: [published]
# Allow manual trigger
workflow_dispatch:
inputs:
build_type:
description: 'Build type (all, minimal, ocr, hardsubx)'
required: false
default: 'all'
# Build on pushes to workflow file for testing
push:
paths:
- '.github/workflows/build_appimage.yml'
- 'linux/build_appimage.sh'
jobs:
build-appimage:
runs-on: ubuntu-22.04
strategy:
fail-fast: false
matrix:
build_type: [minimal, ocr, hardsubx]
steps:
- name: Check if should build this variant
id: should_build
run: |
if [ "${{ github.event_name }}" = "workflow_dispatch" ]; then
INPUT_TYPE="${{ github.event.inputs.build_type }}"
if [ "$INPUT_TYPE" = "all" ] || [ "$INPUT_TYPE" = "${{ matrix.build_type }}" ]; then
echo "should_build=true" >> $GITHUB_OUTPUT
else
echo "should_build=false" >> $GITHUB_OUTPUT
fi
else
echo "should_build=true" >> $GITHUB_OUTPUT
fi
- name: Checkout repository
if: steps.should_build.outputs.should_build == 'true'
uses: actions/checkout@v6
- name: Install base dependencies
if: steps.should_build.outputs.should_build == 'true'
run: |
sudo apt-get update
sudo apt-get install -y --no-install-recommends \
build-essential \
cmake \
pkg-config \
wget \
file \
libfuse2 \
zlib1g-dev \
libpng-dev \
libjpeg-dev \
libfreetype-dev \
libxml2-dev \
libcurl4-gnutls-dev \
libssl-dev \
clang \
libclang-dev
- name: Install OCR dependencies
if: steps.should_build.outputs.should_build == 'true' && (matrix.build_type == 'ocr' || matrix.build_type == 'hardsubx')
run: |
sudo apt-get install -y --no-install-recommends \
tesseract-ocr \
libtesseract-dev \
libleptonica-dev \
tesseract-ocr-eng
- name: Install FFmpeg dependencies (HardSubX)
if: steps.should_build.outputs.should_build == 'true' && matrix.build_type == 'hardsubx'
run: |
sudo apt-get install -y --no-install-recommends \
libavcodec-dev \
libavformat-dev \
libavutil-dev \
libswscale-dev \
libswresample-dev \
libavfilter-dev \
libavdevice-dev
- name: Install Rust toolchain
if: steps.should_build.outputs.should_build == 'true'
uses: dtolnay/rust-toolchain@stable
- name: Cache GPAC build
if: steps.should_build.outputs.should_build == 'true'
id: cache-gpac
uses: actions/cache@v5
with:
path: /usr/local/lib/libgpac*
key: gpac-v2.4.0-ubuntu22
- name: Build and install GPAC
if: steps.should_build.outputs.should_build == 'true' && steps.cache-gpac.outputs.cache-hit != 'true'
run: |
git clone -b v2.4.0 --depth 1 https://github.com/gpac/gpac
cd gpac
./configure
make -j$(nproc) lib
sudo make install-lib
sudo ldconfig
- name: Update library cache
if: steps.should_build.outputs.should_build == 'true'
run: sudo ldconfig
- name: Build AppImage
if: steps.should_build.outputs.should_build == 'true'
run: |
cd linux
chmod +x build_appimage.sh
BUILD_TYPE=${{ matrix.build_type }} ./build_appimage.sh
- name: Get AppImage name
if: steps.should_build.outputs.should_build == 'true'
id: appimage_name
run: |
case "${{ matrix.build_type }}" in
minimal)
echo "name=ccextractor-minimal-x86_64.AppImage" >> $GITHUB_OUTPUT
;;
ocr)
echo "name=ccextractor-x86_64.AppImage" >> $GITHUB_OUTPUT
;;
hardsubx)
echo "name=ccextractor-hardsubx-x86_64.AppImage" >> $GITHUB_OUTPUT
;;
esac
- name: Test AppImage
if: steps.should_build.outputs.should_build == 'true'
run: |
chmod +x linux/${{ steps.appimage_name.outputs.name }}
linux/${{ steps.appimage_name.outputs.name }} --version
- name: Upload AppImage artifact
if: steps.should_build.outputs.should_build == 'true'
uses: actions/upload-artifact@v6
with:
name: ${{ steps.appimage_name.outputs.name }}
path: linux/${{ steps.appimage_name.outputs.name }}
- name: Upload to Release
if: steps.should_build.outputs.should_build == 'true' && github.event_name == 'release'
uses: softprops/action-gh-release@v2
with:
files: linux/${{ steps.appimage_name.outputs.name }}
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

283
.github/workflows/build_deb.yml vendored Normal file
View File

@@ -0,0 +1,283 @@
name: Build Linux .deb Package
on:
# Build on releases
release:
types: [published]
# Allow manual trigger
workflow_dispatch:
inputs:
build_type:
description: 'Build type (all, basic, hardsubx)'
required: false
default: 'all'
# Build on pushes to workflow file for testing
push:
paths:
- '.github/workflows/build_deb.yml'
jobs:
build-deb:
runs-on: ubuntu-24.04
strategy:
fail-fast: false
matrix:
build_type: [basic, hardsubx]
steps:
- name: Check if should build this variant
id: should_build
run: |
if [ "${{ github.event_name }}" = "workflow_dispatch" ]; then
INPUT_TYPE="${{ github.event.inputs.build_type }}"
if [ "$INPUT_TYPE" = "all" ] || [ "$INPUT_TYPE" = "${{ matrix.build_type }}" ]; then
echo "should_build=true" >> $GITHUB_OUTPUT
else
echo "should_build=false" >> $GITHUB_OUTPUT
fi
else
echo "should_build=true" >> $GITHUB_OUTPUT
fi
- name: Checkout repository
if: steps.should_build.outputs.should_build == 'true'
uses: actions/checkout@v6
- name: Get version
if: steps.should_build.outputs.should_build == 'true'
id: version
run: |
# Extract version from source or use tag
if [ "${{ github.event_name }}" = "release" ]; then
VERSION="${{ github.event.release.tag_name }}"
VERSION="${VERSION#v}" # Remove 'v' prefix if present
else
# Extract version from lib_ccx.h (e.g., #define VERSION "0.96.5")
VERSION=$(grep -oP '#define VERSION "\K[^"]+' src/lib_ccx/lib_ccx.h || echo "0.96")
fi
echo "version=$VERSION" >> $GITHUB_OUTPUT
echo "Building version: $VERSION"
- name: Install base dependencies
if: steps.should_build.outputs.should_build == 'true'
run: |
sudo apt-get update
sudo apt-get install -y --no-install-recommends \
build-essential \
cmake \
pkg-config \
zlib1g-dev \
libpng-dev \
libjpeg-dev \
libfreetype-dev \
libxml2-dev \
libcurl4-gnutls-dev \
libssl-dev \
clang \
libclang-dev \
tesseract-ocr \
libtesseract-dev \
libleptonica-dev \
patchelf
- name: Install FFmpeg dependencies (HardSubX)
if: steps.should_build.outputs.should_build == 'true' && matrix.build_type == 'hardsubx'
run: |
sudo apt-get install -y --no-install-recommends \
libavcodec-dev \
libavformat-dev \
libavutil-dev \
libswscale-dev \
libswresample-dev \
libavfilter-dev \
libavdevice-dev
- name: Install Rust toolchain
if: steps.should_build.outputs.should_build == 'true'
uses: dtolnay/rust-toolchain@stable
- name: Cache GPAC build
if: steps.should_build.outputs.should_build == 'true'
id: cache-gpac
uses: actions/cache@v5
with:
path: ~/gpac-install
key: gpac-abi-16.4-ubuntu24-deb
- name: Build GPAC
if: steps.should_build.outputs.should_build == 'true' && steps.cache-gpac.outputs.cache-hit != 'true'
run: |
git clone -b abi-16.4 --depth 1 https://github.com/gpac/gpac
cd gpac
./configure --prefix=/usr
make -j$(nproc)
make DESTDIR=$HOME/gpac-install install-lib
- name: Install GPAC to system
if: steps.should_build.outputs.should_build == 'true'
run: |
sudo cp -r $HOME/gpac-install/usr/lib/* /usr/lib/
sudo cp -r $HOME/gpac-install/usr/include/* /usr/include/
sudo ldconfig
- name: Build CCExtractor
if: steps.should_build.outputs.should_build == 'true'
run: |
mkdir build && cd build
if [ "${{ matrix.build_type }}" = "hardsubx" ]; then
cmake ../src -DCMAKE_BUILD_TYPE=Release -DWITH_OCR=ON -DWITH_HARDSUBX=ON
else
cmake ../src -DCMAKE_BUILD_TYPE=Release -DWITH_OCR=ON
fi
make -j$(nproc)
- name: Test build
if: steps.should_build.outputs.should_build == 'true'
run: ./build/ccextractor --version
- name: Create .deb package structure
if: steps.should_build.outputs.should_build == 'true'
run: |
VERSION="${{ steps.version.outputs.version }}"
VARIANT="${{ matrix.build_type }}"
if [ "$VARIANT" = "basic" ]; then
PKG_NAME="ccextractor_${VERSION}_amd64"
else
PKG_NAME="ccextractor-${VARIANT}_${VERSION}_amd64"
fi
mkdir -p ${PKG_NAME}/DEBIAN
mkdir -p ${PKG_NAME}/usr/bin
mkdir -p ${PKG_NAME}/usr/lib/ccextractor
mkdir -p ${PKG_NAME}/usr/share/doc/ccextractor
mkdir -p ${PKG_NAME}/usr/share/man/man1
# Copy binary
cp build/ccextractor ${PKG_NAME}/usr/bin/
# Copy GPAC library
cp $HOME/gpac-install/usr/lib/libgpac.so* ${PKG_NAME}/usr/lib/ccextractor/
# Set rpath so ccextractor finds bundled libgpac
patchelf --set-rpath '/usr/lib/ccextractor:$ORIGIN/../lib/ccextractor' ${PKG_NAME}/usr/bin/ccextractor
# Copy documentation
cp docs/CHANGES.TXT ${PKG_NAME}/usr/share/doc/ccextractor/changelog
cp LICENSE.txt ${PKG_NAME}/usr/share/doc/ccextractor/copyright
gzip -9 -n ${PKG_NAME}/usr/share/doc/ccextractor/changelog
# Generate man page
help2man --no-info --name="closed captions and teletext subtitle extractor" \
./build/ccextractor > ${PKG_NAME}/usr/share/man/man1/ccextractor.1 2>/dev/null || true
if [ -f ${PKG_NAME}/usr/share/man/man1/ccextractor.1 ]; then
gzip -9 -n ${PKG_NAME}/usr/share/man/man1/ccextractor.1
fi
# Create control file
if [ "$VARIANT" = "basic" ]; then
PKG_DESCRIPTION="CCExtractor - closed captions and teletext subtitle extractor"
else
PKG_DESCRIPTION="CCExtractor (with HardSubX) - closed captions and teletext subtitle extractor"
fi
INSTALLED_SIZE=$(du -sk ${PKG_NAME}/usr | cut -f1)
# Determine dependencies based on build variant (Ubuntu 24.04)
if [ "$VARIANT" = "hardsubx" ]; then
DEPENDS="libc6, libtesseract5, liblept5, libcurl3t64-gnutls, libavcodec60, libavformat60, libavutil58, libswscale7, libavdevice60, libswresample4, libavfilter9"
else
DEPENDS="libc6, libtesseract5, liblept5, libcurl3t64-gnutls"
fi
cat > ${PKG_NAME}/DEBIAN/control << CTRL
Package: ccextractor
Version: ${VERSION}
Section: utils
Priority: optional
Architecture: amd64
Installed-Size: ${INSTALLED_SIZE}
Depends: ${DEPENDS}
Maintainer: CCExtractor Development Team <carlos@ccextractor.org>
Homepage: https://www.ccextractor.org
Description: ${PKG_DESCRIPTION}
CCExtractor is a tool that extracts closed captions and teletext subtitles
from video files and streams. It supports a wide variety of input formats
including MPEG, H.264/AVC, H.265/HEVC, MP4, MKV, WTV, and transport streams.
.
This package includes a bundled GPAC library for MP4 support.
CTRL
# Remove leading spaces from control file
sed -i 's/^ //' ${PKG_NAME}/DEBIAN/control
# Create postinst to update library cache
cat > ${PKG_NAME}/DEBIAN/postinst << 'POSTINST'
#!/bin/sh
set -e
ldconfig
POSTINST
chmod 755 ${PKG_NAME}/DEBIAN/postinst
# Create postrm to update library cache
cat > ${PKG_NAME}/DEBIAN/postrm << 'POSTRM'
#!/bin/sh
set -e
ldconfig
POSTRM
chmod 755 ${PKG_NAME}/DEBIAN/postrm
# Set permissions
chmod 755 ${PKG_NAME}/usr/bin/ccextractor
chmod 755 ${PKG_NAME}/usr/lib/ccextractor
find ${PKG_NAME}/usr/lib/ccextractor -name "*.so*" -exec chmod 644 {} \;
# Build the .deb
dpkg-deb --build --root-owner-group ${PKG_NAME}
echo "deb_name=${PKG_NAME}.deb" >> $GITHUB_OUTPUT
- name: Test .deb package
if: steps.should_build.outputs.should_build == 'true'
run: |
VERSION="${{ steps.version.outputs.version }}"
VARIANT="${{ matrix.build_type }}"
if [ "$VARIANT" = "basic" ]; then
PKG_NAME="ccextractor_${VERSION}_amd64"
else
PKG_NAME="ccextractor-${VARIANT}_${VERSION}_amd64"
fi
# Install and test (apt handles dependencies automatically)
sudo apt-get update
sudo apt-get install -y ./${PKG_NAME}.deb
ccextractor --version
- name: Get .deb filename
if: steps.should_build.outputs.should_build == 'true'
id: deb_name
run: |
VERSION="${{ steps.version.outputs.version }}"
VARIANT="${{ matrix.build_type }}"
if [ "$VARIANT" = "basic" ]; then
echo "name=ccextractor_${VERSION}_amd64.deb" >> $GITHUB_OUTPUT
else
echo "name=ccextractor-${VARIANT}_${VERSION}_amd64.deb" >> $GITHUB_OUTPUT
fi
- name: Upload .deb artifact
if: steps.should_build.outputs.should_build == 'true'
uses: actions/upload-artifact@v6
with:
name: ${{ steps.deb_name.outputs.name }}
path: ${{ steps.deb_name.outputs.name }}
- name: Upload to Release
if: steps.should_build.outputs.should_build == 'true' && github.event_name == 'release'
uses: softprops/action-gh-release@v2
with:
files: ${{ steps.deb_name.outputs.name }}
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

275
.github/workflows/build_deb_debian13.yml vendored Normal file
View File

@@ -0,0 +1,275 @@
name: Build Debian 13 .deb Package
on:
# Build on releases
release:
types: [published]
# Allow manual trigger
workflow_dispatch:
inputs:
build_type:
description: 'Build type (all, basic, hardsubx)'
required: false
default: 'all'
# Build on pushes to workflow file for testing
push:
paths:
- '.github/workflows/build_deb_debian13.yml'
jobs:
build-deb:
runs-on: ubuntu-latest
container:
image: debian:trixie
strategy:
fail-fast: false
matrix:
build_type: [basic, hardsubx]
steps:
- name: Check if should build this variant
id: should_build
run: |
if [ "${{ github.event_name }}" = "workflow_dispatch" ]; then
INPUT_TYPE="${{ github.event.inputs.build_type }}"
if [ "$INPUT_TYPE" = "all" ] || [ "$INPUT_TYPE" = "${{ matrix.build_type }}" ]; then
echo "should_build=true" >> $GITHUB_OUTPUT
else
echo "should_build=false" >> $GITHUB_OUTPUT
fi
else
echo "should_build=true" >> $GITHUB_OUTPUT
fi
- name: Install git and dependencies for checkout
if: steps.should_build.outputs.should_build == 'true'
run: |
apt-get update
apt-get install -y git ca-certificates
- name: Checkout repository
if: steps.should_build.outputs.should_build == 'true'
uses: actions/checkout@v6
- name: Get version
if: steps.should_build.outputs.should_build == 'true'
id: version
run: |
# Extract version from source or use tag
if [ "${{ github.event_name }}" = "release" ]; then
VERSION="${{ github.event.release.tag_name }}"
VERSION="${VERSION#v}" # Remove 'v' prefix if present
else
# Extract version from lib_ccx.h (e.g., #define VERSION "0.96.5")
VERSION=$(grep -oP '#define VERSION "\K[^"]+' src/lib_ccx/lib_ccx.h || echo "0.96")
fi
echo "version=$VERSION" >> $GITHUB_OUTPUT
echo "Building version: $VERSION"
- name: Install base dependencies
if: steps.should_build.outputs.should_build == 'true'
run: |
apt-get install -y --no-install-recommends \
build-essential \
cmake \
pkg-config \
zlib1g-dev \
libpng-dev \
libjpeg-dev \
libfreetype-dev \
libxml2-dev \
libcurl4-gnutls-dev \
libssl-dev \
clang \
libclang-dev \
tesseract-ocr \
libtesseract-dev \
libleptonica-dev \
patchelf \
curl
- name: Install FFmpeg dependencies (HardSubX)
if: steps.should_build.outputs.should_build == 'true' && matrix.build_type == 'hardsubx'
run: |
apt-get install -y --no-install-recommends \
libavcodec-dev \
libavformat-dev \
libavutil-dev \
libswscale-dev \
libswresample-dev \
libavfilter-dev \
libavdevice-dev
- name: Install Rust toolchain
if: steps.should_build.outputs.should_build == 'true'
run: |
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y
echo "$HOME/.cargo/bin" >> $GITHUB_PATH
- name: Build GPAC
if: steps.should_build.outputs.should_build == 'true'
run: |
git clone -b abi-16.4 --depth 1 https://github.com/gpac/gpac
cd gpac
./configure --prefix=/usr
make -j$(nproc)
make install-lib
ldconfig
- name: Build CCExtractor
if: steps.should_build.outputs.should_build == 'true'
run: |
export PATH="$HOME/.cargo/bin:$PATH"
mkdir build && cd build
if [ "${{ matrix.build_type }}" = "hardsubx" ]; then
cmake ../src -DCMAKE_BUILD_TYPE=Release -DWITH_OCR=ON -DWITH_HARDSUBX=ON
else
cmake ../src -DCMAKE_BUILD_TYPE=Release -DWITH_OCR=ON
fi
make -j$(nproc)
- name: Test build
if: steps.should_build.outputs.should_build == 'true'
run: ./build/ccextractor --version
- name: Create .deb package structure
if: steps.should_build.outputs.should_build == 'true'
id: create_deb
run: |
VERSION="${{ steps.version.outputs.version }}"
VARIANT="${{ matrix.build_type }}"
if [ "$VARIANT" = "basic" ]; then
PKG_NAME="ccextractor_${VERSION}_debian13_amd64"
else
PKG_NAME="ccextractor-${VARIANT}_${VERSION}_debian13_amd64"
fi
mkdir -p ${PKG_NAME}/DEBIAN
mkdir -p ${PKG_NAME}/usr/bin
mkdir -p ${PKG_NAME}/usr/lib/ccextractor
mkdir -p ${PKG_NAME}/usr/share/doc/ccextractor
mkdir -p ${PKG_NAME}/usr/share/man/man1
# Copy binary
cp build/ccextractor ${PKG_NAME}/usr/bin/
# Copy GPAC library
cp /usr/lib/libgpac.so* ${PKG_NAME}/usr/lib/ccextractor/
# Set rpath so ccextractor finds bundled libgpac
patchelf --set-rpath '/usr/lib/ccextractor:$ORIGIN/../lib/ccextractor' ${PKG_NAME}/usr/bin/ccextractor
# Copy documentation
cp docs/CHANGES.TXT ${PKG_NAME}/usr/share/doc/ccextractor/changelog
cp LICENSE.txt ${PKG_NAME}/usr/share/doc/ccextractor/copyright
gzip -9 -n ${PKG_NAME}/usr/share/doc/ccextractor/changelog
# Create control file
if [ "$VARIANT" = "basic" ]; then
PKG_DESCRIPTION="CCExtractor - closed captions and teletext subtitle extractor"
else
PKG_DESCRIPTION="CCExtractor (with HardSubX) - closed captions and teletext subtitle extractor"
fi
INSTALLED_SIZE=$(du -sk ${PKG_NAME}/usr | cut -f1)
# Determine dependencies based on build variant (Debian 13 Trixie)
if [ "$VARIANT" = "hardsubx" ]; then
DEPENDS="libc6, libtesseract5, libleptonica6, libcurl3t64-gnutls, libavcodec61, libavformat61, libavutil59, libswscale8, libavdevice61, libswresample5, libavfilter10"
else
DEPENDS="libc6, libtesseract5, libleptonica6, libcurl3t64-gnutls"
fi
cat > ${PKG_NAME}/DEBIAN/control << CTRL
Package: ccextractor
Version: ${VERSION}
Section: utils
Priority: optional
Architecture: amd64
Installed-Size: ${INSTALLED_SIZE}
Depends: ${DEPENDS}
Maintainer: CCExtractor Development Team <carlos@ccextractor.org>
Homepage: https://www.ccextractor.org
Description: ${PKG_DESCRIPTION}
CCExtractor is a tool that extracts closed captions and teletext subtitles
from video files and streams. It supports a wide variety of input formats
including MPEG, H.264/AVC, H.265/HEVC, MP4, MKV, WTV, and transport streams.
.
This package includes a bundled GPAC library for MP4 support.
Built for Debian 13 (Trixie).
CTRL
# Remove leading spaces from control file
sed -i 's/^ //' ${PKG_NAME}/DEBIAN/control
# Create postinst to update library cache
cat > ${PKG_NAME}/DEBIAN/postinst << 'POSTINST'
#!/bin/sh
set -e
ldconfig
POSTINST
chmod 755 ${PKG_NAME}/DEBIAN/postinst
# Create postrm to update library cache
cat > ${PKG_NAME}/DEBIAN/postrm << 'POSTRM'
#!/bin/sh
set -e
ldconfig
POSTRM
chmod 755 ${PKG_NAME}/DEBIAN/postrm
# Set permissions
chmod 755 ${PKG_NAME}/usr/bin/ccextractor
chmod 755 ${PKG_NAME}/usr/lib/ccextractor
find ${PKG_NAME}/usr/lib/ccextractor -name "*.so*" -exec chmod 644 {} \;
# Build the .deb
dpkg-deb --build --root-owner-group ${PKG_NAME}
echo "deb_name=${PKG_NAME}.deb" >> $GITHUB_OUTPUT
- name: Test .deb package
if: steps.should_build.outputs.should_build == 'true'
run: |
VERSION="${{ steps.version.outputs.version }}"
VARIANT="${{ matrix.build_type }}"
if [ "$VARIANT" = "basic" ]; then
PKG_NAME="ccextractor_${VERSION}_debian13_amd64"
else
PKG_NAME="ccextractor-${VARIANT}_${VERSION}_debian13_amd64"
fi
# Install and test (apt handles dependencies automatically)
apt-get update
apt-get install -y ./${PKG_NAME}.deb
ccextractor --version
- name: Get .deb filename
if: steps.should_build.outputs.should_build == 'true'
id: deb_name
run: |
VERSION="${{ steps.version.outputs.version }}"
VARIANT="${{ matrix.build_type }}"
if [ "$VARIANT" = "basic" ]; then
echo "name=ccextractor_${VERSION}_debian13_amd64.deb" >> $GITHUB_OUTPUT
else
echo "name=ccextractor-${VARIANT}_${VERSION}_debian13_amd64.deb" >> $GITHUB_OUTPUT
fi
- name: Upload .deb artifact
if: steps.should_build.outputs.should_build == 'true'
uses: actions/upload-artifact@v6
with:
name: ${{ steps.deb_name.outputs.name }}
path: ${{ steps.deb_name.outputs.name }}
- name: Upload to Release
if: steps.should_build.outputs.should_build == 'true' && github.event_name == 'release'
uses: softprops/action-gh-release@v2
with:
files: ${{ steps.deb_name.outputs.name }}
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

96
.github/workflows/build_docker.yml vendored Normal file
View File

@@ -0,0 +1,96 @@
name: Build CCExtractor Docker Images
on:
workflow_dispatch:
push:
paths:
- '.github/workflows/build_docker.yml'
- 'docker/**'
- '**.c'
- '**.h'
- '**CMakeLists.txt'
- '**.cmake'
- 'src/rust/**'
pull_request:
types: [opened, synchronize, reopened]
paths:
- '.github/workflows/build_docker.yml'
- 'docker/**'
- '**.c'
- '**.h'
- '**CMakeLists.txt'
- '**.cmake'
- 'src/rust/**'
jobs:
build_minimal:
name: Docker build (minimal)
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v6
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
- name: Build minimal image
uses: docker/build-push-action@v6
with:
context: .
file: docker/Dockerfile
build-args: |
BUILD_TYPE=minimal
USE_LOCAL_SOURCE=1
tags: ccextractor:minimal
load: true
cache-from: type=gha,scope=docker-minimal
cache-to: type=gha,mode=max,scope=docker-minimal
- name: Test minimal image
run: |
docker run --rm ccextractor:minimal --version
echo "Minimal build successful"
build_ocr:
name: Docker build (ocr)
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v6
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
- name: Build OCR image
uses: docker/build-push-action@v6
with:
context: .
file: docker/Dockerfile
build-args: |
BUILD_TYPE=ocr
USE_LOCAL_SOURCE=1
tags: ccextractor:ocr
load: true
cache-from: type=gha,scope=docker-ocr
cache-to: type=gha,mode=max,scope=docker-ocr
- name: Test OCR image
run: |
docker run --rm ccextractor:ocr --version
echo "OCR build successful"
build_hardsubx:
name: Docker build (hardsubx)
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v6
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
- name: Build HardSubX image
uses: docker/build-push-action@v6
with:
context: .
file: docker/Dockerfile
build-args: |
BUILD_TYPE=hardsubx
USE_LOCAL_SOURCE=1
tags: ccextractor:hardsubx
load: true
cache-from: type=gha,scope=docker-hardsubx
cache-to: type=gha,mode=max,scope=docker-hardsubx
- name: Test HardSubX image
run: |
docker run --rm ccextractor:hardsubx --version
echo "HardSubX build successful"

View File

@@ -7,6 +7,8 @@ on:
- '.github/workflows/build_linux.yml'
- '**.c'
- '**.h'
- '**CMakeLists.txt'
- '**.cmake'
- '**Makefile**'
- 'linux/**'
- 'package_creators/**'
@@ -17,6 +19,8 @@ on:
- '.github/workflows/build_linux.yml'
- '**.c'
- '**.h'
- '**CMakeLists.txt'
- '**.cmake'
- '**Makefile**'
- 'linux/**'
- 'package_creators/**'
@@ -27,7 +31,7 @@ jobs:
steps:
- name: Install dependencies
run: sudo apt update && sudo apt-get install libgpac-dev libtesseract-dev libavcodec-dev libavdevice-dev libx11-dev libxcb1-dev libxcb-shm0-dev
- uses: actions/checkout@v4
- uses: actions/checkout@v6
- name: build
run: ./build -hardsubx
working-directory: ./linux
@@ -38,7 +42,7 @@ jobs:
run: mkdir ./linux/artifacts
- name: Copy release artifact
run: cp ./linux/ccextractor ./linux/artifacts/
- uses: actions/upload-artifact@v4
- uses: actions/upload-artifact@v6
with:
name: CCExtractor Linux build
path: ./linux/artifacts
@@ -47,7 +51,7 @@ jobs:
steps:
- name: Install dependencies
run: sudo apt update && sudo apt-get install libgpac-dev
- uses: actions/checkout@v4
- uses: actions/checkout@v6
- name: run autogen
run: ./autogen.sh
working-directory: ./linux
@@ -65,7 +69,7 @@ jobs:
steps:
- name: Install dependencies
run: sudo apt update && sudo apt-get install libgpac-dev
- uses: actions/checkout@v4
- uses: actions/checkout@v6
- name: cmake
run: mkdir build && cd build && cmake ../src
- name: build
@@ -76,7 +80,7 @@ jobs:
cmake_ocr_hardsubx:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/checkout@v6
- name: Install dependencies
run: sudo apt update && sudo apt install libgpac-dev libtesseract-dev libavformat-dev libavdevice-dev libswscale-dev yasm
- name: cmake
@@ -94,9 +98,9 @@ jobs:
steps:
- name: Install dependencies
run: sudo apt update && sudo apt-get install libgpac-dev
- uses: actions/checkout@v4
- uses: actions/checkout@v6
- name: cache
uses: actions/cache@v4
uses: actions/cache@v5
with:
path: |
src/rust/.cargo/registry

View File

@@ -0,0 +1,154 @@
name: Build Linux (System Libs)
on:
# Build on releases
release:
types: [published]
# Allow manual trigger
workflow_dispatch:
inputs:
build_type:
description: 'Build type (all, basic, hardsubx)'
required: false
default: 'all'
# Build on pushes to workflow file for testing
push:
paths:
- '.github/workflows/build_linux_systemlibs.yml'
- 'linux/build'
permissions:
contents: write
jobs:
build-systemlibs:
runs-on: ubuntu-22.04
strategy:
fail-fast: false
matrix:
build_type: [basic, hardsubx]
steps:
- name: Check if should build this variant
id: should_build
run: |
if [ "${{ github.event_name }}" = "workflow_dispatch" ]; then
INPUT_TYPE="${{ github.event.inputs.build_type }}"
if [ "$INPUT_TYPE" = "all" ] || [ "$INPUT_TYPE" = "${{ matrix.build_type }}" ]; then
echo "should_build=true" >> $GITHUB_OUTPUT
else
echo "should_build=false" >> $GITHUB_OUTPUT
fi
else
echo "should_build=true" >> $GITHUB_OUTPUT
fi
- name: Checkout repository
if: steps.should_build.outputs.should_build == 'true'
uses: actions/checkout@v6
- name: Install base dependencies
if: steps.should_build.outputs.should_build == 'true'
run: |
sudo apt-get update
sudo apt-get install -y --no-install-recommends \
build-essential \
pkg-config \
zlib1g-dev \
libpng-dev \
libfreetype-dev \
libutf8proc-dev \
libgpac-dev \
libtesseract-dev \
libleptonica-dev \
tesseract-ocr-eng \
clang \
libclang-dev
- name: Install FFmpeg dependencies (HardSubX)
if: steps.should_build.outputs.should_build == 'true' && matrix.build_type == 'hardsubx'
run: |
sudo apt-get install -y --no-install-recommends \
libavcodec-dev \
libavformat-dev \
libavutil-dev \
libswscale-dev \
libswresample-dev \
libavfilter-dev \
libavdevice-dev \
libxcb1-dev \
libxcb-shm0-dev \
libx11-dev \
liblzma-dev
- name: Install Rust toolchain
if: steps.should_build.outputs.should_build == 'true'
uses: dtolnay/rust-toolchain@stable
- name: Build with system libraries
if: steps.should_build.outputs.should_build == 'true'
run: |
cd linux
if [ "${{ matrix.build_type }}" = "hardsubx" ]; then
./build -system-libs -hardsubx
else
./build -system-libs
fi
- name: Verify build
if: steps.should_build.outputs.should_build == 'true'
run: |
./linux/ccextractor --version
echo "=== Library dependencies ==="
ldd ./linux/ccextractor | grep -E 'freetype|png|utf8proc|tesseract|leptonica' || true
- name: Get output name
if: steps.should_build.outputs.should_build == 'true'
id: output_name
run: |
case "${{ matrix.build_type }}" in
basic)
echo "name=ccextractor-linux-systemlibs-x86_64" >> $GITHUB_OUTPUT
;;
hardsubx)
echo "name=ccextractor-linux-systemlibs-hardsubx-x86_64" >> $GITHUB_OUTPUT
;;
esac
- name: Package binary
if: steps.should_build.outputs.should_build == 'true'
run: |
mkdir -p package
cp linux/ccextractor package/
# Create a simple README for the package
cat > package/README.txt << 'EOF'
CCExtractor - System Libraries Build
=====================================
This build uses system libraries (dynamic linking).
Required system packages (Debian/Ubuntu):
sudo apt install libgpac12 libtesseract5 libleptonica6 \
libpng16-16 libfreetype6 libutf8proc3
For HardSubX builds, also install:
sudo apt install libavcodec60 libavformat60 libswscale7 libavfilter9
Run with: ./ccextractor --help
EOF
tar -czvf ${{ steps.output_name.outputs.name }}.tar.gz -C package .
- name: Upload artifact
if: steps.should_build.outputs.should_build == 'true'
uses: actions/upload-artifact@v6
with:
name: ${{ steps.output_name.outputs.name }}
path: ${{ steps.output_name.outputs.name }}.tar.gz
- name: Upload to Release
if: steps.should_build.outputs.should_build == 'true' && github.event_name == 'release'
uses: softprops/action-gh-release@v2
with:
files: ${{ steps.output_name.outputs.name }}.tar.gz
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

View File

@@ -7,6 +7,8 @@ on:
- '.github/workflows/build_mac.yml'
- '**.c'
- '**.h'
- '**CMakeLists.txt'
- '**.cmake'
- '**Makefile**'
- 'mac/**'
- 'package_creators/**'
@@ -17,6 +19,8 @@ on:
- '.github/workflows/build_mac.yml'
- '**.c'
- '**.h'
- '**CMakeLists.txt'
- '**.cmake'
- '**Makefile**'
- 'mac/**'
- 'package_creators/**'
@@ -27,7 +31,7 @@ jobs:
steps:
- name: Install dependencies
run: brew install pkg-config autoconf automake libtool tesseract leptonica gpac
- uses: actions/checkout@v4
- uses: actions/checkout@v6
- name: build
run: ./build.command
working-directory: ./mac
@@ -38,14 +42,27 @@ jobs:
run: mkdir ./mac/artifacts
- name: Copy release artifact
run: cp ./mac/ccextractor ./mac/artifacts/
- uses: actions/upload-artifact@v4
- uses: actions/upload-artifact@v6
with:
name: CCExtractor mac build
path: ./mac/artifacts
build_shell_system_libs:
# Test building with system libraries via pkg-config (for Homebrew formula compatibility)
runs-on: macos-latest
steps:
- name: Install dependencies
run: brew install pkg-config autoconf automake libtool tesseract leptonica gpac freetype libpng protobuf-c utf8proc zlib
- uses: actions/checkout@v6
- name: build with system libs
run: ./build.command -system-libs
working-directory: ./mac
- name: Display version information
run: ./ccextractor --version
working-directory: ./mac
build_autoconf:
runs-on: macos-latest
steps:
- uses: actions/checkout@v4
- uses: actions/checkout@v6
- name: Install dependencies
run: brew install pkg-config autoconf automake libtool gpac
- name: run autogen
@@ -63,10 +80,10 @@ jobs:
cmake:
runs-on: macos-latest
steps:
- uses: actions/checkout@v4
- uses: actions/checkout@v6
- name: dependencies
run: brew install gpac
- uses: actions/checkout@v4
- uses: actions/checkout@v6
- name: cmake
run: mkdir build && cd build && cmake ../src
- name: build
@@ -74,12 +91,76 @@ jobs:
working-directory: build
- name: Display version information
run: ./build/ccextractor --version
cmake_ocr_hardsubx:
runs-on: macos-latest
steps:
- uses: actions/checkout@v6
- name: Install dependencies
run: brew install pkg-config autoconf automake libtool tesseract leptonica gpac ffmpeg
- name: cmake
run: |
mkdir build && cd build
cmake -DWITH_OCR=ON -DWITH_HARDSUBX=ON ../src
- name: build
run: |
make -j$(nproc)
working-directory: build
- name: Display version information
run: ./build/ccextractor --version
build_shell_hardsubx:
# Test build.command with -hardsubx flag (burned-in subtitle extraction)
runs-on: macos-latest
steps:
- name: Install dependencies
run: brew install pkg-config autoconf automake libtool tesseract leptonica gpac ffmpeg
- uses: actions/checkout@v6
- name: build with hardsubx
run: ./build.command -hardsubx
working-directory: ./mac
- name: Display version information
run: ./ccextractor --version
working-directory: ./mac
- name: Verify hardsubx support
run: |
# Check that -hardsubx is recognized (will fail if not compiled in)
./ccextractor -hardsubx --help 2>&1 | head -20 || true
working-directory: ./mac
build_autoconf_hardsubx:
# Test autoconf build with HARDSUBX enabled (fixes issue #1173)
runs-on: macos-latest
steps:
- uses: actions/checkout@v6
- name: Install dependencies
run: brew install pkg-config autoconf automake libtool tesseract leptonica gpac ffmpeg
- name: run autogen
run: ./autogen.sh
working-directory: ./mac
- name: configure with hardsubx
run: |
# Set Homebrew paths for configure to find libraries
export HOMEBREW_PREFIX="$(brew --prefix)"
export LDFLAGS="-L${HOMEBREW_PREFIX}/lib"
export CPPFLAGS="-I${HOMEBREW_PREFIX}/include"
export PKG_CONFIG_PATH="${HOMEBREW_PREFIX}/lib/pkgconfig"
./configure --enable-hardsubx --enable-ocr
working-directory: ./mac
- name: make
run: make
working-directory: ./mac
- name: Display version information
run: ./ccextractor --version
working-directory: ./mac
- name: Verify hardsubx support
run: |
# Check that -hardsubx is recognized
./ccextractor -hardsubx --help 2>&1 | head -20 || true
working-directory: ./mac
build_rust:
runs-on: macos-latest
steps:
- uses: actions/checkout@v4
- name: cache
uses: actions/cache@v4
- uses: actions/checkout@v6
- name: cache
uses: actions/cache@v5
with:
path: |
src/rust/.cargo/registry
@@ -92,5 +173,5 @@ jobs:
toolchain: stable
override: true
- name: build
run: cargo build
run: cargo build
working-directory: ./src/rust

51
.github/workflows/build_snap.yml vendored Normal file
View File

@@ -0,0 +1,51 @@
name: Build CCExtractor Snap
on:
workflow_dispatch:
release:
types: [published]
jobs:
build_snap:
name: Build Snap package
runs-on: ubuntu-22.04
steps:
- name: Checkout repository
uses: actions/checkout@v6
- name: Install snapd
run: |
sudo apt update
sudo apt install -y snapd
- name: Start snapd
run: |
sudo systemctl start snapd.socket
sudo systemctl start snapd
- name: Install Snapcraft
run: |
sudo snap install core22
sudo snap install snapcraft --classic
- name: Show Snapcraft version
run: snapcraft --version
- name: Build snap
run: sudo snapcraft --destructive-mode
- name: List generated snap
run: ls -lh *.snap
- name: Upload snap as workflow artifact
uses: actions/upload-artifact@v6
with:
name: CCExtractor Snap
path: "*.snap"
- name: Upload snap to GitHub Release
if: github.event_name == 'release'
uses: softprops/action-gh-release@v2
with:
files: "*.snap"

View File

@@ -3,8 +3,7 @@ name: Build CCExtractor on Windows
env:
RUSTFLAGS: -Ctarget-feature=+crt-static
VCPKG_DEFAULT_TRIPLET: x64-windows-static
VCPKG_DEFAULT_BINARY_CACHE: C:\vcpkg\.cache
VCPKG_COMMIT: fba75d09065fcc76a25dcf386b1d00d33f5175af
VCPKG_COMMIT: ab2977be50c702126336e5088f4836060733c899
on:
workflow_dispatch:
@@ -13,6 +12,8 @@ on:
- ".github/workflows/build_windows.yml"
- "**.c"
- "**.h"
- "**CMakeLists.txt"
- "**.cmake"
- "windows/**"
- "src/rust/**"
pull_request:
@@ -21,108 +22,118 @@ on:
- ".github/workflows/build_windows.yml"
- "**.c"
- "**.h"
- "**CMakeLists.txt"
- "**.cmake"
- "windows/**"
- "src/rust/**"
jobs:
build_release:
build:
runs-on: windows-2022
steps:
- name: Check out repository
uses: actions/checkout@v4
uses: actions/checkout@v6
- name: Setup MSBuild.exe
uses: microsoft/setup-msbuild@v2.0.0
with:
msbuild-architecture: x64
# Install GPAC (fast, ~30s, not worth caching complexity)
- name: Install gpac
run: choco install gpac --version 2.4.0
run: choco install gpac --version 2.4.0 --no-progress
# Use lukka/run-vcpkg for better caching
- name: Setup vcpkg
run: mkdir C:\vcpkg\.cache
- name: Cache vcpkg
id: cache
uses: actions/cache@v4
uses: lukka/run-vcpkg@v11
id: runvcpkg
with:
path: |
C:\vcpkg\.cache
key: vcpkg-${{ runner.os }}-${{ env.VCPKG_COMMIT }}
- name: Build vcpkg
run: |
git clone https://github.com/microsoft/vcpkg
./vcpkg/bootstrap-vcpkg.bat
- name: Install dependencies
vcpkgGitCommitId: ${{ env.VCPKG_COMMIT }}
vcpkgDirectory: ${{ github.workspace }}/vcpkg
vcpkgJsonGlob: 'windows/vcpkg.json'
# Cache vcpkg installed packages separately for faster restores
- name: Cache vcpkg installed packages
id: vcpkg-installed-cache
uses: actions/cache@v5
with:
path: ${{ github.workspace }}/vcpkg/installed
key: vcpkg-installed-${{ runner.os }}-${{ env.VCPKG_COMMIT }}-${{ hashFiles('windows/vcpkg.json') }}
restore-keys: |
vcpkg-installed-${{ runner.os }}-${{ env.VCPKG_COMMIT }}-
- name: Install vcpkg dependencies
if: steps.vcpkg-installed-cache.outputs.cache-hit != 'true'
run: ${{ github.workspace }}/vcpkg/vcpkg.exe install --x-install-root ${{ github.workspace }}/vcpkg/installed/
working-directory: windows
- uses: actions-rs/toolchain@v1
# Cache Rust/Cargo artifacts
- name: Cache Cargo registry
uses: actions/cache@v5
with:
toolchain: stable
override: true
path: |
~/.cargo/registry
~/.cargo/git
key: ${{ runner.os }}-cargo-registry-${{ hashFiles('**/Cargo.lock') }}
restore-keys: |
${{ runner.os }}-cargo-registry-
# Cache Cargo build artifacts - rust.bat sets CARGO_TARGET_DIR to windows/
# which results in artifacts at windows/x86_64-pc-windows-msvc/
- name: Cache Cargo build artifacts
uses: actions/cache@v5
with:
path: ${{ github.workspace }}/windows/x86_64-pc-windows-msvc
key: ${{ runner.os }}-cargo-build-${{ hashFiles('**/Cargo.lock') }}-${{ hashFiles('src/rust/**/*.rs') }}
restore-keys: |
${{ runner.os }}-cargo-build-${{ hashFiles('**/Cargo.lock') }}-
${{ runner.os }}-cargo-build-
- name: Setup Rust toolchain
uses: dtolnay/rust-toolchain@stable
- name: Install Win 10 SDK
uses: ilammy/msvc-dev-cmd@v1
- name: build Release-Full
# Build Release-Full
- name: Build Release-Full
env:
LIBCLANG_PATH: "C:\\Program Files\\LLVM\\lib"
LLVM_CONFIG_PATH: "C:\\Program Files\\LLVM\\bin\\llvm-config"
CARGO_TARGET_DIR: "..\\..\\windows"
BINDGEN_EXTRA_CLANG_ARGS: -fmsc-version=0
VCPKG_ROOT: ${{ github.workspace }}/vcpkg
run: msbuild ccextractor.sln /p:Configuration=Release-Full /p:Platform=x64
working-directory: ./windows
- name: Display version information
- name: Display Release version information
run: ./ccextractorwinfull.exe --version
working-directory: ./windows/x64/Release-Full
- uses: actions/upload-artifact@v4
- name: Upload Release artifact
uses: actions/upload-artifact@v6
with:
name: CCExtractor Windows Release build
path: |
./windows/x64/Release-Full/ccextractorwinfull.exe
./windows/x64/Release-Full/*.dll
build_debug:
runs-on: windows-2022
steps:
- name: Check out repository
uses: actions/checkout@v4
- name: Setup MSBuild.exe
uses: microsoft/setup-msbuild@v2.0.0
with:
msbuild-architecture: x64
- name: Install gpac
run: choco install gpac --version 2.4.0
- name: Setup vcpkg
run: mkdir C:\vcpkg\.cache
- name: Cache vcpkg
id: cache
uses: actions/cache@v4
with:
path: |
C:\vcpkg\.cache
key: vcpkg-${{ runner.os }}-${{ env.VCPKG_COMMIT }}
- name: Build vcpkg
run: |
git clone https://github.com/microsoft/vcpkg
./vcpkg/bootstrap-vcpkg.bat
- name: Install dependencies
run: ${{ github.workspace }}/vcpkg/vcpkg.exe install --x-install-root ${{ github.workspace }}/vcpkg/installed/
working-directory: windows
- uses: actions-rs/toolchain@v1
with:
toolchain: stable
override: true
- name: Install Win 10 SDK
uses: ilammy/msvc-dev-cmd@v1
- name: build Debug-Full
# Build Debug-Full (reuses cached Cargo artifacts)
- name: Build Debug-Full
env:
LIBCLANG_PATH: "C:\\Program Files\\LLVM\\lib"
LLVM_CONFIG_PATH: "C:\\Program Files\\LLVM\\bin\\llvm-config"
CARGO_TARGET_DIR: "..\\..\\windows"
BINDGEN_EXTRA_CLANG_ARGS: -fmsc-version=0
VCPKG_ROOT: ${{ github.workspace }}/vcpkg
run: msbuild ccextractor.sln /p:Configuration=Debug-Full /p:Platform=x64
working-directory: ./windows
- name: Display version information
- name: Display Debug version information
continue-on-error: true
run: ./ccextractorwinfull.exe --version
working-directory: ./windows/x64/Debug-Full
- uses: actions/upload-artifact@v4
- name: Upload Debug artifact
uses: actions/upload-artifact@v6
with:
name: CCExtractor Windows Debug build
path: |

View File

@@ -19,7 +19,7 @@ jobs:
format:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/checkout@v6
- name: Format code
run: |
find src/ -type f -not -path "src/thirdparty/*" -not -path "src/lib_ccx/zvbi/*" -name '*.c' -not -path "src/GUI/icon_data.c" | xargs clang-format -i
@@ -33,9 +33,9 @@ jobs:
run:
working-directory: ${{ matrix.workdir }}
steps:
- uses: actions/checkout@v4
- uses: actions/checkout@v6
- name: cache
uses: actions/cache@v4
uses: actions/cache@v5
with:
path: |
${{ matrix.workdir }}/.cargo/registry

15
.github/workflows/homebrew.yml vendored Normal file
View File

@@ -0,0 +1,15 @@
name: Bump Homebrew Formula
on:
release:
types: [published]
jobs:
homebrew:
runs-on: ubuntu-latest
steps:
- name: Update Homebrew formula
uses: dawidd6/action-homebrew-bump-formula@v7
with:
token: ${{ secrets.HOMEBREW_GITHUB_API_TOKEN }}
formula: ccextractor

136
.github/workflows/publish_chocolatey.yml vendored Normal file
View File

@@ -0,0 +1,136 @@
# Publish to Chocolatey Community Repository
#
# PREREQUISITES:
# 1. Create a Chocolatey account at https://community.chocolatey.org/account/Register
# 2. Get your API key from https://community.chocolatey.org/account
# 3. Add the API key as repository secret: CHOCOLATEY_API_KEY
#
# Reference: https://docs.chocolatey.org/en-us/create/create-packages-quick-start
name: Publish to Chocolatey
on:
release:
types: [released]
workflow_dispatch:
inputs:
release_tag:
description: 'Release tag to publish (e.g., v0.96.1)'
required: true
type: string
jobs:
publish:
runs-on: windows-latest
steps:
- name: Checkout repository
uses: actions/checkout@v6
- name: Get version from tag
id: version
shell: bash
run: |
TAG="${{ github.event.inputs.release_tag || github.event.release.tag_name }}"
# Strip 'v' prefix if present
VERSION="${TAG#v}"
echo "version=$VERSION" >> $GITHUB_OUTPUT
echo "tag=$TAG" >> $GITHUB_OUTPUT
- name: Download MSI from release
shell: pwsh
run: |
$version = "${{ steps.version.outputs.version }}"
$tag = "${{ steps.version.outputs.tag }}"
$msiUrl = "https://github.com/CCExtractor/ccextractor/releases/download/$tag/CCExtractor.$version.msi"
Write-Host "Downloading MSI from: $msiUrl"
Invoke-WebRequest -Uri $msiUrl -OutFile "CCExtractor.msi"
# Calculate SHA256 checksum
$hash = (Get-FileHash -Path "CCExtractor.msi" -Algorithm SHA256).Hash
Write-Host "SHA256: $hash"
echo "MSI_CHECKSUM=$hash" >> $env:GITHUB_ENV
- name: Update nuspec version
shell: pwsh
run: |
$version = "${{ steps.version.outputs.version }}"
$nuspecPath = "packaging/chocolatey/ccextractor.nuspec"
$content = Get-Content $nuspecPath -Raw
$content = $content -replace '<version>.*</version>', "<version>$version</version>"
Set-Content -Path $nuspecPath -Value $content
Write-Host "Updated nuspec to version $version"
- name: Update install script
shell: pwsh
run: |
$version = "${{ steps.version.outputs.version }}"
$tag = "${{ steps.version.outputs.tag }}"
$checksum = $env:MSI_CHECKSUM
$installScript = "packaging/chocolatey/tools/chocolateyInstall.ps1"
$content = Get-Content $installScript -Raw
# Update URL
$newUrl = "https://github.com/CCExtractor/ccextractor/releases/download/$tag/CCExtractor.$version.msi"
$content = $content -replace "url64bit\s*=\s*'[^']*'", "url64bit = '$newUrl'"
# Update checksum
$content = $content -replace "checksum64\s*=\s*'[^']*'", "checksum64 = '$checksum'"
Set-Content -Path $installScript -Value $content
Write-Host "Updated install script with URL and checksum"
- name: Build Chocolatey package
shell: pwsh
run: |
cd packaging/chocolatey
choco pack ccextractor.nuspec
# List the generated package
Get-ChildItem *.nupkg
- name: Test package locally
shell: pwsh
run: |
cd packaging/chocolatey
$nupkg = Get-ChildItem *.nupkg | Select-Object -First 1
Write-Host "Testing package: $($nupkg.Name)"
# Install from local package
choco install ccextractor --source="'.;https://community.chocolatey.org/api/v2/'" --yes --force
# Verify installation
$ccx = Get-Command ccextractor -ErrorAction SilentlyContinue
if ($ccx) {
Write-Host "CCExtractor found at: $($ccx.Source)"
& ccextractor --version
} else {
Write-Host "CCExtractor not found in PATH, checking Program Files..."
$exePath = Join-Path $env:ProgramFiles "CCExtractor\ccextractor.exe"
if (Test-Path $exePath) {
& $exePath --version
}
}
- name: Push to Chocolatey
shell: pwsh
env:
CHOCOLATEY_API_KEY: ${{ secrets.CHOCOLATEY_API_KEY }}
run: |
cd packaging/chocolatey
$nupkg = Get-ChildItem *.nupkg | Select-Object -First 1
Write-Host "Pushing $($nupkg.Name) to Chocolatey..."
choco push $nupkg.Name --source="https://push.chocolatey.org/" --api-key="$env:CHOCOLATEY_API_KEY"
Write-Host "Package submitted to Chocolatey! It may take some time to be moderated and published."
- name: Upload package artifact
uses: actions/upload-artifact@v6
with:
name: chocolatey-package
path: packaging/chocolatey/*.nupkg

38
.github/workflows/publish_winget.yml vendored Normal file
View File

@@ -0,0 +1,38 @@
# Publish to Windows Package Manager (winget)
#
# PREREQUISITES:
# 1. CCExtractor must already have ONE version in winget-pkgs before this works
# - Submit the initial manifest manually from packaging/winget/
# - PR to: https://github.com/microsoft/winget-pkgs
#
# 2. Create a fork of microsoft/winget-pkgs under the CCExtractor organization
# - https://github.com/CCExtractor/winget-pkgs (needs to be created)
#
# 3. Create a GitHub Personal Access Token (classic) with 'public_repo' scope
# - Add as repository secret: WINGET_TOKEN
#
# Reference: https://github.com/vedantmgoyal9/winget-releaser
name: Publish to WinGet
on:
release:
types: [released]
workflow_dispatch:
inputs:
release_tag:
description: 'Release tag to publish (e.g., v0.96.1)'
required: true
type: string
jobs:
publish:
runs-on: windows-latest
steps:
- name: Publish to WinGet
uses: vedantmgoyal9/winget-releaser@v2
with:
identifier: CCExtractor.CCExtractor
installers-regex: '\.msi$' # Only use the MSI installer
token: ${{ secrets.WINGET_TOKEN }}
release-tag: ${{ github.event.inputs.release_tag || github.event.release.tag_name }}

View File

@@ -5,23 +5,64 @@ on:
types:
- created
permissions:
contents: write
env:
RUSTFLAGS: -Ctarget-feature=+crt-static
VCPKG_DEFAULT_TRIPLET: x64-windows-static
VCPKG_DEFAULT_BINARY_CACHE: C:\vcpkg\.cache
VCPKG_COMMIT: ab2977be50c702126336e5088f4836060733c899
jobs:
build_windows:
runs-on: windows-latest
runs-on: windows-2022
steps:
- name: Check out repository
uses: actions/checkout@v4
uses: actions/checkout@v6
- name: Get the version
id: get_version
run: echo ::set-output name=VERSION::${GITHUB_REF/refs\/tags\/v/}
run: |
# Extract version from tag, strip 'v' prefix and everything after first dash
VERSION=${GITHUB_REF/refs\/tags\/v/}
VERSION=${VERSION%%-*}
# Save display version for filenames (e.g., 0.96.1)
echo ::set-output name=DISPLAY_VERSION::$VERSION
# Count dots to determine version format
DOTS="${VERSION//[^.]}"
PART_COUNT=$((${#DOTS} + 1))
# MSI requires 4-part version (major.minor.build.revision)
if [ "$PART_COUNT" -eq 2 ]; then
MSI_VERSION="${VERSION}.0.0"
elif [ "$PART_COUNT" -eq 3 ]; then
MSI_VERSION="${VERSION}.0"
else
MSI_VERSION="${VERSION}"
fi
echo ::set-output name=VERSION::$MSI_VERSION
shell: bash
- name: Setup MSBuild.exe
uses: microsoft/setup-msbuild@v2.0.0
- name: Install llvm and clang
uses: egor-tensin/setup-clang@v1
with:
version: latest
platform: x64
msbuild-architecture: x64
- name: Install gpac
run: choco install gpac --version 2.4.0
- name: Setup vcpkg
run: mkdir C:\vcpkg\.cache
- name: Cache vcpkg
id: cache
uses: actions/cache@v5
with:
path: |
C:\vcpkg\.cache
key: vcpkg-${{ runner.os }}-${{ env.VCPKG_COMMIT }}
- name: Build vcpkg
run: |
git clone https://github.com/microsoft/vcpkg
./vcpkg/bootstrap-vcpkg.bat
- name: Install dependencies
run: ${{ github.workspace }}/vcpkg/vcpkg.exe install --x-install-root ${{ github.workspace }}/vcpkg/installed/
working-directory: windows
- uses: actions-rs/toolchain@v1
with:
toolchain: stable
@@ -34,15 +75,24 @@ jobs:
LLVM_CONFIG_PATH: "C:\\Program Files\\LLVM\\bin\\llvm-config"
CARGO_TARGET_DIR: "..\\..\\windows"
BINDGEN_EXTRA_CLANG_ARGS: -fmsc-version=0
run: msbuild ccextractor.sln /p:Configuration=Release-Full /p:Platform=Win32
VCPKG_ROOT: ${{ github.workspace }}/vcpkg
run: msbuild ccextractor.sln /p:Configuration=Release-Full /p:Platform=x64
working-directory: ./windows
- name: Copy files to directory for installer
run: mkdir installer; cp ./Release-Full/ccextractorwinfull.exe ./installer; cp ./Release-Full/*.dll ./installer
run: mkdir installer; cp ./x64/Release-Full/ccextractorwinfull.exe ./installer; cp ./x64/Release-Full/*.dll ./installer
working-directory: ./windows
- name: Download tessdata for OCR support
run: |
mkdir -p ./installer/tessdata
# Download English traineddata from tessdata_fast (smaller, faster, good for most use cases)
Invoke-WebRequest -Uri "https://github.com/tesseract-ocr/tessdata_fast/raw/main/eng.traineddata" -OutFile "./installer/tessdata/eng.traineddata"
# Download OSD (Orientation and Script Detection) for automatic script detection
Invoke-WebRequest -Uri "https://github.com/tesseract-ocr/tessdata_fast/raw/main/osd.traineddata" -OutFile "./installer/tessdata/osd.traineddata"
working-directory: ./windows
- name: install WiX
run: dotnet tool install --global wix --version 4.0.0-preview.0 && wix extension -g add WixToolset.UI.wixext
run: dotnet tool uninstall --global wix; dotnet tool install --global wix --version 6.0.2 && wix extension add -g WixToolset.UI.wixext/6.0.2
- name: Make sure WiX works
run: wix --version && wix extension -g list
run: wix --version && wix extension list -g
- name: Download Flutter GUI
run: ((Invoke-WebRequest -UseBasicParsing https://api.github.com/repos/CCExtractor/ccextractorfluttergui/releases/latest).Content | ConvertFrom-Json).assets | ForEach-Object {if ($_.name -eq "windows.zip") { Invoke-WebRequest -UseBasicParsing -Uri $_.browser_download_url -OutFile windows.zip}}
working-directory: ./windows
@@ -50,32 +100,38 @@ jobs:
run: ls
working-directory: ./windows
- name: Unzip Flutter GUI
run: Expand-Archive -Path ./windows.zip -DestinationPath ./installer
run: Expand-Archive -Path ./windows.zip -DestinationPath ./installer -Force
working-directory: ./windows
- name: Display installer folder contents
run: Get-ChildItem -Recurse ./installer
working-directory: ./windows
- name: Create portable zip
run: Compress-Archive -Path ./installer/* -DestinationPath ./CCExtractor_win_portable.zip
run: Compress-Archive -Path ./installer/* -DestinationPath ./CCExtractor.${{ steps.get_version.outputs.DISPLAY_VERSION }}_win_portable.zip
working-directory: ./windows
- name: Build installer
run: wix build -ext "$HOME\.wix\extensions\WixToolset.UI.wixext\4.0.0-preview.0\tools\WixToolset.UI.wixext.dll" -d "AppVersion=${{ steps.get_version.outputs.VERSION }}.0.0" -o CCExtractor.msi installer.wxs
run: wix build -arch x64 -ext WixToolset.UI.wixext -d "AppVersion=${{ steps.get_version.outputs.VERSION }}" -o CCExtractor.${{ steps.get_version.outputs.DISPLAY_VERSION }}.msi installer.wxs CustomUI.wxs
working-directory: ./windows
- name: Upload as asset
uses: AButler/upload-release-assets@v3.0
with:
files: './windows/CCExtractor.msi;./windows/CCExtractor_win_portable.zip'
repo-token: ${{ secrets.GITHUB_TOKEN }}
files: './windows/CCExtractor.${{ steps.get_version.outputs.DISPLAY_VERSION }}.msi;./windows/CCExtractor.${{ steps.get_version.outputs.DISPLAY_VERSION }}_win_portable.zip'
repo-token: ${{ secrets.GITHUB_TOKEN }}
create_linux_package:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/checkout@v6
with:
path: ./ccextractor
- name: Get the version
id: get_version
run: |
VERSION=${GITHUB_REF/refs\/tags\/v/}
VERSION=${VERSION%%-*}
echo ::set-output name=DISPLAY_VERSION::$VERSION
- name: Create .tar.gz without git and windows folders
run: tar -pczf ./ccextractor_minimal.tar.gz --exclude "ccextractor/windows" --exclude "ccextractor/.git" ccextractor
run: tar -pczf ./ccextractor.${{ steps.get_version.outputs.DISPLAY_VERSION }}.tar.gz --exclude "ccextractor/windows" --exclude "ccextractor/.git" ccextractor
- name: Upload as asset
uses: AButler/upload-release-assets@v3.0
with:
files: './ccextractor_minimal.tar.gz'
files: './ccextractor.${{ steps.get_version.outputs.DISPLAY_VERSION }}.tar.gz'
repo-token: ${{ secrets.GITHUB_TOKEN }}

View File

@@ -18,9 +18,9 @@ jobs:
run:
working-directory: ./src/rust
steps:
- uses: actions/checkout@v4
- uses: actions/checkout@v6
- name: cache
uses: actions/cache@v4
uses: actions/cache@v5
with:
path: |
src/rust/.cargo/registry

11
.gitignore vendored
View File

@@ -17,6 +17,7 @@ CVS
mac/ccextractor
linux/ccextractor
linux/depend
linux/build_scan/
windows/x86_64-pc-windows-msvc/**
windows/Debug/**
windows/Debug-OCR/**
@@ -28,6 +29,7 @@ windows/Debug-Full/**
windows/x64/**
windows/ccextractor.VC.db
build/
build_*/
####
# Python
@@ -143,6 +145,9 @@ bazel*
#Intellij IDEs
.idea/
# Plans (local only)
plans/
# Rust build and MakeFiles (and CMake files)
src/rust/CMakeFiles/
src/rust/CMakeCache.txt
@@ -155,3 +160,9 @@ windows/*/debug/*
windows/*/CACHEDIR.TAG
windows/.rustc_info.json
linux/configure~
# Plans and temporary files
plans/
tess.log
**/tess.log
ut=srt*

View File

@@ -4,7 +4,7 @@ MAINTAINER = Marc Espie <espie@openbsd.org>
CATEGORIES = multimedia
COMMENT = closed caption subtitles extractor
HOMEPAGE = https://ccextractor.org
V = 0.94
V = 0.96.5
DISTFILES = ccextractor.${V:S/.//}-src.zip
MASTER_SITES = ${MASTER_SITE_SOURCEFORGE:=ccextractor/}
DISTNAME = ccextractor-$V

View File

@@ -2,7 +2,6 @@
# CCExtractor
<a href="https://travis-ci.org/CCExtractor/ccextractor"><img src="https://raw.githubusercontent.com/CCExtractor/ccextractor-org-media/master/static/macOS-build-badge-logo.png" width="20"></a> [![Build Status](https://travis-ci.org/CCExtractor/ccextractor.svg?branch=master)](https://travis-ci.org/CCExtractor/ccextractor)
[![Sample-Platform Build Status Windows](https://sampleplatform.ccextractor.org/static/img/status/build-windows.svg?maxAge=1800)](https://sampleplatform.ccextractor.org/test/master/windows)
[![Sample-Platform Build Status Linux](https://sampleplatform.ccextractor.org/static/img/status/build-linux.svg?maxAge=1800)](https://sampleplatform.ccextractor.org/test/master/linux)
[![SourceForge](https://img.shields.io/badge/SourceForge%20downloads-213k%2Ftotal-brightgreen.svg)](https://sourceforge.net/projects/ccextractor/)
@@ -29,6 +28,25 @@ The core functionality is written in C. Other languages used include C++ and Pyt
Downloads for precompiled binaries and source code can be found [on our website](https://ccextractor.org/public/general/downloads/).
### Windows Package Managers
**WinGet:**
```powershell
winget install CCExtractor.CCExtractor
```
**Chocolatey:**
```powershell
choco install ccextractor
```
**Scoop:**
```powershell
scoop bucket add extras
scoop install ccextractor
```
Extracting subtitles is relatively simple. Just run the following command:
`ccextractor <input>`
@@ -44,6 +62,34 @@ You can also find the list of parameters and their brief description by running
You can find sample files on [our website](https://ccextractor.org/public/general/tvsamples/) to test the software.
### Building from Source
- [Building on Windows using WSL](docs/build-wsl.md)
#### Linux (Autotools) build notes
CCExtractor also supports an autotools-based build system under the `linux/`
directory.
Important notes:
- The autotools workflow lives inside `linux/`. The `configure` script is
generated there and should be run from that directory.
- Typical build steps are:
```
cd linux
./autogen.sh
./configure
make
```
- Rust support is enabled automatically if `cargo` and `rustc` are available
on the system. In that case, Rust components are built and linked during
`make`.
- If you encounter unexpected build or linking issues, a clean rebuild
(`make clean` or a fresh clone) is recommended, especially when Rust is
involved.
This build flow has been tested on Linux and WSL.
## Compiling CCExtractor
To learn more about how to compile and build CCExtractor for your platform check the [compilation guide](https://github.com/CCExtractor/ccextractor/blob/master/docs/COMPILATION.MD).

239
docker/Dockerfile Normal file
View File

@@ -0,0 +1,239 @@
# CCExtractor Docker Build
#
# Build variants via BUILD_TYPE argument:
# - minimal: Basic CCExtractor without OCR
# - ocr: CCExtractor with OCR support (default)
# - hardsubx: CCExtractor with burned-in subtitle extraction (requires FFmpeg)
#
# Source options via USE_LOCAL_SOURCE argument:
# - 0 (default): Clone from GitHub (standalone Dockerfile usage)
# - 1: Use local source (when building from cloned repo)
#
# Build examples:
#
# # Standalone (just the Dockerfile, clones from GitHub):
# docker build -t ccextractor docker/
# docker build --build-arg BUILD_TYPE=hardsubx -t ccextractor docker/
#
# # From cloned repository (faster, uses local source):
# docker build --build-arg USE_LOCAL_SOURCE=1 -f docker/Dockerfile -t ccextractor .
# docker build --build-arg USE_LOCAL_SOURCE=1 --build-arg BUILD_TYPE=minimal -f docker/Dockerfile -t ccextractor .
ARG DEBIAN_VERSION=bookworm-slim
FROM debian:${DEBIAN_VERSION} AS base
FROM base AS builder
# Build arguments
ARG BUILD_TYPE=ocr
ARG USE_LOCAL_SOURCE=0
# BUILD_TYPE: minimal, ocr, hardsubx
# USE_LOCAL_SOURCE: 0 = git clone, 1 = copy local source
# Avoid interactive prompts during package installation
ENV DEBIAN_FRONTEND=noninteractive
# Install base build dependencies
RUN apt-get update && apt-get install -y --no-install-recommends \
git \
curl \
ca-certificates \
gcc \
g++ \
cmake \
make \
pkg-config \
bash \
zlib1g-dev \
libpng-dev \
libjpeg-dev \
libssl-dev \
libfreetype-dev \
libxml2-dev \
libcurl4-gnutls-dev \
clang \
libclang-dev \
&& rm -rf /var/lib/apt/lists/*
# Install Rust toolchain
RUN curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y --default-toolchain stable
ENV PATH="/root/.cargo/bin:${PATH}"
# Install OCR dependencies (for ocr and hardsubx builds)
RUN if [ "$BUILD_TYPE" = "ocr" ] || [ "$BUILD_TYPE" = "hardsubx" ]; then \
apt-get update && apt-get install -y --no-install-recommends \
tesseract-ocr \
libtesseract-dev \
libleptonica-dev \
&& rm -rf /var/lib/apt/lists/*; \
fi
# Install FFmpeg dependencies (for hardsubx build)
RUN if [ "$BUILD_TYPE" = "hardsubx" ]; then \
apt-get update && apt-get install -y --no-install-recommends \
libavcodec-dev \
libavformat-dev \
libavutil-dev \
libswscale-dev \
libswresample-dev \
libavfilter-dev \
libavdevice-dev \
&& rm -rf /var/lib/apt/lists/*; \
fi
# Build and install GPAC library
WORKDIR /root
RUN git clone -b v2.4.0 --depth 1 https://github.com/gpac/gpac
WORKDIR /root/gpac
RUN ./configure && make -j$(nproc) lib && make install-lib && ldconfig
WORKDIR /root
RUN rm -rf /root/gpac
# Get CCExtractor source (either clone or copy based on USE_LOCAL_SOURCE)
WORKDIR /root
# First, copy local source if provided (will be empty dir if building standalone)
COPY . /root/ccextractor-local/
# Then get source: use local copy if USE_LOCAL_SOURCE=1 and source exists,
# otherwise clone from GitHub
RUN if [ "$USE_LOCAL_SOURCE" = "1" ] && [ -f /root/ccextractor-local/src/ccextractor.c ]; then \
echo "Using local source"; \
mv /root/ccextractor-local /root/ccextractor; \
else \
echo "Cloning from GitHub"; \
rm -rf /root/ccextractor-local; \
git clone --depth 1 https://github.com/CCExtractor/ccextractor.git /root/ccextractor; \
fi
WORKDIR /root/ccextractor/linux
# Generate build info
RUN ./pre-build.sh
# Build Rust library with appropriate features
RUN if [ "$BUILD_TYPE" = "hardsubx" ]; then \
cd ../src/rust && \
CARGO_TARGET_DIR=../../linux/rust cargo build --release --features hardsubx_ocr; \
else \
cd ../src/rust && \
CARGO_TARGET_DIR=../../linux/rust cargo build --release; \
fi
RUN cp rust/release/libccx_rust.a ./libccx_rust.a
# Compile CCExtractor
RUN if [ "$BUILD_TYPE" = "minimal" ]; then \
BLD_FLAGS="-std=gnu99 -Wno-write-strings -Wno-pointer-sign -D_FILE_OFFSET_BITS=64 -DVERSION_FILE_PRESENT -DFT2_BUILD_LIBRARY -DGPAC_DISABLE_VTT -DGPAC_DISABLE_OD_DUMP -DGPAC_DISABLE_REMOTERY -DNO_GZIP -DGPAC_64_BITS"; \
BLD_INCLUDE="-I../src -I../src/lib_ccx/ -I /usr/include/gpac/ -I../src/thirdparty/libpng -I../src/thirdparty/zlib -I../src/lib_ccx/zvbi -I../src/thirdparty/lib_hash -I../src/thirdparty -I../src/thirdparty/freetype/include"; \
BLD_LINKER="-lm -Wl,--allow-multiple-definition -lpthread -ldl -lgpac ./libccx_rust.a"; \
elif [ "$BUILD_TYPE" = "hardsubx" ]; then \
BLD_FLAGS="-std=gnu99 -Wno-write-strings -Wno-pointer-sign -D_FILE_OFFSET_BITS=64 -DVERSION_FILE_PRESENT -DENABLE_OCR -DENABLE_HARDSUBX -DFT2_BUILD_LIBRARY -DGPAC_DISABLE_VTT -DGPAC_DISABLE_OD_DUMP -DGPAC_DISABLE_REMOTERY -DNO_GZIP -DGPAC_64_BITS"; \
BLD_INCLUDE="-I../src -I /usr/include/leptonica/ -I /usr/include/tesseract/ -I../src/lib_ccx/ -I /usr/include/gpac/ -I../src/thirdparty/libpng -I../src/thirdparty/zlib -I../src/lib_ccx/zvbi -I../src/thirdparty/lib_hash -I../src/thirdparty -I../src/thirdparty/freetype/include"; \
BLD_LINKER="-lm -Wl,--allow-multiple-definition -ltesseract -lleptonica -lpthread -ldl -lgpac -lswscale -lavutil -lavformat -lavcodec -lavfilter -lswresample ./libccx_rust.a"; \
else \
BLD_FLAGS="-std=gnu99 -Wno-write-strings -Wno-pointer-sign -D_FILE_OFFSET_BITS=64 -DVERSION_FILE_PRESENT -DENABLE_OCR -DFT2_BUILD_LIBRARY -DGPAC_DISABLE_VTT -DGPAC_DISABLE_OD_DUMP -DGPAC_DISABLE_REMOTERY -DNO_GZIP -DGPAC_64_BITS"; \
BLD_INCLUDE="-I../src -I /usr/include/leptonica/ -I /usr/include/tesseract/ -I../src/lib_ccx/ -I /usr/include/gpac/ -I../src/thirdparty/libpng -I../src/thirdparty/zlib -I../src/lib_ccx/zvbi -I../src/thirdparty/lib_hash -I../src/thirdparty -I../src/thirdparty/freetype/include"; \
BLD_LINKER="-lm -Wl,--allow-multiple-definition -ltesseract -lleptonica -lpthread -ldl -lgpac ./libccx_rust.a"; \
fi && \
SRC_LIBPNG="$(find ../src/thirdparty/libpng/ -name '*.c')" && \
SRC_ZLIB="$(find ../src/thirdparty/zlib/ -name '*.c')" && \
SRC_CCX="$(find ../src/lib_ccx/ -name '*.c')" && \
SRC_GPAC="$(find /usr/include/gpac/ -name '*.c' 2>/dev/null || true)" && \
SRC_HASH="$(find ../src/thirdparty/lib_hash/ -name '*.c')" && \
SRC_UTF8PROC="../src/thirdparty/utf8proc/utf8proc.c" && \
SRC_FREETYPE="../src/thirdparty/freetype/autofit/autofit.c \
../src/thirdparty/freetype/base/ftbase.c \
../src/thirdparty/freetype/base/ftbbox.c \
../src/thirdparty/freetype/base/ftbdf.c \
../src/thirdparty/freetype/base/ftbitmap.c \
../src/thirdparty/freetype/base/ftcid.c \
../src/thirdparty/freetype/base/ftfntfmt.c \
../src/thirdparty/freetype/base/ftfstype.c \
../src/thirdparty/freetype/base/ftgasp.c \
../src/thirdparty/freetype/base/ftglyph.c \
../src/thirdparty/freetype/base/ftgxval.c \
../src/thirdparty/freetype/base/ftinit.c \
../src/thirdparty/freetype/base/ftlcdfil.c \
../src/thirdparty/freetype/base/ftmm.c \
../src/thirdparty/freetype/base/ftotval.c \
../src/thirdparty/freetype/base/ftpatent.c \
../src/thirdparty/freetype/base/ftpfr.c \
../src/thirdparty/freetype/base/ftstroke.c \
../src/thirdparty/freetype/base/ftsynth.c \
../src/thirdparty/freetype/base/ftsystem.c \
../src/thirdparty/freetype/base/fttype1.c \
../src/thirdparty/freetype/base/ftwinfnt.c \
../src/thirdparty/freetype/bdf/bdf.c \
../src/thirdparty/freetype/bzip2/ftbzip2.c \
../src/thirdparty/freetype/cache/ftcache.c \
../src/thirdparty/freetype/cff/cff.c \
../src/thirdparty/freetype/cid/type1cid.c \
../src/thirdparty/freetype/gzip/ftgzip.c \
../src/thirdparty/freetype/lzw/ftlzw.c \
../src/thirdparty/freetype/pcf/pcf.c \
../src/thirdparty/freetype/pfr/pfr.c \
../src/thirdparty/freetype/psaux/psaux.c \
../src/thirdparty/freetype/pshinter/pshinter.c \
../src/thirdparty/freetype/psnames/psnames.c \
../src/thirdparty/freetype/raster/raster.c \
../src/thirdparty/freetype/sfnt/sfnt.c \
../src/thirdparty/freetype/smooth/smooth.c \
../src/thirdparty/freetype/truetype/truetype.c \
../src/thirdparty/freetype/type1/type1.c \
../src/thirdparty/freetype/type42/type42.c \
../src/thirdparty/freetype/winfonts/winfnt.c" && \
BLD_SOURCES="../src/ccextractor.c $SRC_CCX $SRC_GPAC $SRC_ZLIB $SRC_LIBPNG $SRC_HASH $SRC_UTF8PROC $SRC_FREETYPE" && \
gcc $BLD_FLAGS $BLD_INCLUDE -o ccextractor $BLD_SOURCES $BLD_LINKER
# Copy binary to known location
RUN cp /root/ccextractor/linux/ccextractor /ccextractor
# Final minimal image
FROM base AS final
ARG BUILD_TYPE=ocr
# Avoid interactive prompts
ENV DEBIAN_FRONTEND=noninteractive
# Install runtime dependencies based on build type
RUN apt-get update && apt-get install -y --no-install-recommends \
libpng16-16 \
libjpeg62-turbo \
zlib1g \
libssl3 \
libcurl4 \
&& rm -rf /var/lib/apt/lists/*
# OCR runtime dependencies
RUN if [ "$BUILD_TYPE" = "ocr" ] || [ "$BUILD_TYPE" = "hardsubx" ]; then \
apt-get update && apt-get install -y --no-install-recommends \
tesseract-ocr \
liblept5 \
&& rm -rf /var/lib/apt/lists/*; \
fi
# HardSubX runtime dependencies
RUN if [ "$BUILD_TYPE" = "hardsubx" ]; then \
apt-get update && apt-get install -y --no-install-recommends \
libavcodec59 \
libavformat59 \
libavutil57 \
libswscale6 \
libswresample4 \
libavfilter8 \
libavdevice59 \
&& rm -rf /var/lib/apt/lists/*; \
fi
# Copy GPAC library from builder
COPY --from=builder /usr/local/lib/libgpac.so* /usr/local/lib/
# Update library cache
RUN ldconfig
# Copy CCExtractor binary
COPY --from=builder /ccextractor /ccextractor
ENTRYPOINT ["/ccextractor"]

View File

@@ -1,61 +1,91 @@
# CCExtractor Docker image
# CCExtractor Docker Image
This dockerfile prepares a minimalist Docker image with CCExtractor. It compiles CCExtractor from sources following instructions from the [Compilation Guide](https://github.com/CCExtractor/ccextractor/blob/master/docs/COMPILATION.MD).
This Dockerfile builds CCExtractor with support for multiple build variants.
You can install the latest build of this image by running `docker pull CCExtractor/ccextractor`
## Build Variants
## Build
| Variant | Description | Features |
|---------|-------------|----------|
| `minimal` | Basic CCExtractor | No OCR support |
| `ocr` | With OCR support (default) | Tesseract OCR for bitmap subtitles |
| `hardsubx` | With burned-in subtitle extraction | OCR + FFmpeg for hardcoded subtitles |
You can build the Docker image directly from the Dockerfile provided in [docker](https://github.com/CCExtractor/ccextractor/tree/master/docker) directory of CCExtractor source
## Building
### Standalone Build (from Dockerfile only)
You can build CCExtractor using just the Dockerfile - it will clone the source from GitHub:
```bash
$ git clone https://github.com/CCExtractor/ccextractor.git && cd ccextractor
$ cd docker/
$ docker build -t ccextractor .
# Default build (OCR enabled)
docker build -t ccextractor docker/
# Minimal build (no OCR)
docker build --build-arg BUILD_TYPE=minimal -t ccextractor docker/
# HardSubX build (OCR + FFmpeg for burned-in subtitles)
docker build --build-arg BUILD_TYPE=hardsubx -t ccextractor docker/
```
### Build from Cloned Repository (faster)
If you have already cloned the repository, you can use local source for faster builds:
```bash
git clone https://github.com/CCExtractor/ccextractor.git
cd ccextractor
# Default build (OCR enabled)
docker build --build-arg USE_LOCAL_SOURCE=1 -f docker/Dockerfile -t ccextractor .
# Minimal build
docker build --build-arg USE_LOCAL_SOURCE=1 --build-arg BUILD_TYPE=minimal -f docker/Dockerfile -t ccextractor .
# HardSubX build
docker build --build-arg USE_LOCAL_SOURCE=1 --build-arg BUILD_TYPE=hardsubx -f docker/Dockerfile -t ccextractor .
```
## Build Arguments
| Argument | Default | Description |
|----------|---------|-------------|
| `BUILD_TYPE` | `ocr` | Build variant: `minimal`, `ocr`, or `hardsubx` |
| `USE_LOCAL_SOURCE` | `0` | Set to `1` to use local source instead of cloning |
| `DEBIAN_VERSION` | `bookworm-slim` | Debian version to use as base |
## Usage
The CCExtractor Docker image can be used in several ways, depending on your needs.
### Basic Usage
```bash
# General usage
$ docker run ccextractor:latest <features>
# Show version
docker run --rm ccextractor --version
# Show help
docker run --rm ccextractor --help
```
1. Process a local file & use `-o` flag
### Processing Local Files
To process a local video file, mount a directory containing the input file inside the container:
Mount your local directory to process files:
```bash
# Use `-o` to specifying output file
$ docker run --rm -v $(pwd):$(pwd) -w $(pwd) ccextractor:latest input.mp4 -o output.srt
# Process a video file with output file
docker run --rm -v $(pwd):$(pwd) -w $(pwd) ccextractor input.mp4 -o output.srt
# Alternatively use `--stdout` feature
$ docker run --rm -v $(pwd):$(pwd) -w $(pwd) ccextractor:latest input.mp4 --stdout > output.srt
# Process using stdout
docker run --rm -v $(pwd):$(pwd) -w $(pwd) ccextractor input.mp4 --stdout > output.srt
```
Run this command from where your input video file is present, and change `input.mp4` & `output.srt` with the actual name of files.
2. Enter an interactive environment
If you need to run CCExtractor with additional options or perform other tasks within the container, you can enter an interactive environment:
bash
### Interactive Mode
```bash
$ docker run --rm -it --entrypoint='sh' ccextractor:latest
docker run --rm -it --entrypoint=/bin/bash ccextractor
```
This will start a Bash shell inside the container, allowing you to run CCExtractor commands manually or perform other operations.
## Image Size
### Example
I run help command in image built from `dockerfile`
```bash
$ docker build -t ccextractor .
$ docker run --rm ccextractor:latest --help
```
This will show the `--help` message of CCExtractor tool
From there you can see all the features and flags which can be used.
The multi-stage build produces runtime images:
- `minimal`: ~130MB
- `ocr`: ~215MB (includes Tesseract)
- `hardsubx`: ~610MB (includes Tesseract + FFmpeg)

View File

@@ -1,46 +0,0 @@
FROM alpine:latest as base
FROM base as builder
RUN apk add --no-cache --update git curl gcc cmake glew glfw \
tesseract-ocr-dev leptonica-dev clang-dev llvm-dev make pkgconfig \
zlib-dev libpng-dev libjpeg-turbo-dev openssl-dev freetype-dev libxml2-dev bash cargo
WORKDIR /root
RUN git clone -b v2.4.0 https://github.com/gpac/gpac
WORKDIR /root/gpac/
RUN ./configure && make -j$(nproc) && make install-lib
WORKDIR /root
RUN rm -rf /root/gpac
RUN git clone https://github.com/CCExtractor/ccextractor.git
WORKDIR /root/ccextractor/linux
RUN ./pre-build.sh && ./build
RUN cp /root/ccextractor/linux/ccextractor /ccextractor && rm -rf ~/ccextractor
FROM base as final
COPY --from=builder /lib/ld-musl-x86_64.so.1 /lib/
COPY --from=builder /usr/lib/libtesseract.so.5 /usr/lib/
COPY --from=builder /usr/lib/libleptonica.so.6 /usr/lib/
COPY --from=builder /usr/local/lib/libgpac.so.12 /usr/local/lib/
COPY --from=builder /usr/lib/libstdc++.so.6 /usr/lib/
COPY --from=builder /usr/lib/libgcc_s.so.1 /usr/lib/
COPY --from=builder /usr/lib/libgomp.so.1 /usr/lib/
COPY --from=builder /usr/lib/libpng16.so.16 /usr/lib/
COPY --from=builder /usr/lib/libjpeg.so.8 /usr/lib/
COPY --from=builder /usr/lib/libgif.so.7 /usr/lib/
COPY --from=builder /usr/lib/libtiff.so.6 /usr/lib/
COPY --from=builder /usr/lib/libwebp.so.7 /usr/lib/
COPY --from=builder /usr/lib/libwebpmux.so.3 /usr/lib/
COPY --from=builder /usr/lib/libz.so.1 /lib/
COPY --from=builder /usr/lib/libssl.so.3 /lib/
COPY --from=builder /usr/lib/libcrypto.so.3 /lib/
COPY --from=builder /usr/lib/liblzma.so.5 /usr/lib/
COPY --from=builder /usr/lib/libzstd.so.1 /usr/lib/
COPY --from=builder /usr/lib/libsharpyuv.so.0 /usr/lib/
COPY --from=builder /ccextractor /
ENTRYPOINT [ "/ccextractor" ]

View File

@@ -0,0 +1,157 @@
# Building CCExtractor on macOS using System Libraries (-system-libs)
## Overview
This document explains how to build CCExtractor on macOS using system-installed libraries instead of bundled third-party libraries.
This build mode is required for Homebrew compatibility and is enabled via the `-system-libs` flag introduced in PR #1862.
## Why is -system-libs needed?
### Background
CCExtractor was removed from Homebrew (homebrew-core) because:
- Homebrew does not allow bundling third-party libraries
- The default CCExtractor build compiles libraries from `src/thirdparty/`
- This violates Homebrew packaging policies
### What -system-libs fixes
The `-system-libs` flag allows CCExtractor to:
- Use system-installed libraries via Homebrew
- Resolve headers and linker flags using `pkg-config`
- Skip compiling bundled copies of common libraries
This makes CCExtractor acceptable for Homebrew packaging.
## Build Modes Explained
### 1⃣ Default Build (Bundled Libraries)
**Command:**
```bash
./mac/build.command
```
**Behavior:**
- Compiles bundled libraries:
- `freetype`
- `libpng`
- `zlib`
- `utf8proc`
- Self-contained binary
- Larger size
- Suitable for standalone builds
### 2⃣ System Libraries Build (Homebrew-compatible)
**Command:**
```bash
./mac/build.command -system-libs
```
**Behavior:**
- Uses system libraries via `pkg-config`
- Does not compile bundled libraries
- Smaller binary
- Faster build
- Required for Homebrew
## Required Homebrew Dependencies
Install required dependencies:
```bash
brew install pkg-config autoconf automake libtool \
gpac freetype libpng protobuf-c utf8proc zlib
```
**Optional** (OCR / HARDSUBX support):
```bash
brew install tesseract leptonica ffmpeg
```
## How to Build
```bash
cd mac
./build.command -system-libs
```
**Verify:**
```bash
./ccextractor --version
```
## What Changes Internally with -system-libs
### Libraries NOT compiled (system-provided)
- **FreeType**
- **libpng**
- **zlib**
- **utf8proc**
### Libraries STILL bundled
- **lib_hash** (Custom SHA-256 implementation, no system equivalent)
## CI Coverage
A new CI job was added:
- `build_shell_system_libs`
**What it does:**
- Installs Homebrew dependencies
- Runs `./build.command -system-libs`
- Verifies the binary runs correctly
This ensures Homebrew-compatible builds stay working.
## Verification (Local)
You can confirm system libraries are used:
```bash
otool -L mac/ccextractor
```
**Expected output includes paths like:**
```
/opt/homebrew/opt/gpac/lib/libgpac.dylib
```
## Homebrew Formula Usage (Future)
Example formula snippet:
```ruby
def install
system "./mac/build.command", "-system-libs"
bin.install "mac/ccextractor"
end
```
## Summary
- `-system-libs` is opt-in
- Default build remains unchanged
- Enables CCExtractor to return to Homebrew
- Fully tested in CI and locally
## Related
- **PR #1862** — Add `-system-libs` flag
- **Issue #1580** — Homebrew compatibility
- **Issue #1534** — System library support

View File

@@ -1,58 +1,98 @@
1.0 (to be released)
0.96.6 (unreleased)
-------------------
- Fix: DVB EIT start time BCD decoding in XMLTV output causing invalid timestamps (#1835)
- New: Add Snap packaging support with Snapcraft configuration and GitHub Actions CI workflow.
- Fix: Clear status line output on Linux/WSL to prevent text artifacts (#2017)
- Fix: Prevent infinite loop on truncated MKV files
- Fix: Various memory safety and stability fixes in demuxers (MP4, PS, MKV, DVB)
- Fix: Delete empty output files instead of leaving 0-byte files (#1282)
- Fix: --mkvlang now supports BCP 47 language tags (e.g., en-US, zh-Hans-CN) and multiple codes
- Fix: segmentation fault when using --multiprogram
0.96.5 (2026-01-05)
-------------------
- New: CCExtractor is available again via Homebrew on macOS and Linux.
- New: Add support for raw CDP (Caption Distribution Packet) files (#1406)
- New: Add --scc-accurate-timing option for bandwidth-aware SCC output (#1120)
- Fix: MXF files containing CEA-708 captions not being detected/extracted (#1647)
- Docs: Add Windows WSL build instructions
- Fix: Security fixes (out-of-bounds read/write) in a few places in the legacy C code.
0.96.4 (2026-01-01)
-------------------
- New: Persistent CEA-708 decoder context - maintains state across multiple calls for proper subtitle continuity
- New: OCR character blacklist options (--ocr-blacklist, --ocr-blacklist-file) for improved accuracy
- New: OCR line-split option (--ocr-splitontimechange) for better subtitle segmentation
- Fix: 32-bit build failures on i686 and armv7l architectures
- Fix: Legacy command-line argument compatibility (-1, -2, -12, --sc, --svc)
- Fix: Prevent heap buffer overflow in Teletext processing (security fix)
- Fix: Prevent integer overflow leading to heap buffer overflow in Transport Stream handling (security fix)
- Fix: Lazy OCR initialization - only initialize when first DVB subtitle is encountered
- Build: Optimized Windows CI workflow for faster builds
- Fix: Updated GUI with version 0.7.1. A blind attempt to fix a hang on start on some Windows.
0.96.3 (2025-12-29)
-------------------
- New: VOBSUB subtitle extraction with OCR support for MP4 files
- New: VOBSUB subtitle extraction support for MKV/Matroska files
- New: Native SCC (Scenarist Closed Caption) input file support - CCExtractor can now read SCC files
- New: Configurable frame rate (--scc-framerate) and styled PAC codes for SCC output
- Fix: Apply --delay option to DVB/bitmap subtitles (previously only worked with text-based subtitles)
- Fix: 200ms timing offset in MOV/MP4 caption extraction
- Fix: utf8proc include path for system library builds
- Fix: Use fixed-width integer types in MP4 bswap functions for better portability
- Fix: Guard ocr_text access with ENABLE_OCR preprocessor check
- Fix: Preserve FFmpeg libs when building with -system-libs -hardsubx
- Build: Add vobsub_decoder to Windows and autoconf build systems
- Build: Add winget and Chocolatey packaging workflows for Windows distribution
- Docs: Add VOBSUB extraction documentation and subtile-ocr Dockerfile
0.96.2 (2025-12-26)
-------------------
- Fix: Resolve utf8proc header include path when building against system libraries on Linux.
- Rebundle Windows version to include required runtime files to process hardcoded subtitles
(hardcodex mode).
- New: Add optional -system-libs flag to Linux build script for package manager compatibility
0.96.1 (2025-12-25)
-------------------
- Rebundle Windows version to include an updated GUI. No changes in CCExtractor itself.
0.96 (2025-12-23)
-----------------
- New: Multi-page teletext extraction support (#665)
- Extract multiple teletext pages simultaneously with separate output files
- Use --tpage multiple times (e.g., --tpage 100 --tpage 200)
- Output files are named with page suffix (e.g., output_p100.srt, output_p200.srt)
- Fix: SPUPNG subtitle offset calculation to center based on actual image dimensions
- New: Added --list-tracks (-L) option to list all tracks in media files without processing
New: Chinese, Korean, Japanese support - proper encoding and OCR.
New: Correct McPoodle DVD raw format support
Fix: Timing is now frame perfect (using FFMpeg timing dump as reference) in all formats.
Fix: Solved garbling in all the pending issues we had on GitHub.
Fix: All causes of "premature end of file" messages due to bugs and not actual file cuts.
Fix: All memory leaks, double frees and usual C nastyness that valgrind could find.
- Fix Include ATSC VCT virtual channel numbers and call signs in XMLTV output
- Fix: Restore ATSC XMLTV generation with ETT parsing for extended descriptions, multi-segment handling, extended table ID's (EIT/VCT), corrected <programme> XMLTV formatting, buffer bounds fixes
- Fix: Add HEVC/H.265 stream type recognition to prevent crashes on ATSC 3.0 streams.
Fix: Tolerance to damaged streams - recover where possible instead of terminating.
Issues closed: Over 40! Too many to list here, but each of them was either a bug squashed or a feature implemented.
0.95 (2025-09-15 - never formally packaged)
-----------------
- Fix: ARM64/aarch64 build failure due to c_char type mismatch in nal.rs
- Fix: HardSubX OCR on Rust
- Removed the Share Module
- Fix: Regression failures on DVD files
- Fix: Segmentation faults on MP4 files with CEA-708 captions
- Refactor: Remove API structures from ccextractor
- New: Add Encoder Module to Rust
- Fix: Elementary stream regressions
- Fix: Segmentation faults on XDS files
- Fix: Clippy Errors Based on Rust 1.88
- IMPROVEMENT: Refactor and optimize Dockerfile
- Fix: Improved handling of IETF language tags in Matroska files (#1665)
- New: Create unit test for rust code (#1615)
- Breaking: Major argument flags revamp for CCExtractor (#1564 & #1619)
- New: Create a Docker image to simplify the CCExtractor usage without any environmental hustle (#1611)
- New: Add time units module in lib_ccxr (#1623)
- New: Add bits and levenshtein module in lib_ccxr (#1627)
- New: Add constants module in lib_ccxr (#1624)
- New: Add log module in lib_ccxr (#1622)
- New: Create `lib_ccxr` and `libccxr_exports` (#1621)
- Fix: Unexpected behavior of get_write_interval (#1609)
- Update: Bump rsmpeg to latest version for ffmpeg bindings (#1600)
- New: Add SCC support for CEA-708 decoder (#1595)
- Fix: respect `-stdout` even if multiple CC tracks are present in a Matroska input file (#1453)
- Fix: crash in Rust decoder on ATSC1.0 TS Files (#1407)
- Removed the --with-gui flag for linux/configure and mac/configure (use the Flutter GUI instead)
Refactor: Lots of code ported to Rust.
- Fix: Improved handling of IETF language tags in Matroska files (#1665)
- Breaking: Major argument flags revamp for CCExtractor (#1564 & #1619)
- Fix: segmentation fault in using hardsubx
- New: Add function (and command) that extracts closed caption subtitles as well as burnt-in subtitles from a file in a single pass. (As proposed in issue 726)
- Refactored: the `general_loop` function has some code moved to a new function
- Fix: WebVTT X-TIMESTAMP-MAP placement (#1463)
- Disable X-TIMESTAMP-MAP by default (changed option --no-timestamp-map to --timestamp-map)
- Fix: missing `#` in color attribute of font tag
- Fix: ffmpeg 5.0, tesseract 5.0 compatibility and remove deprecated methods
- Fix: tesseract 5.x traineddata location in ocr
- Fix: fix autoconf tesseract detection problem (#1503)
- Fix: add missing compile_info_real.h source to Autotools build
- Fix: add missing `-lavfilter` for hardsubx linking
- Fix: make webvtt-full work correctly with multi-byte utf-8 characters
- Fix: encoding of solid block in latin-1 and unicode
- Fix: McPoodle Broadcast Raw format for field 1
- Fix: Incorrect skipping of packets
- Fix: Repeated values for enums
- Cleanup: Remove the (unmaintained) Nuklear GUI code
- Cleanup: Reduce the amount of Windows build options in the project file
- Fix: infinite loop in MP4 file type detector.
- Improvement: Use Corrosion to build Rust code
- Improvement: Ignore MXF Caption Essence Container version byte to enhance SRT subtitle extraction compatibility
- New: Add tesseract page segmentation modes control with `--psm` flag
- Fix: Resolve compile-time error about implicit declarations (#1646)
- Fix: fatal out of memory error extracting from a VOB PS
- Fix: Unit Test Rust failing due to changes in Rust Version 1.86.0
- Fix: Support for MINGW-w64 cross compiling
- Fix: Build with ENABLE_FFMPEG to support ffmpeg 5
0.94 (2021-12-14)
-----------------

View File

@@ -1,3 +1,16 @@
# Installation
## Homebrew
The easiest way to install CCExtractor for Mac and Linux is through Homebrew:
```bash
brew install ccextractor
```
Note: If you don't have Homebrew installed, see [brew.sh](https://brew.sh/)
for installation instructions.
---
# Compiling CCExtractor
You may compile CCExtractor across all major platforms using `CMakeLists.txt` stored under `ccextractor/src/` directory. Autoconf and custom build scripts are also available. See platform specific instructions in the below sections.
@@ -10,6 +23,16 @@ Clone the latest repository from Github
git clone https://github.com/CCExtractor/ccextractor.git
```
### Hardsubx (Burned-in Subtitles) and FFmpeg Versions
CCExtractor's hardsubx feature extracts burned-in subtitles from videos using OCR. It requires FFmpeg libraries. The build system automatically selects appropriate FFmpeg versions for each platform:
- **Linux**: FFmpeg 6.x (default)
- **Windows**: FFmpeg 6.x (default)
- **macOS**: FFmpeg 8.x (default)
You can override the default by setting the `FFMPEG_VERSION` environment variable to `ffmpeg6`, `ffmpeg7`, or `ffmpeg8` before building. This flexibility ensures compatibility with different FFmpeg installations across platforms.
## Docker
You can now use docker image to build latest source of CCExtractor without any environmental hustle. Follow these [instructions](https://github.com/CCExtractor/ccextractor/tree/master/docker/README.md) for building docker image & usage of it.
@@ -33,6 +56,10 @@ Arch:
```bash
sudo paru -S glew glfw curl tesseract leptonica cmake gcc clang gpac
```
or
```bash
sudo pacman -S glew glfw curl tesseract leptonica cmake gcc clang gpac
```
Rust 1.54 or above is also required. [Install Rust](https://www.rust-lang.org/tools/install). Check specific compilation methods below, on how to compile without rust.
@@ -56,21 +83,26 @@ cd ccextractor/linux
# compile without debug flags
./build
# compile without rust
./build -without-rust
# compile with debug info
./build -debug # same as ./builddebug
# compile with hardsubx
[Optional] You need to set these environment variables correctly according to your machine,
FFMPEG_INCLUDE_DIR=/usr/include
FFMPEG_PKG_CONFIG_PATH=/usr/lib/pkgconfig
# compile with hardsubx (burned-in subtitle extraction)
# Hardsubx requires FFmpeg libraries. Different FFmpeg versions are used by default:
# - Linux: FFmpeg 6.x (automatic)
# - Windows: FFmpeg 6.x (automatic)
# - macOS: FFmpeg 8.x (automatic)
./build -hardsubx # same as ./build_hardsubx
./build -hardsubx # uses platform-specific FFmpeg version
# To override the default FFmpeg version, set FFMPEG_VERSION:
FFMPEG_VERSION=ffmpeg8 ./build -hardsubx # force FFmpeg 8 on any platform
FFMPEG_VERSION=ffmpeg6 ./build -hardsubx # force FFmpeg 6 on any platform
FFMPEG_VERSION=ffmpeg7 ./build -hardsubx # force FFmpeg 7 on any platform
# [Optional] For custom FFmpeg installations, set these environment variables:
FFMPEG_INCLUDE_DIR=/usr/include
FFMPEG_PKG_CONFIG_PATH=/usr/lib/pkgconfig
# compile in debug mode without rust
./build -debug -without-rust
# test your build
./ccextractor
@@ -82,7 +114,7 @@ cd ccextractor/linux
sudo apt-get install autoconf # dependency to generate configuration script
cd ccextractor/linux
./autogen.sh
./configure # OR ./configure --without-rust
./configure
make
# test your build
@@ -113,9 +145,15 @@ sudo make install
`cmake` also accepts the options:
`-DWITH_OCR=ON` to enable OCR
`-DWITH_HARDSUBX=ON` to enable burned-in subtitles
`-DWITH_HARDSUBX=ON` to enable burned-in subtitles (requires FFmpeg)
([OPTIONAL] For hardsubx, you also need to set these environment variables correctly according to your machine)
For hardsubx with specific FFmpeg versions:
Set `FFMPEG_VERSION=ffmpeg6` for FFmpeg 6.x (default on Linux and Windows)
Set `FFMPEG_VERSION=ffmpeg7` for FFmpeg 7.x
Set `FFMPEG_VERSION=ffmpeg8` for FFmpeg 8.x
(Defaults: Linux=FFmpeg 6, Windows=FFmpeg 6, macOS=FFmpeg 8)
([OPTIONAL] For custom FFmpeg installations, set these environment variables)
FFMPEG_INCLUDE_DIR=/usr/include
FFMPEG_PKG_CONFIG_PATH=/usr/lib/pkgconfig
@@ -136,6 +174,8 @@ brew install cmake gpac
# optional if you want OCR:
brew install tesseract
brew install leptonica
# optional if you want hardsubx (burned-in subtitle extraction):
brew install ffmpeg
```
If configuring OCR, use pkg-config to verify tesseract and leptonica dependencies, e.g.
@@ -151,7 +191,12 @@ pkg-config --exists --print-errors lept
```bash
cd ccextractor/mac
./build.command # OR ./build.command OCR
./build.command # basic build
./build.command -ocr # build with OCR support
./build.command -hardsubx # build with hardsubx (uses FFmpeg 8 by default on macOS)
# Override FFmpeg version if needed:
FFMPEG_VERSION=ffmpeg7 ./build.command -hardsubx
# test your build
./ccextractor
@@ -182,7 +227,7 @@ make
```bash
cd ccextractor/mac
./autogen.sh
./configure # OR ./configure --without-rust
./configure
make
# test your build
@@ -220,6 +265,12 @@ Other dependencies are required through vcpkg, so you can follow below steps:
```
vcpkg install ffmpeg leptonica tesseract --triplet x64-windows-static
```
Note: Windows builds use FFmpeg 6 by default. To override:
```
set FFMPEG_VERSION=ffmpeg8
msbuild ccextractor.sln /p:Configuration=Debug-Full /p:Platform=x64
```
otherwise if you have Debug, Release
```
vcpkg install libpng --triplet x64-windows-static

View File

@@ -54,6 +54,32 @@ To build the program with hardsubx support,
NOTE: The build has been tested with FFMpeg version 3.1.0, and Tesseract 3.04.
macOS
-----
Install the required dependencies using Homebrew:
brew install tesseract leptonica ffmpeg
To build the program with hardsubx support, use one of these methods:
== Using build.command (Recommended):
cd ccextractor/mac
./build.command -hardsubx
== Using autoconf:
cd ccextractor/mac
./autogen.sh
./configure --enable-hardsubx --enable-ocr
make
== Using cmake:
cd ccextractor
mkdir build && cd build
cmake -DWITH_OCR=ON -DWITH_HARDSUBX=ON ../src/
make
NOTE: The -hardsubx parameter uses a single dash (not --hardsubx).
Windows
-------

View File

@@ -26,6 +26,14 @@ Running ccextractor without parameters shows the help screen. Usage is
trivial - you just need to pass the input file and (optionally) some
details about the input and output files.
Example:
ccextractor input_video.ts
This command extracts subtitles from the input video file and generates a subtitle output file
(such as .srt) in the same directory.
## Languages
Usually English captions are transmitted in line 21 field 1 data,

129
docs/VOBSUB.md Normal file
View File

@@ -0,0 +1,129 @@
# VOBSUB Subtitle Extraction from MKV Files
CCExtractor supports extracting VOBSUB (S_VOBSUB) subtitles from Matroska (MKV) containers. VOBSUB is an image-based subtitle format originally from DVD video.
## Overview
VOBSUB subtitles consist of two files:
- `.idx` - Index file containing metadata, palette, and timestamp/position entries
- `.sub` - Binary file containing the actual subtitle bitmap data in MPEG Program Stream format
## Basic Usage
```bash
ccextractor movie.mkv
```
This will extract all VOBSUB tracks and create paired `.idx` and `.sub` files:
- `movie_eng.idx` + `movie_eng.sub` (first English track)
- `movie_eng_1.idx` + `movie_eng_1.sub` (second English track, if present)
- etc.
## Converting VOBSUB to SRT (Text)
Since VOBSUB subtitles are images, you need OCR (Optical Character Recognition) to convert them to text-based formats like SRT.
### Using subtile-ocr (Recommended)
[subtile-ocr](https://github.com/gwen-lg/subtile-ocr) is an actively maintained Rust tool that provides accurate OCR conversion.
#### Option 1: Docker (Easiest)
We provide a Dockerfile that builds subtile-ocr with all dependencies:
```bash
# Build the Docker image (one-time)
cd tools/vobsubocr
docker build -t subtile-ocr .
# Extract VOBSUB from MKV
ccextractor movie.mkv
# Convert to SRT using OCR
docker run --rm -v $(pwd):/data subtile-ocr -l eng -o /data/movie_eng.srt /data/movie_eng.idx
```
#### Option 2: Install subtile-ocr Natively
If you have Rust and Tesseract development libraries installed:
```bash
# Install dependencies (Ubuntu/Debian)
sudo apt-get install libleptonica-dev libtesseract-dev tesseract-ocr tesseract-ocr-eng
# Install subtile-ocr
cargo install --git https://github.com/gwen-lg/subtile-ocr
# Convert
subtile-ocr -l eng -o movie_eng.srt movie_eng.idx
```
### subtile-ocr Options
| Option | Description |
|--------|-------------|
| `-l, --lang <LANG>` | Tesseract language code (required). Examples: `eng`, `fra`, `deu`, `chi_sim` |
| `-o, --output <FILE>` | Output SRT file (stdout if not specified) |
| `-t, --threshold <0.0-1.0>` | Binarization threshold (default: 0.6) |
| `-d, --dpi <DPI>` | Image DPI for OCR (default: 150) |
| `--dump` | Save processed subtitle images as PNG files |
### Language Codes
Install additional Tesseract language packs as needed:
```bash
# Examples
sudo apt-get install tesseract-ocr-fra # French
sudo apt-get install tesseract-ocr-deu # German
sudo apt-get install tesseract-ocr-spa # Spanish
sudo apt-get install tesseract-ocr-chi-sim # Simplified Chinese
```
## Technical Details
### .idx File Format
The index file contains:
1. Header with metadata (size, palette, alignment settings)
2. Language identifier line
3. Timestamp entries with file positions
Example:
```
# VobSub index file, v7 (do not modify this line!)
size: 720x576
palette: 000000, 828282, ...
id: eng, index: 0
timestamp: 00:01:12:920, filepos: 000000000
timestamp: 00:01:18:640, filepos: 000000800
...
```
### .sub File Format
The binary file contains MPEG Program Stream packets:
- Each subtitle is wrapped in a PS Pack header (14 bytes) + PES header (15 bytes)
- Subtitles are aligned to 2048-byte boundaries
- Contains raw SPU (SubPicture Unit) bitmap data
## Troubleshooting
### Empty output files
- Ensure the MKV file actually contains VOBSUB tracks (check with `mediainfo` or `ffprobe`)
- CCExtractor will report "No VOBSUB subtitles to write" if the track is empty
### OCR quality issues
- Try adjusting the `-t` threshold parameter
- Ensure the correct language pack is installed
- Use `--dump` to inspect the processed images
### Docker permission issues
- The output files may be owned by root; use `sudo chown` to fix ownership
- Or run Docker with `--user $(id -u):$(id -g)`
## See Also
- [OCR.md](OCR.md) - General OCR support in CCExtractor
- [subtile-ocr GitHub](https://github.com/gwen-lg/subtile-ocr) - OCR tool documentation

137
docs/build-wsl.md Normal file
View File

@@ -0,0 +1,137 @@
# Building CCExtractor on Windows using WSL
This guide explains how to build CCExtractor on Windows using WSL (Ubuntu).
It is based on a fresh setup and includes all required dependencies and
common build issues encountered during compilation.
---
## Prerequisites
- Windows 10 or Windows 11
- WSL enabled
- Ubuntu installed via Microsoft Store
---
## Install WSL and Ubuntu
From PowerShell (run as Administrator):
```powershell
wsl --install -d Ubuntu
```
Restart the system if prompted, then launch Ubuntu from the Start menu.
---
## Update system packages
```bash
sudo apt update
```
---
## Install basic build tools
```bash
sudo apt install -y build-essential git pkg-config
```
---
## Install Rust (required)
CCExtractor includes Rust components, so Rust and Cargo are required.
```bash
curl https://sh.rustup.rs -sSf | sh
source ~/.cargo/env
```
Verify installation:
```bash
cargo --version
rustc --version
```
---
## Install required libraries
```bash
sudo apt install -y \
libclang-dev clang \
libtesseract-dev tesseract-ocr \
libgpac-dev
```
---
## Clone the repository
```bash
git clone https://github.com/CCExtractor/ccextractor.git
cd ccextractor
```
---
## Build CCExtractor
```bash
cd linux
./build
```
After a successful build, verify by running:
```bash
./ccextractor
```
You should see the help/usage output.
---
## Common build issues
### cargo: command not found
```bash
source ~/.cargo/env
```
---
### Unable to find libclang
```bash
sudo apt install libclang-dev clang
```
---
### gpac/isomedia.h: No such file or directory
```bash
sudo apt install libgpac-dev
```
---
### please install tesseract development library
```bash
sudo apt install libtesseract-dev tesseract-ocr
```
---
## Notes
- Compiler warnings during the build process are expected and do not indicate failure.
- This guide was tested on Ubuntu (WSL) running on Windows 11.

View File

@@ -151,6 +151,8 @@ ccextractor_SOURCES = \
../src/lib_ccx/list.h \
../src/lib_ccx/matroska.c \
../src/lib_ccx/matroska.h \
../src/lib_ccx/vobsub_decoder.c \
../src/lib_ccx/vobsub_decoder.h \
../src/lib_ccx/mp4.c \
../src/lib_ccx/myth.c \
../src/lib_ccx/networking.c \
@@ -294,12 +296,18 @@ ccextractor_CPPFLAGS+= ${libavformat_CFLAGS}
ccextractor_CPPFLAGS+= ${libavfilter_CFLAGS}
ccextractor_CPPFLAGS+= ${libavutil_CFALGS}
ccextractor_CPPFLAGS+= ${libswscale_CFLAGS}
# HARDSUBX requires tesseract/leptonica for OCR (same as OCR feature)
ccextractor_CPPFLAGS+= ${tesseract_CFLAGS}
ccextractor_CPPFLAGS+= ${lept_CFLAGS}
AV_LIB = ${libavcodec_LIBS}
AV_LIB += ${libavformat_LIBS}
AV_LIB += ${libavfilter_LIBS}
AV_LIB += ${libavutil_LIBS}
AV_LIB += ${libswscale_LIBS}
ccextractor_LDADD += $(AV_LIB)
# HARDSUBX requires tesseract/leptonica libs for OCR
ccextractor_LDADD += ${tesseract_LIBS}
ccextractor_LDADD += ${lept_LIBS}
HARDSUBX_FEATURE_RUST += --features "hardsubx_ocr"
endif

View File

@@ -2,6 +2,7 @@
RUST_LIB="rust/release/libccx_rust.a"
RUST_PROFILE="--release"
USE_SYSTEM_LIBS=false
while [[ $# -gt 0 ]]; do
case $1 in
-debug)
@@ -13,11 +14,20 @@ while [[ $# -gt 0 ]]; do
;;
-hardsubx)
HARDSUBX=true
RUST_FEATURES="--features hardsubx_ocr"
# Allow overriding FFmpeg version via environment variable
if [ -n "$FFMPEG_VERSION" ]; then
RUST_FEATURES="--features hardsubx_ocr,$FFMPEG_VERSION"
else
RUST_FEATURES="--features hardsubx_ocr"
fi
BLD_FLAGS="$BLD_FLAGS -DENABLE_HARDSUBX"
BLD_LINKER="$BLD_LINKER -lswscale -lavutil -pthread -lavformat -lavcodec -lavfilter -lxcb-shm -lxcb -lX11 -llzma -lswresample"
shift
;;
-system-libs)
USE_SYSTEM_LIBS=true
shift
;;
-*)
echo "Unknown option $1"
exit 1
@@ -25,7 +35,42 @@ while [[ $# -gt 0 ]]; do
esac
done
BLD_FLAGS="$BLD_FLAGS -std=gnu99 -Wno-write-strings -Wno-pointer-sign -D_FILE_OFFSET_BITS=64 -DVERSION_FILE_PRESENT -DENABLE_OCR -DFT2_BUILD_LIBRARY -DGPAC_DISABLE_VTT -DGPAC_DISABLE_OD_DUMP -DGPAC_DISABLE_REMOTERY -DNO_GZIP"
if [ "$USE_SYSTEM_LIBS" = true ]; then
command -v pkg-config >/dev/null || {
echo "Error: pkg-config is required for -system-libs mode"
exit 1
}
MISSING=""
for lib in libpng zlib freetype2 libutf8proc; do
if ! pkg-config --exists "$lib" 2>/dev/null; then
MISSING="$MISSING $lib"
fi
done
if [ -n "$MISSING" ]; then
echo "Error: Missing required system libraries:$MISSING"
echo ""
echo "On Debian/Ubuntu: sudo apt install libpng-dev zlib1g-dev libfreetype-dev libutf8proc-dev"
exit 1
fi
for hdr in leptonica/allheaders.h tesseract/capi.h; do
if ! echo "#include <$hdr>" | gcc -E - >/dev/null 2>&1; then
echo "Error: Missing headers for <$hdr>"
echo "On Debian/Ubuntu: sudo apt install libleptonica-dev libtesseract-dev"
exit 1
fi
done
PKG_CFLAGS="$(pkg-config --cflags libpng zlib freetype2 libutf8proc)"
PKG_LIBS="$(pkg-config --libs libpng zlib freetype2 libutf8proc)"
fi
BLD_FLAGS="$BLD_FLAGS -std=gnu99 -Wno-write-strings -Wno-pointer-sign -D_FILE_OFFSET_BITS=64 -DVERSION_FILE_PRESENT -DENABLE_OCR -DGPAC_DISABLE_VTT -DGPAC_DISABLE_OD_DUMP -DGPAC_DISABLE_REMOTERY -DNO_GZIP"
if [ "$USE_SYSTEM_LIBS" != true ]; then
BLD_FLAGS="$BLD_FLAGS -DFT2_BUILD_LIBRARY"
fi
bit_os=$(getconf LONG_BIT)
if [ "$bit_os" == "64" ]
then
@@ -35,7 +80,7 @@ BLD_INCLUDE="-I../src -I /usr/include/leptonica/ -I /usr/include/tesseract/ -I..
SRC_LIBPNG="$(find ../src/thirdparty/libpng/ -name '*.c')"
SRC_ZLIB="$(find ../src/thirdparty/zlib/ -name '*.c')"
SRC_CCX="$(find ../src/lib_ccx/ -name '*.c')"
SRC_GPAC="$(find /usr/include/gpac/ -name '*.c')"
SRC_GPAC="$(find /usr/include/gpac/ -name '*.c' 2>/dev/null)"
SRC_HASH="$(find ../src/thirdparty/lib_hash/ -name '*.c')"
SRC_UTF8PROC="../src/thirdparty/utf8proc/utf8proc.c"
SRC_FREETYPE="../src/thirdparty/freetype/autofit/autofit.c
@@ -82,6 +127,24 @@ SRC_FREETYPE="../src/thirdparty/freetype/autofit/autofit.c
BLD_SOURCES="../src/ccextractor.c $SRC_CCX $SRC_GPAC $SRC_ZLIB $SRC_LIBPNG $SRC_HASH $SRC_UTF8PROC $SRC_FREETYPE"
BLD_LINKER="$BLD_LINKER -lm -zmuldefs -l tesseract -l leptonica -lpthread -ldl -lgpac"
if [ "$USE_SYSTEM_LIBS" = true ]; then
LEPTONICA_CFLAGS="$(pkg-config --cflags --silence-errors lept)"
TESSERACT_CFLAGS="$(pkg-config --cflags --silence-errors tesseract)"
GPAC_CFLAGS="$(pkg-config --cflags --silence-errors gpac)"
BLD_INCLUDE="-I../src -I../src/lib_ccx -I../src/lib_ccx/zvbi -I../src/thirdparty/lib_hash \
$PKG_CFLAGS $LEPTONICA_CFLAGS $TESSERACT_CFLAGS $GPAC_CFLAGS"
BLD_SOURCES="../src/ccextractor.c $SRC_CCX $SRC_HASH"
# Preserve FFmpeg libraries if -hardsubx was specified
FFMPEG_LIBS=""
if [ "$HARDSUBX" = true ]; then
FFMPEG_LIBS="-lswscale -lavutil -pthread -lavformat -lavcodec -lavfilter -lxcb-shm -lxcb -lX11 -llzma -lswresample"
fi
BLD_LINKER="$PKG_LIBS -ltesseract -lleptonica -lgpac -lpthread -ldl -lm $FFMPEG_LIBS"
fi
echo "Running pre-build script..."
./pre-build.sh
echo "Trying to compile..."
@@ -96,7 +159,7 @@ fi
rustc_version="$(rustc --version)"
semver=( ${rustc_version//./ } )
version="${semver[1]}.${semver[2]}.${semver[3]}"
MSRV="1.54.0"
MSRV="1.87.0"
if [ "$(printf '%s\n' "$MSRV" "$version" | sort -V | head -n1)" = "$MSRV" ]; then
echo "rustc >= MSRV(${MSRV})"
else
@@ -144,3 +207,7 @@ if [[ "$out" != "" ]] ; then
else
echo "Compilation successful, no compiler messages."
fi
if [ -d ./utf8proc_compat ]; then
rm -rf ./utf8proc_compat
fi

View File

@@ -1,63 +1,230 @@
#!/bin/bash
#
# CCExtractor AppImage Build Script
#
# Build variants via BUILD_TYPE environment variable:
# - minimal: Basic CCExtractor without OCR (smallest size)
# - ocr: CCExtractor with OCR support (default)
# - hardsubx: CCExtractor with burned-in subtitle extraction (requires FFmpeg)
#
# Usage:
# ./build_appimage.sh # Builds 'ocr' variant (default)
# BUILD_TYPE=minimal ./build_appimage.sh
# BUILD_TYPE=hardsubx ./build_appimage.sh
#
# Requirements:
# - CMake, GCC, pkg-config, Rust toolchain
# - For OCR: tesseract-ocr, libtesseract-dev, libleptonica-dev
# - For HardSubX: libavcodec-dev, libavformat-dev, libswscale-dev, etc.
# - wget for downloading linuxdeploy
#
set -x
set -e
# store the path of where the script is
OLD_CWD=$(readlink -f .)
# Build type: minimal, ocr, hardsubx (default: ocr)
BUILD_TYPE="${BUILD_TYPE:-ocr}"
# store repo root as variable
REPO_ROOT=$(dirname $OLD_CWD)
echo "=========================================="
echo "CCExtractor AppImage Builder"
echo "Build type: $BUILD_TYPE"
echo "=========================================="
# Make a temp directory for building stuff which will be cleaned automatically
BUILD_DIR="$OLD_CWD/temp"
# Validate build type
case "$BUILD_TYPE" in
minimal|ocr|hardsubx)
;;
*)
echo "Error: Invalid BUILD_TYPE '$BUILD_TYPE'"
echo "Valid options: minimal, ocr, hardsubx"
exit 1
;;
esac
# Check if temp directory exist, and if so then remove contents from it
# if not then create temp directory
if [ -d "$BUILD_DIR" ]; then
rm -r "$BUILD_DIR/*" | true
else
mkdir -p "$BUILD_DIR"
fi
# Store paths
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
REPO_ROOT="$(dirname "$SCRIPT_DIR")"
BUILD_DIR="$SCRIPT_DIR/appimage_build"
# make sure to clean up build dir, even if errors occur
# Clean up function
cleanup() {
if [ -d "$BUILD_DIR" ]; then
rm -rf "$BUILD_DIR"
fi
if [ -d "$BUILD_DIR" ]; then
echo "Cleaning up build directory..."
rm -rf "$BUILD_DIR"
fi
}
# Automatically trigger Cleanup function
# Cleanup on exit (comment out for debugging)
trap cleanup EXIT
# switch to build dir
pushd "$BUILD_DIR"
# Create fresh build directory
rm -rf "$BUILD_DIR" 2>/dev/null || true
mkdir -p "$BUILD_DIR"
# configure build files with CMake
# we need to explicitly set the install prefix, as CMake's default is /usr/local for some reason...
cmake "$REPO_ROOT/src"
cd "$BUILD_DIR"
# build project and install files into AppDir
make -j$(nproc) ENABLE_OCR=yes
# Determine CMake options based on build type
CMAKE_OPTIONS=""
case "$BUILD_TYPE" in
minimal)
CMAKE_OPTIONS=""
;;
ocr)
CMAKE_OPTIONS="-DWITH_OCR=ON"
;;
hardsubx)
CMAKE_OPTIONS="-DWITH_OCR=ON -DWITH_HARDSUBX=ON -DWITH_FFMPEG=ON"
;;
esac
# download linuxdeploy tool
wget https://github.com/linuxdeploy/linuxdeploy/releases/download/continuous/linuxdeploy-x86_64.AppImage
echo "CMake options: $CMAKE_OPTIONS"
# make them executable
chmod +x linuxdeploy*.AppImage
# Configure with CMake
echo "Configuring with CMake..."
cmake $CMAKE_OPTIONS "$REPO_ROOT/src"
# Create AppDir
mkdir -p "$BUILD_DIR/AppDir"
# Build
echo "Building CCExtractor..."
make -j$(nproc)
# Link of CCExtractor image of any of these resolution(8x8, 16x16, 20x20, 22x22, 24x24, 28x28, 32x32, 36x36, 42x42,
# 48x48, 64x64, 72x72, 96x96, 128x128, 160x160, 192x192, 256x256, 384x384, 480x480, 512x512) in png extension
PNG_LINK="https://ccextractor.org/images/ccextractor.png"
# Verify binary was built
if [ ! -f "$BUILD_DIR/ccextractor" ]; then
echo "Error: ccextractor binary not found after build"
exit 1
fi
# Download the image and put it in AppDir
wget "$PNG_LINK" -P AppDir
echo "Build successful!"
"$BUILD_DIR/ccextractor" --version
# now, build AppImage using linuxdeploy
./linuxdeploy-x86_64.AppImage --appdir=AppDir -e ccextractor --create-desktop-file --output appimage -i AppDir/ccextractor.png
# Download linuxdeploy
echo "Downloading linuxdeploy..."
LINUXDEPLOY_URL="https://github.com/linuxdeploy/linuxdeploy/releases/download/continuous/linuxdeploy-x86_64.AppImage"
wget -q --show-progress "$LINUXDEPLOY_URL" -O linuxdeploy-x86_64.AppImage
chmod +x linuxdeploy-x86_64.AppImage
# Move resulted AppImage binary to base directory
mv ccextractor*.AppImage "$OLD_CWD"
# Create AppDir structure
echo "Creating AppDir structure..."
mkdir -p AppDir/usr/bin
mkdir -p AppDir/usr/share/icons/hicolor/256x256/apps
mkdir -p AppDir/usr/share/applications
mkdir -p AppDir/usr/share/tessdata
# Copy binary
cp "$BUILD_DIR/ccextractor" AppDir/usr/bin/
# Download icon
echo "Downloading icon..."
PNG_URL="https://ccextractor.org/images/ccextractor.png"
if wget -q "$PNG_URL" -O AppDir/usr/share/icons/hicolor/256x256/apps/ccextractor.png 2>/dev/null; then
echo "Icon downloaded successfully"
else
# Create a simple placeholder icon if download fails
echo "Warning: Could not download icon, creating placeholder"
convert -size 256x256 xc:navy -fill white -gravity center -pointsize 40 -annotate 0 "CCX" \
AppDir/usr/share/icons/hicolor/256x256/apps/ccextractor.png 2>/dev/null || \
echo "P3 256 256 255" > AppDir/usr/share/icons/hicolor/256x256/apps/ccextractor.ppm
fi
# Create desktop file
cat > AppDir/usr/share/applications/ccextractor.desktop << 'EOF'
[Desktop Entry]
Type=Application
Name=CCExtractor
Comment=Extract closed captions and subtitles from video files
Exec=ccextractor
Icon=ccextractor
Categories=AudioVideo;Video;
Terminal=true
NoDisplay=true
EOF
# Copy desktop file to AppDir root (required by linuxdeploy)
cp AppDir/usr/share/applications/ccextractor.desktop AppDir/
# Copy icon to AppDir root
cp AppDir/usr/share/icons/hicolor/256x256/apps/ccextractor.png AppDir/ 2>/dev/null || true
# For OCR builds, bundle tessdata
if [ "$BUILD_TYPE" = "ocr" ] || [ "$BUILD_TYPE" = "hardsubx" ]; then
echo "Bundling tessdata for OCR support..."
# Try to find system tessdata
TESSDATA_PATHS=(
"/usr/share/tesseract-ocr/5/tessdata"
"/usr/share/tesseract-ocr/4.00/tessdata"
"/usr/share/tessdata"
"/usr/local/share/tessdata"
)
TESSDATA_SRC=""
for path in "${TESSDATA_PATHS[@]}"; do
if [ -d "$path" ] && [ -f "$path/eng.traineddata" ]; then
TESSDATA_SRC="$path"
break
fi
done
if [ -n "$TESSDATA_SRC" ]; then
echo "Found tessdata at: $TESSDATA_SRC"
# Copy English language data (most common)
cp "$TESSDATA_SRC/eng.traineddata" AppDir/usr/share/tessdata/ 2>/dev/null || true
# Copy OSD (orientation and script detection) if available
cp "$TESSDATA_SRC/osd.traineddata" AppDir/usr/share/tessdata/ 2>/dev/null || true
else
echo "Warning: tessdata not found, downloading English language data..."
wget -q "https://github.com/tesseract-ocr/tessdata/raw/main/eng.traineddata" \
-O AppDir/usr/share/tessdata/eng.traineddata || true
fi
# Create wrapper script that sets TESSDATA_PREFIX
mv AppDir/usr/bin/ccextractor AppDir/usr/bin/ccextractor.bin
cat > AppDir/usr/bin/ccextractor << 'WRAPPER'
#!/bin/bash
SELF_DIR="$(dirname "$(readlink -f "$0")")"
export TESSDATA_PREFIX="${SELF_DIR}/../share/tessdata"
exec "${SELF_DIR}/ccextractor.bin" "$@"
WRAPPER
chmod +x AppDir/usr/bin/ccextractor
fi
# Determine output name based on build type
ARCH="x86_64"
case "$BUILD_TYPE" in
minimal)
OUTPUT_NAME="ccextractor-minimal-${ARCH}.AppImage"
;;
ocr)
OUTPUT_NAME="ccextractor-${ARCH}.AppImage"
;;
hardsubx)
OUTPUT_NAME="ccextractor-hardsubx-${ARCH}.AppImage"
;;
esac
# Build AppImage
echo "Building AppImage..."
export OUTPUT="$OUTPUT_NAME"
# Determine which executable to pass to linuxdeploy
# For OCR builds, we have a wrapper script, so pass the actual binary (.bin)
if [ -f "AppDir/usr/bin/ccextractor.bin" ]; then
LINUXDEPLOY_EXEC="AppDir/usr/bin/ccextractor.bin"
else
LINUXDEPLOY_EXEC="AppDir/usr/bin/ccextractor"
fi
./linuxdeploy-x86_64.AppImage \
--appdir=AppDir \
--executable="$LINUXDEPLOY_EXEC" \
--desktop-file=AppDir/ccextractor.desktop \
--icon-file=AppDir/ccextractor.png \
--output=appimage
# Move to output directory
mv "$OUTPUT_NAME" "$SCRIPT_DIR/"
echo "=========================================="
echo "AppImage built successfully!"
echo "Output: $SCRIPT_DIR/$OUTPUT_NAME"
echo ""
echo "Test with: $SCRIPT_DIR/$OUTPUT_NAME --version"
echo "=========================================="

View File

@@ -2,7 +2,7 @@
# Process this file with autoconf to produce a configure script.
AC_PREREQ([2.71])
AC_INIT([CCExtractor], [0.94], [carlos@ccextractor.org])
AC_INIT([CCExtractor], [0.96.5], [carlos@ccextractor.org])
AC_CONFIG_AUX_DIR([build-conf])
AC_CONFIG_SRCDIR([../src/ccextractor.c])
AM_INIT_AUTOMAKE([foreign subdir-objects])
@@ -32,6 +32,11 @@ AC_CHECK_LIB([avformat], [avformat_version], [HAS_AVFORMAT=1 && PKG_CHECK_MODULE
AC_CHECK_LIB([avutil], [avutil_version], [HAS_AVUTIL=1 && PKG_CHECK_MODULES([libavutil], [libavutil])], [HAS_AVUTIL=0])
AC_CHECK_LIB([swscale], [swscale_version], [HAS_SWSCALE=1 && PKG_CHECK_MODULES([libswscale], [libswscale])], [HAS_SWSCALE=0])
# Check for GPAC library (required for MP4 support)
PKG_CHECK_MODULES([gpac], [gpac], [HAS_GPAC=1], [HAS_GPAC=0])
AS_IF([test $HAS_GPAC -eq 0],
[AC_MSG_ERROR([GPAC library not found. Install gpac-devel (Fedora/RHEL), libgpac-dev (Debian/Ubuntu), or gpac (Arch) before proceeding.])])
# Checks for header files.
AC_CHECK_HEADERS([arpa/inet.h fcntl.h float.h inttypes.h limits.h locale.h malloc.h netdb.h netinet/in.h stddef.h stdint.h stdlib.h string.h sys/socket.h sys/time.h sys/timeb.h termios.h unistd.h wchar.h])
@@ -104,7 +109,7 @@ if test "x$with_rust" = "xyes" ; then
AS_IF([test "$RUSTC" = "notfound"], [AC_MSG_ERROR([rustc is required])])
rustc_version=$(rustc --version)
MSRV="1.54.0"
MSRV="1.87.0"
AX_COMPARE_VERSION($rustc_version, [ge], [$MSRV],
[AC_MSG_RESULT(rustc >= $MSRV)],
[AC_MSG_ERROR([Minimum supported rust version(MSRV) is $MSRV, please upgrade rust])])
@@ -149,7 +154,7 @@ AS_IF([ (test x$ocr = xtrue || test x$hardsubx = xtrue) && test ! $HAS_LEPT -gt
AM_CONDITIONAL(HARDSUBX_IS_ENABLED, [ test x$hardsubx = xtrue ])
AM_CONDITIONAL(OCR_IS_ENABLED, [ test x$ocr = xtrue || test x$hardsubx = xtrue ])
AM_CONDITIONAL(FFMPEG_IS_ENABLED, [ test x$ffmpeg = xtrue ])
AM_CONDITIONAL(TESSERACT_PRESENT, [ test ! -z $(pkg-config --libs-only-l --silence-errors tesseract) ])
AM_CONDITIONAL(TESSERACT_PRESENT, [ test ! -z "$(pkg-config --libs-only-l --silence-errors tesseract)" ])
AM_CONDITIONAL(TESSERACT_PRESENT_RPI, [ test -d "/usr/include/tesseract" && test $(ls -A /usr/include/tesseract | wc -l) -gt 0 ])
AM_CONDITIONAL(SYS_IS_LINUX, [ test $(uname -s) = "Linux"])
AM_CONDITIONAL(SYS_IS_MAC, [ test $(uname -s) = "Darwin"])

View File

@@ -123,6 +123,8 @@ ccextractor_SOURCES = \
../src/lib_ccx/list.h \
../src/lib_ccx/matroska.c \
../src/lib_ccx/matroska.h \
../src/lib_ccx/vobsub_decoder.c \
../src/lib_ccx/vobsub_decoder.h \
../src/lib_ccx/mp4.c \
../src/lib_ccx/myth.c \
../src/lib_ccx/networking.c \

View File

@@ -20,7 +20,19 @@ while [[ $# -gt 0 ]]; do
;;
-hardsubx)
HARDSUBX=true
RUST_FEATURES="--features hardsubx_ocr"
ENABLE_OCR=true
# Allow overriding FFmpeg version via environment variable
if [ -n "$FFMPEG_VERSION" ]; then
RUST_FEATURES="--features hardsubx_ocr,$FFMPEG_VERSION"
else
RUST_FEATURES="--features hardsubx_ocr"
fi
shift
;;
-system-libs)
# Use system-installed libraries via pkg-config instead of bundled ones
# This is required for Homebrew formula compatibility
USE_SYSTEM_LIBS=true
shift
;;
-*)
@@ -30,7 +42,21 @@ while [[ $# -gt 0 ]]; do
esac
done
BLD_FLAGS="-std=gnu99 -Wno-write-strings -Wno-pointer-sign -D_FILE_OFFSET_BITS=64 -DVERSION_FILE_PRESENT -Dfopen64=fopen -Dopen64=open -Dlseek64=lseek -DFT2_BUILD_LIBRARY -DGPAC_DISABLE_VTT -DGPAC_DISABLE_OD_DUMP -DGPAC_DISABLE_REMOTERY -DNO_GZIP"
# Determine architecture based on cargo (to ensure consistency with Rust part)
CARGO_ARCH=$(file $(which cargo) | grep -o 'x86_64\|arm64')
if [[ "$CARGO_ARCH" == "x86_64" ]]; then
echo "Detected Intel (x86_64) Cargo. Forcing x86_64 build to match Rust and libraries..."
BLD_ARCH="-arch x86_64"
else
BLD_ARCH="-arch arm64"
fi
BLD_FLAGS="$BLD_ARCH -std=gnu99 -Wno-write-strings -Wno-pointer-sign -D_FILE_OFFSET_BITS=64 -DVERSION_FILE_PRESENT -Dfopen64=fopen -Dopen64=open -Dlseek64=lseek"
# Add flags for bundled libraries (not needed when using system libs)
if [[ "$USE_SYSTEM_LIBS" != "true" ]]; then
BLD_FLAGS="$BLD_FLAGS -DFT2_BUILD_LIBRARY -DGPAC_DISABLE_VTT -DGPAC_DISABLE_OD_DUMP -DGPAC_DISABLE_REMOTERY -DNO_GZIP"
fi
# Add debug flags if needed
if [[ "$DEBUG" == "true" ]]; then
@@ -47,7 +73,68 @@ if [[ "$HARDSUBX" == "true" ]]; then
BLD_FLAGS="$BLD_FLAGS -DENABLE_HARDSUBX"
fi
BLD_INCLUDE="-I../src/ -I../src/lib_ccx -I../src/lib_hash -I../src/thirdparty/libpng -I../src/thirdparty -I../src/thirdparty/zlib -I../src/thirdparty/freetype/include `pkg-config --cflags --silence-errors gpac`"
# Set up include paths based on whether we're using system libs or bundled
if [[ "$USE_SYSTEM_LIBS" == "true" ]]; then
# Use system libraries via pkg-config (for Homebrew compatibility)
# Note: -I../src/thirdparty/lib_hash is needed so that "../lib_hash/sha2.h" resolves correctly
# (the .. goes up from lib_hash to thirdparty, then lib_hash/sha2.h finds the file)
BLD_INCLUDE="-I../src/ -I../src/lib_ccx -I../src/thirdparty/lib_hash -I../src/thirdparty"
BLD_INCLUDE="$BLD_INCLUDE $(pkg-config --cflags --silence-errors freetype2)"
BLD_INCLUDE="$BLD_INCLUDE $(pkg-config --cflags --silence-errors gpac)"
BLD_INCLUDE="$BLD_INCLUDE $(pkg-config --cflags --silence-errors libpng)"
BLD_INCLUDE="$BLD_INCLUDE $(pkg-config --cflags --silence-errors libprotobuf-c)"
BLD_INCLUDE="$BLD_INCLUDE $(pkg-config --cflags --silence-errors libutf8proc)"
else
# Use bundled libraries (default for standalone builds)
BLD_INCLUDE="-I../src/ -I../src/lib_ccx -I../src/thirdparty/lib_hash -I../src/thirdparty/libpng -I../src/thirdparty -I../src/thirdparty/zlib -I../src/thirdparty/freetype/include $(pkg-config --cflags --silence-errors gpac)"
fi
# Add FFmpeg include path for Mac
if [[ -d "/opt/homebrew/Cellar/ffmpeg" ]]; then
FFMPEG_VERSION=$(ls -1 /opt/homebrew/Cellar/ffmpeg | head -1)
if [[ -n "$FFMPEG_VERSION" ]]; then
BLD_INCLUDE="$BLD_INCLUDE -I/opt/homebrew/Cellar/ffmpeg/$FFMPEG_VERSION/include"
fi
elif [[ -d "/usr/local/Cellar/ffmpeg" ]]; then
FFMPEG_VERSION=$(ls -1 /usr/local/Cellar/ffmpeg | head -1)
if [[ -n "$FFMPEG_VERSION" ]]; then
BLD_INCLUDE="$BLD_INCLUDE -I/usr/local/Cellar/ffmpeg/$FFMPEG_VERSION/include"
fi
fi
# Add Leptonica include path for Mac
if [[ -d "/opt/homebrew/Cellar/leptonica" ]]; then
LEPT_VERSION=$(ls -1 /opt/homebrew/Cellar/leptonica | head -1)
if [[ -n "$LEPT_VERSION" ]]; then
BLD_INCLUDE="$BLD_INCLUDE -I/opt/homebrew/Cellar/leptonica/$LEPT_VERSION/include"
fi
elif [[ -d "/usr/local/Cellar/leptonica" ]]; then
LEPT_VERSION=$(ls -1 /usr/local/Cellar/leptonica | head -1)
if [[ -n "$LEPT_VERSION" ]]; then
BLD_INCLUDE="$BLD_INCLUDE -I/usr/local/Cellar/leptonica/$LEPT_VERSION/include"
fi
elif [[ -d "/opt/homebrew/include/leptonica" ]]; then
BLD_INCLUDE="$BLD_INCLUDE -I/opt/homebrew/include"
elif [[ -d "/usr/local/include/leptonica" ]]; then
BLD_INCLUDE="$BLD_INCLUDE -I/usr/local/include"
fi
# Add Tesseract include path for Mac
if [[ -d "/opt/homebrew/Cellar/tesseract" ]]; then
TESS_VERSION=$(ls -1 /opt/homebrew/Cellar/tesseract | head -1)
if [[ -n "$TESS_VERSION" ]]; then
BLD_INCLUDE="$BLD_INCLUDE -I/opt/homebrew/Cellar/tesseract/$TESS_VERSION/include"
fi
elif [[ -d "/usr/local/Cellar/tesseract" ]]; then
TESS_VERSION=$(ls -1 /usr/local/Cellar/tesseract | head -1)
if [[ -n "$TESS_VERSION" ]]; then
BLD_INCLUDE="$BLD_INCLUDE -I/usr/local/Cellar/tesseract/$TESS_VERSION/include"
fi
elif [[ -d "/opt/homebrew/include/tesseract" ]]; then
BLD_INCLUDE="$BLD_INCLUDE -I/opt/homebrew/include"
elif [[ -d "/usr/local/include/tesseract" ]]; then
BLD_INCLUDE="$BLD_INCLUDE -I/usr/local/include"
fi
if [[ "$ENABLE_OCR" == "true" ]]; then
BLD_INCLUDE="$BLD_INCLUDE `pkg-config --cflags --silence-errors tesseract`"
@@ -55,61 +142,111 @@ fi
SRC_CCX="$(find ../src/lib_ccx -name '*.c')"
SRC_LIB_HASH="$(find ../src/thirdparty/lib_hash -name '*.c')"
SRC_LIBPNG="$(find ../src/thirdparty/libpng -name '*.c')"
SRC_UTF8="../src/thirdparty/utf8proc/utf8proc.c"
SRC_ZLIB="$(find ../src/thirdparty/zlib -name '*.c')"
SRC_FREETYPE="../src/thirdparty/freetype/autofit/autofit.c \
../src/thirdparty/freetype/base/ftbase.c \
../src/thirdparty/freetype/base/ftbbox.c \
../src/thirdparty/freetype/base/ftbdf.c \
../src/thirdparty/freetype/base/ftbitmap.c \
../src/thirdparty/freetype/base/ftcid.c \
../src/thirdparty/freetype/base/ftfntfmt.c \
../src/thirdparty/freetype/base/ftfstype.c \
../src/thirdparty/freetype/base/ftgasp.c \
../src/thirdparty/freetype/base/ftglyph.c \
../src/thirdparty/freetype/base/ftgxval.c \
../src/thirdparty/freetype/base/ftinit.c \
../src/thirdparty/freetype/base/ftlcdfil.c \
../src/thirdparty/freetype/base/ftmm.c \
../src/thirdparty/freetype/base/ftotval.c \
../src/thirdparty/freetype/base/ftpatent.c \
../src/thirdparty/freetype/base/ftpfr.c \
../src/thirdparty/freetype/base/ftstroke.c \
../src/thirdparty/freetype/base/ftsynth.c \
../src/thirdparty/freetype/base/ftsystem.c \
../src/thirdparty/freetype/base/fttype1.c \
../src/thirdparty/freetype/base/ftwinfnt.c \
../src/thirdparty/freetype/bdf/bdf.c \
../src/thirdparty/freetype/bzip2/ftbzip2.c \
../src/thirdparty/freetype/cache/ftcache.c \
../src/thirdparty/freetype/cff/cff.c \
../src/thirdparty/freetype/cid/type1cid.c \
../src/thirdparty/freetype/gzip/ftgzip.c \
../src/thirdparty/freetype/lzw/ftlzw.c \
../src/thirdparty/freetype/pcf/pcf.c \
../src/thirdparty/freetype/pfr/pfr.c \
../src/thirdparty/freetype/psaux/psaux.c \
../src/thirdparty/freetype/pshinter/pshinter.c \
../src/thirdparty/freetype/psnames/psnames.c \
../src/thirdparty/freetype/raster/raster.c \
../src/thirdparty/freetype/sfnt/sfnt.c \
../src/thirdparty/freetype/smooth/smooth.c \
../src/thirdparty/freetype/truetype/truetype.c \
../src/thirdparty/freetype/type1/type1.c \
../src/thirdparty/freetype/type42/type42.c \
../src/thirdparty/freetype/winfonts/winfnt.c"
BLD_SOURCES="../src/ccextractor.c $SRC_CCX $SRC_LIB_HASH $SRC_LIBPNG $SRC_UTF8 $SRC_ZLIB $SRC_FREETYPE"
# Set up sources and linker based on whether we're using system libs or bundled
if [[ "$USE_SYSTEM_LIBS" == "true" ]]; then
# Use system libraries - don't compile bundled sources
BLD_SOURCES="../src/ccextractor.c $SRC_CCX $SRC_LIB_HASH"
BLD_LINKER="-lm -liconv -lpthread -ldl `pkg-config --libs --silence-errors gpac`"
BLD_LINKER="-lm -liconv -lpthread -ldl"
BLD_LINKER="$BLD_LINKER $(pkg-config --libs --silence-errors freetype2)"
BLD_LINKER="$BLD_LINKER $(pkg-config --libs --silence-errors gpac)"
BLD_LINKER="$BLD_LINKER $(pkg-config --libs --silence-errors libpng)"
BLD_LINKER="$BLD_LINKER $(pkg-config --libs --silence-errors libprotobuf-c)"
BLD_LINKER="$BLD_LINKER $(pkg-config --libs --silence-errors libutf8proc)"
BLD_LINKER="$BLD_LINKER $(pkg-config --libs --silence-errors zlib)"
else
# Use bundled libraries (default)
SRC_LIBPNG="$(find ../src/thirdparty/libpng -name '*.c')"
SRC_UTF8="../src/thirdparty/utf8proc/utf8proc.c"
SRC_ZLIB="$(find ../src/thirdparty/zlib -name '*.c')"
SRC_FREETYPE="../src/thirdparty/freetype/autofit/autofit.c \
../src/thirdparty/freetype/base/ftbase.c \
../src/thirdparty/freetype/base/ftbbox.c \
../src/thirdparty/freetype/base/ftbdf.c \
../src/thirdparty/freetype/base/ftbitmap.c \
../src/thirdparty/freetype/base/ftcid.c \
../src/thirdparty/freetype/base/ftfntfmt.c \
../src/thirdparty/freetype/base/ftfstype.c \
../src/thirdparty/freetype/base/ftgasp.c \
../src/thirdparty/freetype/base/ftglyph.c \
../src/thirdparty/freetype/base/ftgxval.c \
../src/thirdparty/freetype/base/ftinit.c \
../src/thirdparty/freetype/base/ftlcdfil.c \
../src/thirdparty/freetype/base/ftmm.c \
../src/thirdparty/freetype/base/ftotval.c \
../src/thirdparty/freetype/base/ftpatent.c \
../src/thirdparty/freetype/base/ftpfr.c \
../src/thirdparty/freetype/base/ftstroke.c \
../src/thirdparty/freetype/base/ftsynth.c \
../src/thirdparty/freetype/base/ftsystem.c \
../src/thirdparty/freetype/base/fttype1.c \
../src/thirdparty/freetype/base/ftwinfnt.c \
../src/thirdparty/freetype/bdf/bdf.c \
../src/thirdparty/freetype/bzip2/ftbzip2.c \
../src/thirdparty/freetype/cache/ftcache.c \
../src/thirdparty/freetype/cff/cff.c \
../src/thirdparty/freetype/cid/type1cid.c \
../src/thirdparty/freetype/gzip/ftgzip.c \
../src/thirdparty/freetype/lzw/ftlzw.c \
../src/thirdparty/freetype/pcf/pcf.c \
../src/thirdparty/freetype/pfr/pfr.c \
../src/thirdparty/freetype/psaux/psaux.c \
../src/thirdparty/freetype/pshinter/pshinter.c \
../src/thirdparty/freetype/psnames/psnames.c \
../src/thirdparty/freetype/raster/raster.c \
../src/thirdparty/freetype/sfnt/sfnt.c \
../src/thirdparty/freetype/smooth/smooth.c \
../src/thirdparty/freetype/truetype/truetype.c \
../src/thirdparty/freetype/type1/type1.c \
../src/thirdparty/freetype/type42/type42.c \
../src/thirdparty/freetype/winfonts/winfnt.c"
BLD_SOURCES="../src/ccextractor.c $SRC_CCX $SRC_LIB_HASH $SRC_LIBPNG $SRC_UTF8 $SRC_ZLIB $SRC_FREETYPE"
BLD_LINKER="-lm -liconv -lpthread -ldl $(pkg-config --libs --silence-errors gpac)"
fi
if [[ "$ENABLE_OCR" == "true" ]]; then
BLD_LINKER="$BLD_LINKER `pkg-config --libs --silence-errors tesseract` `pkg-config --libs --silence-errors lept`"
fi
if [[ "$HARDSUBX" == "true" ]]; then
BLD_LINKER="$BLD_LINKER -lswscale -lavutil -pthread -lavformat -lavcodec -lavfilter"
# Add FFmpeg library path for Mac
if [[ -d "/opt/homebrew/Cellar/ffmpeg" ]]; then
FFMPEG_VERSION=$(ls -1 /opt/homebrew/Cellar/ffmpeg | head -1)
if [[ -n "$FFMPEG_VERSION" ]]; then
BLD_LINKER="$BLD_LINKER -L/opt/homebrew/Cellar/ffmpeg/$FFMPEG_VERSION/lib"
fi
elif [[ -d "/usr/local/Cellar/ffmpeg" ]]; then
FFMPEG_VERSION=$(ls -1 /usr/local/Cellar/ffmpeg | head -1)
if [[ -n "$FFMPEG_VERSION" ]]; then
BLD_LINKER="$BLD_LINKER -L/usr/local/Cellar/ffmpeg/$FFMPEG_VERSION/lib"
fi
fi
# Add library paths for Leptonica and Tesseract from Cellar
if [[ -d "/opt/homebrew/Cellar/leptonica" ]]; then
LEPT_VERSION=$(ls -1 /opt/homebrew/Cellar/leptonica | head -1)
if [[ -n "$LEPT_VERSION" ]]; then
BLD_LINKER="$BLD_LINKER -L/opt/homebrew/Cellar/leptonica/$LEPT_VERSION/lib"
fi
fi
if [[ -d "/opt/homebrew/Cellar/tesseract" ]]; then
TESS_VERSION=$(ls -1 /opt/homebrew/Cellar/tesseract | head -1)
if [[ -n "$TESS_VERSION" ]]; then
BLD_LINKER="$BLD_LINKER -L/opt/homebrew/Cellar/tesseract/$TESS_VERSION/lib"
fi
fi
# Also add homebrew lib path as fallback
if [[ -d "/opt/homebrew/lib" ]]; then
BLD_LINKER="$BLD_LINKER -L/opt/homebrew/lib"
elif [[ -d "/usr/local/lib" ]]; then
BLD_LINKER="$BLD_LINKER -L/usr/local/lib"
fi
BLD_LINKER="$BLD_LINKER -lswscale -lavutil -pthread -lavformat -lavcodec -lavfilter -lleptonica -ltesseract"
fi
echo "Running pre-build script..."
@@ -127,7 +264,7 @@ fi
rustc_version="$(rustc --version)"
semver=( ${rustc_version//./ } )
version="${semver[1]}.${semver[2]}.${semver[3]}"
MSRV="1.54.0"
MSRV="1.87.0"
if [ "$(printf '%s\n' "$MSRV" "$version" | sort -V | head -n1)" = "$MSRV" ]; then
echo "rustc >= MSRV(${MSRV})"
else
@@ -180,4 +317,4 @@ if [[ "$out" != "" ]]; then
echo "Compilation successful, compiler message shown in previous lines"
else
echo "Compilation successful, no compiler messages."
fi
fi

View File

@@ -2,7 +2,7 @@
# Process this file with autoconf to produce a configure script.
AC_PREREQ([2.71])
AC_INIT([CCExtractor],[0.94],[carlos@ccextractor.org])
AC_INIT([CCExtractor],[0.96.5],[carlos@ccextractor.org])
AC_CONFIG_AUX_DIR([build-conf])
AC_CONFIG_SRCDIR([../src/ccextractor.c])
AM_INIT_AUTOMAKE([foreign subdir-objects])
@@ -25,7 +25,7 @@ fi
# Checks for libraries.
AC_CHECK_LIB([m], [sin], [], [AC_MSG_ERROR(Math library not installed. Install it before proceeding.)])
AC_CHECK_LIB([lept], [getLeptonicaVersion], [HAS_LEPT=1 && PKG_CHECK_MODULES([lept], [lept])], [HAS_LEPT=0])
AC_CHECK_LIB([leptonica], [getLeptonicaVersion], [HAS_LEPT=1 && PKG_CHECK_MODULES([lept], [lept])], [HAS_LEPT=0])
AC_CHECK_LIB([tesseract], [TessVersion], [HAS_TESSERACT=1 && PKG_CHECK_MODULES([tesseract], [tesseract])], [HAS_TESSERACT=0])
AC_CHECK_LIB([avcodec], [avcodec_version], [HAS_AVCODEC=1 && PKG_CHECK_MODULES([libavcodec], [libavcodec])], [HAS_AVCODEC=0])
AC_CHECK_LIB([avformat], [avformat_version], [HAS_AVFORMAT=1 && PKG_CHECK_MODULES([libavformat], [libavformat])], [HAS_AVFORMAT=0])
@@ -104,7 +104,7 @@ if test "x$with_rust" = "xyes" ; then
AS_IF([test "$RUSTC" = "notfound"], [AC_MSG_ERROR([rustc is required])])
rustc_version=$(rustc --version)
MSRV="1.54.0"
MSRV="1.87.0"
AX_COMPARE_VERSION($rustc_version, [ge], [$MSRV],
[AC_MSG_RESULT(rustc >= $MSRV)],
[AC_MSG_ERROR([Minimum supported rust version(MSRV) is $MSRV, please upgrade rust])])
@@ -148,7 +148,7 @@ AS_IF([ (test x$ocr = xtrue || test x$hardsubx = xtrue) && test ! $HAS_LEPT -gt
AM_CONDITIONAL(HARDSUBX_IS_ENABLED, [ test x$hardsubx = xtrue ])
AM_CONDITIONAL(OCR_IS_ENABLED, [ test x$ocr = xtrue || test x$hardsubx = xtrue ])
AM_CONDITIONAL(FFMPEG_IS_ENABLED, [ test x$ffmpeg = xtrue ])
AM_CONDITIONAL(TESSERACT_PRESENT, [ test ! -z $(pkg-config --libs-only-l --silence-errors tesseract) ])
AM_CONDITIONAL(TESSERACT_PRESENT, [ test ! -z "$(pkg-config --libs-only-l --silence-errors tesseract)" ])
AM_CONDITIONAL(TESSERACT_PRESENT_RPI, [ test -d "/usr/include/tesseract" && test $(ls -A /usr/include/tesseract | wc -l) -gt 0 ])
AM_CONDITIONAL(SYS_IS_LINUX, [ test $(uname -s) = "Linux"])
AM_CONDITIONAL(SYS_IS_MAC, [ test $(uname -s) = "Darwin"])

View File

@@ -1,5 +1,5 @@
pkgname=ccextractor
pkgver=0.94
pkgver=0.96.5
pkgrel=1
pkgdesc="A closed captions and teletext subtitles extractor for video streams."
arch=('i686' 'x86_64')

View File

@@ -1,5 +1,5 @@
Name: ccextractor
Version: 0.94
Version: 0.96.5
Release: 1
Summary: A closed captions and teletext subtitles extractor for video streams.
Group: Applications/Internet

View File

@@ -1,7 +1,7 @@
#!/bin/bash
TYPE="debian" # can be one of 'slackware', 'debian', 'rpm'
PROGRAM_NAME="ccextractor"
VERSION="0.94"
VERSION="0.96.5"
RELEASE="1"
LICENSE="GPL-2.0"
MAINTAINER="carlos@ccextractor.org"

96
packaging/README.md Normal file
View File

@@ -0,0 +1,96 @@
# CCExtractor Packaging
This directory contains packaging configurations for Windows package managers.
## Windows Package Manager (winget)
### Initial Setup (One-time)
1. **Calculate MSI hash** for the current release:
```powershell
certutil -hashfile CCExtractor.0.96.1.msi SHA256
```
2. **Update the manifest files** in `winget/` with the SHA256 hash
3. **Fork microsoft/winget-pkgs** to the CCExtractor organization:
- Go to https://github.com/microsoft/winget-pkgs
- Fork to https://github.com/CCExtractor/winget-pkgs
4. **Submit initial manifest** via PR:
- Clone your fork
- Create directory: `manifests/c/CCExtractor/CCExtractor/0.96.1/`
- Copy the three YAML files from `winget/`
- Submit PR to microsoft/winget-pkgs
5. **Create GitHub token** for automation:
- Go to GitHub Settings > Developer settings > Personal access tokens > Tokens (classic)
- Create token with `public_repo` scope
- Add as secret `WINGET_TOKEN` in CCExtractor/ccextractor repository
### Automated Updates
After the initial submission is merged, the `publish_winget.yml` workflow will automatically submit PRs for new releases.
## Chocolatey
### Initial Setup (One-time)
1. **Create Chocolatey account**:
- Register at https://community.chocolatey.org/account/Register
2. **Get API key**:
- Go to https://community.chocolatey.org/account
- Copy your API key
3. **Add secret**:
- Add `CHOCOLATEY_API_KEY` secret to CCExtractor/ccextractor repository
### Package Structure
```
chocolatey/
├── ccextractor.nuspec # Package metadata
└── tools/
├── chocolateyInstall.ps1 # Installation script
└── chocolateyUninstall.ps1 # Uninstallation script
```
### Manual Testing
```powershell
cd packaging/chocolatey
# Update version and checksum in files first, then:
choco pack ccextractor.nuspec
# Test locally
choco install ccextractor --source="'.'" --yes --force
# Verify
ccextractor --version
```
### Automated Updates
The `publish_chocolatey.yml` workflow automatically:
1. Downloads the MSI from the release
2. Calculates the SHA256 checksum
3. Updates the nuspec and install script
4. Builds and tests the package
5. Pushes to Chocolatey
Note: Chocolatey packages go through moderation before being publicly available.
## Workflow Triggers
Both workflows trigger on:
- **Release published**: Automatic publishing when a new release is created
- **Manual dispatch**: Can be triggered manually with a specific tag
## Secrets Required
| Secret | Purpose |
|--------|---------|
| `WINGET_TOKEN` | GitHub PAT with `public_repo` scope for winget PRs |
| `CHOCOLATEY_API_KEY` | Chocolatey API key for package uploads |

View File

@@ -0,0 +1,43 @@
<?xml version="1.0" encoding="utf-8"?>
<package xmlns="http://schemas.microsoft.com/packaging/2015/06/nuspec.xsd">
<metadata>
<id>ccextractor</id>
<version>0.96.5</version>
<title>CCExtractor</title>
<authors>CCExtractor Development Team</authors>
<owners>CCExtractor</owners>
<licenseUrl>https://github.com/CCExtractor/ccextractor/blob/master/LICENSE.txt</licenseUrl>
<projectUrl>https://ccextractor.org</projectUrl>
<iconUrl>https://raw.githubusercontent.com/CCExtractor/ccextractor/master/windows/CCX.ico</iconUrl>
<requireLicenseAcceptance>false</requireLicenseAcceptance>
<description>CCExtractor is a tool that analyzes video files and produces independent subtitle files from the closed captions data.
### Features
- Extracts closed captions from various video formats (MPEG, H.264, MKV, MP4, etc.)
- Supports multiple input sources including DVDs, DVRs, and live TV captures
- Outputs to multiple formats (SRT, WebVTT, SAMI, transcript, etc.)
- OCR support for bitmap-based subtitles (DVB, teletext)
- Includes a graphical user interface
### Usage
After installation, run `ccextractor` from the command line or use the GUI.
```
ccextractor video.ts -o output.srt
```
For more options: `ccextractor --help`
</description>
<summary>Extract closed captions and subtitles from video files</summary>
<releaseNotes>https://github.com/CCExtractor/ccextractor/releases</releaseNotes>
<copyright>Copyright (c) CCExtractor Development</copyright>
<tags>subtitles closed-captions video extraction accessibility srt dvb teletext ocr media cli</tags>
<projectSourceUrl>https://github.com/CCExtractor/ccextractor</projectSourceUrl>
<packageSourceUrl>https://github.com/CCExtractor/ccextractor/tree/master/packaging/chocolatey</packageSourceUrl>
<docsUrl>https://github.com/CCExtractor/ccextractor/wiki</docsUrl>
<bugTrackerUrl>https://github.com/CCExtractor/ccextractor/issues</bugTrackerUrl>
</metadata>
<files>
<file src="tools\**" target="tools" />
</files>
</package>

View File

@@ -0,0 +1,24 @@
$ErrorActionPreference = 'Stop'
$packageName = 'ccextractor'
$toolsDir = "$(Split-Path -parent $MyInvocation.MyCommand.Definition)"
# Package parameters
$packageArgs = @{
packageName = $packageName
fileType = 'MSI'
url64bit = 'https://github.com/CCExtractor/ccextractor/releases/download/v0.96.5/CCExtractor.0.96.5.msi'
checksum64 = 'FFCAB0D766180AFC2832277397CDEC885D15270DECE33A9A51947B790F1F095B'
checksumType64 = 'sha256'
silentArgs = '/quiet /norestart'
validExitCodes = @(0, 3010, 1641)
}
Install-ChocolateyPackage @packageArgs
# Add to PATH if not already there
$installPath = Join-Path $env:ProgramFiles 'CCExtractor'
if (Test-Path $installPath) {
Install-ChocolateyPath -PathToInstall $installPath -PathType 'Machine'
Write-Host "CCExtractor installed to: $installPath"
}

View File

@@ -0,0 +1,23 @@
$ErrorActionPreference = 'Stop'
$packageName = 'ccextractor'
# Get the uninstall registry key
$regKey = Get-UninstallRegistryKey -SoftwareName 'CCExtractor*'
if ($regKey) {
$silentArgs = '/quiet /norestart'
$file = $regKey.UninstallString -replace 'msiexec.exe','msiexec.exe ' -replace '/I','/X'
$packageArgs = @{
packageName = $packageName
fileType = 'MSI'
silentArgs = "$($regKey.PSChildName) $silentArgs"
file = ''
validExitCodes = @(0, 3010, 1605, 1614, 1641)
}
Uninstall-ChocolateyPackage @packageArgs
} else {
Write-Warning "CCExtractor was not found in the registry. It may have been uninstalled already."
}

View File

@@ -0,0 +1,21 @@
# yaml-language-server: $schema=https://aka.ms/winget-manifest.installer.1.9.0.schema.json
PackageIdentifier: CCExtractor.CCExtractor
PackageVersion: 0.96.5
Platform:
- Windows.Desktop
MinimumOSVersion: 10.0.0.0
InstallModes:
- interactive
- silent
- silentWithProgress
InstallerSwitches:
Silent: /quiet
SilentWithProgress: /passive
UpgradeBehavior: install
Installers:
- Architecture: x64
InstallerType: msi
InstallerUrl: https://github.com/CCExtractor/ccextractor/releases/download/v0.96.5/CCExtractor.0.96.5.msi
InstallerSha256: FFCAB0D766180AFC2832277397CDEC885D15270DECE33A9A51947B790F1F095B
ManifestType: installer
ManifestVersion: 1.9.0

View File

@@ -0,0 +1,39 @@
# yaml-language-server: $schema=https://aka.ms/winget-manifest.defaultLocale.1.9.0.schema.json
PackageIdentifier: CCExtractor.CCExtractor
PackageVersion: 0.96.5
PackageLocale: en-US
Publisher: CCExtractor Development
PublisherUrl: https://ccextractor.org
PublisherSupportUrl: https://github.com/CCExtractor/ccextractor/issues
Author: CCExtractor Development Team
PackageName: CCExtractor
PackageUrl: https://ccextractor.org
License: GPL-2.0
LicenseUrl: https://github.com/CCExtractor/ccextractor/blob/master/LICENSE.txt
Copyright: Copyright (c) CCExtractor Development
ShortDescription: A tool to extract subtitles from video files
Description: |-
CCExtractor is a tool that analyzes video files and produces independent subtitle files from the closed captions data.
Key features:
- Extracts closed captions from various video formats (MPEG, H.264, MKV, MP4, etc.)
- Supports multiple input sources including DVDs, DVRs, and live TV captures
- Outputs to multiple formats (SRT, WebVTT, SAMI, transcript, etc.)
- OCR support for bitmap-based subtitles (DVB, teletext)
- Cross-platform (Windows, Linux, macOS)
- Includes a GUI for easy operation
Moniker: ccextractor
Tags:
- subtitles
- closed-captions
- video
- extraction
- accessibility
- srt
- dvb
- teletext
- ocr
- media
ReleaseNotesUrl: https://github.com/CCExtractor/ccextractor/releases
ManifestType: defaultLocale
ManifestVersion: 1.9.0

View File

@@ -0,0 +1,6 @@
# yaml-language-server: $schema=https://aka.ms/winget-manifest.version.1.9.0.schema.json
PackageIdentifier: CCExtractor.CCExtractor
PackageVersion: 0.96.5
DefaultLocale: en-US
ManifestType: version
ManifestVersion: 1.9.0

19
snap/local/run-ccextractor.sh Executable file
View File

@@ -0,0 +1,19 @@
#!/bin/sh
set -e
# Default fallback
LIB_TRIPLET="x86_64-linux-gnu"
# Detect multiarch directory if present
for d in "$SNAP/usr/lib/"*-linux-gnu; do
if [ -d "$d" ]; then
LIB_TRIPLET=$(basename "$d")
break
fi
done
export LD_LIBRARY_PATH="$SNAP/usr/lib:\
$SNAP/usr/lib/$LIB_TRIPLET:\
$SNAP/usr/lib/$LIB_TRIPLET/blas:\
$SNAP/usr/lib/$LIB_TRIPLET/lapack:\
$SNAP/usr/lib/$LIB_TRIPLET/pulseaudio:\
${LD_LIBRARY_PATH:+:$LD_LIBRARY_PATH}"
shift
exec "$SNAP/usr/local/bin/ccextractor" "$@"

104
snap/snapcraft.yaml Normal file
View File

@@ -0,0 +1,104 @@
name: ccextractor
base: core22
version: '0.96.5'
summary: Closed Caption Extractor
description: |
CCExtractor is a tool for extracting closed captions from video files.
website: https://www.ccextractor.org
source-code: https://github.com/CCExtractor/ccextractor
confinement: classic
apps:
ccextractor:
command: usr/local/bin/ccextractor
command-chain:
- local/run-ccextractor.sh
plugs:
- home
parts:
gpac:
plugin: make
source: https://github.com/gpac/gpac.git
source-tag: abi-16.4
build-packages:
- build-essential
- pkg-config
- zlib1g-dev
- libssl-dev
- libfreetype6-dev
- libjpeg-dev
- libpng-dev
override-build: |
set -eux
./configure --prefix=/usr
make -j$(nproc)
make DESTDIR=$SNAPCRAFT_PART_INSTALL install-lib
sed -i "s|^prefix=.*|prefix=$SNAPCRAFT_STAGE/usr|" $SNAPCRAFT_PART_INSTALL/usr/lib/pkgconfig/gpac.pc
stage:
- usr/lib/libgpac*
- usr/lib/pkgconfig/gpac.pc
- usr/include/gpac
ccextractor:
after: [gpac]
plugin: cmake
source: .
source-subdir: src
build-environment:
- PKG_CONFIG_PATH: "$SNAPCRAFT_STAGE/usr/lib/pkgconfig:$PKG_CONFIG_PATH"
build-snaps:
- cmake/latest/stable
- rustup/latest/stable
build-packages:
- build-essential
- pkg-config
- clang
- llvm-dev
- libclang-dev
- libzvbi-dev
- libtesseract-dev
- libavcodec-dev
- libavformat-dev
- libavdevice-dev
- libavfilter-dev
- libswscale-dev
- libx11-dev
- libxcb1-dev
- libxcb-shm0-dev
- libpng-dev
- zlib1g-dev
- libblas3
- liblapack3
stage-packages:
- libzvbi0
- libfreetype6
- libpng16-16
- libprotobuf-c1
- libutf8proc2
- libgl1
- libglu1-mesa
- libavcodec58
- libavformat58
- libavutil56
- libavdevice58
- libavfilter7
- libswscale5
- libjpeg-turbo8
- libvorbis0a
- libtheora0
- libxvidcore4
- libfaad2
- libmad0
- liba52-0.7.4
- libpulse0
- pulseaudio-utils
override-build: |
set -eux
rustup toolchain install stable
rustup default stable
export PATH="$HOME/.cargo/bin:$PATH"
snapcraftctl build
install -D -m 0755 \
$SNAPCRAFT_PROJECT_DIR/snap/local/run-ccextractor.sh \
$SNAPCRAFT_PART_INSTALL/local/run-ccextractor.sh

View File

@@ -9,7 +9,7 @@ option (WITH_HARDSUBX "Build with support for burned-in subtitles" OFF)
# Version number
set (CCEXTRACTOR_VERSION_MAJOR 0)
set (CCEXTRACTOR_VERSION_MINOR 89)
set (CCEXTRACTOR_VERSION_MINOR 96)
# Get project directory
get_filename_component(BASE_PROJ_DIR ../ ABSOLUTE)
@@ -230,6 +230,14 @@ if (PKG_CONFIG_FOUND AND WITH_HARDSUBX)
set (EXTRA_INCLUDES ${EXTRA_INCLUDES} ${SWSCALE_INCLUDE_DIRS})
set (CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -DENABLE_HARDSUBX")
pkg_check_modules (TESSERACT REQUIRED tesseract)
pkg_check_modules (LEPTONICA REQUIRED lept)
set (EXTRA_LIBS ${EXTRA_LIBS} ${TESSERACT_LIBRARIES})
set (EXTRA_LIBS ${EXTRA_LIBS} ${LEPTONICA_LIBRARIES})
set (EXTRA_INCLUDES ${EXTRA_INCLUDES} ${TESSERACT_INCLUDE_DIRS})
set (EXTRA_INCLUDES ${EXTRA_INCLUDES} ${LEPTONICA_INCLUDE_DIRS})
endif (PKG_CONFIG_FOUND AND WITH_HARDSUBX)
add_executable (ccextractor ${SOURCEFILE} ${FREETYPE_SOURCE} ${UTF8PROC_SOURCE})
@@ -247,4 +255,13 @@ endif (PKG_CONFIG_FOUND)
target_link_libraries (ccextractor ${EXTRA_LIBS})
target_include_directories (ccextractor PUBLIC ${EXTRA_INCLUDES})
# ccx_rust (Rust) calls C functions from ccx (like decode_vbi).
# Force the linker to pull these symbols from ccx before processing ccx_rust.
if (NOT WIN32 AND NOT APPLE)
target_link_options (ccextractor PRIVATE
-Wl,--undefined=decode_vbi
-Wl,--undefined=do_cb
-Wl,--undefined=store_hdcc)
endif()
install (TARGETS ccextractor DESTINATION bin)

View File

@@ -2,6 +2,8 @@
/* CCExtractor, originally by carlos at ccextractor.org, now a lot of people.
Credits: See AUTHORS.TXT
License: GPL 2.0
CI verification run: 2025-12-19T08:30 - Testing merged fixes from PRs #1847 and #1848
*/
#include "ccextractor.h"
#include <stdio.h>
@@ -184,6 +186,11 @@ int start_ccx()
ccx_options.use_gop_as_pts = 0;
if (ccx_options.ignore_pts_jumps)
ccx_common_timing_settings.disable_sync_check = 1;
// When using GOP timing (--goptime), disable sync check because
// GOP time (wall-clock) and PES PTS (stream-relative) are in
// different time bases and will always appear as huge jumps.
if (ccx_options.use_gop_as_pts == 1)
ccx_common_timing_settings.disable_sync_check = 1;
mprint("\rAnalyzing data in general mode\n");
tmp = general_loop(ctx);
if (!ret)
@@ -195,6 +202,12 @@ int start_ccx()
if (!ret)
ret = tmp;
break;
case CCX_SM_SCC:
mprint("\rAnalyzing data in SCC (Scenarist Closed Caption) mode\n");
tmp = raw_loop(ctx);
if (!ret)
ret = tmp;
break;
case CCX_SM_RCWT:
mprint("\rAnalyzing data in CCExtractor's binary format\n");
tmp = rcwt_loop(ctx);
@@ -422,6 +435,9 @@ int main(int argc, char *argv[])
int compile_ret = ccxr_parse_parameters(argc, argv);
// Update the Rust logger target after parsing so --quiet is respected
ccxr_update_logger_target();
if (compile_ret == EXIT_NO_INPUT_FILES)
{
print_usage();

View File

@@ -29,17 +29,16 @@ CURLcode res;
extern struct ccx_s_options ccx_options;
extern struct lib_ccx_ctx *signal_ctx;
//volatile int terminate_asap = 0;
// volatile int terminate_asap = 0;
struct ccx_s_options* api_init_options();
struct ccx_s_options *api_init_options();
int api_start(struct ccx_s_options api_options);
void sigterm_handler(int sig);
void sigint_handler(int sig);
void print_end_msg(void);
int main(int argc, char *argv[]);
#endif //CCEXTRACTOR_H
#endif // CCEXTRACTOR_H

View File

@@ -1,9 +1,9 @@
cmake_policy (SET CMP0037 NEW)
if(MSVC)
set (CMAKE_C_FLAGS "-W3 /wd4005 /wd4996")
set (CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -W3 /wd4005 /wd4996")
else (MSVC)
set (CMAKE_C_FLAGS "-Wall -Wno-pointer-sign -g -std=gnu99")
set (CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -Wall -Wno-pointer-sign -g -std=gnu99")
endif(MSVC)
if(WIN32)

View File

@@ -2,16 +2,16 @@
#define ACTIVITY_H
extern unsigned long net_activity_gui;
void activity_header (void);
void activity_progress (int percentaje, int cur_min, int cur_sec);
void activity_report_version (void);
void activity_input_file_closed (void);
void activity_input_file_open (const char *filename);
void activity_message (const char *fmt, ...);
void activity_video_info (int hor_size,int vert_size,
const char *aspect_ratio, const char *framerate);
void activity_program_number (unsigned program_number);
void activity_header(void);
void activity_progress(int percentaje, int cur_min, int cur_sec);
void activity_report_version(void);
void activity_input_file_closed(void);
void activity_input_file_open(const char *filename);
void activity_message(const char *fmt, ...);
void activity_video_info(int hor_size, int vert_size,
const char *aspect_ratio, const char *framerate);
void activity_program_number(unsigned program_number);
void activity_library_process(enum ccx_common_logging_gui message_type, ...);
void activity_report_data_read (void);
void activity_report_data_read(void);
#endif

View File

@@ -30,28 +30,30 @@
// 10.13 - Undocumented DVR-MS properties
#define DVRMS_PTS "\x2A\xC0\x3C\xFD\xDB\x06\xFA\x4C\x80\x1C\x72\x12\xD3\x87\x45\xE4"
typedef struct {
typedef struct
{
int VideoStreamNumber;
int AudioStreamNumber;
int CaptionStreamNumber;
int CaptionStreamStyle; // 1 = NTSC, 2 = ATSC
int DecodeStreamNumber; // The stream that is chosen to be decoded
int DecodeStreamPTS; // This will be used for the next returned block
int CaptionStreamStyle; // 1 = NTSC, 2 = ATSC
int DecodeStreamNumber; // The stream that is chosen to be decoded
int DecodeStreamPTS; // This will be used for the next returned block
int currDecodeStreamPTS; // Time of the data returned by the function
int prevDecodeStreamPTS; // Previous time
int VideoStreamMS; // See ableve, just for video
int VideoStreamMS; // See ableve, just for video
int currVideoStreamMS;
int prevVideoStreamMS;
int VideoJump; // Remember a jump in the video timeline
int VideoJump; // Remember a jump in the video timeline
} asf_data_stream_properties;
#define STREAMNUM 10
#define STREAMNUM 10
#define PAYEXTNUM 10
typedef struct {
typedef struct
{
// Generic buffer to hold data
unsigned char *parsebuf;
long parsebufsize;
int64_t parsebufsize;
// Header Object variables
int64_t HeaderObjectSize;
int64_t FileSize;
@@ -72,23 +74,23 @@ typedef struct {
uint32_t TotalDataPackets;
int VideoClosedCaptioningFlag;
// Payload data
int PayloadLType; // ASF - Payload Length Type. <>0 for multiple payloads
uint32_t PayloadLength; // ASF - Payload Length
int NumberOfPayloads; // ASF - Number of payloads.
int payloadcur; // local
int PayloadLType; // ASF - Payload Length Type. <>0 for multiple payloads
uint32_t PayloadLength; // ASF - Payload Length
int NumberOfPayloads; // ASF - Number of payloads.
int payloadcur; // local
int PayloadStreamNumber; // ASF
int KeyFrame; // ASF
int KeyFrame; // ASF
uint32_t PayloadMediaNumber; // ASF
// Data Object Loop
uint32_t datapacketcur; // Current packet number
int64_t dobjectread; // Bytes read in Data Object
uint32_t datapacketcur; // Current packet number
int64_t dobjectread; // Bytes read in Data Object
// Payload parsing information
int MultiplePayloads; // ASF
int PacketLType; // ASF
int ReplicatedLType; // ASF
int OffsetMediaLType; // ASF
int MediaNumberLType; // ASF
int StreamNumberLType; // ASF
int MultiplePayloads; // ASF
int PacketLType; // ASF
int ReplicatedLType; // ASF
int OffsetMediaLType; // ASF
int MediaNumberLType; // ASF
int StreamNumberLType; // ASF
uint32_t PacketLength;
uint32_t PaddingLength;
} asf_data;

View File

@@ -42,14 +42,14 @@ char *gui_data_string(void *val)
{
static char sbuf[40];
sprintf(sbuf, "%08lX-%04X-%04X-",
(long)*((uint32_t *)((char *)val + 0)),
(int)*((uint16_t *)((char *)val + 4)),
(int)*((uint16_t *)((char *)val + 6)));
snprintf(sbuf, sizeof(sbuf), "%08lX-%04X-%04X-",
(long)*((uint32_t *)((char *)val + 0)),
(int)*((uint16_t *)((char *)val + 4)),
(int)*((uint16_t *)((char *)val + 6)));
for (int ii = 0; ii < 2; ii++)
sprintf(sbuf + 19 + ii * 2, "%02X-", *((unsigned char *)val + 8 + ii));
snprintf(sbuf + 19 + ii * 2, sizeof(sbuf) - 19 - ii * 2, "%02X-", *((unsigned char *)val + 8 + ii));
for (int ii = 0; ii < 6; ii++)
sprintf(sbuf + 24 + ii * 2, "%02X", *((unsigned char *)val + 10 + ii));
snprintf(sbuf + 24 + ii * 2, sizeof(sbuf) - 24 - ii * 2, "%02X", *((unsigned char *)val + 10 + ii));
return sbuf;
}
@@ -150,6 +150,10 @@ int asf_get_more_data(struct lib_ccx_ctx *ctx, struct demuxer_data **ppdata)
.StreamNumberLType = 0,
.PacketLength = 0,
.PaddingLength = 0};
// Check for allocation failure
if (!asf_data_container.parsebuf)
fatal(EXIT_NOT_ENOUGH_MEMORY, "In asf_getmoredata: Out of memory allocating initial parse buffer.");
// Initialize the Payload Extension System
for (int stream = 0; stream < STREAMNUM; stream++)
{
@@ -185,9 +189,13 @@ int asf_get_more_data(struct lib_ccx_ctx *ctx, struct demuxer_data **ppdata)
if (asf_data_container.HeaderObjectSize > asf_data_container.parsebufsize)
{
asf_data_container.parsebuf = (unsigned char *)realloc(asf_data_container.parsebuf, (size_t)asf_data_container.HeaderObjectSize);
if (!asf_data_container.parsebuf)
unsigned char *tmp = (unsigned char *)realloc(asf_data_container.parsebuf, (size_t)asf_data_container.HeaderObjectSize);
if (!tmp)
{
free(asf_data_container.parsebuf);
fatal(EXIT_NOT_ENOUGH_MEMORY, "In asf_getmoredata: Out of memory requesting buffer for data container.");
}
asf_data_container.parsebuf = tmp;
asf_data_container.parsebufsize = (long)asf_data_container.HeaderObjectSize;
}
@@ -751,9 +759,13 @@ int asf_get_more_data(struct lib_ccx_ctx *ctx, struct demuxer_data **ppdata)
if ((long)replicated_length > asf_data_container.parsebufsize)
{
asf_data_container.parsebuf = (unsigned char *)realloc(asf_data_container.parsebuf, replicated_length);
if (!asf_data_container.parsebuf)
unsigned char *tmp = (unsigned char *)realloc(asf_data_container.parsebuf, replicated_length);
if (!tmp)
{
free(asf_data_container.parsebuf);
fatal(EXIT_NOT_ENOUGH_MEMORY, "In asf_getmoredata: Not enough memory for buffer, unable to continue.\n");
}
asf_data_container.parsebuf = tmp;
asf_data_container.parsebufsize = replicated_length;
}
result = buffered_read(ctx->demux_ctx, asf_data_container.parsebuf, (long)replicated_length);

View File

@@ -48,6 +48,7 @@ struct avc_ctx *init_avc(void)
ctx->cc_databufsize = 1024;
ctx->cc_buffer_saved = CCX_TRUE; // Was the CC buffer saved after it was last updated?
ctx->is_hevc = 0;
ctx->got_seq_para = 0;
ctx->nal_ref_idc = 0;
ctx->seq_parameter_set_id = 0;
@@ -87,16 +88,43 @@ struct avc_ctx *init_avc(void)
return ctx;
}
// HEVC NAL unit types for SEI messages
#define HEVC_NAL_PREFIX_SEI 39
#define HEVC_NAL_SUFFIX_SEI 40
#define HEVC_NAL_VPS 32
#define HEVC_NAL_SPS 33
#define HEVC_NAL_PPS 34
void do_NAL(struct encoder_ctx *enc_ctx, struct lib_cc_decode *dec_ctx, unsigned char *NAL_start, LLONG NAL_length, struct cc_subtitle *sub)
{
unsigned char *NAL_stop;
enum ccx_avc_nal_types nal_unit_type = *NAL_start & 0x1F;
int nal_unit_type;
int nal_header_size;
unsigned char *payload_start;
// Determine if this is HEVC or H.264 based on NAL header
// H.264 NAL header: 1 byte, type in bits [4:0]
// HEVC NAL header: 2 bytes, type in bits [6:1] of first byte
if (dec_ctx->avc_ctx->is_hevc)
{
// HEVC: NAL type is in bits [6:1] of byte 0
nal_unit_type = (NAL_start[0] >> 1) & 0x3F;
nal_header_size = 2;
}
else
{
// H.264: NAL type is in bits [4:0] of byte 0
nal_unit_type = NAL_start[0] & 0x1F;
nal_header_size = 1;
}
NAL_stop = NAL_length + NAL_start;
NAL_stop = remove_03emu(NAL_start + 1, NAL_stop); // Add +1 to NAL_stop for TS, without it for MP4. Still don't know why
NAL_stop = remove_03emu(NAL_start + nal_header_size, NAL_stop);
payload_start = NAL_start + nal_header_size;
dvprint("BEGIN NAL unit type: %d length %d ref_idc: %d - Buffered captions before: %d\n",
nal_unit_type, NAL_stop - NAL_start - 1, dec_ctx->avc_ctx->nal_ref_idc, !dec_ctx->avc_ctx->cc_buffer_saved);
dvprint("BEGIN NAL unit type: %d length %d ref_idc: %d - Buffered captions before: %d (HEVC: %d)\n",
nal_unit_type, NAL_stop - NAL_start - nal_header_size, dec_ctx->avc_ctx->nal_ref_idc,
!dec_ctx->avc_ctx->cc_buffer_saved, dec_ctx->avc_ctx->is_hevc);
if (NAL_stop == NULL) // remove_03emu failed.
{
@@ -104,45 +132,64 @@ void do_NAL(struct encoder_ctx *enc_ctx, struct lib_cc_decode *dec_ctx, unsigned
return;
}
if (nal_unit_type == CCX_NAL_TYPE_ACCESS_UNIT_DELIMITER_9)
if (dec_ctx->avc_ctx->is_hevc)
{
// Found Access Unit Delimiter
// HEVC NAL unit processing
if (nal_unit_type == HEVC_NAL_VPS || nal_unit_type == HEVC_NAL_SPS || nal_unit_type == HEVC_NAL_PPS)
{
// Found HEVC parameter set - mark as having sequence params
// We don't parse HEVC SPS fully, but we need to enable SEI processing
dec_ctx->avc_ctx->got_seq_para = 1;
}
else if (nal_unit_type == HEVC_NAL_PREFIX_SEI || nal_unit_type == HEVC_NAL_SUFFIX_SEI)
{
// Found HEVC SEI (used for subtitles)
// SEI payload format is similar to H.264
sei_rbsp(dec_ctx->avc_ctx, payload_start, NAL_stop);
}
}
else if (nal_unit_type == CCX_NAL_TYPE_SEQUENCE_PARAMETER_SET_7)
else
{
// Found sequence parameter set
// We need this to parse NAL type 1 (CCX_NAL_TYPE_CODED_SLICE_NON_IDR_PICTURE_1)
dec_ctx->avc_ctx->num_nal_unit_type_7++;
seq_parameter_set_rbsp(dec_ctx->avc_ctx, NAL_start + 1, NAL_stop);
dec_ctx->avc_ctx->got_seq_para = 1;
}
else if (dec_ctx->avc_ctx->got_seq_para && (nal_unit_type == CCX_NAL_TYPE_CODED_SLICE_NON_IDR_PICTURE_1 ||
nal_unit_type == CCX_NAL_TYPE_CODED_SLICE_IDR_PICTURE)) // Only if nal_unit_type=1
{
// Found coded slice of a non-IDR picture
// We only need the slice header data, no need to implement
// slice_layer_without_partitioning_rbsp( );
slice_header(enc_ctx, dec_ctx, NAL_start + 1, NAL_stop, nal_unit_type, sub);
}
else if (dec_ctx->avc_ctx->got_seq_para && nal_unit_type == CCX_NAL_TYPE_SEI)
{
// Found SEI (used for subtitles)
// set_fts(ctx->timing); // FIXME - check this!!!
sei_rbsp(dec_ctx->avc_ctx, NAL_start + 1, NAL_stop);
}
else if (dec_ctx->avc_ctx->got_seq_para && nal_unit_type == CCX_NAL_TYPE_PICTURE_PARAMETER_SET)
{
// Found Picture parameter set
// H.264 NAL unit processing (original code)
if (nal_unit_type == CCX_NAL_TYPE_ACCESS_UNIT_DELIMITER_9)
{
// Found Access Unit Delimiter
}
else if (nal_unit_type == CCX_NAL_TYPE_SEQUENCE_PARAMETER_SET_7)
{
// Found sequence parameter set
// We need this to parse NAL type 1 (CCX_NAL_TYPE_CODED_SLICE_NON_IDR_PICTURE_1)
dec_ctx->avc_ctx->num_nal_unit_type_7++;
seq_parameter_set_rbsp(dec_ctx->avc_ctx, payload_start, NAL_stop);
dec_ctx->avc_ctx->got_seq_para = 1;
}
else if (dec_ctx->avc_ctx->got_seq_para && (nal_unit_type == CCX_NAL_TYPE_CODED_SLICE_NON_IDR_PICTURE_1 ||
nal_unit_type == CCX_NAL_TYPE_CODED_SLICE_IDR_PICTURE))
{
// Found coded slice of a non-IDR picture
// We only need the slice header data
slice_header(enc_ctx, dec_ctx, payload_start, NAL_stop, nal_unit_type, sub);
}
else if (dec_ctx->avc_ctx->got_seq_para && nal_unit_type == CCX_NAL_TYPE_SEI)
{
// Found SEI (used for subtitles)
sei_rbsp(dec_ctx->avc_ctx, payload_start, NAL_stop);
}
else if (dec_ctx->avc_ctx->got_seq_para && nal_unit_type == CCX_NAL_TYPE_PICTURE_PARAMETER_SET)
{
// Found Picture parameter set
}
}
if (temp_debug)
{
int len = NAL_stop - (NAL_start + 1);
int len = NAL_stop - payload_start;
dbg_print(CCX_DMT_VIDES, "\n After decoding, the actual thing was (length =%d)\n", len);
dump(CCX_DMT_VIDES, NAL_start + 1, len > 160 ? 160 : len, 0, 0);
dump(CCX_DMT_VIDES, payload_start, len > 160 ? 160 : len, 0, 0);
}
dvprint("END NAL unit type: %d length %d ref_idc: %d - Buffered captions after: %d\n",
nal_unit_type, NAL_stop - NAL_start - 1, dec_ctx->avc_ctx->nal_ref_idc, !dec_ctx->avc_ctx->cc_buffer_saved);
nal_unit_type, NAL_stop - NAL_start - nal_header_size, dec_ctx->avc_ctx->nal_ref_idc, !dec_ctx->avc_ctx->cc_buffer_saved);
}
// Process inbuf bytes in buffer holding and AVC (H.264) video stream.
@@ -332,11 +379,10 @@ void sei_rbsp(struct avc_ctx *ctx, unsigned char *seibuf, unsigned char *seiend)
}
else
{
// TODO: This really really looks bad
mprint("WARNING: Unexpected SEI unit length...trying to continue.");
temp_debug = 1;
mprint("\n Failed block (at sei_rbsp) was:\n");
dump(CCX_DMT_GENERIC_NOTICES, (unsigned char *)seibuf, seiend - seibuf, 0, 0);
// Unexpected SEI length - common with malformed streams, don't spam output
dbg_print(CCX_DMT_VERBOSE, "WARNING: Unexpected SEI unit length (parsed to %p, expected %p)...trying to continue.\n",
(void *)tbuf, (void *)(seiend - 1));
dump(CCX_DMT_VERBOSE, (unsigned char *)seibuf, seiend - seibuf, 0, 0);
ctx->num_unexpected_sei_length++;
}
@@ -346,20 +392,24 @@ void sei_rbsp(struct avc_ctx *ctx, unsigned char *seibuf, unsigned char *seiend)
unsigned char *sei_message(struct avc_ctx *ctx, unsigned char *seibuf, unsigned char *seiend)
{
int payload_type = 0;
while (*seibuf == 0xff)
while (seibuf < seiend && *seibuf == 0xff)
{
payload_type += 255;
seibuf++;
}
if (seibuf >= seiend)
return NULL;
payload_type += *seibuf;
seibuf++;
int payload_size = 0;
while (*seibuf == 0xff)
while (seibuf < seiend && *seibuf == 0xff)
{
payload_size += 255;
seibuf++;
}
if (seibuf >= seiend)
return NULL;
payload_size += *seibuf;
seibuf++;
@@ -510,9 +560,13 @@ void user_data_registered_itu_t_t35(struct avc_ctx *ctx, unsigned char *userbuf,
// Save the data and process once we know the sequence number
if (((ctx->cc_count + local_cc_count) * 3) + 1 > ctx->cc_databufsize)
{
ctx->cc_data = (unsigned char *)realloc(ctx->cc_data, (size_t)((ctx->cc_count + local_cc_count) * 6) + 1);
if (!ctx->cc_data)
unsigned char *tmp = (unsigned char *)realloc(ctx->cc_data, (size_t)((ctx->cc_count + local_cc_count) * 6) + 1);
if (!tmp)
{
free(ctx->cc_data);
fatal(EXIT_NOT_ENOUGH_MEMORY, "In user_data_registered_itu_t_t35: Out of memory to allocate buffer for CC data.");
}
ctx->cc_data = tmp;
ctx->cc_databufsize = (long)((ctx->cc_count + local_cc_count) * 6) + 1;
}
// Copy new cc data into cc_data
@@ -581,9 +635,13 @@ void user_data_registered_itu_t_t35(struct avc_ctx *ctx, unsigned char *userbuf,
// Save the data and process once we know the sequence number
if ((((local_cc_count + ctx->cc_count) * 3) + 1) > ctx->cc_databufsize)
{
ctx->cc_data = (unsigned char *)realloc(ctx->cc_data, (size_t)(((local_cc_count + ctx->cc_count) * 6) + 1));
if (!ctx->cc_data)
unsigned char *tmp = (unsigned char *)realloc(ctx->cc_data, (size_t)(((local_cc_count + ctx->cc_count) * 6) + 1));
if (!tmp)
{
free(ctx->cc_data);
fatal(EXIT_NOT_ENOUGH_MEMORY, "In user_data_registered_itu_t_t35: Not enough memory trying to allocate buffer for CC data.");
}
ctx->cc_data = tmp;
ctx->cc_databufsize = (long)(((local_cc_count + ctx->cc_count) * 6) + 1);
}
// Copy new cc data into cc_data - replace command below.
@@ -849,10 +907,10 @@ void seq_parameter_set_rbsp(struct avc_ctx *ctx, unsigned char *seqbuf, unsigned
dvprint("vcl_hrd_parameters_present_flag= %llX\n", tmp1);
if (tmp)
{
// TODO.
mprint("vcl_hrd. Not implemented for now. Hopefully not needed. Skipping rest of NAL\n");
// VCL HRD parameters are for video buffering compliance, not needed for caption extraction.
// Just skip and continue - this doesn't affect our ability to extract captions.
mprint("Skipping VCL HRD parameters (not needed for caption extraction)\n");
ctx->num_vcl_hrd++;
// exit(1);
}
if (tmp || tmp1)
{
@@ -899,6 +957,15 @@ void slice_header(struct encoder_ctx *enc_ctx, struct lib_cc_decode *dec_ctx, un
dvprint("first_mb_in_slice= % 4lld (%#llX)\n", tmp, tmp);
slice_type = read_exp_golomb_unsigned(&q1);
dvprint("slice_type= % 4llX\n", slice_type);
// Validate slice_type to prevent buffer overflow in slice_types[] array
// Valid H.264 slice_type values are 0-9 (H.264 spec Table 7-6)
if (slice_type >= 10)
{
mprint("Invalid slice_type %lld in slice header, skipping.\n", slice_type);
return;
}
tmp = read_exp_golomb_unsigned(&q1);
dvprint("pic_parameter_set_id= % 4lld (%#llX)\n", tmp, tmp);
@@ -929,9 +996,9 @@ void slice_header(struct encoder_ctx *enc_ctx, struct lib_cc_decode *dec_ctx, un
if (nal_unit_type == 5)
{
// idr_pic_id: Read to advance bitstream position; value not needed for caption extraction
tmp = read_exp_golomb_unsigned(&q1);
dvprint("idr_pic_id= % 4lld (%#llX)\n", tmp, tmp);
// TODO
}
if (dec_ctx->avc_ctx->pic_order_cnt_type == 0)
{
@@ -1038,7 +1105,12 @@ void slice_header(struct encoder_ctx *enc_ctx, struct lib_cc_decode *dec_ctx, un
}
// if slices are buffered - flush
if (isref)
// For I/P-only streams (like HDHomeRun recordings), flushing on every
// reference frame defeats reordering since all frames are reference frames.
// Only flush and reset on IDR frames (nal_unit_type==5), not P-frames.
// This allows P-frames to accumulate in the buffer and be sorted by PTS.
int is_idr = (nal_unit_type == CCX_NAL_TYPE_CODED_SLICE_IDR_PICTURE);
if (isref && is_idr)
{
dvprint("\nReference pic! [%s]\n", slice_types[slice_type]);
dbg_print(CCX_DMT_TIME, "\nReference pic! [%s] maxrefcnt: %3d\n",
@@ -1133,8 +1205,32 @@ void slice_header(struct encoder_ctx *enc_ctx, struct lib_cc_decode *dec_ctx, un
if (abs(current_index) >= MAXBFRAMES)
{
// Probably a jump in the timeline. Warn and handle gracefully.
mprint("\nFound large gap(%d) in PTS! Trying to recover ...\n", current_index);
// Large PTS gap detected. This can happen with certain encoders
// (like HDHomeRun) that produce streams where PTS jumps are common.
// Instead of just resetting current_index to 0 (which causes captions
// to pile up at the same buffer slot and become garbled), we need to:
// 1. Flush any buffered captions
// 2. Reset the reference PTS to the current PTS
// 3. Set current_index to 0 for a fresh start
// This ensures subsequent frames use the new reference point.
dbg_print(CCX_DMT_VERBOSE, "\nLarge PTS gap(%d) detected, flushing buffer and resetting reference.\n", current_index);
// Flush any buffered captions before resetting
if (dec_ctx->has_ccdata_buffered)
{
process_hdcc(enc_ctx, dec_ctx, sub);
}
// Reset the reference point to current PTS
dec_ctx->avc_ctx->currefpts = dec_ctx->timing->current_pts;
// Reset tracking variables for the new reference
dec_ctx->avc_ctx->lastmaxidx = -1;
dec_ctx->avc_ctx->maxidx = 0;
dec_ctx->avc_ctx->lastminidx = 10000;
dec_ctx->avc_ctx->minidx = 10000;
// Start with index 0 relative to the new reference
current_index = 0;
}

View File

@@ -1,15 +1,15 @@
#ifndef AVC_FUNCTION_H
#define AVC_FUNCTION_H
struct avc_ctx
{
unsigned char cc_count;
// buffer to hold cc data
unsigned char *cc_data;
long cc_databufsize;
int64_t cc_databufsize;
int cc_buffer_saved; // Was the CC buffer saved after it was last updated?
int is_hevc; // Flag to indicate HEVC (H.265) mode vs H.264
int got_seq_para;
unsigned nal_ref_idc;
LLONG seq_parameter_set_id;
@@ -19,11 +19,11 @@ struct avc_ctx
int frame_mbs_only_flag;
// Use and throw stats for debug, remove this ugliness soon
long num_nal_unit_type_7;
long num_vcl_hrd;
long num_nal_hrd;
long num_jump_in_frames;
long num_unexpected_sei_length;
int64_t num_nal_unit_type_7;
int64_t num_vcl_hrd;
int64_t num_nal_hrd;
int64_t num_jump_in_frames;
int64_t num_unexpected_sei_length;
int ccblocks_in_avc_total;
int ccblocks_in_avc_lost;
@@ -50,6 +50,6 @@ struct avc_ctx
struct avc_ctx *init_avc(void);
void dinit_avc(struct avc_ctx **ctx);
void do_NAL (struct encoder_ctx *enc_ctx, struct lib_cc_decode *ctx, unsigned char *NAL_start, LLONG NAL_length, struct cc_subtitle *sub);
void do_NAL(struct encoder_ctx *enc_ctx, struct lib_cc_decode *ctx, unsigned char *NAL_start, LLONG NAL_length, struct cc_subtitle *sub);
size_t process_avc(struct encoder_ctx *enc_ctx, struct lib_cc_decode *ctx, unsigned char *avcbuf, size_t avcbuflen, struct cc_subtitle *sub);
#endif

View File

@@ -26,26 +26,25 @@ struct bitstream
int _i_bpos;
};
#define read_u8(bstream) (uint8_t)bitstream_get_num(bstream,1,1)
#define read_u16(bstream) (uint16_t)bitstream_get_num(bstream,2,1)
#define read_u32(bstream) (uint32_t)bitstream_get_num(bstream,4,1)
#define read_u64(bstream) (uint64_t)bitstream_get_num(bstream,8,1)
#define read_i8(bstream) (int8_t)bitstream_get_num(bstream,1,1)
#define read_i16(bstream) (int16_t)bitstream_get_num(bstream,2,1)
#define read_i32(bstream) (int32_t)bitstream_get_num(bstream,4,1)
#define read_i64(bstream) (int64_t)bitstream_get_num(bstream,8,1)
#define read_u8(bstream) (uint8_t)bitstream_get_num(bstream, 1, 1)
#define read_u16(bstream) (uint16_t)bitstream_get_num(bstream, 2, 1)
#define read_u32(bstream) (uint32_t)bitstream_get_num(bstream, 4, 1)
#define read_u64(bstream) (uint64_t)bitstream_get_num(bstream, 8, 1)
#define read_i8(bstream) (int8_t)bitstream_get_num(bstream, 1, 1)
#define read_i16(bstream) (int16_t)bitstream_get_num(bstream, 2, 1)
#define read_i32(bstream) (int32_t)bitstream_get_num(bstream, 4, 1)
#define read_i64(bstream) (int64_t)bitstream_get_num(bstream, 8, 1)
#define skip_u32(bstream) (void)bitstream_get_num(bstream,4,1)
#define next_u8(bstream) (uint8_t)bitstream_get_num(bstream,1,0)
#define next_u16(bstream) (uint16_t)bitstream_get_num(bstream,2,0)
#define next_u32(bstream) (uint32_t)bitstream_get_num(bstream,4,0)
#define next_u64(bstream) (uint64_t)bitstream_get_num(bstream,8,0)
#define next_i8(bstream) (int8_t)bitstream_get_num(bstream,1,0)
#define next_i16(bstream) (int16_t)bitstream_get_num(bstream,2,0)
#define next_i32(bstream) (int32_t)bitstream_get_num(bstream,4,0)
#define next_i64(bstream) (int64_t)bitstream_get_num(bstream,8,0)
#define skip_u32(bstream) (void)bitstream_get_num(bstream, 4, 1)
#define next_u8(bstream) (uint8_t)bitstream_get_num(bstream, 1, 0)
#define next_u16(bstream) (uint16_t)bitstream_get_num(bstream, 2, 0)
#define next_u32(bstream) (uint32_t)bitstream_get_num(bstream, 4, 0)
#define next_u64(bstream) (uint64_t)bitstream_get_num(bstream, 8, 0)
#define next_i8(bstream) (int8_t)bitstream_get_num(bstream, 1, 0)
#define next_i16(bstream) (int16_t)bitstream_get_num(bstream, 2, 0)
#define next_i32(bstream) (int32_t)bitstream_get_num(bstream, 4, 0)
#define next_i64(bstream) (int64_t)bitstream_get_num(bstream, 8, 0)
int init_bitstream(struct bitstream *bstr, unsigned char *start, unsigned char *end);
uint64_t next_bits(struct bitstream *bstr, unsigned bnum);

View File

@@ -10,47 +10,47 @@
<100 means display whatever was output to stderr as a warning
>=100 means display whatever was output to stdout as an error
*/
#define EXIT_OK 0
#define EXIT_NO_INPUT_FILES 2
#define EXIT_TOO_MANY_INPUT_FILES 3
#define EXIT_INCOMPATIBLE_PARAMETERS 4
#define EXIT_UNABLE_TO_DETERMINE_FILE_SIZE 6
#define EXIT_MALFORMED_PARAMETER 7
#define EXIT_READ_ERROR 8
#define EXIT_NO_CAPTIONS 10
#define EXIT_WITH_HELP 11
#define EXIT_NOT_CLASSIFIED 300
#define EXIT_ERROR_IN_CAPITALIZATION_FILE 501
#define EXIT_BUFFER_FULL 502
#define EXIT_MISSING_ASF_HEADER 1001
#define EXIT_MISSING_RCWT_HEADER 1002
#define EXIT_OK 0
#define EXIT_NO_INPUT_FILES 2
#define EXIT_TOO_MANY_INPUT_FILES 3
#define EXIT_INCOMPATIBLE_PARAMETERS 4
#define EXIT_UNABLE_TO_DETERMINE_FILE_SIZE 6
#define EXIT_MALFORMED_PARAMETER 7
#define EXIT_READ_ERROR 8
#define EXIT_NO_CAPTIONS 10
#define EXIT_WITH_HELP 11
#define EXIT_NOT_CLASSIFIED 300
#define EXIT_ERROR_IN_CAPITALIZATION_FILE 501
#define EXIT_BUFFER_FULL 502
#define EXIT_MISSING_ASF_HEADER 1001
#define EXIT_MISSING_RCWT_HEADER 1002
#define CCX_COMMON_EXIT_FILE_CREATION_FAILED 5
#define CCX_COMMON_EXIT_UNSUPPORTED 9
#define EXIT_NOT_ENOUGH_MEMORY 500
#define CCX_COMMON_EXIT_BUG_BUG 1000
#define CCX_COMMON_EXIT_FILE_CREATION_FAILED 5
#define CCX_COMMON_EXIT_UNSUPPORTED 9
#define EXIT_NOT_ENOUGH_MEMORY 500
#define CCX_COMMON_EXIT_BUG_BUG 1000
#define CCX_OK 0
#define CCX_FALSE 0
#define CCX_TRUE 1
#define CCX_EAGAIN -100
#define CCX_EOF -101
#define CCX_EINVAL -102
#define CCX_OK 0
#define CCX_FALSE 0
#define CCX_TRUE 1
#define CCX_EAGAIN -100
#define CCX_EOF -101
#define CCX_EINVAL -102
#define CCX_ENOSUPP -103
#define CCX_ENOMEM -104
#define CCX_ENOMEM -104
// Declarations
int cc608_parity(unsigned int byte);
int fdprintf(int fd, const char *fmt, ...);
void millis_to_time(LLONG milli, unsigned *hours, unsigned *minutes,unsigned *seconds, unsigned *ms);
void millis_to_time(LLONG milli, unsigned *hours, unsigned *minutes, unsigned *seconds, unsigned *ms);
extern void ccxr_millis_to_time(LLONG milli, unsigned *hours, unsigned *minutes,unsigned *seconds, unsigned *ms);
extern void ccxr_millis_to_time(LLONG milli, unsigned *hours, unsigned *minutes, unsigned *seconds, unsigned *ms);
void freep(void *arg);
void dbg_print(LLONG mask, const char *fmt, ...);
unsigned char *debug_608_to_ASC(unsigned char *ccdata, int channel);
int add_cc_sub_text(struct cc_subtitle *sub, char *str, LLONG start_time,
LLONG end_time, char *info, char *mode, enum ccx_encoding_type);
LLONG end_time, char *info, char *mode, enum ccx_encoding_type);
extern int cc608_parity_table[256]; // From myth
#endif

View File

@@ -17,8 +17,8 @@ const unsigned char UTF8_BOM[] = {0xef, 0xbb, 0xbf};
const unsigned char DVD_HEADER[8] = {0x00, 0x00, 0x01, 0xb2, 0x43, 0x43, 0x01, 0xf8};
const unsigned char lc1[1] = {0x8a};
const unsigned char lc2[1] = {0x8f};
const unsigned char lc3[2] = {0x16, 0xfe};
const unsigned char lc4[2] = {0x1e, 0xfe};
const unsigned char lc3[1] = {0x16}; // McPoodle uses single-byte loop markers
const unsigned char lc4[1] = {0x1e};
const unsigned char lc5[1] = {0xff};
const unsigned char lc6[1] = {0xfe};

View File

@@ -22,42 +22,42 @@ extern const unsigned char UTF8_BOM[3];
extern const unsigned char DVD_HEADER[8];
extern const unsigned char lc1[1];
extern const unsigned char lc2[1];
extern const unsigned char lc3[2];
extern const unsigned char lc4[2];
extern const unsigned char lc3[1];
extern const unsigned char lc4[1];
extern const unsigned char lc5[1];
extern const unsigned char lc6[1];
extern unsigned char rcwt_header[11];
#define ONEPASS 120 /* Bytes we can always look ahead without going out of limits */
#define BUFSIZE (2048*1024+ONEPASS) /* 2 Mb plus the safety pass */
#define ONEPASS 120 /* Bytes we can always look ahead without going out of limits */
#define BUFSIZE (2048 * 1024 + ONEPASS) /* 2 Mb plus the safety pass */
#define MAX_CLOSED_CAPTION_DATA_PER_PICTURE 32
#define EIA_708_BUFFER_LENGTH 2048 // TODO: Find out what the real limit is
#define TS_PACKET_PAYLOAD_LENGTH 184 // From specs
#define SUBLINESIZE 2048 // Max. length of a .srt line - TODO: Get rid of this
#define STARTBYTESLENGTH (1024*1024)
#define EIA_708_BUFFER_LENGTH 2048 // TODO: Find out what the real limit is
#define TS_PACKET_PAYLOAD_LENGTH 184 // From specs
#define SUBLINESIZE 2048 // Max. length of a .srt line - TODO: Get rid of this
#define STARTBYTESLENGTH (1024 * 1024)
#define UTF8_MAX_BYTES 6
#define XMLRPC_CHUNK_SIZE (64*1024) // 64 Kb per chunk, to avoid too many realloc()
#define XMLRPC_CHUNK_SIZE (64 * 1024) // 64 Kb per chunk, to avoid too many realloc()
enum ccx_debug_message_types
{
/* Each debug message now belongs to one of these types. Use bitmaps in case
we want one message to belong to more than one type. */
CCX_DMT_PARSE = 1, // Show information related to parsing the container
CCX_DMT_VIDES = 2, // Show video stream related information
CCX_DMT_TIME = 4, // Show GOP and PTS timing information
CCX_DMT_VERBOSE = 8, // Show lots of debugging output
CCX_DMT_DECODER_608 = 0x10, // Show CC-608 decoder debug?
CCX_DMT_708 = 0x20, // Show CC-708 decoder debug?
CCX_DMT_DECODER_XDS = 0x40, // Show XDS decoder debug?
CCX_DMT_CBRAW = 0x80, // Caption blocks with FTS timing
CCX_DMT_PARSE = 1, // Show information related to parsing the container
CCX_DMT_VIDES = 2, // Show video stream related information
CCX_DMT_TIME = 4, // Show GOP and PTS timing information
CCX_DMT_VERBOSE = 8, // Show lots of debugging output
CCX_DMT_DECODER_608 = 0x10, // Show CC-608 decoder debug?
CCX_DMT_708 = 0x20, // Show CC-708 decoder debug?
CCX_DMT_DECODER_XDS = 0x40, // Show XDS decoder debug?
CCX_DMT_CBRAW = 0x80, // Caption blocks with FTS timing
CCX_DMT_GENERIC_NOTICES = 0x100, // Generic, always displayed even if no debug is selected
CCX_DMT_TELETEXT = 0x200, // Show teletext debug?
CCX_DMT_PAT = 0x400, // Program Allocation Table dump
CCX_DMT_PMT = 0x800, // Program Map Table dump
CCX_DMT_LEVENSHTEIN = 0x1000, // Levenshtein distance calculations
CCX_DMT_DVB = 0x2000, // DVB
CCX_DMT_DUMPDEF = 0x4000 // Dump defective TS packets
CCX_DMT_TELETEXT = 0x200, // Show teletext debug?
CCX_DMT_PAT = 0x400, // Program Allocation Table dump
CCX_DMT_PMT = 0x800, // Program Map Table dump
CCX_DMT_LEVENSHTEIN = 0x1000, // Levenshtein distance calculations
CCX_DMT_DVB = 0x2000, // DVB
CCX_DMT_DUMPDEF = 0x4000 // Dump defective TS packets
};
// AVC NAL types
@@ -101,95 +101,95 @@ enum ccx_avc_nal_types
enum ccx_stream_type
{
CCX_STREAM_TYPE_UNKNOWNSTREAM = 0,
/*
/*
The later constants are defined by MPEG-TS standard
Explore at: https://exiftool.org/TagNames/M2TS.html
Explore at: https://exiftool.org/TagNames/M2TS.html
*/
CCX_STREAM_TYPE_VIDEO_MPEG1 = 0x01,
CCX_STREAM_TYPE_VIDEO_MPEG2 = 0x02,
CCX_STREAM_TYPE_AUDIO_MPEG1 = 0x03,
CCX_STREAM_TYPE_AUDIO_MPEG2 = 0x04,
CCX_STREAM_TYPE_PRIVATE_TABLE_MPEG2 = 0x05,
CCX_STREAM_TYPE_PRIVATE_MPEG2 = 0x06,
CCX_STREAM_TYPE_MHEG_PACKETS = 0x07,
CCX_STREAM_TYPE_MPEG2_ANNEX_A_DSM_CC = 0x08,
CCX_STREAM_TYPE_ITU_T_H222_1 = 0x09,
CCX_STREAM_TYPE_VIDEO_MPEG1 = 0x01,
CCX_STREAM_TYPE_VIDEO_MPEG2 = 0x02,
CCX_STREAM_TYPE_AUDIO_MPEG1 = 0x03,
CCX_STREAM_TYPE_AUDIO_MPEG2 = 0x04,
CCX_STREAM_TYPE_PRIVATE_TABLE_MPEG2 = 0x05,
CCX_STREAM_TYPE_PRIVATE_MPEG2 = 0x06,
CCX_STREAM_TYPE_MHEG_PACKETS = 0x07,
CCX_STREAM_TYPE_MPEG2_ANNEX_A_DSM_CC = 0x08,
CCX_STREAM_TYPE_ITU_T_H222_1 = 0x09,
CCX_STREAM_TYPE_ISO_IEC_13818_6_TYPE_A = 0x0A,
CCX_STREAM_TYPE_ISO_IEC_13818_6_TYPE_B = 0x0B,
CCX_STREAM_TYPE_ISO_IEC_13818_6_TYPE_C = 0x0C,
CCX_STREAM_TYPE_ISO_IEC_13818_6_TYPE_D = 0x0D,
CCX_STREAM_TYPE_AUDIO_AAC = 0x0f,
CCX_STREAM_TYPE_VIDEO_MPEG4 = 0x10,
CCX_STREAM_TYPE_VIDEO_H264 = 0x1b,
CCX_STREAM_TYPE_PRIVATE_USER_MPEG2 = 0x80,
CCX_STREAM_TYPE_AUDIO_AC3 = 0x81,
CCX_STREAM_TYPE_AUDIO_HDMV_DTS = 0x82,
CCX_STREAM_TYPE_AUDIO_DTS = 0x8a
CCX_STREAM_TYPE_AUDIO_AAC = 0x0f,
CCX_STREAM_TYPE_VIDEO_MPEG4 = 0x10,
CCX_STREAM_TYPE_VIDEO_H264 = 0x1b,
CCX_STREAM_TYPE_VIDEO_HEVC = 0x24,
CCX_STREAM_TYPE_PRIVATE_USER_MPEG2 = 0x80,
CCX_STREAM_TYPE_AUDIO_AC3 = 0x81,
CCX_STREAM_TYPE_AUDIO_HDMV_DTS = 0x82,
CCX_STREAM_TYPE_AUDIO_DTS = 0x8a
};
enum ccx_mpeg_descriptor
{
/*
/*
The later constants are defined by ETSI EN 300 468 standard
Explore at: https://www.etsi.org/deliver/etsi_en/300400_300499/300468/01.11.01_60/en_300468v011101p.pdf
Explore at: https://www.etsi.org/deliver/etsi_en/300400_300499/300468/01.11.01_60/en_300468v011101p.pdf
*/
CCX_MPEG_DSC_REGISTRATION = 0x05,
CCX_MPEG_DSC_DATA_STREAM_ALIGNMENT = 0x06,
CCX_MPEG_DSC_ISO639_LANGUAGE = 0x0A,
CCX_MPEG_DSC_VBI_DATA_DESCRIPTOR = 0x45,
CCX_MPEG_DSC_REGISTRATION = 0x05,
CCX_MPEG_DSC_DATA_STREAM_ALIGNMENT = 0x06,
CCX_MPEG_DSC_ISO639_LANGUAGE = 0x0A,
CCX_MPEG_DSC_VBI_DATA_DESCRIPTOR = 0x45,
CCX_MPEG_DSC_VBI_TELETEXT_DESCRIPTOR = 0x46,
CCX_MPEG_DSC_TELETEXT_DESCRIPTOR = 0x56,
CCX_MPEG_DSC_DVB_SUBTITLE = 0x59,
CCX_MPEG_DSC_TELETEXT_DESCRIPTOR = 0x56,
CCX_MPEG_DSC_DVB_SUBTITLE = 0x59,
/* User defined */
CCX_MPEG_DSC_CAPTION_SERVICE = 0x86,
CCX_MPEG_DESC_DATA_COMP = 0xfd // Consider to change DESC to DSC
CCX_MPEG_DSC_CAPTION_SERVICE = 0x86,
CCX_MPEG_DESC_DATA_COMP = 0xfd // Consider to change DESC to DSC
};
enum
{
CCX_MESSAGES_QUIET = 0,
CCX_MESSAGES_QUIET = 0,
CCX_MESSAGES_STDOUT = 1,
CCX_MESSAGES_STDERR = 2
};
enum ccx_datasource
{
CCX_DS_FILE = 0,
CCX_DS_STDIN = 1,
CCX_DS_FILE = 0,
CCX_DS_STDIN = 1,
CCX_DS_NETWORK = 2,
CCX_DS_TCP = 3
CCX_DS_TCP = 3
};
enum ccx_output_format
{
CCX_OF_RAW = 0,
CCX_OF_SRT = 1,
CCX_OF_SAMI = 2,
CCX_OF_RAW = 0,
CCX_OF_SRT = 1,
CCX_OF_SAMI = 2,
CCX_OF_TRANSCRIPT = 3,
CCX_OF_RCWT = 4,
CCX_OF_NULL = 5,
CCX_OF_SMPTETT = 6,
CCX_OF_SPUPNG = 7,
CCX_OF_DVDRAW = 8, // See -d at http://www.theneitherworld.com/mcpoodle/SCC_TOOLS/DOCS/SCC_TOOLS.HTML#CCExtract
CCX_OF_WEBVTT = 9,
CCX_OF_RCWT = 4,
CCX_OF_NULL = 5,
CCX_OF_SMPTETT = 6,
CCX_OF_SPUPNG = 7,
CCX_OF_DVDRAW = 8, // See -d at http://www.theneitherworld.com/mcpoodle/SCC_TOOLS/DOCS/SCC_TOOLS.HTML#CCExtract
CCX_OF_WEBVTT = 9,
CCX_OF_SIMPLE_XML = 10,
CCX_OF_G608 = 11,
CCX_OF_CURL = 12,
CCX_OF_SSA = 13,
CCX_OF_MCC = 14,
CCX_OF_SCC = 15,
CCX_OF_CCD = 16,
CCX_OF_G608 = 11,
CCX_OF_CURL = 12,
CCX_OF_SSA = 13,
CCX_OF_MCC = 14,
CCX_OF_SCC = 15,
CCX_OF_CCD = 16,
};
enum ccx_output_date_format
{
ODF_NONE = 0,
ODF_HHMMSS = 1,
ODF_SECONDS = 2,
ODF_DATE = 3,
ODF_HHMMSSMS = 4 // HH:MM:SS,MILIS (.srt style)
ODF_NONE = 0,
ODF_HHMMSS = 1,
ODF_SECONDS = 2,
ODF_DATE = 3,
ODF_HHMMSSMS = 4 // HH:MM:SS,MILIS (.srt style)
};
enum ccx_stream_mode_enum
@@ -199,9 +199,9 @@ enum ccx_stream_mode_enum
CCX_SM_PROGRAM = 2,
CCX_SM_ASF = 3,
CCX_SM_MCPOODLESRAW = 4,
CCX_SM_RCWT = 5, // Raw Captions With Time, not used yet.
CCX_SM_MYTH = 6, // Use the myth loop
CCX_SM_MP4 = 7, // MP4, ISO-
CCX_SM_RCWT = 5, // Raw Captions With Time, not used yet.
CCX_SM_MYTH = 6, // Use the myth loop
CCX_SM_MP4 = 7, // MP4, ISO-
#ifdef WTV_DEBUG
CCX_SM_HEX_DUMP = 8, // Hexadecimal dump generated by wtvccdump
#endif
@@ -212,6 +212,7 @@ enum ccx_stream_mode_enum
CCX_SM_GXF = 11,
CCX_SM_MKV = 12,
CCX_SM_MXF = 13,
CCX_SM_SCC = 14, // Scenarist Closed Caption input
CCX_SM_AUTODETECT = 16
};
@@ -220,8 +221,8 @@ enum ccx_encoding_type
{
CCX_ENC_UNICODE = 0,
CCX_ENC_LATIN_1 = 1,
CCX_ENC_UTF_8 = 2,
CCX_ENC_ASCII = 3
CCX_ENC_UTF_8 = 2,
CCX_ENC_ASCII = 3
};
enum ccx_bufferdata_type
@@ -237,7 +238,8 @@ enum ccx_bufferdata_type
CCX_ISDB_SUBTITLE = 8,
/* BUffer where cc data contain 3 byte cc_valid ccdata 1 ccdata 2 */
CCX_RAW_TYPE = 9,
CCX_DVD_SUBTITLE = 10
CCX_DVD_SUBTITLE = 10,
CCX_HEVC = 11
};
enum ccx_frame_type
@@ -249,32 +251,33 @@ enum ccx_frame_type
CCX_FRAME_TYPE_D_FRAME = 4
};
typedef enum {
NO = 0,
typedef enum
{
NO = 0,
YES = 1,
UNDEFINED = 0xff
} bool_t;
enum ccx_code_type
{
CCX_CODEC_ANY = 0,
CCX_CODEC_ANY = 0,
CCX_CODEC_TELETEXT = 1,
CCX_CODEC_DVB = 2,
CCX_CODEC_ISDB_CC = 3,
CCX_CODEC_ATSC_CC = 4,
CCX_CODEC_NONE = 5
CCX_CODEC_DVB = 2,
CCX_CODEC_ISDB_CC = 3,
CCX_CODEC_ATSC_CC = 4,
CCX_CODEC_NONE = 5
};
/* Caption Distribution Packet */
enum cdp_section_type
{
/*
/*
The later constants are defined by SMPTE ST 334
Purchase for 80$ at: https://ieeexplore.ieee.org/document/8255806
*/
CDP_SECTION_DATA = 0x72,
CDP_SECTION_DATA = 0x72,
CDP_SECTION_SVC_INFO = 0x73,
CDP_SECTION_FOOTER = 0x74
CDP_SECTION_FOOTER = 0x74
};
/*
@@ -287,9 +290,9 @@ enum cdp_section_type
*
*/
#define IS_VALID_TELETEXT_DESC(desc) ( ((desc) == CCX_MPEG_DSC_VBI_DATA_DESCRIPTOR )|| \
( (desc) == CCX_MPEG_DSC_VBI_TELETEXT_DESCRIPTOR ) || \
( (desc) == CCX_MPEG_DSC_TELETEXT_DESCRIPTOR ) )
#define IS_VALID_TELETEXT_DESC(desc) (((desc) == CCX_MPEG_DSC_VBI_DATA_DESCRIPTOR) || \
((desc) == CCX_MPEG_DSC_VBI_TELETEXT_DESCRIPTOR) || \
((desc) == CCX_MPEG_DSC_TELETEXT_DESCRIPTOR))
/*
* This macro to be used when you want to find out whether you
@@ -308,19 +311,19 @@ enum cdp_section_type
* @param f_sel pass the codec name whom you are testing to be feasible
* to parse.
*/
#define IS_FEASIBLE(u_sel,u_nsel,f_sel) ( ( (u_sel) == CCX_CODEC_ANY && (u_nsel) != (f_sel) ) || (u_sel) == (f_sel) )
#define CCX_TXT_FORBIDDEN 0 // Ignore teletext packets
#define CCX_TXT_AUTO_NOT_YET_FOUND 1
#define CCX_TXT_IN_USE 2 // Positive auto-detected, or forced, etc
#define IS_FEASIBLE(u_sel, u_nsel, f_sel) (((u_sel) == CCX_CODEC_ANY && (u_nsel) != (f_sel)) || (u_sel) == (f_sel))
#define CCX_TXT_FORBIDDEN 0 // Ignore teletext packets
#define CCX_TXT_AUTO_NOT_YET_FOUND 1
#define CCX_TXT_IN_USE 2 // Positive auto-detected, or forced, etc
#define NB_LANGUAGE 100
extern const char *language[NB_LANGUAGE];
#define DEF_VAL_STARTCREDITSNOTBEFORE "0"
#define DEF_VAL_STARTCREDITSNOTBEFORE "0"
// To catch the theme after the teaser in TV shows
#define DEF_VAL_STARTCREDITSNOTAFTER "5:00"
#define DEF_VAL_STARTCREDITSFORATLEAST "2"
#define DEF_VAL_STARTCREDITSFORATMOST "5"
#define DEF_VAL_ENDCREDITSFORATLEAST "2"
#define DEF_VAL_ENDCREDITSFORATMOST "5"
#define DEF_VAL_STARTCREDITSNOTAFTER "5:00"
#define DEF_VAL_STARTCREDITSFORATLEAST "2"
#define DEF_VAL_STARTCREDITSFORATMOST "5"
#define DEF_VAL_ENDCREDITSFORATLEAST "2"
#define DEF_VAL_ENDCREDITSFORATMOST "5"
#endif

View File

@@ -73,7 +73,9 @@ void init_options(struct ccx_s_options *options)
options->ocrlang = NULL; // By default, autodetect .traineddata file
options->ocr_oem = -1; // By default, OEM mode depends on the tesseract version
options->psm = 3; // Default PSM mode (3 is the default tesseract as well)
options->ocr_quantmode = 1; // CCExtractor's internal
options->ocr_quantmode = 0; // No quantization (better OCR accuracy for DVB subtitles)
options->ocr_line_split = 0; // By default, don't split images into lines (pending testing)
options->ocr_blacklist = 1; // By default, use character blacklist to prevent common OCR errors (| vs I, etc.)
options->mkvlang = NULL; // By default, all the languages are extracted
options->ignore_pts_jumps = 1;
options->analyze_video_stream = 0;
@@ -139,7 +141,9 @@ void init_options(struct ccx_s_options *options)
options->enc_cfg.services_charsets = NULL;
options->enc_cfg.all_services_charset = NULL;
options->enc_cfg.with_semaphore = 0;
options->enc_cfg.force_dropframe = 0; // Assume No Drop Frame for MCC Encode.
options->enc_cfg.force_dropframe = 0; // Assume No Drop Frame for MCC Encode.
options->enc_cfg.scc_framerate = 0; // Default: 29.97fps for SCC output
options->enc_cfg.scc_accurate_timing = 0; // Default: off for backwards compatibility (issue #1120)
options->enc_cfg.extract_only_708 = 0;
options->settings_dtvcc.enabled = 0;
@@ -147,10 +151,13 @@ void init_options(struct ccx_s_options *options)
options->settings_dtvcc.print_file_reports = 1;
options->settings_dtvcc.no_rollup = 0;
options->settings_dtvcc.report = NULL;
options->settings_dtvcc.timing = NULL;
memset(
options->settings_dtvcc.services_enabled, 0,
CCX_DTVCC_MAX_SERVICES * sizeof(options->settings_dtvcc.services_enabled[0]));
options->scc_framerate = 0; // Default: 29.97fps
#ifdef WITH_LIBCURL
options->curlposturl = NULL;
#endif

View File

@@ -15,38 +15,38 @@ struct demuxer_cfg
enum ccx_code_type codec;
enum ccx_code_type nocodec;
unsigned ts_autoprogram; // Try to find a stream with captions automatically (no -pn needed)
unsigned ts_autoprogram; // Try to find a stream with captions automatically (no -pn needed)
unsigned ts_allprogram;
unsigned ts_cappids[128]; // PID for stream that holds caption information
unsigned ts_cappids[128]; // PID for stream that holds caption information
int nb_ts_cappid;
unsigned ts_forced_cappid ; // If 1, never mess with the selected PID
int ts_forced_program; // Specific program to process in TS files, if ts_forced_program_selected==1
unsigned ts_forced_cappid; // If 1, never mess with the selected PID
int ts_forced_program; // Specific program to process in TS files, if ts_forced_program_selected==1
unsigned ts_forced_program_selected;
int ts_datastreamtype ; // User WANTED stream type (i.e. use the stream that has this type)
int ts_datastreamtype; // User WANTED stream type (i.e. use the stream that has this type)
unsigned ts_forced_streamtype; // User selected (forced) stream type
};
struct encoder_cfg
{
int extract; // Extract 1st, 2nd or both fields
int dtvcc_extract; // 1 or 0
int gui_mode_reports; // If 1, output in stderr progress updates so the GUI can grab them
int extract; // Extract 1st, 2nd or both fields
int dtvcc_extract; // 1 or 0
int gui_mode_reports; // If 1, output in stderr progress updates so the GUI can grab them
char *output_filename;
enum ccx_output_format write_format; // 0=Raw, 1=srt, 2=SMI
int keep_output_closed;
int force_flush; // Force flush on content write
int append_mode; // Append mode for output files
int ucla; // 1 if -UCLA used, 0 if not
int force_flush; // Force flush on content write
int append_mode; // Append mode for output files
int ucla; // 1 if -UCLA used, 0 if not
enum ccx_encoding_type encoding;
enum ccx_output_date_format date_format;
char millis_separator;
int autodash; // Add dashes (-) before each speaker automatically?
int trim_subs; // " Remove spaces at sides? "
int autodash; // Add dashes (-) before each speaker automatically?
int trim_subs; // " Remove spaces at sides? "
int sentence_cap; // FIX CASE? = Fix case?
int splitbysentence; // Split text into complete sentences and prorate time?
#ifdef WITH_LIBCURL
char *curlposturl; // If out=curl, where do we send the data to?
char *curlposturl; // If out=curl, where do we send the data to?
#endif
int filter_profanity; // Censors profane words from subtitles
@@ -54,45 +54,49 @@ struct encoder_cfg
/* Credit stuff */
char *start_credits_text;
char *end_credits_text;
struct ccx_boundary_time startcreditsnotbefore, startcreditsnotafter; // Where to insert start credits, if possible
struct ccx_boundary_time startcreditsnotbefore, startcreditsnotafter; // Where to insert start credits, if possible
struct ccx_boundary_time startcreditsforatleast, startcreditsforatmost; // How long to display them?
struct ccx_boundary_time endcreditsforatleast, endcreditsforatmost;
ccx_encoders_transcript_format transcript_settings; // Keeps the settings for generating transcript output files.
unsigned int send_to_srv;
int no_bom; // Set to 1 when no BOM (Byte Order Mark) should be used for files. Note, this might make files unreadable in windows!
int no_bom; // Set to 1 when no BOM (Byte Order Mark) should be used for files. Note, this might make files unreadable in windows!
char *first_input_file;
int multiple_files;
int no_font_color;
int no_type_setting;
int cc_to_stdout; // If this is set to 1, the stdout will be flushed when data was written to the screen during a process_608 call.
int line_terminator_lf; // 0 = CRLF, 1=LF
LLONG subs_delay; // ms to delay (or advance) subs
int cc_to_stdout; // If this is set to 1, the stdout will be flushed when data was written to the screen during a process_608 call.
int line_terminator_lf; // 0 = CRLF, 1=LF
LLONG subs_delay; // ms to delay (or advance) subs
int program_number;
unsigned char in_format;
int nospupngocr; // 1 if we don't want to OCR bitmaps to add the text as comments in the XML file in spupng
int nospupngocr; // 1 if we don't want to OCR bitmaps to add the text as comments in the XML file in spupng
// MCC File
int force_dropframe; // 1 if dropframe frame count should be used. defaults to no drop frame.
int force_dropframe; // 1 if dropframe frame count should be used. defaults to no drop frame.
// SCC output framerate
int scc_framerate; // SCC output framerate: 0=29.97 (default), 1=24, 2=25, 3=30
int scc_accurate_timing; // If 1, use bandwidth-aware timing for broadcast compliance (issue #1120)
// text -> png (text render)
char *render_font; // The font used to render text if needed (e.g. teletext->spupng)
char *render_font; // The font used to render text if needed (e.g. teletext->spupng)
char *render_font_italics;
//CEA-708
// CEA-708
int services_enabled[CCX_DTVCC_MAX_SERVICES];
char** services_charsets;
char* all_services_charset;
int extract_only_708; // 1 if only 708 subs extraction is enabled
char **services_charsets;
char *all_services_charset;
int extract_only_708; // 1 if only 708 subs extraction is enabled
};
struct ccx_s_options // Options from user parameters
{
int extract; // Extract 1st, 2nd or both fields
int no_rollup; // Disable roll-up emulation (no duplicate output in generated file)
int extract; // Extract 1st, 2nd or both fields
int no_rollup; // Disable roll-up emulation (no duplicate output in generated file)
int noscte20;
int webvtt_create_css;
int cc_channel; // Channel we want to dump in srt mode
int cc_channel; // Channel we want to dump in srt mode
int buffer_input;
int nofontcolor;
int nohtmlescape;
@@ -100,57 +104,59 @@ struct ccx_s_options // Options from user parameters
struct ccx_boundary_time extraction_start, extraction_end; // Segment we actually process
int print_file_reports;
ccx_decoder_608_settings settings_608; // Contains the settings for the 608 decoder.
ccx_decoder_dtvcc_settings settings_dtvcc; // Same for 708 decoder
int is_608_enabled; // Is 608 enabled by explicitly using flags(-1,-2,-12)
int is_708_enabled; // Is 708 enabled by explicitly using flags(-svc)
ccx_decoder_608_settings settings_608; // Contains the settings for the 608 decoder.
ccx_decoder_dtvcc_settings settings_dtvcc; // Same for 708 decoder
int is_608_enabled; // Is 608 enabled by explicitly using flags(-1,-2,-12)
int is_708_enabled; // Is 708 enabled by explicitly using flags(-svc)
char millis_separator;
int binary_concat; // Disabled by -ve or --videoedited
int use_gop_as_pts; // Use GOP instead of PTS timing (0=do as needed, 1=always, -1=never)
int fix_padding; // Replace 0000 with 8080 in HDTV (needed for some cards)
int gui_mode_reports; // If 1, output in stderr progress updates so the GUI can grab them
int no_progress_bar; // If 1, suppress the output of the progress to stdout
char *sentence_cap_file; // Extra capitalization word file
int live_stream; /* -1 -> Not a complete file but a live stream, without timeout
0 -> A regular file
>0 -> Live stream with a timeout of this value in seconds */
char *filter_profanity_file; // Extra profanity word file
int messages_target; // 0 = nowhere (quiet), 1=stdout, 2=stderr
int timestamp_map; // If 1, add WebVTT X-TIMESTAMP-MAP header
int binary_concat; // Disabled by -ve or --videoedited
int use_gop_as_pts; // Use GOP instead of PTS timing (0=do as needed, 1=always, -1=never)
int fix_padding; // Replace 0000 with 8080 in HDTV (needed for some cards)
int gui_mode_reports; // If 1, output in stderr progress updates so the GUI can grab them
int no_progress_bar; // If 1, suppress the output of the progress to stdout
char *sentence_cap_file; // Extra capitalization word file
int live_stream; /* -1 -> Not a complete file but a live stream, without timeout
0 -> A regular file
>0 -> Live stream with a timeout of this value in seconds */
char *filter_profanity_file; // Extra profanity word file
int messages_target; // 0 = nowhere (quiet), 1=stdout, 2=stderr
int timestamp_map; // If 1, add WebVTT X-TIMESTAMP-MAP header
/* Levenshtein's parameters, for string comparison */
int dolevdist; // 0 => don't attempt to correct typos with this algorithm
int dolevdist; // 0 => don't attempt to correct typos with this algorithm
int levdistmincnt, levdistmaxpct; // Means 2 fails or less is "the same", 10% or less is also "the same"
int investigate_packets; // Look for captions in all packets when everything else fails
int fullbin; // Disable pruning of padding cc blocks
int nosync; // Disable syncing
unsigned int hauppauge_mode; // If 1, use PID=1003, process specially and so on
int wtvconvertfix; // Fix broken Windows 7 conversion
int investigate_packets; // Look for captions in all packets when everything else fails
int fullbin; // Disable pruning of padding cc blocks
int nosync; // Disable syncing
unsigned int hauppauge_mode; // If 1, use PID=1003, process specially and so on
int wtvconvertfix; // Fix broken Windows 7 conversion
int wtvmpeg2;
int auto_myth; // Use myth-tv mpeg code? 0=no, 1=yes, 2=auto
int auto_myth; // Use myth-tv mpeg code? 0=no, 1=yes, 2=auto
/* MP4 related stuff */
unsigned mp4vidtrack; // Process the video track even if a CC dedicated track exists.
int extract_chapters; // If 1, extracts chapters (if present), from MP4 files.
unsigned mp4vidtrack; // Process the video track even if a CC dedicated track exists.
int extract_chapters; // If 1, extracts chapters (if present), from MP4 files.
/* General settings */
int usepicorder; // Force the use of pic_order_cnt_lsb in AVC/H.264 data streams
int xmltv; // 1 = full output. 2 = live output. 3 = both
int xmltvliveinterval; // interval in seconds between writing xmltv output files in live mode
int xmltvoutputinterval; // interval in seconds between writing xmltv full file output
int xmltvonlycurrent; // 0 off 1 on
int usepicorder; // Force the use of pic_order_cnt_lsb in AVC/H.264 data streams
int xmltv; // 1 = full output. 2 = live output. 3 = both
int xmltvliveinterval; // interval in seconds between writing xmltv output files in live mode
int xmltvoutputinterval; // interval in seconds between writing xmltv full file output
int xmltvonlycurrent; // 0 off 1 on
int keep_output_closed;
int force_flush; // Force flush on content write
int append_mode; // Append mode for output files
int ucla; // 1 if UCLA used, 0 if not
int tickertext; // 1 if ticker text style burned in subs, 0 if not
int hardsubx; // 1 if burned-in subtitles to be extracted
int hardsubx_and_common; // 1 if both burned-in and not burned in need to be extracted
char *dvblang; // The name of the language stream for DVB
const char *ocrlang; // The name of the .traineddata file to be loaded with tesseract
int ocr_oem; // The Tesseract OEM mode, could be 0 (default), 1 or 2
int psm; // The Tesseract PSM mode, could be between 0 and 13. 3 is tesseract default
int ocr_quantmode; // How to quantize the bitmap before passing to to tesseract (0=no quantization at all, 1=CCExtractor's internal)
char *mkvlang; // The name of the language stream for MKV
int analyze_video_stream; // If 1, the video stream will be processed even if we're using a different one for subtitles.
int force_flush; // Force flush on content write
int append_mode; // Append mode for output files
int ucla; // 1 if UCLA used, 0 if not
int tickertext; // 1 if ticker text style burned in subs, 0 if not
int hardsubx; // 1 if burned-in subtitles to be extracted
int hardsubx_and_common; // 1 if both burned-in and not burned in need to be extracted
char *dvblang; // The name of the language stream for DVB
const char *ocrlang; // The name of the .traineddata file to be loaded with tesseract
int ocr_oem; // The Tesseract OEM mode, could be 0 (default), 1 or 2
int psm; // The Tesseract PSM mode, could be between 0 and 13. 3 is tesseract default
int ocr_quantmode; // How to quantize the bitmap before passing to to tesseract (0=no quantization at all, 1=CCExtractor's internal)
int ocr_line_split; // If 1, split images into lines before OCR (uses PSM 7 for better accuracy)
int ocr_blacklist; // If 1, use character blacklist to prevent common OCR errors (default: enabled)
char *mkvlang; // The name of the language stream for MKV
int analyze_video_stream; // If 1, the video stream will be processed even if we're using a different one for subtitles.
/*HardsubX related stuff*/
int hardsubx_ocr_mode;
@@ -164,42 +170,43 @@ struct ccx_s_options // Options from user parameters
ccx_encoders_transcript_format transcript_settings; // Keeps the settings for generating transcript output files.
enum ccx_output_date_format date_format;
unsigned send_to_srv;
enum ccx_output_format write_format; // 0=Raw, 1=srt, 2=SMI
enum ccx_output_format write_format; // 0=Raw, 1=srt, 2=SMI
int write_format_rewritten;
int use_ass_instead_of_ssa;
int use_webvtt_styling;
LLONG debug_mask; // dbg_print will use this mask to print or ignore different types
LLONG debug_mask_on_debug; // If we're using temp_debug to enable/disable debug "live", this is the mask when temp_debug=1
LLONG debug_mask; // dbg_print will use this mask to print or ignore different types
LLONG debug_mask_on_debug; // If we're using temp_debug to enable/disable debug "live", this is the mask when temp_debug=1
/* Networking */
char *udpsrc;
char *udpaddr;
unsigned udpport; // Non-zero => Listen for UDP packets on this port, no files.
unsigned udpport; // Non-zero => Listen for UDP packets on this port, no files.
char *tcpport;
char *tcp_password;
char *tcp_desc;
char *srv_addr;
char *srv_port;
int noautotimeref; // Do NOT set time automatically?
enum ccx_datasource input_source; // Files, stdin or network
int noautotimeref; // Do NOT set time automatically?
enum ccx_datasource input_source; // Files, stdin or network
char *output_filename;
char **inputfile; // List of files to process
int num_input_files; // How many?
char **inputfile; // List of files to process
int num_input_files; // How many?
struct demuxer_cfg demux_cfg;
struct encoder_cfg enc_cfg;
LLONG subs_delay; // ms to delay (or advance) subs
int cc_to_stdout; // If this is set to 1, the stdout will be flushed when data was written to the screen during a process_608 call.
int pes_header_to_stdout; // If this is set to 1, the PES Header will be printed to console (debugging purposes)
int ignore_pts_jumps; // If 1, the program will ignore PTS jumps. Sometimes this parameter is required for DVB subs with > 30s pause time
LLONG subs_delay; // ms to delay (or advance) subs
int cc_to_stdout; // If this is set to 1, the stdout will be flushed when data was written to the screen during a process_608 call.
int pes_header_to_stdout; // If this is set to 1, the PES Header will be printed to console (debugging purposes)
int ignore_pts_jumps; // If 1, the program will ignore PTS jumps. Sometimes this parameter is required for DVB subs with > 30s pause time
int multiprogram;
int out_interval;
int segment_on_key_frames_only;
int scc_framerate; // SCC input framerate: 0=29.97 (default), 1=24, 2=25, 3=30
#ifdef WITH_LIBCURL
char *curlposturl;
#endif
};
extern struct ccx_s_options ccx_options;
void init_options (struct ccx_s_options *options);
void init_options(struct ccx_s_options *options);
#endif

View File

@@ -1,123 +1,122 @@
#ifndef CCX_PLATFORM_H
#define CCX_PLATFORM_H
#define CCX_PLATFORM_H
// Default includes (cross-platform)
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>
#include <time.h>
#include <fcntl.h>
#include <stdarg.h>
#include <errno.h>
// Default includes (cross-platform)
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>
#include <time.h>
#include <fcntl.h>
#include <stdarg.h>
#include <errno.h>
#define __STDC_FORMAT_MACROS
#define __STDC_FORMAT_MACROS
#ifdef _WIN32
#define inline _inline
#define typeof decltype
#include <io.h>
#include <ws2tcpip.h>
#include <windows.h>
#define STDIN_FILENO 0
#define STDOUT_FILENO 1
#define STDERR_FILENO 2
#include "inttypes.h"
#undef UINT64_MAX
#define UINT64_MAX _UI64_MAX
typedef int socklen_t;
#if !defined(__MINGW64__) && !defined(__MINGW32__)
typedef int ssize_t;
#endif
typedef uint32_t in_addr_t;
#ifndef IN_CLASSD
#define IN_CLASSD(i) (((INT32)(i) & 0xf0000000) == 0xe0000000)
#define IN_MULTICAST(i) IN_CLASSD(i)
#endif
#include <direct.h>
#define mkdir(path, mode) _mkdir(path)
#ifndef snprintf
// Added ifndef because VS2013 warns for macro redefinition.
#define snprintf(buf, len, fmt, ...) _snprintf(buf, len, fmt, __VA_ARGS__)
#endif
#define sleep(sec) Sleep((sec) * 1000)
#ifdef _WIN32
#define inline _inline
#define typeof decltype
#include <io.h>
#include <ws2tcpip.h>
#include <windows.h>
#define STDIN_FILENO 0
#define STDOUT_FILENO 1
#define STDERR_FILENO 2
#include "inttypes.h"
#undef UINT64_MAX
#define UINT64_MAX _UI64_MAX
typedef int socklen_t;
#if !defined(__MINGW64__) && !defined(__MINGW32__)
typedef int ssize_t;
#endif
typedef uint32_t in_addr_t;
#ifndef IN_CLASSD
#define IN_CLASSD(i) (((INT32)(i) & 0xf0000000) == 0xe0000000)
#define IN_MULTICAST(i) IN_CLASSD(i)
#endif
#include <direct.h>
#define mkdir(path, mode) _mkdir(path)
#ifndef snprintf
// Added ifndef because VS2013 warns for macro redefinition.
#define snprintf(buf, len, fmt, ...) _snprintf(buf, len, fmt, __VA_ARGS__)
#endif
#define sleep(sec) Sleep((sec) * 1000)
#include <fcntl.h>
#else // _WIN32
#include <unistd.h>
#define __STDC_LIMIT_MACROS
#include <inttypes.h>
#include <stdint.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <arpa/inet.h>
#include <netdb.h>
#include <sys/stat.h>
#include <sys/types.h>
#endif // _WIN32
#include <fcntl.h>
#else // _WIN32
#include <unistd.h>
#define __STDC_LIMIT_MACROS
#include <inttypes.h>
#include <stdint.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <arpa/inet.h>
#include <netdb.h>
#include <sys/stat.h>
#include <sys/types.h>
#endif // _WIN32
//#include "disable_warnings.h"
// #include "disable_warnings.h"
#if defined(_MSC_VER) && !defined(__clang__)
#include "stdintmsc.h"
// Don't bug me with strcpy() deprecation warnings
#pragma warning(disable : 4996)
#else
#include <stdint.h>
#endif
#if defined(_MSC_VER) && !defined(__clang__)
#include "stdintmsc.h"
// Don't bug me with strcpy() deprecation warnings
#pragma warning(disable : 4996)
#else
#include <stdint.h>
#endif
#ifdef __OpenBSD__
#define FOPEN64 fopen
#define OPEN open
#define FSEEK fseek
#define FTELL ftell
#define LSEEK lseek
#define FSTAT fstat
#else
#ifdef _WIN32
#define OPEN _open
// 64 bit file functions
#if defined(_MSC_VER)
#define FSEEK _fseeki64
#define FTELL _ftelli64
#else
// For MinGW
#define FSEEK fseeko64
#define FTELL ftello64
#endif
#define TELL _telli64
#define LSEEK _lseeki64
typedef struct _stati64 FSTATSTRUCT;
#else
// Linux internally maps these functions to 64bit usage,
// if _FILE_OFFSET_BITS macro is set to 64
#define FOPEN64 fopen
#define OPEN open
#define LSEEK lseek
#define FSEEK fseek
#define FTELL ftell
#define FSTAT fstat
#define TELL tell
#include <stdint.h>
#endif
#endif
#ifdef __OpenBSD__
#define FOPEN64 fopen
#define OPEN open
#define FSEEK fseek
#define FTELL ftell
#define LSEEK lseek
#define FSTAT fstat
#else
#ifdef _WIN32
#define OPEN _open
// 64 bit file functions
#if defined(_MSC_VER)
#define FSEEK _fseeki64
#define FTELL _ftelli64
#else
// For MinGW
#define FSEEK fseeko64
#define FTELL ftello64
#endif
#define TELL _telli64
#define LSEEK _lseeki64
typedef struct _stati64 FSTATSTRUCT;
#else
// Linux internally maps these functions to 64bit usage,
// if _FILE_OFFSET_BITS macro is set to 64
#define FOPEN64 fopen
#define OPEN open
#define LSEEK lseek
#define FSEEK fseek
#define FTELL ftell
#define FSTAT fstat
#define TELL tell
#include <stdint.h>
#endif
#endif
#ifndef int64_t_C
#define int64_t_C(c) (c ## LL)
#define uint64_t_C(c) (c ## ULL)
#endif
#ifndef int64_t_C
#define int64_t_C(c) (c##LL)
#define uint64_t_C(c) (c##ULL)
#endif
#ifndef O_BINARY
#define O_BINARY 0 // Not present in Linux because it's always binary
#endif
#ifndef O_BINARY
#define O_BINARY 0 // Not present in Linux because it's always binary
#endif
#ifndef max
#define max(a,b) (((a) > (b)) ? (a) : (b))
#endif
#ifndef max
#define max(a, b) (((a) > (b)) ? (a) : (b))
#endif
typedef int64_t LLONG;
typedef uint64_t ULLONG;
typedef uint8_t UBYTE;
typedef int64_t LLONG;
typedef uint64_t ULLONG;
typedef uint8_t UBYTE;
#endif // CCX_PLATFORM_H

View File

@@ -3,26 +3,28 @@
#include "ccx_common_constants.h"
enum ccx_common_logging_gui {
CCX_COMMON_LOGGING_GUI_XDS_PROGRAM_NAME, // Called with xds_program_name
CCX_COMMON_LOGGING_GUI_XDS_PROGRAM_ID_NR, // Called with current_xds_min, current_xds_hour, current_xds_date, current_xds_month
enum ccx_common_logging_gui
{
CCX_COMMON_LOGGING_GUI_XDS_PROGRAM_NAME, // Called with xds_program_name
CCX_COMMON_LOGGING_GUI_XDS_PROGRAM_ID_NR, // Called with current_xds_min, current_xds_hour, current_xds_date, current_xds_month
CCX_COMMON_LOGGING_GUI_XDS_PROGRAM_DESCRIPTION, // Called with line_num, xds_desc
CCX_COMMON_LOGGING_GUI_XDS_CALL_LETTERS // Called with current_xds_call_letters
CCX_COMMON_LOGGING_GUI_XDS_CALL_LETTERS // Called with current_xds_call_letters
};
struct ccx_common_logging_t {
LLONG debug_mask; // The debug mask that is used to determine if things should be printed or not.
void(*fatal_ftn) (int exit_code, const char *fmt, ...); // Used when an unrecoverable error happens. This should log output/save the error and then exit the program.
void(*debug_ftn) (LLONG mask, const char *fmt, ...); // Used to process debug output. Mask can be ignored (custom set by debug_mask).
void(*log_ftn)(const char *fmt, ...); // Used to print things. Replacement of standard printf, to allow more control.
void(*gui_ftn)(enum ccx_common_logging_gui message_type, ...); // Used to display things in a gui (if appropriate). Is called with the message_type and appropriate variables (described in enum)
struct ccx_common_logging_t
{
LLONG debug_mask; // The debug mask that is used to determine if things should be printed or not.
void (*fatal_ftn)(int exit_code, const char *fmt, ...); // Used when an unrecoverable error happens. This should log output/save the error and then exit the program.
void (*debug_ftn)(LLONG mask, const char *fmt, ...); // Used to process debug output. Mask can be ignored (custom set by debug_mask).
void (*log_ftn)(const char *fmt, ...); // Used to print things. Replacement of standard printf, to allow more control.
void (*gui_ftn)(enum ccx_common_logging_gui message_type, ...); // Used to display things in a gui (if appropriate). Is called with the message_type and appropriate variables (described in enum)
};
extern struct ccx_common_logging_t ccx_common_logging;
enum subdatatype
{
CC_DATATYPE_GENERIC=0,
CC_DATATYPE_DVB=1
CC_DATATYPE_GENERIC = 0,
CC_DATATYPE_DVB = 1
};
enum subtype
@@ -34,18 +36,18 @@ enum subtype
};
/**
* Raw Subtitle struct used as output of decoder (cc608)
* and input for encoder (sami, srt, transcript or smptett etc)
*
* if subtype CC_BITMAP then data contain nb_data numbers of rectangle
* which have to be displayed at same time.
*/
* Raw Subtitle struct used as output of decoder (cc608)
* and input for encoder (sami, srt, transcript or smptett etc)
*
* if subtype CC_BITMAP then data contain nb_data numbers of rectangle
* which have to be displayed at same time.
*/
struct cc_subtitle
{
/**
* A generic data which contain data according to decoder
* @warn decoder cant output multiple types of data
*/
* A generic data which contain data according to decoder
* @warn decoder cant output multiple types of data
*/
void *data;
enum subdatatype datatype;
@@ -56,11 +58,11 @@ struct cc_subtitle
enum subtype type;
/** Encoding type of Text, must be ignored in case of subtype as bitmap or cc_screen*/
enum ccx_encoding_type enc_type;
enum ccx_encoding_type enc_type;
/* set only when all the data is to be displayed at same time
* Unit is of time is ms
*/
* Unit is of time is ms
*/
LLONG start_time;
LLONG end_time;
@@ -72,13 +74,19 @@ struct cc_subtitle
/** flag to tell that decoder has given output */
int got_output;
char mode[5];
char info[4];
/** Used for DVB end time in ms */
int time_out;
/** Raw PTS value when this subtitle started (for DVB timing) */
LLONG start_pts;
/** Teletext page number (for multi-page extraction, issue #665) */
uint16_t teletext_page;
struct cc_subtitle *next;
struct cc_subtitle *prev;
};

View File

@@ -35,6 +35,8 @@ void ccxr_set_current_pts(struct ccx_common_timing_ctx *ctx, LLONG pts);
int ccxr_set_fts(struct ccx_common_timing_ctx *ctx);
LLONG ccxr_get_fts(struct ccx_common_timing_ctx *ctx, int current_field);
LLONG ccxr_get_fts_max(struct ccx_common_timing_ctx *ctx);
LLONG ccxr_get_visible_start(struct ccx_common_timing_ctx *ctx, int current_field);
LLONG ccxr_get_visible_end(struct ccx_common_timing_ctx *ctx, int current_field);
char *ccxr_print_mstime_static(LLONG mstime, char *buf);
void ccxr_print_debug_timing(struct ccx_common_timing_ctx *ctx);
void ccxr_calculate_ms_gop_time(struct gop_time_code *g);
@@ -63,6 +65,9 @@ struct ccx_common_timing_ctx *init_timing_ctx(struct ccx_common_timing_settings_
ctx->current_pts = 0;
ctx->current_picture_coding_type = CCX_FRAME_TYPE_RESET_OR_UNKNOWN;
ctx->min_pts_adjusted = 0;
ctx->seen_known_frame_type = 0;
ctx->pending_min_pts = 0x01FFFFFFFFLL;
ctx->unknown_frame_count = 0;
ctx->min_pts = 0x01FFFFFFFFLL; // 33 bit
ctx->max_pts = 0;
ctx->sync_pts = 0;
@@ -108,15 +113,18 @@ LLONG get_fts_max(struct ccx_common_timing_ctx *ctx)
/**
* SCC Time formatting
* Note: buf must have at least 32 bytes available from the write position
*/
size_t print_scc_time(struct ccx_boundary_time time, char *buf)
{
char *fmt = "%02u:%02u:%02u;%02u";
double frame;
// Format produces "HH:MM:SS;FF" = 11 chars + null, use 32 for safety
const size_t max_time_len = 32;
frame = ((double)(time.time_in_ms - 1000 * (time.ss + 60 * (time.mm + 60 * time.hh))) * 29.97 / 1000);
return (size_t)sprintf(buf + time.set, fmt, time.hh, time.mm, time.ss, (unsigned)frame);
return (size_t)snprintf(buf + time.set, max_time_len, fmt, time.hh, time.mm, time.ss, (unsigned)frame);
}
struct ccx_boundary_time get_time(LLONG time)
@@ -137,11 +145,14 @@ struct ccx_boundary_time get_time(LLONG time)
/**
* Fill buffer with a time string using specified format
* @param fmt has to contain 4 format specifiers for h, m, s and ms respectively
* Note: buf must have at least 32 bytes available from the write position
*/
size_t print_mstime_buff(LLONG mstime, char *fmt, char *buf)
{
unsigned hh, mm, ss, ms;
int signoffset = (mstime < 0 ? 1 : 0);
// Typical format produces "HH:MM:SS:MSS" = 12 chars + null, use 32 for safety
const size_t max_time_len = 32;
if (mstime < 0) // Avoid loss of data warning with abs()
mstime = -mstime;
@@ -153,7 +164,7 @@ size_t print_mstime_buff(LLONG mstime, char *fmt, char *buf)
buf[0] = '-';
return (size_t)sprintf(buf + signoffset, fmt, hh, mm, ss, ms);
return (size_t)snprintf(buf + signoffset, max_time_len, fmt, hh, mm, ss, ms);
}
/* Fill a static buffer with a time string (hh:mm:ss:ms) corresponding

View File

@@ -17,40 +17,43 @@ struct gop_time_code
struct ccx_common_timing_settings_t
{
int disable_sync_check; // If 1, timeline jumps will be ignored. This is important in several input formats that are assumed to have correct timing, no matter what.
int no_sync; // If 1, there will be no sync at all. Mostly useful for debugging.
int disable_sync_check; // If 1, timeline jumps will be ignored. This is important in several input formats that are assumed to have correct timing, no matter what.
int no_sync; // If 1, there will be no sync at all. Mostly useful for debugging.
int is_elementary_stream; // Needs to be set, as it's used in set_fts.
LLONG *file_position; // The position of the file
LLONG *file_position; // The position of the file
};
extern struct ccx_common_timing_settings_t ccx_common_timing_settings;
struct ccx_boundary_time
{
int hh,mm,ss;
int hh, mm, ss;
LLONG time_in_ms;
int set;
};
struct ccx_common_timing_ctx
{
int pts_set; // 0 = No, 1 = received, 2 = min_pts set
int min_pts_adjusted; // 0 = No, 1=Yes (don't adjust again)
int pts_set; // 0 = No, 1 = received, 2 = min_pts set
int min_pts_adjusted; // 0 = No, 1=Yes (don't adjust again)
int seen_known_frame_type; // 0 = No, 1 = Yes. Tracks if we've seen a frame with known type
LLONG pending_min_pts; // Minimum PTS seen while waiting for frame type determination
unsigned int unknown_frame_count; // Count of set_fts calls with unknown frame type
LLONG current_pts;
enum ccx_frame_type current_picture_coding_type;
int current_tref; // Store temporal reference of current frame
int current_tref; // Store temporal reference of current frame
LLONG min_pts;
LLONG max_pts;
LLONG sync_pts;
LLONG minimum_fts; // No screen should start before this FTS
LLONG fts_now; // Time stamp of current file (w/ fts_offset, w/o fts_global)
LLONG fts_now; // Time stamp of current file (w/ fts_offset, w/o fts_global)
LLONG fts_offset; // Time before first sync_pts
LLONG fts_fc_offset; // Time before first GOP
LLONG fts_max; // Remember the maximum fts that we saw in current file
LLONG fts_max; // Remember the maximum fts that we saw in current file
LLONG fts_global; // Duration of previous files (-ve mode)
int sync_pts2fts_set; // 0 = No, 1 = Yes
LLONG sync_pts2fts_fts;
LLONG sync_pts2fts_pts;
int pts_reset; // 0 = No, 1 = Yes. PTS resets when current_pts is lower than prev
int pts_reset; // 0 = No, 1 = Yes. PTS resets when current_pts is lower than prev
};
// Count 608 (per field) and 708 blocks since last set_fts() call
extern int cb_field1, cb_field2, cb_708;
@@ -60,7 +63,6 @@ extern int MPEG_CLOCK_FREQ; // This is part of the standard
extern int max_dif;
extern unsigned pts_big_change;
extern enum ccx_frame_type current_picture_coding_type;
extern double current_fps;
extern int frames_since_ref_time;
@@ -85,7 +87,7 @@ LLONG get_fts_max(struct ccx_common_timing_ctx *ctx);
char *print_mstime_static(LLONG mstime);
size_t print_mstime_buff(LLONG mstime, char *fmt, char *buf);
void print_debug_timing(struct ccx_common_timing_ctx *ctx);
int gop_accepted(struct gop_time_code* g);
int gop_accepted(struct gop_time_code *g);
void calculate_ms_gop_time(struct gop_time_code *g);
#endif

View File

@@ -127,6 +127,8 @@ ccx_decoder_608_context *ccx_decoder_608_init_library(struct ccx_decoder_608_set
ccx_decoder_608_context *data = NULL;
data = malloc(sizeof(ccx_decoder_608_context));
if (!data)
return NULL;
data->cursor_column = 0;
data->cursor_row = 0;
@@ -147,10 +149,11 @@ ccx_decoder_608_context *ccx_decoder_608_init_library(struct ccx_decoder_608_set
data->my_field = field;
data->my_channel = channel;
data->have_cursor_position = 0;
data->rollup_from_popon = 0;
data->output_format = output_format;
data->cc_to_stdout = cc_to_stdout;
data->textprinted = 0;
data->ts_start_of_current_line = 0;
// Note: ts_start_of_current_line already set to -1 above
data->halt = halt;
@@ -198,6 +201,9 @@ void delete_to_end_of_row(ccx_decoder_608_context *context)
{
if (context->mode != MODE_TEXT)
{
if (context->cursor_row >= CCX_DECODER_608_SCREEN_ROWS)
return;
struct eia608_screen *use_buffer = get_writing_buffer(context);
for (int i = context->cursor_column; i <= CCX_DECODER_608_SCREEN_WIDTH - 1; i++)
{
@@ -218,6 +224,10 @@ void write_char(const unsigned char c, ccx_decoder_608_context *context)
/* printf ("\rWriting char [%c] at %s:%d:%d\n",c,
use_buffer == &wb->data608->buffer1?"B1":"B2",
wb->data608->cursor_row,wb->data608->cursor_column); */
if (context->cursor_row >= CCX_DECODER_608_SCREEN_ROWS || context->cursor_column >= CCX_DECODER_608_SCREEN_WIDTH)
return;
use_buffer->characters[context->cursor_row][context->cursor_column] = c;
use_buffer->colors[context->cursor_row][context->cursor_column] = context->current_color;
use_buffer->fonts[context->cursor_row][context->cursor_column] = context->font;
@@ -225,7 +235,9 @@ void write_char(const unsigned char c, ccx_decoder_608_context *context)
if (use_buffer->empty)
{
if (MODE_POPON != context->mode)
// Don't set start time if we're in a transition from pop-on to roll-up
// In this case, start time will be set when CR causes scrolling
if (MODE_POPON != context->mode && !context->rollup_from_popon)
context->current_visible_start_ms = get_visible_start(context->timing, context->my_field);
}
use_buffer->empty = 0;
@@ -311,12 +323,23 @@ int write_cc_buffer(ccx_decoder_608_context *context, struct cc_subtitle *sub)
if (!data->empty && context->output_format != CCX_OF_NULL)
{
sub->data = (struct eia608_screen *)realloc(sub->data, (sub->nb_data + 1) * sizeof(*data));
if (!sub->data)
size_t new_size;
if (sub->nb_data + 1 > SIZE_MAX / sizeof(struct eia608_screen))
{
ccx_common_logging.log_ftn("No Memory left");
ccx_common_logging.log_ftn("Too many screens, cannot allocate more memory.\n");
return 0;
}
new_size = (sub->nb_data + 1) * sizeof(struct eia608_screen);
struct eia608_screen *new_data = (struct eia608_screen *)realloc(sub->data, new_size);
if (!new_data)
{
ccx_common_logging.log_ftn("Out of memory while reallocating screen buffer\n");
return 0;
}
sub->data = new_data;
sub->datatype = CC_DATATYPE_GENERIC;
memcpy(((struct eia608_screen *)sub->data) + sub->nb_data, data, sizeof(*data));
sub->nb_data++;
@@ -380,12 +403,23 @@ int write_cc_line(ccx_decoder_608_context *context, struct cc_subtitle *sub)
if (!data->empty)
{
sub->data = (struct eia608_screen *)realloc(sub->data, (sub->nb_data + 1) * sizeof(*data));
if (!sub->data)
size_t new_size;
if (sub->nb_data + 1 > SIZE_MAX / sizeof(struct eia608_screen))
{
ccx_common_logging.log_ftn("No Memory left");
ccx_common_logging.log_ftn("Too many screens, cannot allocate more memory.\n");
return 0;
}
new_size = (sub->nb_data + 1) * sizeof(struct eia608_screen);
struct eia608_screen *new_data = (struct eia608_screen *)realloc(sub->data, new_size);
if (!new_data)
{
ccx_common_logging.log_ftn("Out of memory while reallocating screen buffer\n");
return 0;
}
sub->data = new_data;
memcpy(((struct eia608_screen *)sub->data) + sub->nb_data, data, sizeof(*data));
data = (struct eia608_screen *)sub->data + sub->nb_data;
sub->datatype = CC_DATATYPE_GENERIC;
@@ -720,6 +754,10 @@ void handle_command(unsigned char c1, const unsigned char c2, ccx_decoder_608_co
if (write_cc_buffer(context, sub))
context->screenfuls_counter++;
erase_memory(context, true);
// Track transition from pop-on/paint-on to roll-up for timing adjustment
// Start time will be set when CR causes scrolling (matching FFmpeg behavior)
context->rollup_from_popon = 1;
context->ts_start_of_current_line = -1;
}
erase_memory(context, false);
@@ -772,6 +810,15 @@ void handle_command(unsigned char c1, const unsigned char c2, ccx_decoder_608_co
changes = check_roll_up(context);
if (changes)
{
// Handle pop-on to roll-up transition timing
// Use ts_start_of_current_line (when current line started) as the start time
// This matches FFmpeg's behavior of timestamping when the display changed
if (context->rollup_from_popon && context->ts_start_of_current_line > 0)
{
context->current_visible_start_ms = context->ts_start_of_current_line;
context->rollup_from_popon = 0;
}
// Only if the roll up would actually cause a line to disappear we write the buffer
if (context->output_format != CCX_OF_TRANSCRIPT)
{
@@ -781,8 +828,18 @@ void handle_command(unsigned char c1, const unsigned char c2, ccx_decoder_608_co
erase_memory(context, true); // Make sure the lines we just wrote aren't written again
}
}
roll_up(context); // The roll must be done anyway of course.
context->ts_start_of_current_line = -1; // Unknown.
roll_up(context); // The roll must be done anyway of course.
// When in pop-on to roll-up transition with changes=0 (first CR, only 1 line),
// preserve the CR time so the next caption uses the display state change time,
// not the character typing time. This matches FFmpeg's timing behavior.
if (context->rollup_from_popon && !changes)
{
context->ts_start_of_current_line = get_fts(context->timing, context->my_field);
}
else
{
context->ts_start_of_current_line = -1; // Unknown.
}
if (changes)
context->current_visible_start_ms = get_visible_start(context->timing, context->my_field);
context->cursor_column = 0;

View File

@@ -1,5 +1,5 @@
#ifndef CCX_DECODER_608_H
#define CCX_DECODER_608_H
#define CCX_DECODER_608_H
#include "ccx_common_platform.h"
#include "ccx_common_structs.h"
#include "ccx_decoders_structs.h"
@@ -19,11 +19,11 @@ struct ccx_decoder_608_report
typedef struct ccx_decoder_608_settings
{
int direct_rollup; // Write roll-up captions directly instead of line by line?
int force_rollup; // 0=Disabled, 1, 2 or 3=max lines in roll-up mode
int no_rollup; // If 1, write one line at a time
int direct_rollup; // Write roll-up captions directly instead of line by line?
int force_rollup; // 0=Disabled, 1, 2 or 3=max lines in roll-up mode
int no_rollup; // If 1, write one line at a time
enum ccx_decoder_608_color_code default_color; // Default color to use.
int screens_to_process; // How many screenfuls we want? Use -1 for unlimited
int screens_to_process; // How many screenfuls we want? Use -1 for unlimited
struct ccx_decoder_608_report *report;
} ccx_decoder_608_settings;
@@ -34,33 +34,33 @@ typedef struct ccx_decoder_608_context
struct eia608_screen buffer2;
int cursor_row, cursor_column;
int visible_buffer;
int screenfuls_counter; // Number of meaningful screenfuls written
int screenfuls_counter; // Number of meaningful screenfuls written
LLONG current_visible_start_ms; // At what time did the current visible buffer became so?
enum cc_modes mode;
unsigned char last_c1, last_c2;
int channel; // Currently selected channel
enum ccx_decoder_608_color_code current_color; // Color we are currently using to write
enum font_bits font; // Font we are currently using to write
int channel; // Currently selected channel
enum ccx_decoder_608_color_code current_color; // Color we are currently using to write
enum font_bits font; // Font we are currently using to write
int rollup_base_row;
LLONG ts_start_of_current_line; /* Time at which the first character for current line was received, =-1 no character received yet */
LLONG ts_last_char_received; /* Time at which the last written character was received, =-1 no character received yet */
int new_channel; // The new channel after a channel change
int my_field; // Used for sanity checks
int my_channel; // Used for sanity checks
long bytes_processed_608; // To be written ONLY by process_608
LLONG ts_last_char_received; /* Time at which the last written character was received, =-1 no character received yet */
int new_channel; // The new channel after a channel change
int my_field; // Used for sanity checks
int my_channel; // Used for sanity checks
int rollup_from_popon; // Track transition from pop-on/paint-on to roll-up mode
int64_t bytes_processed_608; // To be written ONLY by process_608
int have_cursor_position;
int *halt; // Can be used to halt the feeding of caption data. Set to 1 if screens_to_progress != -1 && screenfuls_counter >= screens_to_process
int cc_to_stdout; // If this is set to 1, the stdout will be flushed when data was written to the screen during a process_608 call.
int *halt; // Can be used to halt the feeding of caption data. Set to 1 if screens_to_progress != -1 && screenfuls_counter >= screens_to_process
int cc_to_stdout; // If this is set to 1, the stdout will be flushed when data was written to the screen during a process_608 call.
struct ccx_decoder_608_report *report;
LLONG subs_delay; // ms to delay (or advance) subs
LLONG subs_delay; // ms to delay (or advance) subs
enum ccx_output_format output_format; // What kind of output format should be used?
int textprinted;
struct ccx_common_timing_ctx *timing;
} ccx_decoder_608_context;
#define MAX_COLOR 10
extern const char *color_text[MAX_COLOR][2];
@@ -80,7 +80,7 @@ enum command_code
COM_ERASENONDISPLAYEDMEMORY = 11,
COM_BACKSPACE = 12,
COM_RESUMETEXTDISPLAY = 13,
COM_ALARMOFF =14,
COM_ALARMOFF = 14,
COM_ALARMON = 15,
COM_DELETETOENDOFROW = 16,
COM_RESUMEDIRECTCAPTIONING = 17,
@@ -89,15 +89,14 @@ enum command_code
COM_FAKE_RULLUP1 = 18
};
void ccx_decoder_608_dinit_library(void **ctx);
/*
*
*/
ccx_decoder_608_context* ccx_decoder_608_init_library(struct ccx_decoder_608_settings *settings, int channel,
int field, int *halt,
int cc_to_stdout,
enum ccx_output_format output_format, struct ccx_common_timing_ctx *timing);
ccx_decoder_608_context *ccx_decoder_608_init_library(struct ccx_decoder_608_settings *settings, int channel,
int field, int *halt,
int cc_to_stdout,
enum ccx_output_format output_format, struct ccx_common_timing_ctx *timing);
/**
* @param data raw cc608 data to be processed

View File

@@ -998,6 +998,14 @@ void dtvcc_handle_DFx_DefineWindow(dtvcc_service_decoder *decoder, int window_id
int row_count = (data[4] & 0xf) + 1; // according to CEA-708-D
int anchor_point = data[4] >> 4;
int col_count = (data[5] & 0x3f) + 1; // according to CEA-708-D
if (row_count > CCX_DTVCC_MAX_ROWS || col_count > CCX_DTVCC_MAX_COLUMNS)
{
ccx_common_logging.log_ftn("[CEA-708] Invalid window size %dx%d (max %dx%d), rejecting window definition\n",
row_count, col_count, CCX_DTVCC_MAX_ROWS, CCX_DTVCC_MAX_COLUMNS);
return;
}
int pen_style = data[6] & 0x7;
int win_style = (data[6] >> 3) & 0x7;
@@ -1025,6 +1033,18 @@ void dtvcc_handle_DFx_DefineWindow(dtvcc_service_decoder *decoder, int window_id
if (anchor_horizontal > CCX_DTVCC_SCREENGRID_COLUMNS - col_count)
anchor_horizontal = CCX_DTVCC_SCREENGRID_COLUMNS - col_count;
if (window->is_defined)
{
if (row_count < window->row_count)
{
// Remove the oldest row if the row count is reduced
for (int i = row_count; i < window->row_count; i++)
{
dtvcc_window_rollup(decoder, window);
}
}
}
window->priority = priority;
window->col_lock = col_lock;
window->row_lock = row_lock;
@@ -1329,6 +1349,14 @@ void dtvcc_handle_SPL_SetPenLocation(dtvcc_service_decoder *decoder, unsigned ch
}
dtvcc_window *window = &decoder->windows[decoder->current_window];
if (row >= window->row_count || col >= window->col_count)
{
ccx_common_logging.log_ftn("[CEA-708] dtvcc_handle_SPL_SetPenLocation: "
"Invalid pen location %d:%d for window size %dx%d, rejecting command\n",
row, col, window->row_count, window->col_count);
return;
}
window->pen_row = row;
window->pen_column = col;
}
@@ -1467,7 +1495,12 @@ int dtvcc_handle_C0(dtvcc_ctx *dtvcc,
else if (c0 >= 0x18 && c0 <= 0x1F)
{
if (c0 == DTVCC_C0_P16) // PE16
dtvcc_handle_C0_P16(decoder, data + 1);
{
if (data_length >= 3)
dtvcc_handle_C0_P16(decoder, data + 1);
else
ccx_common_logging.debug_ftn(CCX_DMT_708, "[CEA-708] dtvcc_handle_C0: Not enough data for P16\n");
}
len = 3;
}
if (len == -1)
@@ -1621,6 +1654,9 @@ int dtvcc_handle_extended_char(dtvcc_service_decoder *decoder, unsigned char *da
ccx_common_logging.debug_ftn(CCX_DMT_708, "[CEA-708] In dtvcc_handle_extended_char, "
"first data code: [%c], length: [%u]\n",
data[0], data_length);
if (data_length < 1)
return 0;
unsigned char c = 0x20; // Default to space
unsigned char code = data[0];
if (/* data[i]>=0x00 && */ code <= 0x1F) // Comment to silence warning
@@ -1689,8 +1725,17 @@ void dtvcc_process_service_block(dtvcc_ctx *dtvcc,
}
else // Use extended set
{
used = dtvcc_handle_extended_char(decoder, data + i + 1, data_length - 1);
used++; // Since we had DTVCC_C0_EXT1
if (i + 1 >= data_length)
{
used = 1; // skip EXT1
}
else
{
used = dtvcc_handle_extended_char(decoder,
data + i + 1,
data_length - i - 1) +
1;
}
}
i += used;
}
@@ -1742,6 +1787,12 @@ void dtvcc_process_current_packet(dtvcc_ctx *dtvcc, int len)
if (service_number == 7) // There is an extended header
{
if (pos + 1 >= dtvcc->current_packet + len)
{
ccx_common_logging.debug_ftn(CCX_DMT_708, "[CEA-708] dtvcc_process_current_packet: "
"Truncated extended header, stopping.\n");
break;
}
pos++;
service_number = (pos[0] & 0x3F); // 6 more significant bits
// printf ("Extended header: Service number: [%d]\n",service_number);

View File

@@ -6,7 +6,7 @@
#include "ccx_common_constants.h"
#include "ccx_common_structs.h"
#define CCX_DTVCC_MAX_PACKET_LENGTH 128 //According to EIA-708B, part 5
#define CCX_DTVCC_MAX_PACKET_LENGTH 128 // According to EIA-708B, part 5
#define CCX_DTVCC_MAX_SERVICES 63
#define CCX_DTVCC_MAX_ROWS 15
@@ -14,7 +14,7 @@
* This value should be 32, but there were 16-bit encoded samples (from Korea),
* where RowCount calculated another way and equals 46 (23[8bit]*2)
*/
#define CCX_DTVCC_MAX_COLUMNS (32*2)
#define CCX_DTVCC_MAX_COLUMNS (32 * 2)
#define CCX_DTVCC_SCREENGRID_ROWS 75
#define CCX_DTVCC_SCREENGRID_COLUMNS 210
@@ -30,14 +30,14 @@
enum DTVCC_COMMANDS_C0_CODES
{
DTVCC_C0_NUL = 0x00,
DTVCC_C0_ETX = 0x03,
DTVCC_C0_BS = 0x08,
DTVCC_C0_FF = 0x0c,
DTVCC_C0_CR = 0x0d,
DTVCC_C0_HCR = 0x0e,
DTVCC_C0_NUL = 0x00,
DTVCC_C0_ETX = 0x03,
DTVCC_C0_BS = 0x08,
DTVCC_C0_FF = 0x0c,
DTVCC_C0_CR = 0x0d,
DTVCC_C0_HCR = 0x0e,
DTVCC_C0_EXT1 = 0x10,
DTVCC_C0_P16 = 0x18
DTVCC_C0_P16 = 0x18
};
enum DTVCC_COMMANDS_C1_CODES
@@ -86,21 +86,21 @@ struct DTVCC_S_COMMANDS_C1
enum dtvcc_window_justify
{
DTVCC_WINDOW_JUSTIFY_LEFT = 0,
DTVCC_WINDOW_JUSTIFY_RIGHT = 1,
DTVCC_WINDOW_JUSTIFY_CENTER = 2,
DTVCC_WINDOW_JUSTIFY_FULL = 3
DTVCC_WINDOW_JUSTIFY_LEFT = 0,
DTVCC_WINDOW_JUSTIFY_RIGHT = 1,
DTVCC_WINDOW_JUSTIFY_CENTER = 2,
DTVCC_WINDOW_JUSTIFY_FULL = 3
};
enum dtvcc_window_pd //Print Direction
enum dtvcc_window_pd // Print Direction
{
DTVCC_WINDOW_PD_LEFT_RIGHT = 0, //left -> right
DTVCC_WINDOW_PD_LEFT_RIGHT = 0, // left -> right
DTVCC_WINDOW_PD_RIGHT_LEFT = 1,
DTVCC_WINDOW_PD_TOP_BOTTOM = 2,
DTVCC_WINDOW_PD_BOTTOM_TOP = 3
};
enum dtvcc_window_sd //Scroll Direction
enum dtvcc_window_sd // Scroll Direction
{
DTVCC_WINDOW_SD_LEFT_RIGHT = 0,
DTVCC_WINDOW_SD_RIGHT_LEFT = 1,
@@ -108,14 +108,14 @@ enum dtvcc_window_sd //Scroll Direction
DTVCC_WINDOW_SD_BOTTOM_TOP = 3
};
enum dtvcc_window_sde //Scroll Display Effect
enum dtvcc_window_sde // Scroll Display Effect
{
DTVCC_WINDOW_SDE_SNAP = 0,
DTVCC_WINDOW_SDE_FADE = 1,
DTVCC_WINDOW_SDE_WIPE = 2
};
enum dtvcc_window_ed //Effect Direction
enum dtvcc_window_ed // Effect Direction
{
DTVCC_WINDOW_ED_LEFT_RIGHT = 0,
DTVCC_WINDOW_ED_RIGHT_LEFT = 1,
@@ -123,91 +123,91 @@ enum dtvcc_window_ed //Effect Direction
DTVCC_WINDOW_ED_BOTTOM_TOP = 3
};
enum dtvcc_window_fo //Fill Opacity
enum dtvcc_window_fo // Fill Opacity
{
DTVCC_WINDOW_FO_SOLID = 0,
DTVCC_WINDOW_FO_FLASH = 1,
DTVCC_WINDOW_FO_TRANSLUCENT = 2,
DTVCC_WINDOW_FO_SOLID = 0,
DTVCC_WINDOW_FO_FLASH = 1,
DTVCC_WINDOW_FO_TRANSLUCENT = 2,
DTVCC_WINDOW_FO_TRANSPARENT = 3
};
enum dtvcc_window_border
{
DTVCC_WINDOW_BORDER_NONE = 0,
DTVCC_WINDOW_BORDER_RAISED = 1,
DTVCC_WINDOW_BORDER_DEPRESSED = 2,
DTVCC_WINDOW_BORDER_UNIFORM = 3,
DTVCC_WINDOW_BORDER_SHADOW_LEFT = 4,
DTVCC_WINDOW_BORDER_SHADOW_RIGHT = 5
DTVCC_WINDOW_BORDER_NONE = 0,
DTVCC_WINDOW_BORDER_RAISED = 1,
DTVCC_WINDOW_BORDER_DEPRESSED = 2,
DTVCC_WINDOW_BORDER_UNIFORM = 3,
DTVCC_WINDOW_BORDER_SHADOW_LEFT = 4,
DTVCC_WINDOW_BORDER_SHADOW_RIGHT = 5
};
enum dtvcc_pen_size
{
DTVCC_PEN_SIZE_SMALL = 0,
DTVCC_PEN_SIZE_SMALL = 0,
DTVCC_PEN_SIZE_STANDART = 1,
DTVCC_PEN_SIZE_LARGE = 2
DTVCC_PEN_SIZE_LARGE = 2
};
enum dtvcc_pen_font_style
{
DTVCC_PEN_FONT_STYLE_DEFAULT_OR_UNDEFINED = 0,
DTVCC_PEN_FONT_STYLE_MONOSPACED_WITH_SERIFS = 1,
DTVCC_PEN_FONT_STYLE_PROPORTIONALLY_SPACED_WITH_SERIFS = 2,
DTVCC_PEN_FONT_STYLE_MONOSPACED_WITHOUT_SERIFS = 3,
DTVCC_PEN_FONT_STYLE_PROPORTIONALLY_SPACED_WITHOUT_SERIFS = 4,
DTVCC_PEN_FONT_STYLE_CASUAL_FONT_TYPE = 5,
DTVCC_PEN_FONT_STYLE_CURSIVE_FONT_TYPE = 6,
DTVCC_PEN_FONT_STYLE_SMALL_CAPITALS = 7
DTVCC_PEN_FONT_STYLE_DEFAULT_OR_UNDEFINED = 0,
DTVCC_PEN_FONT_STYLE_MONOSPACED_WITH_SERIFS = 1,
DTVCC_PEN_FONT_STYLE_PROPORTIONALLY_SPACED_WITH_SERIFS = 2,
DTVCC_PEN_FONT_STYLE_MONOSPACED_WITHOUT_SERIFS = 3,
DTVCC_PEN_FONT_STYLE_PROPORTIONALLY_SPACED_WITHOUT_SERIFS = 4,
DTVCC_PEN_FONT_STYLE_CASUAL_FONT_TYPE = 5,
DTVCC_PEN_FONT_STYLE_CURSIVE_FONT_TYPE = 6,
DTVCC_PEN_FONT_STYLE_SMALL_CAPITALS = 7
};
enum dtvcc_pen_text_tag
{
DTVCC_PEN_TEXT_TAG_DIALOG = 0,
DTVCC_PEN_TEXT_TAG_SOURCE_OR_SPEAKER_ID = 1,
DTVCC_PEN_TEXT_TAG_ELECTRONIC_VOICE = 2,
DTVCC_PEN_TEXT_TAG_FOREIGN_LANGUAGE = 3,
DTVCC_PEN_TEXT_TAG_VOICEOVER = 4,
DTVCC_PEN_TEXT_TAG_AUDIBLE_TRANSLATION = 5,
DTVCC_PEN_TEXT_TAG_SUBTITLE_TRANSLATION = 6,
DTVCC_PEN_TEXT_TAG_VOICE_QUALITY_DESCRIPTION = 7,
DTVCC_PEN_TEXT_TAG_SONG_LYRICS = 8,
DTVCC_PEN_TEXT_TAG_SOUND_EFFECT_DESCRIPTION = 9,
DTVCC_PEN_TEXT_TAG_MUSICAL_SCORE_DESCRIPTION = 10,
DTVCC_PEN_TEXT_TAG_EXPLETIVE = 11,
DTVCC_PEN_TEXT_TAG_UNDEFINED_12 = 12,
DTVCC_PEN_TEXT_TAG_UNDEFINED_13 = 13,
DTVCC_PEN_TEXT_TAG_UNDEFINED_14 = 14,
DTVCC_PEN_TEXT_TAG_NOT_TO_BE_DISPLAYED = 15
DTVCC_PEN_TEXT_TAG_DIALOG = 0,
DTVCC_PEN_TEXT_TAG_SOURCE_OR_SPEAKER_ID = 1,
DTVCC_PEN_TEXT_TAG_ELECTRONIC_VOICE = 2,
DTVCC_PEN_TEXT_TAG_FOREIGN_LANGUAGE = 3,
DTVCC_PEN_TEXT_TAG_VOICEOVER = 4,
DTVCC_PEN_TEXT_TAG_AUDIBLE_TRANSLATION = 5,
DTVCC_PEN_TEXT_TAG_SUBTITLE_TRANSLATION = 6,
DTVCC_PEN_TEXT_TAG_VOICE_QUALITY_DESCRIPTION = 7,
DTVCC_PEN_TEXT_TAG_SONG_LYRICS = 8,
DTVCC_PEN_TEXT_TAG_SOUND_EFFECT_DESCRIPTION = 9,
DTVCC_PEN_TEXT_TAG_MUSICAL_SCORE_DESCRIPTION = 10,
DTVCC_PEN_TEXT_TAG_EXPLETIVE = 11,
DTVCC_PEN_TEXT_TAG_UNDEFINED_12 = 12,
DTVCC_PEN_TEXT_TAG_UNDEFINED_13 = 13,
DTVCC_PEN_TEXT_TAG_UNDEFINED_14 = 14,
DTVCC_PEN_TEXT_TAG_NOT_TO_BE_DISPLAYED = 15
};
enum dtvcc_pen_offset
{
DTVCC_PEN_OFFSET_SUBSCRIPT = 0,
DTVCC_PEN_OFFSET_NORMAL = 1,
DTVCC_PEN_OFFSET_SUPERSCRIPT = 2
DTVCC_PEN_OFFSET_SUBSCRIPT = 0,
DTVCC_PEN_OFFSET_NORMAL = 1,
DTVCC_PEN_OFFSET_SUPERSCRIPT = 2
};
enum dtvcc_pen_edge
{
DTVCC_PEN_EDGE_NONE = 0,
DTVCC_PEN_EDGE_RAISED = 1,
DTVCC_PEN_EDGE_DEPRESSED = 2,
DTVCC_PEN_EDGE_UNIFORM = 3,
DTVCC_PEN_EDGE_LEFT_DROP_SHADOW = 4,
DTVCC_PEN_EDGE_RIGHT_DROP_SHADOW = 5
DTVCC_PEN_EDGE_NONE = 0,
DTVCC_PEN_EDGE_RAISED = 1,
DTVCC_PEN_EDGE_DEPRESSED = 2,
DTVCC_PEN_EDGE_UNIFORM = 3,
DTVCC_PEN_EDGE_LEFT_DROP_SHADOW = 4,
DTVCC_PEN_EDGE_RIGHT_DROP_SHADOW = 5
};
enum dtvcc_pen_anchor_point
{
DTVCC_ANCHOR_POINT_TOP_LEFT = 0,
DTVCC_ANCHOR_POINT_TOP_CENTER = 1,
DTVCC_ANCHOR_POINT_TOP_RIGHT = 2,
DTVCC_ANCHOR_POINT_MIDDLE_LEFT = 3,
DTVCC_ANCHOR_POINT_MIDDLE_CENTER = 4,
DTVCC_ANCHOR_POINT_MIDDLE_RIGHT = 5,
DTVCC_ANCHOR_POINT_BOTTOM_LEFT = 6,
DTVCC_ANCHOR_POINT_BOTTOM_CENTER = 7,
DTVCC_ANCHOR_POINT_BOTTOM_RIGHT = 8
DTVCC_ANCHOR_POINT_TOP_LEFT = 0,
DTVCC_ANCHOR_POINT_TOP_CENTER = 1,
DTVCC_ANCHOR_POINT_TOP_RIGHT = 2,
DTVCC_ANCHOR_POINT_MIDDLE_LEFT = 3,
DTVCC_ANCHOR_POINT_MIDDLE_CENTER = 4,
DTVCC_ANCHOR_POINT_MIDDLE_RIGHT = 5,
DTVCC_ANCHOR_POINT_BOTTOM_LEFT = 6,
DTVCC_ANCHOR_POINT_BOTTOM_CENTER = 7,
DTVCC_ANCHOR_POINT_BOTTOM_RIGHT = 8
};
typedef struct dtvcc_pen_color
@@ -252,12 +252,20 @@ typedef struct dtvcc_window_attribs
*/
typedef struct dtvcc_symbol
{
unsigned short sym; //symbol itself, at least 16 bit
unsigned char init; //initialized or not. could be 0 or 1
unsigned short sym; // symbol itself, at least 16 bit
unsigned char init; // initialized or not. could be 0 or 1
} dtvcc_symbol;
#define CCX_DTVCC_SYM_SET(x, c) {x.init = 1; x.sym = c;}
#define CCX_DTVCC_SYM_SET_16(x, c1, c2) {x.init = 1; x.sym = (c1 << 8) | c2;}
#define CCX_DTVCC_SYM_SET(x, c) \
{ \
x.init = 1; \
x.sym = c; \
}
#define CCX_DTVCC_SYM_SET_16(x, c1, c2) \
{ \
x.init = 1; \
x.sym = (c1 << 8) | c2; \
}
#define CCX_DTVCC_SYM(x) ((unsigned char)(x.sym))
#define CCX_DTVCC_SYM_IS_EMPTY(x) (x.init == 0)
#define CCX_DTVCC_SYM_IS_SET(x) (x.init == 1)
@@ -344,7 +352,7 @@ typedef struct dtvcc_ctx
{
int is_active;
int active_services_count;
int services_active[CCX_DTVCC_MAX_SERVICES]; //0 - inactive, 1 - active
int services_active[CCX_DTVCC_MAX_SERVICES]; // 0 - inactive, 1 - active
int report_enabled;
ccx_decoder_dtvcc_report *report;
@@ -357,21 +365,20 @@ typedef struct dtvcc_ctx
int last_sequence;
void *encoder; //we can't include header, so keeping it this way
void *encoder; // we can't include header, so keeping it this way
int no_rollup;
struct ccx_common_timing_ctx *timing;
} dtvcc_ctx;
void dtvcc_clear_packet(dtvcc_ctx *ctx);
void dtvcc_windows_reset(dtvcc_service_decoder *decoder);
void dtvcc_decoder_flush(dtvcc_ctx *dtvcc, dtvcc_service_decoder *decoder);
void dtvcc_process_current_packet(dtvcc_ctx *dtvcc, int len);
void dtvcc_process_service_block(dtvcc_ctx *dtvcc,
dtvcc_service_decoder *decoder,
unsigned char *data,
int data_length);
dtvcc_service_decoder *decoder,
unsigned char *data,
int data_length);
void dtvcc_tv_clear(dtvcc_service_decoder *decoder);
int dtvcc_decoder_has_visible_windows(dtvcc_service_decoder *decoder);
@@ -381,9 +388,9 @@ void dtvcc_window_clear(dtvcc_service_decoder *decoder, int window_id);
void dtvcc_window_apply_style(dtvcc_window *window, dtvcc_window_attribs *style);
#ifdef DTVCC_PRINT_DEBUG
int dtvcc_is_win_row_empty(dtvcc_window *window, int row_index);
void dtvcc_get_win_write_interval(dtvcc_window *window, int row_index, int *first, int *last);
void dtvcc_window_dump(dtvcc_service_decoder *decoder, dtvcc_window *window);
int dtvcc_is_win_row_empty(dtvcc_window *window, int row_index);
void dtvcc_get_win_write_interval(dtvcc_window *window, int row_index, int *first, int *last);
void dtvcc_window_dump(dtvcc_service_decoder *decoder, dtvcc_window *window);
#endif
void dtvcc_decoders_reset(dtvcc_ctx *dtvcc);
@@ -406,7 +413,7 @@ void dtvcc_process_character(dtvcc_service_decoder *decoder, dtvcc_symbol symbol
void dtvcc_handle_CWx_SetCurrentWindow(dtvcc_service_decoder *decoder, int window_id);
void dtvcc_handle_CLW_ClearWindows(dtvcc_ctx *dtvcc, dtvcc_service_decoder *decoder, int windows_bitmap);
void dtvcc_handle_DSW_DisplayWindows(dtvcc_service_decoder *decoder, int windows_bitmap, struct ccx_common_timing_ctx *timing);
void dtvcc_handle_HDW_HideWindows(dtvcc_ctx *dtvcc,dtvcc_service_decoder *decoder,
void dtvcc_handle_HDW_HideWindows(dtvcc_ctx *dtvcc, dtvcc_service_decoder *decoder,
int windows_bitmap);
void dtvcc_handle_TGW_ToggleWindows(dtvcc_ctx *dtvcc,
dtvcc_service_decoder *decoder,
@@ -426,13 +433,13 @@ void dtvcc_handle_C0_P16(dtvcc_service_decoder *decoder, unsigned char *data);
int dtvcc_handle_G0(dtvcc_service_decoder *decoder, unsigned char *data, int data_length);
int dtvcc_handle_G1(dtvcc_service_decoder *decoder, unsigned char *data, int data_length);
int dtvcc_handle_C0(dtvcc_ctx *dtvcc,
dtvcc_service_decoder *decoder,
unsigned char *data,
int data_length);
dtvcc_service_decoder *decoder,
unsigned char *data,
int data_length);
int dtvcc_handle_C1(dtvcc_ctx *dtvcc,
dtvcc_service_decoder *decoder,
unsigned char *data,
int data_length);
dtvcc_service_decoder *decoder,
unsigned char *data,
int data_length);
int dtvcc_handle_C2(dtvcc_service_decoder *decoder, unsigned char *data, int data_length);
int dtvcc_handle_C3(dtvcc_service_decoder *decoder, unsigned char *data, int data_length);
int dtvcc_handle_extended_char(dtvcc_service_decoder *decoder, unsigned char *data, int data_length);

View File

@@ -57,6 +57,7 @@ void dtvcc_change_pen_colors(dtvcc_tv_screen *tv, dtvcc_pen_color pen_color, int
return;
char *buf = (char *)encoder->buffer;
size_t remaining = INITIAL_ENC_BUFFER_CAPACITY - *buf_len;
dtvcc_pen_color new_pen_color;
if (column_index >= CCX_DTVCC_SCREENGRID_COLUMNS)
@@ -66,7 +67,11 @@ void dtvcc_change_pen_colors(dtvcc_tv_screen *tv, dtvcc_pen_color pen_color, int
if (pen_color.fg_color != new_pen_color.fg_color)
{
if (pen_color.fg_color != 0x3f && !open)
(*buf_len) += sprintf(buf + (*buf_len), "</font>"); // should close older non-white color
{
int written = snprintf(buf + (*buf_len), remaining, "</font>"); // should close older non-white color
if (written > 0 && (size_t)written < remaining)
(*buf_len) += written;
}
if (new_pen_color.fg_color != 0x3f && open)
{
@@ -75,7 +80,10 @@ void dtvcc_change_pen_colors(dtvcc_tv_screen *tv, dtvcc_pen_color pen_color, int
red = (255 / 3) * red;
green = (255 / 3) * green;
blue = (255 / 3) * blue;
(*buf_len) += sprintf(buf + (*buf_len), "<font color=\"#%02x%02x%02x\">", red, green, blue);
remaining = INITIAL_ENC_BUFFER_CAPACITY - *buf_len;
int written = snprintf(buf + (*buf_len), remaining, "<font color=\"#%02x%02x%02x\">", red, green, blue);
if (written > 0 && (size_t)written < remaining)
(*buf_len) += written;
}
}
}
@@ -86,6 +94,8 @@ void dtvcc_change_pen_attribs(dtvcc_tv_screen *tv, dtvcc_pen_attribs pen_attribs
return;
char *buf = (char *)encoder->buffer;
size_t remaining;
int written;
dtvcc_pen_attribs new_pen_attribs;
if (column_index >= CCX_DTVCC_SCREENGRID_COLUMNS)
@@ -94,33 +104,47 @@ void dtvcc_change_pen_attribs(dtvcc_tv_screen *tv, dtvcc_pen_attribs pen_attribs
new_pen_attribs = tv->pen_attribs[row_index][column_index];
if (pen_attribs.italic != new_pen_attribs.italic)
{
remaining = INITIAL_ENC_BUFFER_CAPACITY - *buf_len;
if (pen_attribs.italic && !open)
(*buf_len) += sprintf(buf + (*buf_len), "</i>");
{
written = snprintf(buf + (*buf_len), remaining, "</i>");
if (written > 0 && (size_t)written < remaining)
(*buf_len) += written;
}
if (!pen_attribs.italic && open)
(*buf_len) += sprintf(buf + (*buf_len), "<i>");
{
written = snprintf(buf + (*buf_len), remaining, "<i>");
if (written > 0 && (size_t)written < remaining)
(*buf_len) += written;
}
}
if (pen_attribs.underline != new_pen_attribs.underline)
{
remaining = INITIAL_ENC_BUFFER_CAPACITY - *buf_len;
if (pen_attribs.underline && !open)
(*buf_len) += sprintf(buf + (*buf_len), "</u>");
{
written = snprintf(buf + (*buf_len), remaining, "</u>");
if (written > 0 && (size_t)written < remaining)
(*buf_len) += written;
}
if (!pen_attribs.underline && open)
(*buf_len) += sprintf(buf + (*buf_len), "<u>");
{
written = snprintf(buf + (*buf_len), remaining, "<u>");
if (written > 0 && (size_t)written < remaining)
(*buf_len) += written;
}
}
}
size_t write_utf16_char(unsigned short utf16_char, char *out)
{
if ((utf16_char >> 8) != 0)
{
out[0] = (unsigned char)(utf16_char >> 8);
out[1] = (unsigned char)(utf16_char & 0xff);
return 2;
}
else
{
out[0] = (unsigned char)(utf16_char);
return 1;
}
// Always write 2 bytes for consistent UTF-16BE encoding.
// Previously, this function wrote 1 byte for ASCII characters and 2 bytes
// for non-ASCII, creating an invalid mix that iconv couldn't handle properly.
// This caused garbled output with Japanese/Chinese characters (issue #1451).
out[0] = (unsigned char)(utf16_char >> 8);
out[1] = (unsigned char)(utf16_char & 0xff);
return 2;
}
void dtvcc_write_row(dtvcc_writer_ctx *writer, dtvcc_service_decoder *decoder, int row_index, struct encoder_ctx *encoder, int use_colors)
@@ -207,16 +231,31 @@ void dtvcc_write_srt(dtvcc_writer_ctx *writer, dtvcc_service_decoder *decoder, s
char *buf = (char *)encoder->buffer;
memset(buf, 0, INITIAL_ENC_BUFFER_CAPACITY);
size_t buf_len = 0;
size_t remaining = INITIAL_ENC_BUFFER_CAPACITY;
int written;
sprintf(buf, "%u%s", encoder->cea_708_counter, encoder->encoded_crlf);
written = snprintf(buf, remaining, "%u%s", encoder->cea_708_counter, encoder->encoded_crlf);
if (written > 0 && (size_t)written < remaining)
buf_len += written;
remaining = INITIAL_ENC_BUFFER_CAPACITY - buf_len;
print_mstime_buff(tv->time_ms_show + encoder->subs_delay,
"%02u:%02u:%02u,%03u", buf + strlen(buf));
sprintf(buf + strlen(buf), " --> ");
"%02u:%02u:%02u,%03u", buf + buf_len);
buf_len = strlen(buf);
remaining = INITIAL_ENC_BUFFER_CAPACITY - buf_len;
written = snprintf(buf + buf_len, remaining, " --> ");
if (written > 0 && (size_t)written < remaining)
buf_len += written;
remaining = INITIAL_ENC_BUFFER_CAPACITY - buf_len;
print_mstime_buff(tv->time_ms_hide + encoder->subs_delay,
"%02u:%02u:%02u,%03u", buf + strlen(buf));
sprintf(buf + strlen(buf), "%s", (char *)encoder->encoded_crlf);
"%02u:%02u:%02u,%03u", buf + buf_len);
buf_len = strlen(buf);
remaining = INITIAL_ENC_BUFFER_CAPACITY - buf_len;
written = snprintf(buf + buf_len, remaining, "%s", (char *)encoder->encoded_crlf);
if (written > 0 && (size_t)written < remaining)
buf_len += written;
write_wrapped(encoder->dtvcc_writers[tv->service_number - 1].fd, buf, strlen(buf));
write_wrapped(encoder->dtvcc_writers[tv->service_number - 1].fd, buf, buf_len);
for (int i = 0; i < CCX_DTVCC_SCREENGRID_ROWS; i++)
{
@@ -263,28 +302,47 @@ void dtvcc_write_transcript(dtvcc_writer_ctx *writer, dtvcc_service_decoder *dec
return;
char *buf = (char *)encoder->buffer;
size_t buf_len;
size_t remaining;
int written;
for (int i = 0; i < CCX_DTVCC_SCREENGRID_ROWS; i++)
{
if (!dtvcc_is_row_empty(tv, i))
{
buf[0] = 0;
buf_len = 0;
if (encoder->transcript_settings->showStartTime)
{
print_mstime_buff(tv->time_ms_show + encoder->subs_delay,
"%02u:%02u:%02u,%03u|", buf + strlen(buf));
"%02u:%02u:%02u,%03u|", buf + buf_len);
buf_len = strlen(buf);
}
if (encoder->transcript_settings->showEndTime)
{
print_mstime_buff(tv->time_ms_hide + encoder->subs_delay,
"%02u:%02u:%02u,%03u|", buf + strlen(buf));
"%02u:%02u:%02u,%03u|", buf + buf_len);
buf_len = strlen(buf);
}
if (encoder->transcript_settings->showCC)
sprintf(buf + strlen(buf), "CC1|"); // always CC1 because CEA-708 is field-independent
{
remaining = INITIAL_ENC_BUFFER_CAPACITY - buf_len;
written = snprintf(buf + buf_len, remaining, "CC1|"); // always CC1 because CEA-708 is field-independent
if (written > 0 && (size_t)written < remaining)
buf_len += written;
}
if (encoder->transcript_settings->showMode)
sprintf(buf + strlen(buf), "POP|"); // TODO caption mode(pop, rollup, etc.)
{
remaining = INITIAL_ENC_BUFFER_CAPACITY - buf_len;
written = snprintf(buf + buf_len, remaining, "POP|"); // TODO caption mode(pop, rollup, etc.)
if (written > 0 && (size_t)written < remaining)
buf_len += written;
}
const size_t buf_len = strlen(buf);
if (buf_len != 0)
write_wrapped(encoder->dtvcc_writers[tv->service_number - 1].fd, buf, buf_len);
@@ -300,22 +358,33 @@ void dtvcc_write_sami_header(dtvcc_tv_screen *tv, struct encoder_ctx *encoder)
char *buf = (char *)encoder->buffer;
memset(buf, 0, INITIAL_ENC_BUFFER_CAPACITY);
size_t buf_len = 0;
size_t remaining = INITIAL_ENC_BUFFER_CAPACITY;
int written;
buf_len += sprintf(buf + buf_len, "<sami>%s", encoder->encoded_crlf);
buf_len += sprintf(buf + buf_len, "<head>%s", encoder->encoded_crlf);
buf_len += sprintf(buf + buf_len, "<style type=\"text/css\">%s", encoder->encoded_crlf);
buf_len += sprintf(buf + buf_len, "<!--%s", encoder->encoded_crlf);
buf_len += sprintf(buf + buf_len,
"p {margin-left: 16pt; margin-right: 16pt; margin-bottom: 16pt; margin-top: 16pt;%s",
encoder->encoded_crlf);
buf_len += sprintf(buf + buf_len,
"text-align: center; font-size: 18pt; font-family: arial; font-weight: bold; color: #f0f0f0;}%s",
encoder->encoded_crlf);
buf_len += sprintf(buf + buf_len, ".unknowncc {Name:Unknown; lang:en-US; SAMIType:CC;}%s", encoder->encoded_crlf);
buf_len += sprintf(buf + buf_len, "-->%s", encoder->encoded_crlf);
buf_len += sprintf(buf + buf_len, "</style>%s", encoder->encoded_crlf);
buf_len += sprintf(buf + buf_len, "</head>%s%s", encoder->encoded_crlf, encoder->encoded_crlf);
buf_len += sprintf(buf + buf_len, "<body>%s", encoder->encoded_crlf);
#define SAMI_SNPRINTF(fmt, ...) \
do \
{ \
remaining = INITIAL_ENC_BUFFER_CAPACITY - buf_len; \
written = snprintf(buf + buf_len, remaining, fmt, ##__VA_ARGS__); \
if (written > 0 && (size_t)written < remaining) \
buf_len += written; \
} while (0)
SAMI_SNPRINTF("<sami>%s", encoder->encoded_crlf);
SAMI_SNPRINTF("<head>%s", encoder->encoded_crlf);
SAMI_SNPRINTF("<style type=\"text/css\">%s", encoder->encoded_crlf);
SAMI_SNPRINTF("<!--%s", encoder->encoded_crlf);
SAMI_SNPRINTF("p {margin-left: 16pt; margin-right: 16pt; margin-bottom: 16pt; margin-top: 16pt;%s",
encoder->encoded_crlf);
SAMI_SNPRINTF("text-align: center; font-size: 18pt; font-family: arial; font-weight: bold; color: #f0f0f0;}%s",
encoder->encoded_crlf);
SAMI_SNPRINTF(".unknowncc {Name:Unknown; lang:en-US; SAMIType:CC;}%s", encoder->encoded_crlf);
SAMI_SNPRINTF("-->%s", encoder->encoded_crlf);
SAMI_SNPRINTF("</style>%s", encoder->encoded_crlf);
SAMI_SNPRINTF("</head>%s%s", encoder->encoded_crlf, encoder->encoded_crlf);
SAMI_SNPRINTF("<body>%s", encoder->encoded_crlf);
#undef SAMI_SNPRINTF
write_wrapped(encoder->dtvcc_writers[tv->service_number - 1].fd, buf, buf_len);
}
@@ -323,8 +392,9 @@ void dtvcc_write_sami_header(dtvcc_tv_screen *tv, struct encoder_ctx *encoder)
void dtvcc_write_sami_footer(dtvcc_tv_screen *tv, struct encoder_ctx *encoder)
{
char *buf = (char *)encoder->buffer;
sprintf(buf, "</body></sami>");
write_wrapped(encoder->dtvcc_writers[tv->service_number - 1].fd, buf, strlen(buf));
int written = snprintf(buf, INITIAL_ENC_BUFFER_CAPACITY, "</body></sami>");
if (written > 0 && (size_t)written < INITIAL_ENC_BUFFER_CAPACITY)
write_wrapped(encoder->dtvcc_writers[tv->service_number - 1].fd, buf, written);
write_wrapped(encoder->dtvcc_writers[tv->service_number - 1].fd,
encoder->encoded_crlf, encoder->encoded_crlf_length);
}
@@ -342,12 +412,14 @@ void dtvcc_write_sami(dtvcc_writer_ctx *writer, dtvcc_service_decoder *decoder,
dtvcc_write_sami_header(tv, encoder);
char *buf = (char *)encoder->buffer;
int written;
buf[0] = 0;
sprintf(buf, "<sync start=%llu><p class=\"unknowncc\">%s",
(unsigned long long)tv->time_ms_show + encoder->subs_delay,
encoder->encoded_crlf);
write_wrapped(encoder->dtvcc_writers[tv->service_number - 1].fd, buf, strlen(buf));
written = snprintf(buf, INITIAL_ENC_BUFFER_CAPACITY, "<sync start=%llu><p class=\"unknowncc\">%s",
(unsigned long long)tv->time_ms_show + encoder->subs_delay,
encoder->encoded_crlf);
if (written > 0 && (size_t)written < INITIAL_ENC_BUFFER_CAPACITY)
write_wrapped(encoder->dtvcc_writers[tv->service_number - 1].fd, buf, written);
for (int i = 0; i < CCX_DTVCC_SCREENGRID_ROWS; i++)
{
@@ -361,10 +433,11 @@ void dtvcc_write_sami(dtvcc_writer_ctx *writer, dtvcc_service_decoder *decoder,
}
}
sprintf(buf, "<sync start=%llu><p class=\"unknowncc\">&nbsp;</p></sync>%s%s",
(unsigned long long)tv->time_ms_hide + encoder->subs_delay,
encoder->encoded_crlf, encoder->encoded_crlf);
write_wrapped(encoder->dtvcc_writers[tv->service_number - 1].fd, buf, strlen(buf));
written = snprintf(buf, INITIAL_ENC_BUFFER_CAPACITY, "<sync start=%llu><p class=\"unknowncc\">&nbsp;</p></sync>%s%s",
(unsigned long long)tv->time_ms_hide + encoder->subs_delay,
encoder->encoded_crlf, encoder->encoded_crlf);
if (written > 0 && (size_t)written < INITIAL_ENC_BUFFER_CAPACITY)
write_wrapped(encoder->dtvcc_writers[tv->service_number - 1].fd, buf, written);
}
unsigned char adjust_odd_parity(const unsigned char value)
@@ -388,11 +461,12 @@ unsigned char adjust_odd_parity(const unsigned char value)
void dtvcc_write_scc_header(dtvcc_tv_screen *tv, struct encoder_ctx *encoder)
{
char *buf = (char *)encoder->buffer;
// 18 characters long + 2 new lines
memset(buf, 0, 20);
sprintf(buf, "Scenarist_SCC V1.0\n\n");
// 18 characters long + 2 new lines = 20 characters total
memset(buf, 0, INITIAL_ENC_BUFFER_CAPACITY);
int written = snprintf(buf, INITIAL_ENC_BUFFER_CAPACITY, "Scenarist_SCC V1.0\n\n");
write_wrapped(encoder->dtvcc_writers[tv->service_number - 1].fd, buf, strlen(buf));
if (written > 0 && (size_t)written < INITIAL_ENC_BUFFER_CAPACITY)
write_wrapped(encoder->dtvcc_writers[tv->service_number - 1].fd, buf, written);
}
int count_captions_lines_scc(dtvcc_tv_screen *tv)
@@ -415,22 +489,31 @@ int count_captions_lines_scc(dtvcc_tv_screen *tv)
* 2 line length subtitles can be placed in 14th and 15th row
* 3 line length subtitles can be placed in 13th, 14th and 15th row
*/
void add_needed_scc_labels(char *buf, int total_subtitle_count, int current_subtitle_count)
void add_needed_scc_labels(char *buf, size_t buf_size, size_t *buf_len, int total_subtitle_count, int current_subtitle_count)
{
size_t remaining = buf_size - *buf_len;
int written;
const char *label;
switch (total_subtitle_count)
{
case 1:
// row 15, column 00
sprintf(buf + strlen(buf), " 94e0 94e0");
label = " 94e0 94e0";
break;
case 2:
// 9440: row 14, column 00 | 94e0: row 15, column 00
sprintf(buf + strlen(buf), current_subtitle_count == 1 ? " 9440 9440" : " 94e0 94e0");
label = (current_subtitle_count == 1) ? " 9440 9440" : " 94e0 94e0";
break;
default:
// 13e0: row 13, column 04 | 9440: row 14, column 00 | 94e0: row 15, column 00
sprintf(buf + strlen(buf), current_subtitle_count == 1 ? " 13e0 13e0" : (current_subtitle_count == 2 ? " 9440 9440" : " 94e0 94e0"));
label = (current_subtitle_count == 1) ? " 13e0 13e0" : ((current_subtitle_count == 2) ? " 9440 9440" : " 94e0 94e0");
break;
}
written = snprintf(buf + *buf_len, remaining, "%s", label);
if (written > 0 && (size_t)written < remaining)
*buf_len += written;
}
void dtvcc_write_scc(dtvcc_writer_ctx *writer, dtvcc_service_decoder *decoder, struct encoder_ctx *encoder)
@@ -447,38 +530,55 @@ void dtvcc_write_scc(dtvcc_writer_ctx *writer, dtvcc_service_decoder *decoder, s
dtvcc_write_scc_header(tv, encoder);
char *buf = (char *)encoder->buffer;
size_t buf_len;
size_t remaining;
int written;
struct ccx_boundary_time time_show = get_time(tv->time_ms_show + encoder->subs_delay);
// when hiding subtract a frame (1 frame = 34 ms)
struct ccx_boundary_time time_end = get_time(tv->time_ms_hide + encoder->subs_delay - 34);
#define SCC_SNPRINTF(fmt, ...) \
do \
{ \
remaining = INITIAL_ENC_BUFFER_CAPACITY - buf_len; \
written = snprintf(buf + buf_len, remaining, fmt, ##__VA_ARGS__); \
if (written > 0 && (size_t)written < remaining) \
buf_len += written; \
} while (0)
if (tv->old_cc_time_end > time_show.time_in_ms)
{
// Correct the frame delay
time_show.time_in_ms -= 1000 / 29.97;
print_scc_time(time_show, buf);
sprintf(buf + strlen(buf), "\t942c 942c");
buf_len = strlen(buf);
SCC_SNPRINTF("\t942c 942c");
time_show.time_in_ms += 1000 / 29.97;
// Clear the buffer and start pop on caption
sprintf(buf + strlen(buf), "94ae 94ae 9420 9420");
SCC_SNPRINTF("94ae 94ae 9420 9420");
}
else if (tv->old_cc_time_end < time_show.time_in_ms)
{
// Clear the screen for new caption
struct ccx_boundary_time time_to_display = get_time(tv->old_cc_time_end);
print_scc_time(time_to_display, buf);
sprintf(buf + strlen(buf), "\t942c 942c \n\n");
buf_len = strlen(buf);
SCC_SNPRINTF("\t942c 942c \n\n");
// Correct the frame delay
time_show.time_in_ms -= 1000 / 29.97;
// Clear the buffer and start pop on caption in new time
print_scc_time(time_show, buf);
sprintf(buf + strlen(buf), "\t94ae 94ae 9420 9420");
print_scc_time(time_show, buf + buf_len);
buf_len = strlen(buf);
SCC_SNPRINTF("\t94ae 94ae 9420 9420");
time_show.time_in_ms += 1000 / 29.97;
}
else
{
time_show.time_in_ms -= 1000 / 29.97;
print_scc_time(time_show, buf);
sprintf(buf + strlen(buf), "\t942c 942c 94ae 94ae 9420 9420");
buf_len = strlen(buf);
SCC_SNPRINTF("\t942c 942c 94ae 94ae 9420 9420");
time_show.time_in_ms += 1000 / 29.97;
}
@@ -490,27 +590,29 @@ void dtvcc_write_scc(dtvcc_writer_ctx *writer, dtvcc_service_decoder *decoder, s
if (!dtvcc_is_row_empty(tv, i))
{
current_subtitle_count++;
add_needed_scc_labels(buf, total_subtitle_count, current_subtitle_count);
add_needed_scc_labels(buf, INITIAL_ENC_BUFFER_CAPACITY, &buf_len, total_subtitle_count, current_subtitle_count);
int first, last, bytes_written = 0;
dtvcc_get_write_interval(tv, i, &first, &last);
for (int j = first; j <= last; j++)
{
if (bytes_written % 2 == 0)
sprintf(buf + strlen(buf), " ");
sprintf(buf + strlen(buf), "%x", adjust_odd_parity(tv->chars[i][j].sym));
SCC_SNPRINTF(" ");
SCC_SNPRINTF("%x", adjust_odd_parity(tv->chars[i][j].sym));
bytes_written += 1;
}
// if byte pair are not even then make it even by adding 0x80 as padding
if (bytes_written % 2 == 1)
sprintf(buf + strlen(buf), "80 ");
SCC_SNPRINTF("80 ");
else
sprintf(buf + strlen(buf), " ");
SCC_SNPRINTF(" ");
}
}
// Display caption (942f 942f)
sprintf(buf + strlen(buf), "942f 942f \n\n");
SCC_SNPRINTF("942f 942f \n\n");
#undef SCC_SNPRINTF
write_wrapped(encoder->dtvcc_writers[tv->service_number - 1].fd, buf, strlen(buf));
tv->old_cc_time_end = time_end.time_in_ms;
@@ -579,7 +681,7 @@ void dtvcc_writer_init(dtvcc_writer_ctx *writer,
const char *ext = get_file_extension(write_format);
char suffix[32];
sprintf(suffix, CCX_DTVCC_FILENAME_TEMPLATE, program_number, service_number);
snprintf(suffix, sizeof(suffix), CCX_DTVCC_FILENAME_TEMPLATE, program_number, service_number);
writer->filename = create_outfilename(base_filename, suffix, ext);
if (!writer->filename)

View File

@@ -31,7 +31,7 @@ void dtvcc_write_sami_header(dtvcc_tv_screen *tv, struct encoder_ctx *encoder);
void dtvcc_write_sami_footer(dtvcc_tv_screen *tv, struct encoder_ctx *encoder);
void dtvcc_write_sami(dtvcc_writer_ctx *writer, dtvcc_service_decoder *decoder, struct encoder_ctx *encoder);
void dtvcc_write_scc_header(dtvcc_tv_screen *tv, struct encoder_ctx *encoder);
void add_needed_scc_labels(char *buf, int total_subtitle_count, int current_subtitle_count);
void add_needed_scc_labels(char *buf, size_t buf_size, size_t *buf_len, int total_subtitle_count, int current_subtitle_count);
void dtvcc_write_scc(dtvcc_writer_ctx *writer, dtvcc_service_decoder *decoder, struct encoder_ctx *encoder);
void dtvcc_write(dtvcc_writer_ctx *writer, dtvcc_service_decoder *decoder, struct encoder_ctx *encoder);

View File

@@ -21,23 +21,22 @@ extern void ccxr_flush_decoder(struct dtvcc_ctx *dtvcc, struct dtvcc_service_dec
uint64_t utc_refvalue = UINT64_MAX; /* _UI64_MAX/UINT64_MAX means don't use UNIX, 0 = use current system time as reference, +1 use a specific reference */
extern int in_xds_mode;
LLONG ccxr_get_visible_start(struct ccx_common_timing_ctx *ctx, int current_field);
LLONG ccxr_get_visible_end(struct ccx_common_timing_ctx *ctx, int current_field);
/* This function returns a FTS that is guaranteed to be at least 1 ms later than the end of the previous screen. It shouldn't be needed
obviously but it guarantees there's no timing overlap */
LLONG get_visible_start(struct ccx_common_timing_ctx *ctx, int current_field)
{
LLONG fts = get_fts(ctx, current_field);
if (fts <= ctx->minimum_fts)
fts = ctx->minimum_fts + 1;
LLONG fts = ccxr_get_visible_start(ctx, current_field);
ccx_common_logging.debug_ftn(CCX_DMT_DECODER_608, "Visible Start time=%s\n", print_mstime_static(fts));
return fts;
}
/* This function returns the current FTS and saves it so it can be used by ctxget_visible_start */
/* This function returns the current FTS and saves it so it can be used by get_visible_start */
LLONG get_visible_end(struct ccx_common_timing_ctx *ctx, int current_field)
{
LLONG fts = get_fts(ctx, current_field);
if (fts > ctx->minimum_fts)
ctx->minimum_fts = fts;
LLONG fts = ccxr_get_visible_end(ctx, current_field);
ccx_common_logging.debug_ftn(CCX_DMT_DECODER_608, "Visible End time=%s\n", print_mstime_static(fts));
return fts;
}
@@ -148,7 +147,11 @@ int do_cb(struct lib_cc_decode *ctx, unsigned char *cc_block, struct cc_subtitle
else
writercwtdata(ctx, cc_block, sub);
}
cb_field1++;
// For container formats (H.264, MPEG-2 PES), don't increment cb_field
// because the frame PTS already represents the correct timestamp.
// The cb_field offset is only meaningful for raw/elementary streams.
if (ctx->in_bufferdatatype != CCX_H264 && ctx->in_bufferdatatype != CCX_PES)
cb_field1++;
break;
case 1:
dbg_print(CCX_DMT_CBRAW, " .. %s ..\n", debug_608_to_ASC(cc_block, 1));
@@ -172,7 +175,9 @@ int do_cb(struct lib_cc_decode *ctx, unsigned char *cc_block, struct cc_subtitle
else
writercwtdata(ctx, cc_block, sub);
}
cb_field2++;
// For container formats, don't increment cb_field (see comment above)
if (ctx->in_bufferdatatype != CCX_H264 && ctx->in_bufferdatatype != CCX_PES)
cb_field2++;
break;
case 2: // EIA-708
// DTVCC packet data
@@ -197,7 +202,9 @@ int do_cb(struct lib_cc_decode *ctx, unsigned char *cc_block, struct cc_subtitle
if (ctx->write_format == CCX_OF_RCWT)
writercwtdata(ctx, cc_block, sub);
}
cb_708++;
// For container formats, don't increment cb_708 (see comment above)
if (ctx->in_bufferdatatype != CCX_H264 && ctx->in_bufferdatatype != CCX_PES)
cb_708++;
// Check for bytes read
// printf ("Warning: Losing EIA-708 data!\n");
break;
@@ -217,13 +224,43 @@ int do_cb(struct lib_cc_decode *ctx, unsigned char *cc_block, struct cc_subtitle
void dinit_cc_decode(struct lib_cc_decode **ctx)
{
struct lib_cc_decode *lctx = *ctx;
#ifndef DISABLE_RUST
ccxr_dtvcc_free(lctx->dtvcc_rust);
lctx->dtvcc_rust = NULL;
#else
dtvcc_free(&lctx->dtvcc);
#endif
dinit_avc(&lctx->avc_ctx);
ccx_decoder_608_dinit_library(&lctx->context_cc608_field_1);
ccx_decoder_608_dinit_library(&lctx->context_cc608_field_2);
dinit_timing_ctx(&lctx->timing);
free_decoder_context(lctx->prev);
free_subtitle(lctx->dec_sub.prev);
/* Free the embedded dec_sub's data field (allocated by write_cc_buffer) */
if (lctx->dec_sub.datatype == CC_DATATYPE_DVB)
{
struct cc_bitmap *bitmap = (struct cc_bitmap *)lctx->dec_sub.data;
if (bitmap)
{
freep(&bitmap->data0);
freep(&bitmap->data1);
}
}
/* Free any leftover XDS strings that weren't processed by the encoder */
if (lctx->dec_sub.type == CC_608 && lctx->dec_sub.data)
{
struct eia608_screen *data = (struct eia608_screen *)lctx->dec_sub.data;
for (int i = 0; i < lctx->dec_sub.nb_data; i++, data++)
{
if (data->format == SFORMAT_XDS && data->xds_str)
{
freep(&data->xds_str);
}
}
}
freep(&lctx->dec_sub.data);
/* Note: xds_ctx is freed in general_loop.c, mp4.c etc. during normal processing.
Don't free it here as it may cause double-free if already freed elsewhere. */
freep(ctx);
}
@@ -233,11 +270,27 @@ struct lib_cc_decode *init_cc_decode(struct ccx_decoders_common_settings_t *sett
ctx = malloc(sizeof(struct lib_cc_decode));
if (!ctx)
return NULL;
fatal(EXIT_NOT_ENOUGH_MEMORY, "In init_cc_decode: Out of memory allocating ctx.");
// Initialize all pointers to NULL
ctx->avc_ctx = NULL;
ctx->timing = NULL;
ctx->dtvcc = NULL;
ctx->context_cc608_field_1 = NULL;
ctx->context_cc608_field_2 = NULL;
ctx->xds_ctx = NULL;
ctx->vbi_decoder = NULL;
ctx->prev = NULL;
memset(&ctx->dec_sub, 0, sizeof(ctx->dec_sub));
ctx->avc_ctx = init_avc();
if (!ctx->avc_ctx)
fatal(EXIT_NOT_ENOUGH_MEMORY, "In init_cc_decode: Out of memory initializing avc_ctx.");
ctx->codec = setting->codec;
ctx->timing = init_timing_ctx(&ccx_common_timing_settings);
if (!ctx->timing)
fatal(EXIT_NOT_ENOUGH_MEMORY, "In init_cc_decode: Out of memory initializing timing.");
setting->settings_dtvcc->timing = ctx->timing;
@@ -246,8 +299,16 @@ struct lib_cc_decode *init_cc_decode(struct ccx_decoders_common_settings_t *sett
ctx->no_rollup = setting->no_rollup;
ctx->noscte20 = setting->noscte20;
#ifndef DISABLE_RUST
ctx->dtvcc_rust = ccxr_dtvcc_init(setting->settings_dtvcc);
ctx->dtvcc = NULL; // Not used when Rust is enabled
#else
ctx->dtvcc = dtvcc_init(setting->settings_dtvcc);
if (!ctx->dtvcc)
fatal(EXIT_NOT_ENOUGH_MEMORY, "In init_cc_decode: Out of memory initializing dtvcc.");
ctx->dtvcc->is_active = setting->settings_dtvcc->enabled;
ctx->dtvcc_rust = NULL;
#endif
if (setting->codec == CCX_CODEC_ATSC_CC)
{
@@ -260,6 +321,8 @@ struct lib_cc_decode *init_cc_decode(struct ccx_decoders_common_settings_t *sett
setting->cc_to_stdout,
setting->output_format,
ctx->timing);
if (!ctx->context_cc608_field_1)
fatal(EXIT_NOT_ENOUGH_MEMORY, "In init_cc_decode: Out of memory initializing context_cc608_field_1.");
ctx->context_cc608_field_2 = ccx_decoder_608_init_library(
setting->settings_608,
setting->cc_channel,
@@ -268,11 +331,8 @@ struct lib_cc_decode *init_cc_decode(struct ccx_decoders_common_settings_t *sett
setting->cc_to_stdout,
setting->output_format,
ctx->timing);
}
else
{
ctx->context_cc608_field_1 = NULL;
ctx->context_cc608_field_2 = NULL;
if (!ctx->context_cc608_field_2)
fatal(EXIT_NOT_ENOUGH_MEMORY, "In init_cc_decode: Out of memory initializing context_cc608_field_2.");
}
ctx->current_field = 1;
ctx->private_data = setting->private_data;
@@ -377,6 +437,8 @@ struct lib_cc_decode *init_cc_decode(struct ccx_decoders_common_settings_t *sett
setting->xds_write_to_file = 0;
}
ctx->xds_ctx = ccx_decoders_xds_init_library(ctx->timing, setting->xds_write_to_file);
if (!ctx->xds_ctx)
fatal(EXIT_NOT_ENOUGH_MEMORY, "In init_cc_decode: Out of memory initializing xds_ctx.");
ctx->vbi_decoder = NULL;
ctx->ocr_quantmode = setting->ocr_quantmode;
@@ -426,6 +488,13 @@ void flush_cc_decode(struct lib_cc_decode *ctx, struct cc_subtitle *sub)
}
}
}
#ifndef DISABLE_RUST
if (ccxr_dtvcc_is_active(ctx->dtvcc_rust))
{
ctx->current_field = 3;
ccxr_flush_active_decoders(ctx->dtvcc_rust);
}
#else
if (ctx->dtvcc->is_active)
{
for (int i = 0; i < CCX_DTVCC_MAX_SERVICES; i++)
@@ -440,52 +509,86 @@ void flush_cc_decode(struct lib_cc_decode *ctx, struct cc_subtitle *sub)
}
}
}
#endif
}
struct encoder_ctx *copy_encoder_context(struct encoder_ctx *ctx)
{
struct encoder_ctx *ctx_copy = NULL;
ctx_copy = malloc(sizeof(struct encoder_ctx));
if (!ctx_copy)
fatal(EXIT_NOT_ENOUGH_MEMORY, "In copy_encoder_context: Out of memory allocating ctx_copy.");
memcpy(ctx_copy, ctx, sizeof(struct encoder_ctx));
// Initialize copied pointers to NULL before re-allocating
ctx_copy->buffer = NULL;
ctx_copy->first_input_file = NULL;
ctx_copy->out = NULL;
ctx_copy->timing = NULL;
ctx_copy->transcript_settings = NULL;
ctx_copy->subline = NULL;
ctx_copy->start_credits_text = NULL;
ctx_copy->end_credits_text = NULL;
ctx_copy->prev = NULL;
ctx_copy->last_string = NULL;
if (ctx->buffer)
{
ctx_copy->buffer = malloc(ctx->capacity * sizeof(unsigned char));
if (!ctx_copy->buffer)
fatal(EXIT_NOT_ENOUGH_MEMORY, "In copy_encoder_context: Out of memory allocating buffer.");
memcpy(ctx_copy->buffer, ctx->buffer, ctx->capacity * sizeof(unsigned char));
}
if (ctx->first_input_file)
{
ctx_copy->first_input_file = malloc(strlen(ctx->first_input_file) * sizeof(char));
memcpy(ctx_copy->first_input_file, ctx->first_input_file, strlen(ctx->first_input_file) * sizeof(char));
size_t len = strlen(ctx->first_input_file) + 1;
ctx_copy->first_input_file = malloc(len);
if (!ctx_copy->first_input_file)
fatal(EXIT_NOT_ENOUGH_MEMORY, "In copy_encoder_context: Out of memory allocating first_input_file.");
memcpy(ctx_copy->first_input_file, ctx->first_input_file, len);
}
if (ctx->out)
{
ctx_copy->out = malloc(sizeof(struct ccx_s_write));
if (!ctx_copy->out)
fatal(EXIT_NOT_ENOUGH_MEMORY, "In copy_encoder_context: Out of memory allocating out.");
memcpy(ctx_copy->out, ctx->out, sizeof(struct ccx_s_write));
}
if (ctx->timing)
{
ctx_copy->timing = malloc(sizeof(struct ccx_common_timing_ctx));
if (!ctx_copy->timing)
fatal(EXIT_NOT_ENOUGH_MEMORY, "In copy_encoder_context: Out of memory allocating timing.");
memcpy(ctx_copy->timing, ctx->timing, sizeof(struct ccx_common_timing_ctx));
}
if (ctx->transcript_settings)
{
ctx_copy->transcript_settings = malloc(sizeof(struct ccx_encoders_transcript_format));
if (!ctx_copy->transcript_settings)
fatal(EXIT_NOT_ENOUGH_MEMORY, "In copy_encoder_context: Out of memory allocating transcript_settings.");
memcpy(ctx_copy->transcript_settings, ctx->transcript_settings, sizeof(struct ccx_encoders_transcript_format));
}
if (ctx->subline)
{
ctx_copy->subline = malloc(SUBLINESIZE);
if (!ctx_copy->subline)
fatal(EXIT_NOT_ENOUGH_MEMORY, "In copy_encoder_context: Out of memory allocating subline.");
memcpy(ctx_copy->subline, ctx->subline, SUBLINESIZE);
}
if (ctx->start_credits_text)
{
ctx_copy->start_credits_text = malloc(strlen(ctx->start_credits_text) * sizeof(char));
memcpy(ctx_copy->start_credits_text, ctx->start_credits_text, (strlen(ctx->start_credits_text) + 1) * sizeof(char));
size_t len = strlen(ctx->start_credits_text) + 1;
ctx_copy->start_credits_text = malloc(len);
if (!ctx_copy->start_credits_text)
fatal(EXIT_NOT_ENOUGH_MEMORY, "In copy_encoder_context: Out of memory allocating start_credits_text.");
memcpy(ctx_copy->start_credits_text, ctx->start_credits_text, len);
}
if (ctx->end_credits_text)
{
ctx_copy->end_credits_text = malloc(strlen(ctx->end_credits_text) * sizeof(char));
memcpy(ctx_copy->end_credits_text, ctx->end_credits_text, (strlen(ctx->end_credits_text) + 1) * sizeof(char));
size_t len = strlen(ctx->end_credits_text) + 1;
ctx_copy->end_credits_text = malloc(len);
if (!ctx_copy->end_credits_text)
fatal(EXIT_NOT_ENOUGH_MEMORY, "In copy_encoder_context: Out of memory allocating end_credits_text.");
memcpy(ctx_copy->end_credits_text, ctx->end_credits_text, len);
}
return ctx_copy;
}
@@ -493,42 +596,67 @@ struct lib_cc_decode *copy_decoder_context(struct lib_cc_decode *ctx)
{
struct lib_cc_decode *ctx_copy = NULL;
ctx_copy = malloc(sizeof(struct lib_cc_decode));
if (!ctx_copy)
fatal(EXIT_NOT_ENOUGH_MEMORY, "In copy_decoder_context: Out of memory allocating ctx_copy.");
memcpy(ctx_copy, ctx, sizeof(struct lib_cc_decode));
// Initialize copied pointers to NULL before re-allocating
ctx_copy->context_cc608_field_1 = NULL;
ctx_copy->context_cc608_field_2 = NULL;
ctx_copy->timing = NULL;
ctx_copy->avc_ctx = NULL;
ctx_copy->private_data = NULL;
ctx_copy->dtvcc = NULL;
ctx_copy->xds_ctx = NULL;
ctx_copy->vbi_decoder = NULL;
if (ctx->context_cc608_field_1)
{
ctx_copy->context_cc608_field_1 = malloc(sizeof(struct ccx_decoder_608_context));
if (!ctx_copy->context_cc608_field_1)
fatal(EXIT_NOT_ENOUGH_MEMORY, "In copy_decoder_context: Out of memory allocating context_cc608_field_1.");
memcpy(ctx_copy->context_cc608_field_1, ctx->context_cc608_field_1, sizeof(struct ccx_decoder_608_context));
}
if (ctx->context_cc608_field_2)
{
ctx_copy->context_cc608_field_2 = malloc(sizeof(struct ccx_decoder_608_context));
if (!ctx_copy->context_cc608_field_2)
fatal(EXIT_NOT_ENOUGH_MEMORY, "In copy_decoder_context: Out of memory allocating context_cc608_field_2.");
memcpy(ctx_copy->context_cc608_field_2, ctx->context_cc608_field_2, sizeof(struct ccx_decoder_608_context));
}
if (ctx->timing)
{
ctx_copy->timing = malloc(sizeof(struct ccx_common_timing_ctx));
if (!ctx_copy->timing)
fatal(EXIT_NOT_ENOUGH_MEMORY, "In copy_decoder_context: Out of memory allocating timing.");
memcpy(ctx_copy->timing, ctx->timing, sizeof(struct ccx_common_timing_ctx));
}
if (ctx->avc_ctx)
{
ctx_copy->avc_ctx = malloc(sizeof(struct avc_ctx));
if (!ctx_copy->avc_ctx)
fatal(EXIT_NOT_ENOUGH_MEMORY, "In copy_decoder_context: Out of memory allocating avc_ctx.");
memcpy(ctx_copy->avc_ctx, ctx->avc_ctx, sizeof(struct avc_ctx));
}
ctx_copy->private_data = NULL;
if (ctx->dtvcc)
{
ctx_copy->dtvcc = malloc(sizeof(struct dtvcc_ctx));
if (!ctx_copy->dtvcc)
fatal(EXIT_NOT_ENOUGH_MEMORY, "In copy_decoder_context: Out of memory allocating dtvcc.");
memcpy(ctx_copy->dtvcc, ctx->dtvcc, sizeof(struct dtvcc_ctx));
}
if (ctx->xds_ctx)
{
ctx_copy->xds_ctx = malloc(sizeof(struct ccx_decoders_xds_context));
if (!ctx_copy->xds_ctx)
fatal(EXIT_NOT_ENOUGH_MEMORY, "In copy_decoder_context: Out of memory allocating xds_ctx.");
memcpy(ctx_copy->xds_ctx, ctx->xds_ctx, sizeof(struct ccx_decoders_xds_context));
}
if (ctx->vbi_decoder)
{
ctx_copy->vbi_decoder = malloc(sizeof(struct ccx_decoder_vbi_ctx));
if (!ctx_copy->vbi_decoder)
fatal(EXIT_NOT_ENOUGH_MEMORY, "In copy_decoder_context: Out of memory allocating vbi_decoder.");
memcpy(ctx_copy->vbi_decoder, ctx->vbi_decoder, sizeof(struct ccx_decoder_vbi_ctx));
}
return ctx_copy;
@@ -537,12 +665,17 @@ struct cc_subtitle *copy_subtitle(struct cc_subtitle *sub)
{
struct cc_subtitle *sub_copy = NULL;
sub_copy = malloc(sizeof(struct cc_subtitle));
if (!sub_copy)
fatal(EXIT_NOT_ENOUGH_MEMORY, "In copy_subtitle: Out of memory allocating sub_copy.");
memcpy(sub_copy, sub, sizeof(struct cc_subtitle));
sub_copy->datatype = sub->datatype;
sub_copy->data = NULL;
if (sub->data)
{
sub_copy->data = malloc(sub->nb_data * sizeof(struct eia608_screen));
if (!sub_copy->data)
fatal(EXIT_NOT_ENOUGH_MEMORY, "In copy_subtitle: Out of memory allocating data.");
memcpy(sub_copy->data, sub->data, sub->nb_data * sizeof(struct eia608_screen));
}
return sub_copy;

View File

@@ -11,25 +11,31 @@
extern uint64_t utc_refvalue; // UTC referential value
// Declarations
LLONG get_visible_start (struct ccx_common_timing_ctx *ctx, int current_field);
LLONG get_visible_end (struct ccx_common_timing_ctx *ctx, int current_field);
LLONG get_visible_start(struct ccx_common_timing_ctx *ctx, int current_field);
LLONG get_visible_end(struct ccx_common_timing_ctx *ctx, int current_field);
unsigned int get_decoder_str_basic(unsigned char *buffer, unsigned char *line, int trim_subs, enum ccx_encoding_type encoding);
void ccx_decoders_common_settings_init(LLONG subs_delay, enum ccx_output_format output_format);
int validate_cc_data_pair (unsigned char *cc_data_pair);
int process_cc_data (struct encoder_ctx *enc_ctx, struct lib_cc_decode *ctx, unsigned char *cc_data, int cc_count, struct cc_subtitle *sub);
int do_cb (struct lib_cc_decode *ctx, unsigned char *cc_block, struct cc_subtitle *sub);
void printdata (struct lib_cc_decode *ctx, const unsigned char *data1, int length1,
const unsigned char *data2, int length2, struct cc_subtitle *sub);
struct lib_cc_decode* init_cc_decode (struct ccx_decoders_common_settings_t *setting);
int validate_cc_data_pair(unsigned char *cc_data_pair);
int process_cc_data(struct encoder_ctx *enc_ctx, struct lib_cc_decode *ctx, unsigned char *cc_data, int cc_count, struct cc_subtitle *sub);
int do_cb(struct lib_cc_decode *ctx, unsigned char *cc_block, struct cc_subtitle *sub);
void printdata(struct lib_cc_decode *ctx, const unsigned char *data1, int length1,
const unsigned char *data2, int length2, struct cc_subtitle *sub);
struct lib_cc_decode *init_cc_decode(struct ccx_decoders_common_settings_t *setting);
void dinit_cc_decode(struct lib_cc_decode **ctx);
void flush_cc_decode(struct lib_cc_decode *ctx, struct cc_subtitle *sub);
struct encoder_ctx* copy_encoder_context(struct encoder_ctx *ctx);
struct lib_cc_decode* copy_decoder_context(struct lib_cc_decode *ctx);
struct cc_subtitle* copy_subtitle(struct cc_subtitle *sub);
struct encoder_ctx *copy_encoder_context(struct encoder_ctx *ctx);
struct lib_cc_decode *copy_decoder_context(struct lib_cc_decode *ctx);
struct cc_subtitle *copy_subtitle(struct cc_subtitle *sub);
void free_encoder_context(struct encoder_ctx *ctx);
void free_decoder_context(struct lib_cc_decode *ctx);
void free_subtitle(struct cc_subtitle* sub);
void free_subtitle(struct cc_subtitle *sub);
#ifndef DISABLE_RUST
// Rust FFI function to flush active CEA-708 service decoders
extern void ccxr_flush_active_decoders(void *dtvcc_rust);
#endif
#endif

View File

@@ -345,6 +345,11 @@ static struct ISDBText *allocate_text_node(ISDBSubLayout *ls)
text->used = 0;
text->buf = malloc(128);
if (!text->buf)
{
free(text);
return NULL;
}
text->len = 128;
*text->buf = 0;
return text;
@@ -719,16 +724,17 @@ static int parse_csi(ISDBSubContext *ctx, const uint8_t *buf, int len)
// Copy buf in arg
for (i = 0; *buf != 0x20; i++)
{
if (i >= (sizeof(arg)) + 1)
if (i >= sizeof(arg) - 1)
{
isdb_log("UnExpected CSI %d >= %d", sizeof(arg) + 1, i);
isdb_log("UnExpected CSI: too long");
break;
}
arg[i] = *buf;
buf++;
}
/* ignore terminating 0x20 character */
arg[i] = *buf++;
if (i < sizeof(arg))
arg[i] = *buf++;
switch (*buf)
{

View File

@@ -9,5 +9,4 @@ int isdbsub_decode(struct lib_cc_decode *dec_ctx, const uint8_t *buf, size_t buf
void delete_isdb_decoder(void **isdb_ctx);
void *init_isdb_decoder(void);
#endif

View File

@@ -11,11 +11,10 @@
#define CCX_DECODER_608_SCREEN_ROWS 15
#define CCX_DECODER_608_SCREEN_WIDTH 32
#define MAXBFRAMES 50
#define SORTBUF (2*MAXBFRAMES+1)
#define SORTBUF (2 * MAXBFRAMES + 1)
/* flag raised when end of display marker arrives in Dvb Subtitle */
#define SUB_EOD_MARKER (1 << 0 )
#define SUB_EOD_MARKER (1 << 0)
struct cc_bitmap
{
int x;
@@ -77,13 +76,13 @@ enum ccx_decoder_608_color_code
};
/**
* This structure have fields which need to be ignored according to format,
* for example if format is SFORMAT_XDS then all fields other then
* xds related (xds_str, xds_len and cur_xds_packet_class) should be
* ignored and not to be dereferenced.
*
* TODO use union inside struct for each kind of fields
*/
* This structure have fields which need to be ignored according to format,
* for example if format is SFORMAT_XDS then all fields other then
* xds related (xds_str, xds_len and cur_xds_packet_class) should be
* ignored and not to be dereferenced.
*
* TODO use union inside struct for each kind of fields
*/
struct eia608_screen // A CC buffer
{
/** format of data inside this structure */
@@ -91,8 +90,8 @@ struct eia608_screen // A CC buffer
unsigned char characters[CCX_DECODER_608_SCREEN_ROWS][CCX_DECODER_608_SCREEN_WIDTH + 1];
enum ccx_decoder_608_color_code colors[CCX_DECODER_608_SCREEN_ROWS][CCX_DECODER_608_SCREEN_WIDTH + 1];
enum font_bits fonts[CCX_DECODER_608_SCREEN_ROWS][CCX_DECODER_608_SCREEN_WIDTH + 1]; // Extra char at the end for a 0
int row_used[CCX_DECODER_608_SCREEN_ROWS]; // Any data in row?
int empty; // Buffer completely empty?
int row_used[CCX_DECODER_608_SCREEN_ROWS]; // Any data in row?
int empty; // Buffer completely empty?
/** start time of this CC buffer */
LLONG start_time;
/** end time of this CC buffer */
@@ -110,20 +109,20 @@ struct eia608_screen // A CC buffer
struct ccx_decoders_common_settings_t
{
LLONG subs_delay; // ms to delay (or advance) subs
enum ccx_output_format output_format; // What kind of output format should be used?
int fix_padding; // Replace 0000 with 8080 in HDTV (needed for some cards)
LLONG subs_delay; // ms to delay (or advance) subs
enum ccx_output_format output_format; // What kind of output format should be used?
int fix_padding; // Replace 0000 with 8080 in HDTV (needed for some cards)
struct ccx_boundary_time extraction_start, extraction_end; // Segment we actually process
int cc_to_stdout;
int extract; // Extract 1st, 2nd or both fields
int fullbin; // Disable pruning of padding cc blocks
int extract; // Extract 1st, 2nd or both fields
int fullbin; // Disable pruning of padding cc blocks
int no_rollup;
int noscte20;
struct ccx_decoder_608_settings *settings_608; // Contains the settings for the 608 decoder.
ccx_decoder_dtvcc_settings *settings_dtvcc; // Same for cea 708 captions decoder (dtvcc)
int cc_channel; // Channel we want to dump in srt mode
struct ccx_decoder_608_settings *settings_608; // Contains the settings for the 608 decoder.
ccx_decoder_dtvcc_settings *settings_dtvcc; // Same for cea 708 captions decoder (dtvcc)
int cc_channel; // Channel we want to dump in srt mode
unsigned send_to_srv;
unsigned int hauppauge_mode; // If 1, use PID=1003, process specially and so on
unsigned int hauppauge_mode; // If 1, use PID=1003, process specially and so on
int program_number;
enum ccx_code_type codec;
int xds_write_to_file;
@@ -142,17 +141,17 @@ struct lib_cc_decode
void *context_cc608_field_1;
void *context_cc608_field_2;
int no_rollup; // If 1, write one line at a time
int no_rollup; // If 1, write one line at a time
int noscte20;
int fix_padding; // Replace 0000 with 8080 in HDTV (needed for some cards)
enum ccx_output_format write_format; // 0 = Raw, 1 = srt, 2 = SMI
int fix_padding; // Replace 0000 with 8080 in HDTV (needed for some cards)
enum ccx_output_format write_format; // 0 = Raw, 1 = srt, 2 = SMI
struct ccx_boundary_time extraction_start, extraction_end; // Segment we actually process
LLONG subs_delay; // ms to delay (or advance) subs
int extract; // Extract 1st, 2nd or both fields
int fullbin; // Disable pruning of padding cc blocks
LLONG subs_delay; // ms to delay (or advance) subs
int extract; // Extract 1st, 2nd or both fields
int fullbin; // Disable pruning of padding cc blocks
struct cc_subtitle dec_sub;
enum ccx_bufferdata_type in_bufferdatatype;
unsigned int hauppauge_mode; // If 1, use PID=1003, process specially and so on
unsigned int hauppauge_mode; // If 1, use PID=1003, process specially and so on
int frames_since_last_gop;
/* GOP-based timing */
@@ -185,7 +184,7 @@ struct lib_cc_decode
int in_pic_data;
unsigned int current_progressive_sequence;
unsigned int current_pulldownfields ;
unsigned int current_pulldownfields;
int temporal_reference;
enum ccx_frame_type picture_coding_type;
@@ -197,18 +196,19 @@ struct lib_cc_decode
/* Required in es_function.c and es_userdata.c */
unsigned top_field_first; // Needs to be global
/* Stats. Modified in es_userdata.c*/
int stat_numuserheaders;
int stat_dvdccheaders;
int stat_scte20ccheaders;
int stat_replay5000headers;
int stat_replay4000headers;
int stat_dishheaders;
int stat_hdtv;
int stat_divicom;
int false_pict_header;
/* Stats. Modified in es_userdata.c*/
int stat_numuserheaders;
int stat_dvdccheaders;
int stat_scte20ccheaders;
int stat_replay5000headers;
int stat_replay4000headers;
int stat_dishheaders;
int stat_hdtv;
int stat_divicom;
int false_pict_header;
dtvcc_ctx *dtvcc;
void *dtvcc_rust; // Persistent Rust CEA-708 decoder context
int current_field;
// Analyse/use the picture information
int maxtref; // Use to remember the temporal reference number
@@ -217,7 +217,7 @@ struct lib_cc_decode
// Store fts;
LLONG cc_fts[SORTBUF];
// Store HD CC packets
unsigned char cc_data_pkts[SORTBUF][10*31*3+1]; // *10, because MP4 seems to have different limits
unsigned char cc_data_pkts[SORTBUF][10 * 31 * 3 + 1]; // *10, because MP4 seems to have different limits
// The sequence number of the current anchor frame. All currently read
// B-Frames belong to this I- or P-frame.
@@ -227,7 +227,7 @@ struct lib_cc_decode
int (*writedata)(const unsigned char *data, int length, void *private_data, struct cc_subtitle *sub);
//dvb subtitle related
// dvb subtitle related
int ocr_quantmode;
struct lib_cc_decode *prev;
};

View File

@@ -18,11 +18,10 @@ struct ccx_decoder_vbi_ctx
{
int vbi_decoder_inited;
vbi_raw_decoder zvbi_decoder;
//vbi3_raw_decoder zvbi_decoder;
// vbi3_raw_decoder zvbi_decoder;
#ifdef VBI_DEBUG
FILE *vbi_debug_dump;
#endif
};
#endif

View File

@@ -175,6 +175,8 @@ void xdsprint(struct cc_subtitle *sub, struct ccx_decoders_xds_context *ctx, con
if (n > -1 && n < size)
{
write_xds_string(sub, ctx, p, n);
/* Note: Don't free(p) here - the pointer is stored in data->xds_str
and will be freed by the encoder or decoder cleanup code */
return;
}
/* Else try again with more space. */
@@ -347,9 +349,9 @@ void xds_do_copy_generation_management_system(struct cc_subtitle *sub, struct cc
const char *copytext[4] = {"Copy permitted (no restrictions)", "No more copies (one generation copy has been made)",
"One generation of copies can be made", "No copying is permitted"};
const char *apstext[4] = {"No APS", "PSP On; Split Burst Off", "PSP On; 2 line Split Burst On", "PSP On; 4 line Split Burst On"};
sprintf(copy_permited, "CGMS: %s", copytext[cgms_a_b4 * 2 + cgms_a_b3]);
sprintf(aps, "APS: %s", apstext[aps_b2 * 2 + aps_b1]);
sprintf(rcd, "Redistribution Control Descriptor: %d", rcd0);
snprintf(copy_permited, sizeof(copy_permited), "CGMS: %s", copytext[cgms_a_b4 * 2 + cgms_a_b3]);
snprintf(aps, sizeof(aps), "APS: %s", apstext[aps_b2 * 2 + aps_b1]);
snprintf(rcd, sizeof(rcd), "Redistribution Control Descriptor: %d", rcd0);
}
xdsprint(sub, ctx, copy_permited);
@@ -407,30 +409,45 @@ void xds_do_content_advisory(struct cc_subtitle *sub, struct ccx_decoders_xds_co
const char *agetext[8] = {"None", "TV-Y (All Children)", "TV-Y7 (Older Children)",
"TV-G (General Audience)", "TV-PG (Parental Guidance Suggested)",
"TV-14 (Parents Strongly Cautioned)", "TV-MA (Mature Audience Only)", "None"};
sprintf(age, "ContentAdvisory: US TV Parental Guidelines. Age Rating: %s", agetext[g2 * 4 + g1 * 2 + g0]);
snprintf(age, sizeof(age), "ContentAdvisory: US TV Parental Guidelines. Age Rating: %s", agetext[g2 * 4 + g1 * 2 + g0]);
content[0] = 0;
size_t content_len = 0;
if (!g2 && g1 && !g0) // For TV-Y7 (Older children), the Violence bit is "fantasy violence"
{
if (FV)
strcpy(content, "[Fantasy Violence] ");
{
snprintf(content, sizeof(content), "[Fantasy Violence] ");
content_len = strlen(content);
}
}
else // For all others, is real
{
if (FV)
strcpy(content, "[Violence] ");
{
snprintf(content, sizeof(content), "[Violence] ");
content_len = strlen(content);
}
}
if (S)
strcat(content, "[Sexual Situations] ");
{
snprintf(content + content_len, sizeof(content) - content_len, "[Sexual Situations] ");
content_len = strlen(content);
}
if (La3)
strcat(content, "[Adult Language] ");
{
snprintf(content + content_len, sizeof(content) - content_len, "[Adult Language] ");
content_len = strlen(content);
}
if (Da2)
strcat(content, "[Sexually Suggestive Dialog] ");
{
snprintf(content + content_len, sizeof(content) - content_len, "[Sexually Suggestive Dialog] ");
}
supported = 1;
}
if (!a0) // MPA
{
const char *ratingtext[8] = {"N/A", "G", "PG", "PG-13", "R", "NC-17", "X", "Not Rated"};
sprintf(rating, "ContentAdvisory: MPA Rating: %s", ratingtext[r2 * 4 + r1 * 2 + r0]);
snprintf(rating, sizeof(rating), "ContentAdvisory: MPA Rating: %s", ratingtext[r2 * 4 + r1 * 2 + r0]);
supported = 1;
}
if (a0 && a1 && !Da2 && !La3) // Canadian English Language Rating
@@ -438,7 +455,7 @@ void xds_do_content_advisory(struct cc_subtitle *sub, struct ccx_decoders_xds_co
const char *ratingtext[8] = {"Exempt", "Children", "Children eight years and older",
"General programming suitable for all audiences", "Parental Guidance",
"Viewers 14 years and older", "Adult Programming", "[undefined]"};
sprintf(rating, "ContentAdvisory: Canadian English Rating: %s", ratingtext[g2 * 4 + g1 * 2 + g0]);
snprintf(rating, sizeof(rating), "ContentAdvisory: Canadian English Rating: %s", ratingtext[g2 * 4 + g1 * 2 + g0]);
supported = 1;
}
if (a0 && a1 && Da2 && !La3) // Canadian French Language Rating
@@ -447,7 +464,7 @@ void xds_do_content_advisory(struct cc_subtitle *sub, struct ccx_decoders_xds_co
"Cette ?mission peut ne pas convenir aux enfants de moins de 13 ans",
"Cette ?mission ne convient pas aux moins de 16 ans",
"Cette ?mission est r?serv?e aux adultes", "[invalid]", "[invalid]"};
sprintf(rating, "ContentAdvisory: Canadian French Rating: %s", ratingtext[g2 * 4 + g1 * 2 + g0]);
snprintf(rating, sizeof(rating), "ContentAdvisory: Canadian French Rating: %s", ratingtext[g2 * 4 + g1 * 2 + g0]);
supported = 1;
}
}
@@ -455,14 +472,17 @@ void xds_do_content_advisory(struct cc_subtitle *sub, struct ccx_decoders_xds_co
if (!a1 && a0) // US TV parental guidelines
{
xdsprint(sub, ctx, age);
xdsprint(sub, ctx, content);
if (content[0]) // Only output content if not empty
xdsprint(sub, ctx, content);
if (changed)
{
ccx_common_logging.log_ftn("\rXDS: %s\n ", age);
ccx_common_logging.log_ftn("\rXDS: %s\n ", content);
if (content[0])
ccx_common_logging.log_ftn("\rXDS: %s\n ", content);
}
ccx_common_logging.debug_ftn(CCX_DMT_DECODER_XDS, "\rXDS: %s\n", age);
ccx_common_logging.debug_ftn(CCX_DMT_DECODER_XDS, "\rXDS: %s\n", content);
if (content[0])
ccx_common_logging.debug_ftn(CCX_DMT_DECODER_XDS, "\rXDS: %s\n", content);
}
if (!a0 || // MPA
(a0 && a1 && !Da2 && !La3) || // Canadian English Language Rating
@@ -484,6 +504,10 @@ int xds_do_current_and_future(struct cc_subtitle *sub, struct ccx_decoders_xds_c
int was_proc = 0;
char *str = malloc(1024);
if (!str)
{
return CCX_ENOMEM;
}
char *tstr = NULL;
int str_len = 1024;
@@ -712,7 +736,8 @@ int xds_do_current_and_future(struct cc_subtitle *sub, struct ccx_decoders_xds_c
if (changed)
{
ccx_common_logging.log_ftn("\rXDS description line %d: %s\n", line_num, xds_desc);
strcpy(ctx->xds_program_description[line_num], xds_desc);
strncpy(ctx->xds_program_description[line_num], xds_desc, 32);
ctx->xds_program_description[line_num][32] = '\0';
}
else
{
@@ -749,7 +774,8 @@ int xds_do_channel(struct cc_subtitle *sub, struct ccx_decoders_xds_context *ctx
if (strcmp(xds_network_name, ctx->current_xds_network_name)) // Change of station
{
ccx_common_logging.log_ftn("XDS Notice: Network is now %s\n", xds_network_name);
strcpy(ctx->current_xds_network_name, xds_network_name);
strncpy(ctx->current_xds_network_name, xds_network_name, 32);
ctx->current_xds_network_name[32] = '\0';
}
break;
case XDS_TYPE_CALL_LETTERS_AND_CHANNEL:
@@ -800,12 +826,19 @@ int xds_do_private_data(struct cc_subtitle *sub, struct ccx_decoders_xds_context
if (!ctx)
return CCX_EINVAL;
str = malloc((ctx->cur_xds_payload_length * 3) + 1);
size_t str_size = (ctx->cur_xds_payload_length * 3) + 1;
str = malloc(str_size);
if (str == NULL) // Only thing we can do with private data is dump it.
return 1;
str[0] = '\0';
size_t offset = 0;
for (i = 2; i < ctx->cur_xds_payload_length - 1; i++)
sprintf(str, "%02X ", ctx->cur_xds_payload[i]);
{
int written = snprintf(str + offset, str_size - offset, "%02X ", ctx->cur_xds_payload[i]);
if (written > 0)
offset += written;
}
xdsprint(sub, ctx, str);
free(str);

View File

@@ -4,11 +4,11 @@
#include "ccx_decoders_common.h"
#define NUM_BYTES_PER_PACKET 35 // Class + type (repeated for convenience) + data + zero
#define NUM_XDS_BUFFERS 9 // CEA recommends no more than one level of interleaving. Play it safe
#define NUM_XDS_BUFFERS 9 // CEA recommends no more than one level of interleaving. Play it safe
struct ccx_decoders_xds_context;
void process_xds_bytes (struct ccx_decoders_xds_context *ctx, const unsigned char hi, int lo);
void do_end_of_xds (struct cc_subtitle *sub, struct ccx_decoders_xds_context *ctx, unsigned char expected_checksum);
void process_xds_bytes(struct ccx_decoders_xds_context *ctx, const unsigned char hi, int lo);
void do_end_of_xds(struct cc_subtitle *sub, struct ccx_decoders_xds_context *ctx, unsigned char expected_checksum);
struct ccx_decoders_xds_context *ccx_decoders_xds_init_library(struct ccx_common_timing_ctx *timing, int xds_write_to_file);

View File

@@ -3,9 +3,20 @@
#include "lib_ccx.h"
#include "utility.h"
#include "ffmpeg_intgr.h"
#ifndef DISABLE_RUST
void ccxr_demuxer_reset(struct ccx_demuxer *ctx);
void ccxr_demuxer_close(struct ccx_demuxer *ctx);
int ccxr_demuxer_isopen(const struct ccx_demuxer *ctx);
int ccxr_demuxer_open(struct ccx_demuxer *ctx, const char *file);
LLONG ccxr_demuxer_get_file_size(struct ccx_demuxer *ctx);
void ccxr_demuxer_print_cfg(const struct ccx_demuxer *ctx);
#endif
static void ccx_demuxer_reset(struct ccx_demuxer *ctx)
{
#ifndef DISABLE_RUST
ccxr_demuxer_reset(ctx);
#else
ctx->startbytes_pos = 0;
ctx->startbytes_avail = 0;
ctx->num_of_PIDs = 0;
@@ -17,10 +28,14 @@ static void ccx_demuxer_reset(struct ccx_demuxer *ctx)
}
memset(ctx->stream_id_of_each_pid, 0, (MAX_PSI_PID + 1) * sizeof(uint8_t));
memset(ctx->PIDs_programs, 0, 65536 * sizeof(struct PMT_entry *));
#endif
}
static void ccx_demuxer_close(struct ccx_demuxer *ctx)
{
#ifndef DISABLE_RUST
ccxr_demuxer_close(ctx);
#else
ctx->past = 0;
if (ctx->infd != -1 && ccx_options.input_source == CCX_DS_FILE)
{
@@ -28,14 +43,23 @@ static void ccx_demuxer_close(struct ccx_demuxer *ctx)
ctx->infd = -1;
activity_input_file_closed();
}
#endif
}
static int ccx_demuxer_isopen(struct ccx_demuxer *ctx)
{
#ifndef DISABLE_RUST
return ccxr_demuxer_isopen(ctx);
#else
return ctx->infd != -1;
#endif
}
static int ccx_demuxer_open(struct ccx_demuxer *ctx, const char *file)
{
#ifndef DISABLE_RUST
return ccxr_demuxer_open(ctx, file);
#else
ctx->past = 0;
ctx->min_global_timestamp = 0;
ctx->global_timestamp_inited = 0;
@@ -193,9 +217,14 @@ static int ccx_demuxer_open(struct ccx_demuxer *ctx, const char *file)
}
return 0;
#endif
}
LLONG ccx_demuxer_get_file_size(struct ccx_demuxer *ctx)
{
#ifndef DISABLE_RUST
return ccxr_demuxer_get_file_size(ctx);
#else
LLONG ret = 0;
int in = ctx->infd;
LLONG current = LSEEK(in, 0, SEEK_CUR);
@@ -208,6 +237,7 @@ LLONG ccx_demuxer_get_file_size(struct ccx_demuxer *ctx)
return -1;
return length;
#endif
}
static int ccx_demuxer_get_stream_mode(struct ccx_demuxer *ctx)
@@ -217,6 +247,9 @@ static int ccx_demuxer_get_stream_mode(struct ccx_demuxer *ctx)
static void ccx_demuxer_print_cfg(struct ccx_demuxer *ctx)
{
#ifndef DISABLE_RUST
ccxr_demuxer_print_cfg(ctx);
#else
switch (ctx->auto_stream)
{
case CCX_SM_ELEMENTARY_OR_NOT_FOUND:
@@ -252,6 +285,9 @@ static void ccx_demuxer_print_cfg(struct ccx_demuxer *ctx)
case CCX_SM_MXF:
mprint("MXF");
break;
case CCX_SM_SCC:
mprint("SCC");
break;
#ifdef WTV_DEBUG
case CCX_SM_HEX_DUMP:
mprint("Hex");
@@ -261,6 +297,7 @@ static void ccx_demuxer_print_cfg(struct ccx_demuxer *ctx)
fatal(CCX_COMMON_EXIT_BUG_BUG, "BUG: Unknown stream mode. Please file a bug report on Github.\n");
break;
}
#endif
}
void ccx_demuxer_delete(struct ccx_demuxer **ctx)
@@ -292,7 +329,7 @@ void ccx_demuxer_delete(struct ccx_demuxer **ctx)
struct ccx_demuxer *init_demuxer(void *parent, struct demuxer_cfg *cfg)
{
int i;
struct ccx_demuxer *ctx = malloc(sizeof(struct ccx_demuxer));
struct ccx_demuxer *ctx = calloc(1, sizeof(struct ccx_demuxer));
if (!ctx)
return NULL;
@@ -314,7 +351,6 @@ struct ccx_demuxer *init_demuxer(void *parent, struct demuxer_cfg *cfg)
{
ctx->pinfo[i].got_important_streams_min_pts[j] = UINT64_MAX;
}
ctx->pinfo[i].initialized_ocr = 0;
ctx->pinfo[i].version = 0xFF; // Not real in a real stream since it's 5 bits. FF => Not initialized
}
@@ -407,4 +443,4 @@ struct demuxer_data *alloc_demuxer_data(void)
data->next_stream = 0;
data->next_program = 0;
return data;
}
}

View File

@@ -25,22 +25,21 @@ enum STREAM_TYPE
};
struct ccx_demux_report
{
unsigned program_cnt;
unsigned dvb_sub_pid[SUB_STREAMS_CNT];
unsigned tlt_sub_pid[SUB_STREAMS_CNT];
unsigned mp4_cc_track_cnt;
unsigned program_cnt;
unsigned dvb_sub_pid[SUB_STREAMS_CNT];
unsigned tlt_sub_pid[SUB_STREAMS_CNT];
unsigned mp4_cc_track_cnt;
};
struct program_info
{
int pid;
int program_number;
int initialized_ocr; // Avoid initializing the OCR more than once
uint8_t analysed_PMT_once:1;
uint8_t analysed_PMT_once : 1;
uint8_t version;
uint8_t saved_section[1021];
int32_t crc;
uint8_t valid_crc:1;
uint8_t valid_crc : 1;
char name[MAX_PROGRAM_NAME_LEN];
/**
* -1 pid represent that pcr_pid is not available
@@ -48,6 +47,7 @@ struct program_info
int16_t pcr_pid;
uint64_t got_important_streams_min_pts[COUNT];
int has_all_min_pts;
char virtual_channel[16]; // Stores ATSC virtual channel like "2.1"
};
struct cap_info
@@ -56,9 +56,9 @@ struct cap_info
int program_number;
enum ccx_stream_type stream;
enum ccx_code_type codec;
long capbufsize;
int64_t capbufsize;
unsigned char *capbuf;
long capbuflen; // Bytes read in capbuf
int64_t capbuflen; // Bytes read in capbuf
int saw_pesstart;
int prev_counter;
void *codec_private_data;
@@ -77,7 +77,6 @@ struct cap_info
List joining all sibling Stream in Program
*/
struct list_head pg_stream;
};
struct ccx_demuxer
{
@@ -97,7 +96,6 @@ struct ccx_demuxer
int flag_ts_forced_cappid;
int ts_datastreamtype;
struct program_info pinfo[MAX_PROGRAM];
int nb_program;
/* subtitle codec type */
@@ -119,10 +117,10 @@ struct ccx_demuxer
struct PSI_buffer *PID_buffers[MAX_PSI_PID];
int PIDs_seen[MAX_PID];
/*51 possible stream ids in total, 0xbd is private stream, 0xbe is padding stream,
0xbf private stream 2, 0xc0 - 0xdf audio, 0xe0 - 0xef video
/*51 possible stream ids in total, 0xbd is private stream, 0xbe is padding stream,
0xbf private stream 2, 0xc0 - 0xdf audio, 0xe0 - 0xef video
(stream ids range from 0xbd to 0xef so 0xef - 0xbd + 1 = 51)*/
//uint8_t found_stream_ids[MAX_NUM_OF_STREAMIDS];
// uint8_t found_stream_ids[MAX_NUM_OF_STREAMIDS];
uint8_t stream_id_of_each_pid[MAX_PSI_PID + 1];
uint64_t min_pts[MAX_PSI_PID + 1];
@@ -141,7 +139,7 @@ struct ccx_demuxer
unsigned last_pat_length;
unsigned char *filebuffer;
LLONG filebuffer_start; // Position of buffer start relative to file
LLONG filebuffer_start; // Position of buffer start relative to file
unsigned int filebuffer_pos; // Position of pointer relative to buffer start
unsigned int bytesinbuffer; // Number of bytes we actually have on buffer
@@ -156,7 +154,7 @@ struct ccx_demuxer
void *parent;
//Will contain actual Demuxer Context
// Will contain actual Demuxer Context
void *private_data;
void (*print_cfg)(struct ccx_demuxer *ctx);
void (*reset)(struct ccx_demuxer *ctx);
@@ -164,7 +162,7 @@ struct ccx_demuxer
int (*open)(struct ccx_demuxer *ctx, const char *file_name);
int (*is_open)(struct ccx_demuxer *ctx);
int (*get_stream_mode)(struct ccx_demuxer *ctx);
LLONG (*get_filesize) (struct ccx_demuxer *ctx);
LLONG (*get_filesize)(struct ccx_demuxer *ctx);
};
struct demuxer_data
@@ -182,21 +180,21 @@ struct demuxer_data
struct demuxer_data *next_program;
};
struct cap_info *get_sib_stream_by_type(struct cap_info* program, enum ccx_code_type type);
struct cap_info *get_sib_stream_by_type(struct cap_info *program, enum ccx_code_type type);
struct ccx_demuxer *init_demuxer(void *parent, struct demuxer_cfg *cfg);
void ccx_demuxer_delete(struct ccx_demuxer **ctx);
struct demuxer_data* alloc_demuxer_data(void);
struct demuxer_data *alloc_demuxer_data(void);
void delete_demuxer_data(struct demuxer_data *data);
int update_capinfo(struct ccx_demuxer *ctx, int pid, enum ccx_stream_type stream, enum ccx_code_type codec, int pn, void *private_data);
struct cap_info * get_cinfo(struct ccx_demuxer *ctx, int pid);
struct cap_info *get_cinfo(struct ccx_demuxer *ctx, int pid);
int need_cap_info(struct ccx_demuxer *ctx, int program_number);
int need_cap_info_for_pid(struct ccx_demuxer *ctx, int pid);
struct demuxer_data *get_best_data(struct demuxer_data *data);
struct demuxer_data *get_data_stream(struct demuxer_data *data, int pid);
int get_best_stream(struct ccx_demuxer *ctx);
void ignore_other_stream(struct ccx_demuxer *ctx, int pid);
void dinit_cap (struct ccx_demuxer *ctx);
void dinit_cap(struct ccx_demuxer *ctx);
int get_programme_number(struct ccx_demuxer *ctx, int pid);
struct cap_info* get_best_sib_stream(struct cap_info* program);
void ignore_other_sib_stream(struct cap_info* head, int pid);
struct cap_info *get_best_sib_stream(struct cap_info *program);
void ignore_other_sib_stream(struct cap_info *head, int pid);
#endif

View File

@@ -14,13 +14,6 @@
#define IS_KLV_KEY(x, y) (!memcmp(x, y, sizeof(y)))
#define IS_KLV_KEY_ANY_VERSION(x, y) (!memcmp(x, y, 7) && !memcmp(x + 8, y + 8, sizeof(y) - 8))
enum MXFCaptionType
{
MXF_CT_VBI,
MXF_CT_ANC,
};
typedef uint8_t UID[16];
typedef struct KLVPacket
{
UID key;
@@ -35,29 +28,12 @@ typedef struct MXFCodecUL
typedef int ReadFunc(struct ccx_demuxer *ctx, uint64_t size);
typedef struct
{
int track_id;
uint8_t track_number[4];
} MXFTrack;
typedef struct MXFReadTableEntry
{
const UID key;
ReadFunc *read;
} MXFReadTableEntry;
typedef struct MXFContext
{
enum MXFCaptionType type;
int cap_track_id;
UID cap_essence_key;
MXFTrack tracks[32];
int nb_tracks;
int cap_count;
struct ccx_rational edit_rate;
} MXFContext;
typedef struct MXFLocalTAGS
{
uint16_t tag;
@@ -99,12 +75,15 @@ enum MXFLocalTag
void update_tid_lut(struct MXFContext *ctx, uint32_t track_id, uint8_t *track_number, struct ccx_rational edit_rate)
{
int i;
debug("update_tid_lut: track_id=%u (0x%x), track_number=%02X%02X%02X%02X, cap_track_id=%u\n",
track_id, track_id, track_number[0], track_number[1], track_number[2], track_number[3], ctx->cap_track_id);
// Update essence element key if we have track Id of caption
if (ctx->cap_track_id == track_id)
{
memcpy(ctx->cap_essence_key, mxf_essence_element_key, 12);
memcpy(ctx->cap_essence_key + 12, track_number, 4);
ctx->edit_rate = edit_rate;
debug("MXF: Found caption track, track_id=%u\n", track_id);
}
for (i = 0; i < ctx->nb_tracks; i++)
@@ -272,6 +251,7 @@ static int mxf_read_vanc_vbi_desc(struct ccx_demuxer *demux, uint64_t size)
{
case MXF_TAG_LTRACK_ID:
ctx->cap_track_id = buffered_get_be32(demux);
debug("MXF: VANC/VBI descriptor found, Linked Track ID = %u\n", ctx->cap_track_id);
update_cap_essence_key(ctx, ctx->cap_track_id);
break;
default:
@@ -328,6 +308,17 @@ static int mxf_read_cdp_data(struct ccx_demuxer *demux, int size, struct demuxer
log("Incomplete CDP packet\n");
ret = buffered_read(demux, data->buffer + data->len, cc_count * 3);
// Log first few bytes of cc_data for debugging
if (cc_count > 0)
{
unsigned char *cc_ptr = data->buffer + data->len;
debug("cc_data (first 6 triplets): ");
for (int j = 0; j < (cc_count < 6 ? cc_count : 6); j++)
{
debug("%02X%02X%02X ", cc_ptr[j * 3], cc_ptr[j * 3 + 1], cc_ptr[j * 3 + 2]);
}
debug("\n");
}
data->len += cc_count * 3;
demux->past += cc_count * 3;
len += ret;
@@ -385,7 +376,10 @@ static int mxf_read_vanc_data(struct ccx_demuxer *demux, uint64_t size, struct d
// uint8_t count; /* Currently unused */
if (size < 19)
{
debug("VANC data too small: %" PRIu64 " < 19\n", size);
goto error;
}
ret = buffered_read(demux, vanc_header, 16);
@@ -394,31 +388,39 @@ static int mxf_read_vanc_data(struct ccx_demuxer *demux, uint64_t size, struct d
return CCX_EOF;
len += ret;
debug("VANC header: num_packets=%d, line=0x%02x%02x, wrap_type=0x%02x, sample_config=0x%02x\n",
vanc_header[1], vanc_header[2], vanc_header[3], vanc_header[4], vanc_header[5]);
for (int i = 0; i < vanc_header[1]; i++)
{
DID = buffered_get_byte(demux);
len++;
debug("VANC packet %d: DID=0x%02x\n", i, DID);
if (!(DID == 0x61 || DID == 0x80))
{
debug("DID 0x%02x not recognized as caption DID\n", DID);
goto error;
}
SDID = buffered_get_byte(demux);
len++;
debug("VANC packet %d: SDID=0x%02x\n", i, SDID);
if (SDID == 0x01)
debug("Caption Type 708\n");
else if (SDID == 0x02)
debug("Caption Type 608\n");
cdp_size = buffered_get_byte(demux);
debug("VANC packet %d: cdp_size=%d\n", i, cdp_size);
if (cdp_size + 19 > size)
{
debug("Incomplete cdp(%d) in anc data(%d)\n", cdp_size, size);
log("Incomplete cdp(%d) in anc data(%" PRIu64 ")\n", cdp_size, size);
goto error;
}
len++;
ret = mxf_read_cdp_data(demux, cdp_size, data);
debug("mxf_read_cdp_data returned %d, data->len=%d\n", ret, data->len);
len += ret;
// len += (3 + count + 4);
}
@@ -435,15 +437,33 @@ static int mxf_read_essence_element(struct ccx_demuxer *demux, uint64_t size, st
int ret;
struct MXFContext *ctx = demux->private_data;
debug("mxf_read_essence_element: ctx->type=%d (ANC=%d, VBI=%d), size=%" PRIu64 "\n",
ctx->type, MXF_CT_ANC, MXF_CT_VBI, size);
if (ctx->type == MXF_CT_ANC)
{
data->bufferdatatype = CCX_RAW_TYPE;
ret = mxf_read_vanc_data(demux, size, data);
data->pts = ctx->cap_count;
debug("mxf_read_vanc_data returned %d, data->len=%d\n", ret, data->len);
// Calculate PTS in 90kHz units from frame count and edit rate
// edit_rate is frames per second (e.g., 25/1 for 25fps)
// PTS = frame_count * 90000 / fps = frame_count * 90000 * edit_rate.den / edit_rate.num
if (ctx->edit_rate.num > 0 && ctx->edit_rate.den > 0)
{
data->pts = (int64_t)ctx->cap_count * 90000 * ctx->edit_rate.den / ctx->edit_rate.num;
}
else
{
// Fallback to 25fps if edit_rate not set
data->pts = (int64_t)ctx->cap_count * 90000 / 25;
}
debug("Frame %d, PTS=%" PRId64 " (edit_rate=%d/%d)\n",
ctx->cap_count, data->pts, ctx->edit_rate.num, ctx->edit_rate.den);
ctx->cap_count++;
}
else
{
debug("Skipping essence element (not ANC type)\n");
ret = buffered_skip(demux, size);
demux->past += ret;
}
@@ -538,6 +558,7 @@ static int read_packet(struct ccx_demuxer *demux, struct demuxer_data *data)
KLVPacket klv;
const MXFReadTableEntry *reader;
struct MXFContext *ctx = demux->private_data;
static int first_essence_logged = 0;
while ((ret = klv_read_packet(&klv, demux)) == 0)
{
debug("Key %02X%02X%02X%02X%02X%02X%02X%02X.%02X%02X%02X%02X%02X%02X%02X%02X size %" PRIu64 "\n",
@@ -547,8 +568,25 @@ static int read_packet(struct ccx_demuxer *demux, struct demuxer_data *data)
klv.key[12], klv.key[13], klv.key[14], klv.key[15],
klv.length);
// Check if this is an essence element key (first 12 bytes match)
if (IS_KLV_KEY(klv.key, mxf_essence_element_key) && !first_essence_logged)
{
debug("MXF: First essence element key: %02X%02X%02X%02X%02X%02X%02X%02X.%02X%02X%02X%02X%02X%02X%02X%02X\n",
klv.key[0], klv.key[1], klv.key[2], klv.key[3],
klv.key[4], klv.key[5], klv.key[6], klv.key[7],
klv.key[8], klv.key[9], klv.key[10], klv.key[11],
klv.key[12], klv.key[13], klv.key[14], klv.key[15]);
debug("MXF: cap_essence_key: %02X%02X%02X%02X%02X%02X%02X%02X.%02X%02X%02X%02X%02X%02X%02X%02X\n",
ctx->cap_essence_key[0], ctx->cap_essence_key[1], ctx->cap_essence_key[2], ctx->cap_essence_key[3],
ctx->cap_essence_key[4], ctx->cap_essence_key[5], ctx->cap_essence_key[6], ctx->cap_essence_key[7],
ctx->cap_essence_key[8], ctx->cap_essence_key[9], ctx->cap_essence_key[10], ctx->cap_essence_key[11],
ctx->cap_essence_key[12], ctx->cap_essence_key[13], ctx->cap_essence_key[14], ctx->cap_essence_key[15]);
first_essence_logged = 1;
}
if (IS_KLV_KEY(klv.key, ctx->cap_essence_key))
{
debug("MXF: Found ANC essence element, size=%" PRIu64 "\n", klv.length);
mxf_read_essence_element(demux, klv.length, data);
if (data->len > 0)
break;
@@ -590,8 +628,15 @@ int ccx_mxf_getmoredata(struct lib_ccx_ctx *ctx, struct demuxer_data **ppdata)
data->program_number = 1;
data->stream_pid = 1;
data->codec = CCX_CODEC_ATSC_CC;
data->tb.num = 1001;
data->tb.den = 30000;
// PTS is already calculated in 90kHz units by mxf_read_essence_element
data->tb.num = 1;
data->tb.den = 90000;
// Enable CEA-708 (DTVCC) decoder for MXF files with VANC captions
if (ctx->dec_global_setting && ctx->dec_global_setting->settings_dtvcc)
{
ctx->dec_global_setting->settings_dtvcc->enabled = 1;
}
}
else
{
@@ -600,6 +645,11 @@ int ccx_mxf_getmoredata(struct lib_ccx_ctx *ctx, struct demuxer_data **ppdata)
ret = read_packet(ctx->demux_ctx, data);
// Ensure timebase is 90kHz since PTS is calculated in 90kHz units
// CDP parsing may have set a frame-based timebase which would cause incorrect conversion
data->tb.num = 1;
data->tb.den = 90000;
return ret;
}

View File

@@ -3,6 +3,31 @@
#include "ccx_demuxer.h"
typedef uint8_t UID[16];
enum MXFCaptionType
{
MXF_CT_VBI,
MXF_CT_ANC,
};
typedef struct
{
int track_id;
uint8_t track_number[4];
} MXFTrack;
typedef struct MXFContext
{
enum MXFCaptionType type;
int cap_track_id;
UID cap_essence_key;
MXFTrack tracks[32];
int nb_tracks;
int cap_count;
struct ccx_rational edit_rate;
} MXFContext;
int ccx_probe_mxf(struct ccx_demuxer *ctx);
struct MXFContext *ccx_mxf_init(struct ccx_demuxer *demux);
#endif

View File

@@ -25,7 +25,7 @@ void dtvcc_process_data(struct dtvcc_ctx *dtvcc,
ccx_common_logging.debug_ftn(CCX_DMT_708, "[CEA-708] dtvcc_process_data: DTVCC Channel Packet Data\n");
if (cc_valid && dtvcc->is_current_packet_header_parsed)
{
if (dtvcc->current_packet_length > 253)
if (dtvcc->current_packet_length + 2 > CCX_DTVCC_MAX_PACKET_LENGTH)
{
ccx_common_logging.debug_ftn(CCX_DMT_708, "[CEA-708] dtvcc_process_data: "
"Warning: Legal packet size exceeded (1), data not added.\n");
@@ -51,7 +51,7 @@ void dtvcc_process_data(struct dtvcc_ctx *dtvcc,
ccx_common_logging.debug_ftn(CCX_DMT_708, "[CEA-708] dtvcc_process_data: DTVCC Channel Packet Start\n");
if (cc_valid)
{
if (dtvcc->current_packet_length > CCX_DTVCC_MAX_PACKET_LENGTH - 1)
if (dtvcc->current_packet_length + 2 > CCX_DTVCC_MAX_PACKET_LENGTH)
{
ccx_common_logging.debug_ftn(CCX_DMT_708, "[CEA-708] dtvcc_process_data: "
"Warning: Legal packet size exceeded (2), data not added.\n");
@@ -115,10 +115,10 @@ dtvcc_ctx *dtvcc_init(struct ccx_decoder_dtvcc_settings *opts)
dtvcc_service_decoder *decoder = &ctx->decoders[i];
decoder->cc_count = 0;
decoder->tv = (dtvcc_tv_screen *)malloc(sizeof(dtvcc_tv_screen));
decoder->tv->service_number = i + 1;
decoder->tv->cc_count = 0;
if (!decoder->tv)
ccx_common_logging.fatal_ftn(EXIT_NOT_ENOUGH_MEMORY, "dtvcc_init");
decoder->tv->service_number = i + 1;
decoder->tv->cc_count = 0;
for (int j = 0; j < CCX_DTVCC_MAX_WINDOWS; j++)
decoder->windows[j].memory_reserved = 0;

View File

@@ -10,4 +10,14 @@ void dtvcc_process_data(struct dtvcc_ctx *dtvcc,
dtvcc_ctx *dtvcc_init(ccx_decoder_dtvcc_settings *opts);
void dtvcc_free(dtvcc_ctx **);
#endif //CCEXTRACTOR_CCX_DTVCC_H
#ifndef DISABLE_RUST
// Rust FFI functions for persistent CEA-708 decoder
extern void *ccxr_dtvcc_init(struct ccx_decoder_dtvcc_settings *settings_dtvcc);
extern void ccxr_dtvcc_free(void *dtvcc_rust);
extern void ccxr_dtvcc_process_data(void *dtvcc_rust, const unsigned char cc_valid,
const unsigned char cc_type, const unsigned char data1, const unsigned char data2);
extern int ccxr_dtvcc_is_active(void *dtvcc_rust);
extern void ccxr_dtvcc_set_active(void *dtvcc_rust, int active);
#endif
#endif // CCEXTRACTOR_CCX_DTVCC_H

View File

@@ -30,7 +30,7 @@ ccx_encoders_transcript_format ccx_encoders_default_transcript_settings =
.useColors = 1,
.isFinal = 0};
// TODO sami header doesn't carry about CRLF/LF option
// TODO sami header doesn't care about CRLF/LF option
static const char *sami_header = // TODO: Revise the <!-- comments
"<SAMI>\n\
<HEAD>\n\
@@ -131,7 +131,7 @@ int write_subtitle_file_footer(struct encoder_ctx *ctx, struct ccx_s_write *out)
switch (ctx->write_format)
{
case CCX_OF_SAMI:
sprintf((char *)str, "</BODY></SAMI>\n");
snprintf((char *)str, sizeof(str), "</BODY></SAMI>\n");
if (ctx->encoding != CCX_ENC_UNICODE)
{
dbg_print(CCX_DMT_DECODER_608, "\r%s\n", str);
@@ -144,7 +144,7 @@ int write_subtitle_file_footer(struct encoder_ctx *ctx, struct ccx_s_write *out)
}
break;
case CCX_OF_SMPTETT:
sprintf((char *)str, " </div>\n </body>\n</tt>\n");
snprintf((char *)str, sizeof(str), " </div>\n </body>\n</tt>\n");
if (ctx->encoding != CCX_ENC_UNICODE)
{
dbg_print(CCX_DMT_DECODER_608, "\r%s\n", str);
@@ -160,7 +160,7 @@ int write_subtitle_file_footer(struct encoder_ctx *ctx, struct ccx_s_write *out)
write_spumux_footer(out);
break;
case CCX_OF_SIMPLE_XML:
sprintf((char *)str, "</captions>\n");
snprintf((char *)str, sizeof(str), "</captions>\n");
if (ctx->encoding != CCX_ENC_UNICODE)
{
dbg_print(CCX_DMT_DECODER_608, "\r%s\n", str);
@@ -176,6 +176,14 @@ int write_subtitle_file_footer(struct encoder_ctx *ctx, struct ccx_s_write *out)
case CCX_OF_CCD:
ret = write(out->fh, ctx->encoded_crlf, ctx->encoded_crlf_length);
break;
case CCX_OF_WEBVTT:
// Ensure WebVTT header is written even if no subtitles were found (issue #1743)
// This is required for HLS compatibility
if (!ctx->wrote_webvtt_header)
{
write_webvtt_header(ctx);
}
break;
default: // Nothing to do, no footer on this format
break;
}
@@ -193,7 +201,7 @@ static int write_bom(struct encoder_ctx *ctx, struct ccx_s_write *out)
ret = write(out->fh, UTF8_BOM, sizeof(UTF8_BOM));
if (ret < sizeof(UTF8_BOM))
{
mprint("WARNING: Unable tp write UTF BOM\n");
mprint("WARNING: Unable to write UTF BOM\n");
return -1;
}
}
@@ -667,8 +675,13 @@ static int init_output_ctx(struct encoder_ctx *ctx, struct encoder_cfg *cfg)
if (cfg->cc_to_stdout)
{
#ifdef WIN32
ctx->dtvcc_writers[i].fd = -1;
ctx->dtvcc_writers[i].fhandle = GetStdHandle(STD_OUTPUT_HANDLE);
#else
ctx->dtvcc_writers[i].fd = STDOUT_FILENO;
ctx->dtvcc_writers[i].fhandle = NULL;
#endif
ctx->dtvcc_writers[i].charset = NULL;
ctx->dtvcc_writers[i].filename = NULL;
ctx->dtvcc_writers[i].cd = (iconv_t)-1;
@@ -714,6 +727,9 @@ void dinit_encoder(struct encoder_ctx **arg, LLONG current_fts)
write_subtitle_file_footer(ctx, ctx->out + i);
}
// Clean up teletext multi-page output files (issue #665)
dinit_teletext_outputs(ctx);
free_encoder_context(ctx->prev);
dinit_output_ctx(ctx);
freep(&ctx->subline);
@@ -767,6 +783,7 @@ struct encoder_ctx *init_encoder(struct encoder_cfg *opt)
return NULL;
}
ctx->in_fileformat = opt->in_format;
ctx->is_pal = (opt->in_format == 2);
/** used in case of SUB_EOD_MARKER */
ctx->prev_start = -1;
@@ -832,6 +849,19 @@ struct encoder_ctx *init_encoder(struct encoder_cfg *opt)
ctx->segment_pending = 0;
ctx->segment_last_key_frame = 0;
ctx->nospupngocr = opt->nospupngocr;
ctx->scc_framerate = opt->scc_framerate;
ctx->scc_accurate_timing = opt->scc_accurate_timing;
ctx->scc_last_transmission_end = 0;
ctx->scc_last_display_end = 0;
// Initialize teletext multi-page output arrays (issue #665)
ctx->tlt_out_count = 0;
for (int i = 0; i < MAX_TLT_PAGES_EXTRACT; i++)
{
ctx->tlt_out[i] = NULL;
ctx->tlt_out_pages[i] = 0;
ctx->tlt_srt_counter[i] = 0;
}
ctx->prev = NULL;
return ctx;
@@ -867,8 +897,30 @@ int encode_sub(struct encoder_ctx *context, struct cc_subtitle *sub)
int wrote_something = 0;
int ret = 0;
/* If there is no encoder context (e.g. -out=report), we must still free
any allocated subtitle data to avoid memory leaks. */
if (!context)
{
if (sub)
{
/* DVB subtitles store bitmap planes inside cc_bitmap */
if (sub->datatype == CC_DATATYPE_DVB)
{
struct cc_bitmap *bitmap = (struct cc_bitmap *)sub->data;
if (bitmap)
{
freep(&bitmap->data0);
freep(&bitmap->data1);
}
}
/* Free generic subtitle payload buffer */
freep(&sub->data);
sub->nb_data = 0;
}
return CCX_OK;
}
context = change_filename(context);
@@ -902,6 +954,11 @@ int encode_sub(struct encoder_ctx *context, struct cc_subtitle *sub)
// After adding delay, if start/end time is lower than 0, then continue with the next subtitle
if (data->start_time < 0 || data->end_time <= 0)
{
// Free XDS string if skipping to avoid memory leak
if (data->format == SFORMAT_XDS && data->xds_str)
{
freep(&data->xds_str);
}
continue;
}
@@ -1001,6 +1058,28 @@ int encode_sub(struct encoder_ctx *context, struct cc_subtitle *sub)
freep(&sub->data);
break;
case CC_BITMAP:;
// Apply subs_delay to bitmap subtitles (DVB, DVD, etc.)
// This is the same as what's done for CC_608 above
sub->start_time += context->subs_delay;
sub->end_time += context->subs_delay;
// After adding delay, if start/end time is lower than 0, skip this subtitle
if (sub->start_time < 0 || sub->end_time <= 0)
{
// Free bitmap data to avoid memory leak
if (sub->datatype == CC_DATATYPE_DVB)
{
struct cc_bitmap *bitmap_tmp = (struct cc_bitmap *)sub->data;
if (bitmap_tmp)
{
freep(&bitmap_tmp->data0);
freep(&bitmap_tmp->data1);
}
}
freep(&sub->data);
sub->nb_data = 0;
break;
}
#ifdef ENABLE_OCR
struct cc_bitmap *rect;
@@ -1181,7 +1260,7 @@ unsigned int get_line_encoded(struct encoder_ctx *ctx, unsigned char *buffer, in
{
unsigned char *orig = buffer; // Keep for debugging
unsigned char *line = data->characters[line_num];
for (int i = 0; i < 33; i++)
for (int i = 0; i < 32; i++)
{
int bytes = 0;
switch (ctx->encoding)
@@ -1215,7 +1294,6 @@ unsigned int get_color_encoded(struct encoder_ctx *ctx, unsigned char *buffer, i
else
*buffer++ = 'E';
}
*buffer = 0;
return (unsigned)(buffer - orig); // Return length
}
unsigned int get_font_encoded(struct encoder_ctx *ctx, unsigned char *buffer, int line_num, struct eia608_screen *data)
@@ -1252,7 +1330,7 @@ void switch_output_file(struct lib_ccx_ctx *ctx, struct encoder_ctx *enc_ctx, in
}
const char *ext = get_file_extension(ctx->write_format);
char suffix[32];
sprintf(suffix, "_%d", track_id);
snprintf(suffix, sizeof(suffix), "_%d", track_id);
char *basename = get_basename(enc_ctx->out->original_filename);
if (basename != NULL)
{
@@ -1267,3 +1345,168 @@ void switch_output_file(struct lib_ccx_ctx *ctx, struct encoder_ctx *enc_ctx, in
enc_ctx->cea_708_counter = 0;
enc_ctx->srt_counter = 0;
}
/**
* Get or create the output file for a specific teletext page (issue #665)
* Creates output files on-demand with suffix _pNNN (e.g., output_p891.srt)
* Returns NULL if we're in stdout mode or if too many pages are being extracted
*/
struct ccx_s_write *get_teletext_output(struct encoder_ctx *ctx, uint16_t teletext_page)
{
// If teletext_page is 0, use the default output
if (teletext_page == 0 || ctx->out == NULL)
return ctx->out;
// Check if we're sending to stdout - can't do multi-page in that case
if (ctx->out[0].fh == STDOUT_FILENO)
return ctx->out;
// Check if we already have an output file for this page
for (int i = 0; i < ctx->tlt_out_count; i++)
{
if (ctx->tlt_out_pages[i] == teletext_page)
return ctx->tlt_out[i];
}
// If we only have one teletext page requested, use the default output
// (no suffix needed for backward compatibility)
extern struct ccx_s_teletext_config tlt_config;
if (tlt_config.num_user_pages <= 1 && !tlt_config.extract_all_pages)
return ctx->out;
// Need to create a new output file for this page
if (ctx->tlt_out_count >= MAX_TLT_PAGES_EXTRACT)
{
mprint("Warning: Too many teletext pages to extract (max %d), using default output for page %03d\n",
MAX_TLT_PAGES_EXTRACT, teletext_page);
return ctx->out;
}
// Allocate the new write structure
struct ccx_s_write *new_out = (struct ccx_s_write *)malloc(sizeof(struct ccx_s_write));
if (!new_out)
{
mprint("Error: Memory allocation failed for teletext output\n");
return ctx->out;
}
memset(new_out, 0, sizeof(struct ccx_s_write));
// Create the filename with page suffix
const char *ext = get_file_extension(ctx->write_format);
char suffix[16];
snprintf(suffix, sizeof(suffix), "_p%03d", teletext_page);
char *basefilename = NULL;
if (ctx->out[0].filename != NULL)
{
basefilename = get_basename(ctx->out[0].filename);
}
else if (ctx->first_input_file != NULL)
{
basefilename = get_basename(ctx->first_input_file);
}
else
{
basefilename = strdup("untitled");
}
if (basefilename == NULL)
{
free(new_out);
return ctx->out;
}
char *filename = create_outfilename(basefilename, suffix, ext);
free(basefilename);
if (filename == NULL)
{
free(new_out);
return ctx->out;
}
// Open the file
new_out->filename = filename;
new_out->fh = open(filename, O_RDWR | O_CREAT | O_TRUNC | O_BINARY, S_IREAD | S_IWRITE);
if (new_out->fh == -1)
{
mprint("Error: Failed to open output file %s: %s\n", filename, strerror(errno));
free(filename);
free(new_out);
return ctx->out;
}
mprint("Creating teletext output file: %s\n", filename);
// Store in our array
int idx = ctx->tlt_out_count;
ctx->tlt_out[idx] = new_out;
ctx->tlt_out_pages[idx] = teletext_page;
ctx->tlt_srt_counter[idx] = 0;
ctx->tlt_out_count++;
// Write the subtitle file header
write_subtitle_file_header(ctx, new_out);
return new_out;
}
/**
* Get the SRT counter for a specific teletext page (issue #665)
* Returns pointer to the counter, or NULL if page not found
*/
unsigned int *get_teletext_srt_counter(struct encoder_ctx *ctx, uint16_t teletext_page)
{
// If teletext_page is 0, use the default counter
if (teletext_page == 0)
return &ctx->srt_counter;
// Check if we're using multi-page mode
extern struct ccx_s_teletext_config tlt_config;
if (tlt_config.num_user_pages <= 1 && !tlt_config.extract_all_pages)
return &ctx->srt_counter;
// Find the counter for this page
for (int i = 0; i < ctx->tlt_out_count; i++)
{
if (ctx->tlt_out_pages[i] == teletext_page)
return &ctx->tlt_srt_counter[i];
}
// Not found, use default counter
return &ctx->srt_counter;
}
/**
* Clean up all teletext output files (issue #665)
*/
void dinit_teletext_outputs(struct encoder_ctx *ctx)
{
if (!ctx)
return;
for (int i = 0; i < ctx->tlt_out_count; i++)
{
if (ctx->tlt_out[i] != NULL)
{
// Write footer
write_subtitle_file_footer(ctx, ctx->tlt_out[i]);
// Close file
if (ctx->tlt_out[i]->fh != -1)
{
close(ctx->tlt_out[i]->fh);
}
// Free filename
if (ctx->tlt_out[i]->filename != NULL)
{
free(ctx->tlt_out[i]->filename);
}
free(ctx->tlt_out[i]);
ctx->tlt_out[i] = NULL;
}
}
ctx->tlt_out_count = 0;
}

View File

@@ -2,13 +2,13 @@
#define _CC_ENCODER_COMMON_H
#ifdef WIN32
#if defined(__MINGW64__) || defined(__MINGW32__)
#include <iconv.h>
#else
#include "..\\thirdparty\\win_iconv\\iconv.h"
#endif
#if defined(__MINGW64__) || defined(__MINGW32__)
#include <iconv.h>
#else
#include "iconv.h"
#include "..\\thirdparty\\win_iconv\\iconv.h"
#endif
#else
#include "iconv.h"
#endif
#include "ccx_common_structs.h"
@@ -16,14 +16,25 @@
#include "ccx_encoders_structs.h"
#include "ccx_common_option.h"
#define REQUEST_BUFFER_CAPACITY(ctx,length) if (length>ctx->capacity) \
{ctx->capacity = length * 2; ctx->buffer = (unsigned char*)realloc(ctx->buffer, ctx->capacity); \
if (ctx->buffer == NULL) { fatal(EXIT_NOT_ENOUGH_MEMORY, "Not enough memory for reallocating buffer, bailing out\n"); } \
}
// Maximum number of teletext pages to extract simultaneously (issue #665)
#ifndef MAX_TLT_PAGES_EXTRACT
#define MAX_TLT_PAGES_EXTRACT 8
#endif
#define REQUEST_BUFFER_CAPACITY(ctx, length) \
if (length > ctx->capacity) \
{ \
ctx->capacity = length * 2; \
ctx->buffer = (unsigned char *)realloc(ctx->buffer, ctx->capacity); \
if (ctx->buffer == NULL) \
{ \
fatal(EXIT_NOT_ENOUGH_MEMORY, "Not enough memory for reallocating buffer, bailing out\n"); \
} \
}
// CC page dimensions
#define ROWS 15
#define COLUMNS 32
#define ROWS 15
#define COLUMNS 32
typedef struct dtvcc_writer_ctx
{
@@ -46,11 +57,11 @@ typedef struct ccx_sbs_utf8_character
struct ccx_mcc_caption_time
{
unsigned int hour;
unsigned int minute;
unsigned int second;
unsigned int millisecond;
unsigned int frame;
unsigned int hour;
unsigned int minute;
unsigned int second;
unsigned int millisecond;
unsigned int frame;
};
/**
@@ -84,7 +95,7 @@ struct encoder_ctx
/* number of member in array of write out array */
int nb_out;
/* Input file format used in Teletext for exceptional output */
unsigned int in_fileformat; //1 = Normal, 2 = Teletext
unsigned int in_fileformat; // 1 = Normal, 2 = Teletext
/* Keep output file closed when not actually writing to it and start over each time (add headers, etc) */
unsigned int keep_output_closed;
/* Force a flush on the file buffer whenever content is written */
@@ -96,22 +107,22 @@ struct encoder_ctx
/* Flag saying BOM to be written in each output file */
enum ccx_encoding_type encoding;
enum ccx_output_format write_format; // 0=Raw, 1=srt, 2=SMI
enum ccx_output_format write_format; // 0=Raw, 1=srt, 2=SMI
int generates_file;
struct ccx_encoders_transcript_format *transcript_settings; // Keeps the settings for generating transcript output files.
int no_bom;
int sentence_cap; // FIX CASE? = Fix case?
int sentence_cap; // FIX CASE? = Fix case?
int filter_profanity;
int trim_subs; // " Remove spaces at sides? "
int autodash; // Add dashes (-) before each speaker automatically?
int trim_subs; // " Remove spaces at sides? "
int autodash; // Add dashes (-) before each speaker automatically?
int no_font_color;
int no_type_setting;
int gui_mode_reports; // If 1, output in stderr progress updates so the GUI can grab them
unsigned char *subline; // Temp storage for storing each line
int gui_mode_reports; // If 1, output in stderr progress updates so the GUI can grab them
unsigned char *subline; // Temp storage for storing each line
int extract;
int dtvcc_extract; // 1 or 0 depending if we have to handle dtvcc
int dtvcc_extract; // 1 or 0 depending if we have to handle dtvcc
dtvcc_writer_ctx dtvcc_writers[CCX_DTVCC_MAX_SERVICES];
/* Timing related variables*/
@@ -126,7 +137,7 @@ struct encoder_ctx
int startcredits_displayed;
char *start_credits_text;
char *end_credits_text;
struct ccx_boundary_time startcreditsnotbefore, startcreditsnotafter; // Where to insert start credits, if possible
struct ccx_boundary_time startcreditsnotbefore, startcreditsnotafter; // Where to insert start credits, if possible
struct ccx_boundary_time startcreditsforatleast, startcreditsforatmost; // How long to display them?
struct ccx_boundary_time endcreditsforatleast, endcreditsforatmost;
@@ -138,9 +149,17 @@ struct encoder_ctx
// MCC File
int header_printed_flag;
struct ccx_mcc_caption_time next_caption_time;
unsigned int cdp_hdr_seq;
int force_dropframe;
struct ccx_mcc_caption_time next_caption_time;
unsigned int cdp_hdr_seq;
int force_dropframe;
// SCC output framerate
int scc_framerate; // SCC output framerate: 0=29.97 (default), 1=24, 2=25, 3=30
// SCC accurate timing (issue #1120)
int scc_accurate_timing; // If 1, use bandwidth-aware timing for broadcast compliance
LLONG scc_last_transmission_end; // When last caption transmission ends (ms)
LLONG scc_last_display_end; // When last caption display ends (ms)
int new_sentence; // Capitalize next letter?
@@ -150,12 +169,12 @@ struct encoder_ctx
/* split-by-sentence stuff */
int sbs_enabled;
//for dvb subs
struct encoder_ctx* prev;
// for dvb subs
struct encoder_ctx *prev;
int write_previous;
//for dvb in .mkv
int is_mkv; //are we working with .mkv file
char* last_string; //last recognized DVB sub
// for dvb in .mkv
int is_mkv; // are we working with .mkv file
char *last_string; // last recognized DVB sub
// Segmenting
int segment_pending;
@@ -163,9 +182,15 @@ struct encoder_ctx
// OCR in SPUPNG
int nospupngocr;
int is_pal;
struct ccx_s_write *tlt_out[MAX_TLT_PAGES_EXTRACT]; // Output files per teletext page
uint16_t tlt_out_pages[MAX_TLT_PAGES_EXTRACT]; // Page numbers for each output slot
unsigned int tlt_srt_counter[MAX_TLT_PAGES_EXTRACT]; // SRT counter per page
int tlt_out_count; // Number of teletext output files
};
#define INITIAL_ENC_BUFFER_CAPACITY 2048
#define INITIAL_ENC_BUFFER_CAPACITY 2048
/**
* Inialize encoder context with output context
* allocate initial memory to buffer of context
@@ -186,7 +211,7 @@ struct encoder_ctx *init_encoder(struct encoder_cfg *opt);
* after deallocating user need to allocate encoder ctx again
*
* @oaram arg pointer to initialized encoder ctx using init_encoder
*
*
* @param current_fts to calculate window for end credits
*/
void dinit_encoder(struct encoder_ctx **arg, LLONG current_fts);
@@ -195,52 +220,53 @@ void dinit_encoder(struct encoder_ctx **arg, LLONG current_fts);
* @param ctx encoder context
* @param sub subtitle context returned by decoder
*/
int encode_sub(struct encoder_ctx *ctx,struct cc_subtitle *sub);
int encode_sub(struct encoder_ctx *ctx, struct cc_subtitle *sub);
int write_cc_buffer_as_ccd (const struct eia608_screen *data, struct encoder_ctx *context);
int write_cc_buffer_as_scc (const struct eia608_screen *data, struct encoder_ctx *context);
int write_cc_buffer_as_srt (struct eia608_screen *data, struct encoder_ctx *context);
int write_cc_buffer_as_ssa (struct eia608_screen *data, struct encoder_ctx *context);
int write_cc_buffer_as_webvtt (struct eia608_screen *data, struct encoder_ctx *context);
int write_cc_buffer_as_sami (struct eia608_screen *data, struct encoder_ctx *context);
int write_cc_buffer_as_smptett (struct eia608_screen *data, struct encoder_ctx *context);
int write_cc_buffer_as_spupng (struct eia608_screen *data, struct encoder_ctx *context);
void write_cc_buffer_to_gui (struct eia608_screen *data, struct encoder_ctx *context);
int write_cc_buffer_as_ccd(const struct eia608_screen *data, struct encoder_ctx *context);
int write_cc_buffer_as_scc(const struct eia608_screen *data, struct encoder_ctx *context);
int write_cc_buffer_as_srt(struct eia608_screen *data, struct encoder_ctx *context);
int write_cc_buffer_as_ssa(struct eia608_screen *data, struct encoder_ctx *context);
int write_cc_buffer_as_webvtt(struct eia608_screen *data, struct encoder_ctx *context);
int write_cc_buffer_as_sami(struct eia608_screen *data, struct encoder_ctx *context);
int write_cc_buffer_as_smptett(struct eia608_screen *data, struct encoder_ctx *context);
int write_cc_buffer_as_spupng(struct eia608_screen *data, struct encoder_ctx *context);
void write_cc_buffer_to_gui(struct eia608_screen *data, struct encoder_ctx *context);
int write_cc_buffer_as_g608 (struct eia608_screen *data, struct encoder_ctx *context);
int write_cc_buffer_as_transcript2 (struct eia608_screen *data, struct encoder_ctx *context);
int write_cc_buffer_as_g608(struct eia608_screen *data, struct encoder_ctx *context);
int write_cc_buffer_as_transcript2(struct eia608_screen *data, struct encoder_ctx *context);
void write_cc_line_as_transcript2 (struct eia608_screen *data, struct encoder_ctx *context, int line_number);
void write_cc_line_as_transcript2(struct eia608_screen *data, struct encoder_ctx *context, int line_number);
int write_cc_subtitle_as_srt (struct cc_subtitle *sub, struct encoder_ctx *context);
int write_cc_subtitle_as_ssa (struct cc_subtitle *sub, struct encoder_ctx *context);
int write_cc_subtitle_as_webvtt (struct cc_subtitle *sub, struct encoder_ctx *context);
int write_cc_subtitle_as_sami (struct cc_subtitle *sub, struct encoder_ctx *context);
int write_cc_subtitle_as_smptett (struct cc_subtitle *sub, struct encoder_ctx *context);
int write_cc_subtitle_as_spupng (struct cc_subtitle *sub, struct encoder_ctx *context);
int write_cc_subtitle_as_transcript (struct cc_subtitle *sub, struct encoder_ctx *context);
int write_cc_subtitle_as_srt(struct cc_subtitle *sub, struct encoder_ctx *context);
int write_cc_subtitle_as_ssa(struct cc_subtitle *sub, struct encoder_ctx *context);
int write_cc_subtitle_as_webvtt(struct cc_subtitle *sub, struct encoder_ctx *context);
int write_cc_subtitle_as_sami(struct cc_subtitle *sub, struct encoder_ctx *context);
int write_cc_subtitle_as_smptett(struct cc_subtitle *sub, struct encoder_ctx *context);
int write_cc_subtitle_as_spupng(struct cc_subtitle *sub, struct encoder_ctx *context);
int write_cc_subtitle_as_transcript(struct cc_subtitle *sub, struct encoder_ctx *context);
int write_stringz_as_srt(char *string, struct encoder_ctx *context, LLONG ms_start, LLONG ms_end);
int write_stringz_as_ssa(char *string, struct encoder_ctx *context, LLONG ms_start, LLONG ms_end);
int write_stringz_as_webvtt(char *string, struct encoder_ctx *context, LLONG ms_start, LLONG ms_end);
int write_stringz_as_sami(char *string, struct encoder_ctx *context, LLONG ms_start, LLONG ms_end);
void write_stringz_as_smptett(char *string, struct encoder_ctx *context, LLONG ms_start, LLONG ms_end);
int write_stringz_as_srt (char *string, struct encoder_ctx *context, LLONG ms_start, LLONG ms_end);
int write_stringz_as_ssa (char *string, struct encoder_ctx *context, LLONG ms_start, LLONG ms_end);
int write_stringz_as_webvtt (char *string, struct encoder_ctx *context, LLONG ms_start, LLONG ms_end);
int write_stringz_as_sami (char *string, struct encoder_ctx *context, LLONG ms_start, LLONG ms_end);
void write_stringz_as_smptett (char *string, struct encoder_ctx *context, LLONG ms_start, LLONG ms_end);
int write_cc_bitmap_as_srt (struct cc_subtitle *sub, struct encoder_ctx *context);
int write_cc_bitmap_as_ssa (struct cc_subtitle *sub, struct encoder_ctx *context);
int write_cc_bitmap_as_webvtt (struct cc_subtitle *sub, struct encoder_ctx *context);
int write_cc_bitmap_as_sami (struct cc_subtitle *sub, struct encoder_ctx *context);
int write_cc_bitmap_as_smptett (struct cc_subtitle *sub, struct encoder_ctx *context);
int write_cc_bitmap_as_spupng (struct cc_subtitle *sub, struct encoder_ctx *context);
int write_cc_bitmap_as_transcript (struct cc_subtitle *sub, struct encoder_ctx *context);
int write_cc_bitmap_as_libcurl (struct cc_subtitle *sub, struct encoder_ctx *context);
int write_cc_bitmap_as_srt(struct cc_subtitle *sub, struct encoder_ctx *context);
int write_cc_bitmap_as_ssa(struct cc_subtitle *sub, struct encoder_ctx *context);
int write_cc_bitmap_as_webvtt(struct cc_subtitle *sub, struct encoder_ctx *context);
int write_cc_bitmap_as_sami(struct cc_subtitle *sub, struct encoder_ctx *context);
int write_cc_bitmap_as_smptett(struct cc_subtitle *sub, struct encoder_ctx *context);
int write_cc_bitmap_as_spupng(struct cc_subtitle *sub, struct encoder_ctx *context);
int write_cc_bitmap_as_transcript(struct cc_subtitle *sub, struct encoder_ctx *context);
int write_cc_bitmap_as_libcurl(struct cc_subtitle *sub, struct encoder_ctx *context);
void write_spumux_header(struct encoder_ctx *ctx, struct ccx_s_write *out);
void write_spumux_footer(struct ccx_s_write *out);
struct cc_subtitle * reformat_cc_bitmap_through_sentence_buffer (struct cc_subtitle *sub, struct encoder_ctx *context);
// WebVTT header writer (issue #1743 - ensures header is written even for empty files)
void write_webvtt_header(struct encoder_ctx *context);
struct cc_subtitle *reformat_cc_bitmap_through_sentence_buffer(struct cc_subtitle *sub, struct encoder_ctx *context);
void set_encoder_last_displayed_subs_ms(struct encoder_ctx *ctx, LLONG last_displayed_subs_ms);
void set_encoder_subs_delay(struct encoder_ctx *ctx, LLONG subs_delay);
@@ -251,8 +277,7 @@ int reset_output_ctx(struct encoder_ctx *ctx, struct encoder_cfg *cfg);
void find_limit_characters(const unsigned char *line, int *first_non_blank, int *last_non_blank, int max_len);
int get_str_basic(unsigned char *out_buffer, unsigned char *in_buffer, int trim_subs,
enum ccx_encoding_type in_enc, enum ccx_encoding_type out_enc, int max_len);
enum ccx_encoding_type in_enc, enum ccx_encoding_type out_enc, int max_len);
unsigned int get_line_encoded(struct encoder_ctx *ctx, unsigned char *buffer, int line_num, struct eia608_screen *data);
unsigned int get_color_encoded(struct encoder_ctx *ctx, unsigned char *buffer, int line_num, struct eia608_screen *data);
@@ -260,4 +285,9 @@ unsigned int get_font_encoded(struct encoder_ctx *ctx, unsigned char *buffer, in
struct lib_ccx_ctx;
void switch_output_file(struct lib_ccx_ctx *ctx, struct encoder_ctx *enc_ctx, int track_id);
// Teletext multi-page output (issue #665)
struct ccx_s_write *get_teletext_output(struct encoder_ctx *ctx, uint16_t teletext_page);
unsigned int *get_teletext_srt_counter(struct encoder_ctx *ctx, uint16_t teletext_page);
void dinit_teletext_outputs(struct encoder_ctx *ctx);
#endif

View File

@@ -56,7 +56,7 @@ int write_cc_bitmap_as_libcurl(struct cc_subtitle *sub, struct encoder_ctx *cont
millis_to_time(ms_start, &h1, &m1, &s1, &ms1);
millis_to_time(ms_end - 1, &h2, &m2, &s2, &ms2); // -1 To prevent overlapping with next line.
context->srt_counter++;
sprintf(timeline, "group_id=ccextractordev&start_time=%" PRIu64 "&end_time=%" PRIu64 "&lang=en", ms_start, ms_end);
snprintf(timeline, sizeof(timeline), "group_id=ccextractordev&start_time=%" PRIu64 "&end_time=%" PRIu64 "&lang=en", ms_start, ms_end);
char *curlline = NULL;
curlline = str_reallocncat(curlline, timeline);
curlline = str_reallocncat(curlline, "&payload=");
@@ -65,9 +65,13 @@ int write_cc_bitmap_as_libcurl(struct cc_subtitle *sub, struct encoder_ctx *cont
curl_free(urlencoded);
mprint("%s", curlline);
char *result = malloc(strlen(ccx_options.curlposturl) + strlen("/frame/") + 1);
strcpy(result, ccx_options.curlposturl);
strcat(result, "/frame/");
size_t result_size = strlen(ccx_options.curlposturl) + strlen("/frame/") + 1;
char *result = malloc(result_size);
if (!result)
{
fatal(EXIT_NOT_ENOUGH_MEMORY, "In write_cc_bitmap_as_curl: Out of memory allocating result.");
}
snprintf(result, result_size, "%s/frame/", ccx_options.curlposturl);
curl_easy_setopt(curl, CURLOPT_URL, result);
curl_easy_setopt(curl, CURLOPT_POSTFIELDS, curlline);
free(result);

View File

@@ -14,11 +14,11 @@ int write_cc_buffer_as_g608(struct eia608_screen *data, struct encoder_ctx *cont
millis_to_time(data->end_time - 1, &h2, &m2, &s2, &ms2); // -1 To prevent overlapping with next line.
char timeline[128];
context->srt_counter++;
sprintf(timeline, "%u%s", context->srt_counter, context->encoded_crlf);
snprintf(timeline, sizeof(timeline), "%u%s", context->srt_counter, context->encoded_crlf);
used = encode_line(context, context->buffer, (unsigned char *)timeline);
write_wrapped(context->out->fh, context->buffer, used);
sprintf(timeline, "%02u:%02u:%02u,%03u --> %02u:%02u:%02u,%03u%s",
h1, m1, s1, ms1, h2, m2, s2, ms2, context->encoded_crlf);
snprintf(timeline, sizeof(timeline), "%02u:%02u:%02u,%03u --> %02u:%02u:%02u,%03u%s",
h1, m1, s1, ms1, h2, m2, s2, ms2, context->encoded_crlf);
used = encode_line(context, context->buffer, (unsigned char *)timeline);
write_wrapped(context->out->fh, context->buffer, used);

View File

@@ -316,7 +316,7 @@ unsigned get_decoder_line_encoded(struct encoder_ctx *ctx, unsigned char *buffer
}
if (color_text[its_color][1][0]) // That means a <font> was added to the buffer
{
strcat(tagstack, "F");
strncat(tagstack, "F", sizeof(tagstack) - strlen(tagstack) - 1);
changed_font++;
}
color = its_color;
@@ -326,7 +326,7 @@ unsigned get_decoder_line_encoded(struct encoder_ctx *ctx, unsigned char *buffer
if (is_underlined && underlined == 0 && !ctx->no_type_setting) // Open underline
{
buffer += encode_line(ctx, buffer, (unsigned char *)"<u>");
strcat(tagstack, "U");
strncat(tagstack, "U", sizeof(tagstack) - strlen(tagstack) - 1);
underlined++;
}
if (is_underlined == 0 && underlined && !ctx->no_type_setting) // Close underline
@@ -338,7 +338,7 @@ unsigned get_decoder_line_encoded(struct encoder_ctx *ctx, unsigned char *buffer
if (has_ita && italics == 0 && !ctx->no_type_setting) // Open italics
{
buffer += encode_line(ctx, buffer, (unsigned char *)"<i>");
strcat(tagstack, "I");
strncat(tagstack, "I", sizeof(tagstack) - strlen(tagstack) - 1);
italics++;
}
if (has_ita == 0 && italics && !ctx->no_type_setting) // Close italics
@@ -410,10 +410,13 @@ int add_word(struct word_list *list, const char *word)
if (list->len == list->capacity)
{
list->capacity += 50;
if ((list->words = realloc(list->words, list->capacity * sizeof(char *))) == NULL)
char **tmp = realloc(list->words, list->capacity * sizeof(char *));
if (!tmp)
{
list->capacity -= 50; // Restore original capacity
return -1;
}
list->words = tmp;
}
size_t word_len = strlen(word);
@@ -422,7 +425,7 @@ int add_word(struct word_list *list, const char *word)
return -1;
}
strcpy(list->words[list->len++], word);
memcpy(list->words[list->len++], word, word_len + 1);
return word_len;
}
@@ -466,6 +469,11 @@ void shell_sort(void *base, int nb, size_t size, int (*compar)(const void *p1, c
{
unsigned char *lbase = (unsigned char *)base;
unsigned char *tmp = (unsigned char *)malloc(size);
if (!tmp)
{
// Cannot sort without temporary buffer, return silently
return;
}
for (int gap = nb / 2; gap > 0; gap = gap / 2)
{
int p, j;

View File

@@ -4,6 +4,7 @@
#include <stdarg.h>
#include "ccx_encoders_mcc.h"
#include "lib_ccx.h"
#include "utility.h"
#define MORE_DEBUG CCX_FALSE
@@ -142,10 +143,16 @@ boolean mcc_encode_cc_data(struct encoder_ctx *enc_ctx, struct lib_cc_decode *de
caption_time.second, caption_time.frame, num_chars_needed);
#endif
char *compressed_data_buffer = malloc(num_chars_needed + 13);
size_t compressed_data_size = num_chars_needed + 13;
char *compressed_data_buffer = malloc(compressed_data_size);
if (!compressed_data_buffer)
{
free(w_boilerplate_buffer);
fatal(EXIT_NOT_ENOUGH_MEMORY, "In mcc_encode_cc_data: Out of memory allocating compressed_data_buffer.");
}
sprintf(compressed_data_buffer, "%02d:%02d:%02d:%02d\t", caption_time.hour, caption_time.minute,
caption_time.second, caption_time.frame);
snprintf(compressed_data_buffer, compressed_data_size, "%02d:%02d:%02d:%02d\t", caption_time.hour, caption_time.minute,
caption_time.second, caption_time.frame);
compress_data(w_boilerplate_buffer, w_boilerplate_buff_size, (uint8 *)&compressed_data_buffer[12]);
free(w_boilerplate_buffer);
@@ -155,7 +162,12 @@ boolean mcc_encode_cc_data(struct encoder_ctx *enc_ctx, struct lib_cc_decode *de
caption_time.hour, caption_time.minute, caption_time.second, caption_time.frame);
#endif
strcat(compressed_data_buffer, "\n");
size_t current_len = strlen(compressed_data_buffer);
if (current_len + 1 < compressed_data_size)
{
compressed_data_buffer[current_len] = '\n';
compressed_data_buffer[current_len + 1] = '\0';
}
write_wrapped(enc_ctx->out->fh, compressed_data_buffer, strlen(compressed_data_buffer));
@@ -168,58 +180,59 @@ boolean mcc_encode_cc_data(struct encoder_ctx *enc_ctx, struct lib_cc_decode *de
static void generate_mcc_header(int fh, int fr_code, int dropframe_flag)
{
char uuid_str[50];
char date_str[50];
char time_str[30];
char tcr_str[25];
char date_str[64];
char time_str[32];
char tcr_str[32];
time_t t = time(NULL);
struct tm tm = *localtime(&t);
sprintf(uuid_str, "UUID=");
snprintf(uuid_str, sizeof(uuid_str), "UUID=");
uuid4(&uuid_str[5]);
uuid_str[41] = '\n';
uuid_str[42] = '\0';
ASSERT(tm.tm_wday < 7);
ASSERT(tm.tm_mon < 12);
sprintf(date_str, "Creation Date=%s, %s %d, %d\n", DayOfWeekStr[tm.tm_wday], MonthStr[tm.tm_mon], tm.tm_mday, tm.tm_year + 1900);
sprintf(time_str, "Creation Time=%d:%02d:%02d\n", tm.tm_hour, tm.tm_min, tm.tm_sec);
snprintf(date_str, sizeof(date_str), "Creation Date=%s, %s %d, %d\n", DayOfWeekStr[tm.tm_wday], MonthStr[tm.tm_mon], tm.tm_mday, tm.tm_year + 1900);
snprintf(time_str, sizeof(time_str), "Creation Time=%d:%02d:%02d\n", tm.tm_hour, tm.tm_min, tm.tm_sec);
switch (fr_code)
{
case 1:
case 2:
sprintf(tcr_str, "Time Code Rate=24\n\n");
snprintf(tcr_str, sizeof(tcr_str), "Time Code Rate=24\n\n");
break;
case 3:
sprintf(tcr_str, "Time Code Rate=25\n\n");
snprintf(tcr_str, sizeof(tcr_str), "Time Code Rate=25\n\n");
break;
case 4:
case 5:
if (dropframe_flag == CCX_TRUE)
{
sprintf(tcr_str, "Time Code Rate=30DF\n\n");
snprintf(tcr_str, sizeof(tcr_str), "Time Code Rate=30DF\n\n");
}
else
{
sprintf(tcr_str, "Time Code Rate=30\n\n");
snprintf(tcr_str, sizeof(tcr_str), "Time Code Rate=30\n\n");
}
break;
case 6:
sprintf(tcr_str, "Time Code Rate=50\n\n");
snprintf(tcr_str, sizeof(tcr_str), "Time Code Rate=50\n\n");
break;
case 7:
case 8:
if (dropframe_flag == CCX_TRUE)
{
sprintf(tcr_str, "Time Code Rate=60DF\n\n");
snprintf(tcr_str, sizeof(tcr_str), "Time Code Rate=60DF\n\n");
}
else
{
sprintf(tcr_str, "Time Code Rate=60\n\n");
snprintf(tcr_str, sizeof(tcr_str), "Time Code Rate=60\n\n");
}
break;
default:
LOG("ERROR: Invalid Framerate Code: %d", fr_code);
tcr_str[0] = '\0';
break;
}
@@ -271,6 +284,10 @@ static uint8 *add_boilerplate(struct encoder_ctx *ctx, unsigned char *cc_data, i
uint8 data_size = cc_count * 3;
uint8 *buff_ptr = malloc(data_size + 16);
if (!buff_ptr)
{
fatal(EXIT_NOT_ENOUGH_MEMORY, "In add_boilerplate: Out of memory allocating buff_ptr.");
}
uint8 cdp_frame_rate = CDP_FRAME_RATE_FORBIDDEN;
switch (fr_code)
@@ -323,15 +340,15 @@ static uint8 *add_boilerplate(struct encoder_ctx *ctx, unsigned char *cc_data, i
buff_ptr[5] = data_size + 12;
buff_ptr[6] = ((cdp_frame_rate << 4) | 0x0F);
buff_ptr[7] = 0x43; // Timecode not Present; Service Info not Present; Captions Present
buff_ptr[8] = (uint8)((ctx->cdp_hdr_seq & 0xF0) >> 8);
buff_ptr[9] = (uint8)(ctx->cdp_hdr_seq & 0x0F);
buff_ptr[8] = (uint8)((ctx->cdp_hdr_seq >> 8) & 0xFF);
buff_ptr[9] = (uint8)(ctx->cdp_hdr_seq & 0xFF);
buff_ptr[10] = CC_DATA_ID;
buff_ptr[11] = cc_count | 0xE0;
memcpy(&buff_ptr[12], cc_data, data_size);
uint8 *data_ptr = &buff_ptr[data_size + 12];
data_ptr[0] = CDP_FOOTER_ID;
data_ptr[1] = (uint8)((ctx->cdp_hdr_seq & 0xF0) >> 8);
data_ptr[2] = (uint8)(ctx->cdp_hdr_seq & 0x0F);
data_ptr[1] = (uint8)((ctx->cdp_hdr_seq >> 8) & 0xFF);
data_ptr[2] = (uint8)(ctx->cdp_hdr_seq & 0xFF);
data_ptr[3] = 0;
for (int loop = 0; loop < (data_size + 15); loop++)
@@ -639,9 +656,10 @@ static void compress_data(uint8 *data_ptr, uint16 num_elements, uint8 *out_data_
static void random_chars(char buffer[], int len)
{
static const char hex_chars[] = "0123456789ABCDEF";
for (int i = 0; i < len; i++)
{
sprintf(buffer + i, "%X", rand() % 16);
buffer[i] = hex_chars[rand() % 16];
}
}
@@ -674,14 +692,15 @@ static void debug_log(char *file, int line, ...)
va_start(args, line);
char *fmt = va_arg(args, char *);
vsprintf(message, fmt, args);
vsnprintf(message, sizeof(message), fmt, args);
va_end(args);
char *basename = strrchr(file, '/');
basename = basename ? basename + 1 : file;
if (message[(strlen(message) - 1)] == '\n')
message[(strlen(message) - 1)] = '\0';
size_t msg_len = strlen(message);
if (msg_len > 0 && message[msg_len - 1] == '\n')
message[msg_len - 1] = '\0';
dbg_print(CCX_DMT_VERBOSE, "[%s:%d] - %s\n", basename, line, message);
} // debug_log()

View File

@@ -9,22 +9,22 @@
/*-- Constants --*/
/*----------------------------------------------------------------------------*/
#define ANC_DID_CLOSED_CAPTIONING 0x61
#define ANC_SDID_CEA_708 0x01
#define CDP_IDENTIFIER_VALUE_HIGH 0x96
#define CDP_IDENTIFIER_VALUE_LOW 0x69
#define CC_DATA_ID 0x72
#define CDP_FOOTER_ID 0x74
#define ANC_DID_CLOSED_CAPTIONING 0x61
#define ANC_SDID_CEA_708 0x01
#define CDP_IDENTIFIER_VALUE_HIGH 0x96
#define CDP_IDENTIFIER_VALUE_LOW 0x69
#define CC_DATA_ID 0x72
#define CDP_FOOTER_ID 0x74
#define CDP_FRAME_RATE_FORBIDDEN 0x00
#define CDP_FRAME_RATE_23_976 0x01
#define CDP_FRAME_RATE_24 0x02
#define CDP_FRAME_RATE_25 0x03
#define CDP_FRAME_RATE_29_97 0x04
#define CDP_FRAME_RATE_30 0x05
#define CDP_FRAME_RATE_50 0x06
#define CDP_FRAME_RATE_59_94 0x07
#define CDP_FRAME_RATE_60 0x08
#define CDP_FRAME_RATE_FORBIDDEN 0x00
#define CDP_FRAME_RATE_23_976 0x01
#define CDP_FRAME_RATE_24 0x02
#define CDP_FRAME_RATE_25 0x03
#define CDP_FRAME_RATE_29_97 0x04
#define CDP_FRAME_RATE_30 0x05
#define CDP_FRAME_RATE_50 0x06
#define CDP_FRAME_RATE_59_94 0x07
#define CDP_FRAME_RATE_60 0x08
/*----------------------------------------------------------------------------*/
/*-- Types --*/
@@ -46,7 +46,9 @@ typedef long long int64;
/*----------------------------------------------------------------------------*/
#define LOG(...) debug_log(__FILE__, __LINE__, __VA_ARGS__)
#define ASSERT(x) if(!(x)) debug_log(__FILE__, __LINE__, "ASSERT FAILED!")
#define ASSERT(x) \
if (!(x)) \
debug_log(__FILE__, __LINE__, "ASSERT FAILED!")
/*----------------------------------------------------------------------------*/
/*-- Exposed Variables --*/

View File

@@ -15,7 +15,7 @@ int write_stringz_as_sami(char *string, struct encoder_ctx *context, LLONG ms_st
unsigned char *el = NULL;
char str[1024];
sprintf(str, "<SYNC start=%llu><P class=\"UNKNOWNCC\">\r\n", (unsigned long long)ms_start);
snprintf(str, sizeof(str), "<SYNC start=%llu><P class=\"UNKNOWNCC\">\r\n", (unsigned long long)ms_start);
if (context->encoding != CCX_ENC_UNICODE)
{
dbg_print(CCX_DMT_DECODER_608, "\r%s\n", str);
@@ -85,7 +85,7 @@ int write_stringz_as_sami(char *string, struct encoder_ctx *context, LLONG ms_st
begin += strlen((const char *)begin) + 1;
}
sprintf((char *)str, "</P></SYNC>\r\n");
snprintf(str, sizeof(str), "</P></SYNC>\r\n");
if (context->encoding != CCX_ENC_UNICODE)
{
dbg_print(CCX_DMT_DECODER_608, "\r%s\n", str);
@@ -94,9 +94,9 @@ int write_stringz_as_sami(char *string, struct encoder_ctx *context, LLONG ms_st
ret = write(context->out->fh, context->buffer, used);
if (ret != used)
goto end;
sprintf((char *)str,
"<SYNC start=%llu><P class=\"UNKNOWNCC\">&nbsp;</P></SYNC>\r\n\r\n",
(unsigned long long)ms_end);
snprintf(str, sizeof(str),
"<SYNC start=%llu><P class=\"UNKNOWNCC\">&nbsp;</P></SYNC>\r\n\r\n",
(unsigned long long)ms_end);
if (context->encoding != CCX_ENC_UNICODE)
{
dbg_print(CCX_DMT_DECODER_608, "\r%s\n", str);
@@ -127,8 +127,8 @@ int write_cc_bitmap_as_sami(struct cc_subtitle *sub, struct encoder_ctx *context
if (sub->data != NULL) // then we should write the sub
{
sprintf(buf,
"<SYNC start=%llu><P class=\"UNKNOWNCC\">\r\n", (unsigned long long)sub->start_time);
snprintf(buf, context->capacity,
"<SYNC start=%llu><P class=\"UNKNOWNCC\">\r\n", (unsigned long long)sub->start_time);
write_wrapped(context->out->fh, buf, strlen(buf));
for (int i = sub->nb_data - 1; i >= 0; i--)
{
@@ -137,7 +137,7 @@ int write_cc_bitmap_as_sami(struct cc_subtitle *sub, struct encoder_ctx *context
if (context->prev_start != -1 || !(sub->flags & SUB_EOD_MARKER))
{
token = strtok(rect[i].ocr_text, "\r\n");
sprintf(buf, "%s", token);
snprintf(buf, context->capacity, "%s", token);
token = strtok(NULL, "\r\n");
write_wrapped(context->out->fh, buf, strlen(buf));
if (i != 0)
@@ -146,13 +146,13 @@ int write_cc_bitmap_as_sami(struct cc_subtitle *sub, struct encoder_ctx *context
}
}
}
sprintf(buf, "</P></SYNC>\r\n");
snprintf(buf, context->capacity, "</P></SYNC>\r\n");
write_wrapped(context->out->fh, buf, strlen(buf));
}
else // we write an empty subtitle to clear the old one
{
sprintf(buf,
"<SYNC start=%llu><P class=\"UNKNOWNCC\">&nbsp;</P></SYNC>\r\n\r\n", (unsigned long long)sub->start_time);
snprintf(buf, context->capacity,
"<SYNC start=%llu><P class=\"UNKNOWNCC\">&nbsp;</P></SYNC>\r\n\r\n", (unsigned long long)sub->start_time);
write_wrapped(context->out->fh, buf, strlen(buf));
}
#endif
@@ -194,8 +194,8 @@ int write_cc_buffer_as_sami(struct eia608_screen *data, struct encoder_ctx *cont
int wrote_something = 0;
char str[1024];
sprintf(str, "<SYNC start=%llu><P class=\"UNKNOWNCC\">\r\n",
(unsigned long long)data->start_time);
snprintf(str, sizeof(str), "<SYNC start=%llu><P class=\"UNKNOWNCC\">\r\n",
(unsigned long long)data->start_time);
if (context->encoding != CCX_ENC_UNICODE)
{
dbg_print(CCX_DMT_DECODER_608, "\r%s\n", str);
@@ -219,16 +219,16 @@ int write_cc_buffer_as_sami(struct eia608_screen *data, struct encoder_ctx *cont
write_wrapped(context->out->fh, context->encoded_crlf, context->encoded_crlf_length);
}
}
sprintf((char *)str, "</P></SYNC>\r\n");
snprintf(str, sizeof(str), "</P></SYNC>\r\n");
if (context->encoding != CCX_ENC_UNICODE)
{
dbg_print(CCX_DMT_DECODER_608, "\r%s\n", str);
}
used = encode_line(context, context->buffer, (unsigned char *)str);
write_wrapped(context->out->fh, context->buffer, used);
sprintf((char *)str,
"<SYNC start=%llu><P class=\"UNKNOWNCC\">&nbsp;</P></SYNC>\r\n\r\n",
(unsigned long long)data->end_time - 1); // - 1 to prevent overlap
snprintf(str, sizeof(str),
"<SYNC start=%llu><P class=\"UNKNOWNCC\">&nbsp;</P></SYNC>\r\n\r\n",
(unsigned long long)data->end_time - 1); // - 1 to prevent overlap
if (context->encoding != CCX_ENC_UNICODE)
{
dbg_print(CCX_DMT_DECODER_608, "\r%s\n", str);

View File

@@ -10,6 +10,171 @@ unsigned char odd_parity(const unsigned char byte)
return byte | !(cc608_parity(byte) % 2) << 7;
}
/**
* SCC Accurate Timing Implementation (Issue #1120)
*
* EIA-608 bandwidth constraints:
* - 2 bytes per frame at 29.97 FPS (or configured frame rate)
* - Captions must be pre-loaded before display time
* - Each control code takes 2 bytes (sent twice for reliability = 4 bytes total)
* - Text characters take 1 byte each
*/
// Get frame rate value from scc_framerate setting
// 0=29.97 (default), 1=24, 2=25, 3=30
static float get_scc_fps_internal(int scc_framerate)
{
switch (scc_framerate)
{
case 1:
return 24.0f;
case 2:
return 25.0f;
case 3:
return 30.0f;
default:
return 29.97f;
}
}
/**
* Calculate total bytes needed to transmit a caption
*
* Byte costs:
* - Control code (RCL, EOC, ENM, EDM): 2 bytes x 2 (sent twice) = 4 bytes
* - Preamble code: 2 bytes x 2 = 4 bytes
* - Tab offset: 2 bytes x 2 = 4 bytes
* - Mid-row code (color/style): 2 bytes x 2 = 4 bytes
* - Text character: 1 byte each
* - Padding: 1 byte if odd number of text bytes
*/
static unsigned int calculate_caption_bytes(const struct eia608_screen *data)
{
unsigned int total_bytes = 0;
// RCL (Resume Caption Loading): 4 bytes
total_bytes += 4;
for (unsigned char row = 0; row < 15; ++row)
{
if (!data->row_used[row])
continue;
int first, last;
find_limit_characters(data->characters[row], &first, &last, CCX_DECODER_608_SCREEN_WIDTH);
if (first > last)
continue;
// Assume we need at least one preamble per row: 4 bytes
total_bytes += 4;
// Count characters on this row
unsigned int char_count = 0;
enum font_bits prev_font = FONT_REGULAR;
enum ccx_decoder_608_color_code prev_color = COL_WHITE;
int prev_col = -1;
for (int col = first; col <= last; ++col)
{
// Check if we need position codes
if (prev_col != col - 1 && prev_col != -1)
{
// Need preamble + possible tab offset: 4-8 bytes
total_bytes += 4;
if (col % 4 != 0)
total_bytes += 4; // Tab offset
}
// Check if we need mid-row style codes
if (data->fonts[row][col] != prev_font || data->colors[row][col] != prev_color)
{
total_bytes += 4; // Mid-row code
prev_font = data->fonts[row][col];
prev_color = data->colors[row][col];
}
// Text character
char_count++;
prev_col = col;
}
// Add text bytes (1 per character, rounded up to even)
total_bytes += char_count;
if (char_count % 2 == 1)
total_bytes++; // Padding
}
// EOC (End of Caption): 4 bytes
total_bytes += 4;
// ENM (Erase Non-displayed Memory): 4 bytes
total_bytes += 4;
return total_bytes;
}
/**
* Calculate the pre-roll start time for a caption
*
* @param display_time When the caption should appear on screen (ms)
* @param total_bytes Total bytes to transmit
* @param fps Frame rate
* @return Time to begin loading the caption (ms)
*/
static LLONG calculate_preroll_time(LLONG display_time, unsigned int total_bytes, float fps)
{
// Calculate transmission time in milliseconds
// 2 bytes per frame, so frames_needed = (total_bytes + 1) / 2
float ms_per_frame = 1000.0f / fps;
unsigned int frames_needed = (total_bytes + 1) / 2;
LLONG transmission_time_ms = (LLONG)(frames_needed * ms_per_frame);
// Add 1 frame for EOC to be sent before display
LLONG one_frame_ms = (LLONG)ms_per_frame;
LLONG preroll_start = display_time - transmission_time_ms - one_frame_ms;
// Don't go negative
if (preroll_start < 0)
preroll_start = 0;
return preroll_start;
}
/**
* Check for collision with previous caption transmission and resolve it
*
* @param context Encoder context with timing state
* @param preroll_start Proposed pre-roll start time (will be modified if collision)
* @param display_time Caption display time (may be adjusted)
* @param fps Frame rate
* @return true if timing was adjusted due to collision
*/
static bool resolve_collision(struct encoder_ctx *context, LLONG *preroll_start,
LLONG *display_time, float fps)
{
// Check if our preroll would start before previous caption finishes transmitting
// This prevents bandwidth collision but allows visual overlap (like scc_tools)
// Visual overlap is fine - the EOC command swaps buffers atomically
if (context->scc_last_transmission_end > 0 &&
*preroll_start < context->scc_last_transmission_end)
{
// Bandwidth collision detected - shift our caption forward
// Add 1 frame buffer to ensure no overlap
LLONG one_frame_ms = (LLONG)(1000.0f / fps);
LLONG new_preroll = context->scc_last_transmission_end + one_frame_ms;
LLONG shift = new_preroll - *preroll_start;
*preroll_start = new_preroll;
*display_time += shift;
return true;
}
return false;
}
struct control_code_info
{
unsigned int byte1_odd;
@@ -484,14 +649,156 @@ void write_control_code(const int fd, const unsigned char channel, const enum co
* @param row 0-14 (inclusive)
* @param column 0-31 (inclusive)
*
* //TODO: Preamble code need to take into account font as well
*
* Returns an indent-based preamble code (positions cursor at column with white color)
*/
enum control_code get_preamble_code(const unsigned char row, const unsigned char column)
{
return PREAMBLE_CC_START + 1 + (row * 8) + (column / 4);
}
/**
* Get byte2 value for a styled PAC (color/font at column 0)
* Returns 0x40-0x4F or 0x60-0x6F depending on the style
*
* @param color The color to use
* @param font The font style to use
* @param use_high_range If true, use 0x60-0x6F range instead of 0x40-0x4F
*
* PAC style encoding (byte2):
* 0x40/0x60: white, regular 0x41/0x61: white, underline
* 0x42/0x62: green, regular 0x43/0x63: green, underline
* 0x44/0x64: blue, regular 0x45/0x65: blue, underline
* 0x46/0x66: cyan, regular 0x47/0x67: cyan, underline
* 0x48/0x68: red, regular 0x49/0x69: red, underline
* 0x4a/0x6a: yellow, regular 0x4b/0x6b: yellow, underline
* 0x4c/0x6c: magenta, regular 0x4d/0x6d: magenta, underline
* 0x4e/0x6e: white, italics 0x4f/0x6f: white, italic underline
*/
static unsigned char get_styled_pac_byte2(enum ccx_decoder_608_color_code color, enum font_bits font, bool use_high_range)
{
unsigned char base = use_high_range ? 0x60 : 0x40;
unsigned char style_offset;
// Handle italics specially - they're always white
if (font == FONT_ITALICS)
return base + 0x0e;
if (font == FONT_UNDERLINED_ITALICS)
return base + 0x0f;
// Map color to base offset (0, 2, 4, 6, 8, 10, 12)
switch (color)
{
case COL_WHITE:
style_offset = 0x00;
break;
case COL_GREEN:
style_offset = 0x02;
break;
case COL_BLUE:
style_offset = 0x04;
break;
case COL_CYAN:
style_offset = 0x06;
break;
case COL_RED:
style_offset = 0x08;
break;
case COL_YELLOW:
style_offset = 0x0a;
break;
case COL_MAGENTA:
style_offset = 0x0c;
break;
default:
// For unsupported colors (black, transparent, userdefined), fall back to white
style_offset = 0x00;
break;
}
// Add 1 for underlined
if (font == FONT_UNDERLINED)
style_offset += 1;
return base + style_offset;
}
/**
* Check if the row uses high range (0x60-0x6F) or low range (0x40-0x4F) for styled PACs
* Rows that have byte2 in 0x70-0x7F range for indents use 0x60-0x6F for styles
*/
static bool row_uses_high_range(unsigned char row)
{
// Based on the preamble code table:
// Rows 2, 4, 6, 8, 10, 13, 15 use the "high" range (byte2 0x70-0x7F for indents)
// which corresponds to 0x60-0x6F for styled PACs
return (row == 1 || row == 3 || row == 5 || row == 7 || row == 9 || row == 12 || row == 14);
}
/**
* Write a styled PAC code (color/font at column 0) directly
* This is more efficient than using indent PAC + mid-row code when at column 0
*
* @param fd File descriptor
* @param channel Caption channel (1-4)
* @param row Row number (0-14)
* @param color Color to set
* @param font Font style to set
* @param disassemble If true, output assembly format
* @param bytes_written Pointer to byte counter
*/
static void write_styled_preamble(const int fd, const unsigned char channel, const unsigned char row,
enum ccx_decoder_608_color_code color, enum font_bits font,
const bool disassemble, unsigned int *bytes_written)
{
// Get the preamble code for column 0 to obtain byte1
enum control_code base_preamble = get_preamble_code(row, 0);
unsigned char byte1 = odd_parity(get_first_byte(channel, base_preamble));
// Get styled byte2
bool use_high_range = row_uses_high_range(row);
unsigned char byte2 = odd_parity(get_styled_pac_byte2(color, font, use_high_range));
check_padding(fd, disassemble, bytes_written);
if (disassemble)
{
// Output assembly format like {0100Gr} for row 1, green
const char *color_names[] = {"Wh", "Gr", "Bl", "Cy", "R", "Y", "Ma", "Wh", "Bk", "Wh"};
const char *font_suffix = "";
if (font == FONT_UNDERLINED)
font_suffix = "U";
else if (font == FONT_ITALICS)
font_suffix = "I";
else if (font == FONT_UNDERLINED_ITALICS)
font_suffix = "IU";
fdprintf(fd, "{%02d00%s%s}", row + 1, color_names[color], font_suffix);
}
else
{
if (*bytes_written % 2 == 0)
write_wrapped(fd, " ", 1);
fdprintf(fd, "%02x%02x", byte1, byte2);
}
*bytes_written += 2;
}
/**
* Check if a styled PAC can be used (when color/font differs from white/regular and column is 0)
*/
static bool can_use_styled_pac(enum ccx_decoder_608_color_code color, enum font_bits font, unsigned char column)
{
// Styled PACs can only be used at column 0
if (column != 0)
return false;
// If style is already white/regular, no need for styled PAC
if (color == COL_WHITE && font == FONT_REGULAR)
return false;
return true;
}
enum control_code get_tab_offset_code(const unsigned char column)
{
int offset = column % 4;
@@ -519,6 +826,23 @@ enum control_code get_font_code(enum font_bits font, enum ccx_decoder_608_color_
}
}
// Get frame rate value from scc_framerate setting
// 0=29.97 (default), 1=24, 2=25, 3=30
static float get_scc_fps(int scc_framerate)
{
switch (scc_framerate)
{
case 1:
return 24.0f;
case 2:
return 25.0f;
case 3:
return 30.0f;
default:
return 29.97f;
}
}
void add_timestamp(const struct encoder_ctx *context, LLONG time, const bool disassemble)
{
write_wrapped(context->out->fh, context->encoded_crlf, context->encoded_crlf_length);
@@ -528,9 +852,15 @@ void add_timestamp(const struct encoder_ctx *context, LLONG time, const bool dis
unsigned hour, minute, second, milli;
millis_to_time(time, &hour, &minute, &second, &milli);
// SMPTE format
float frame = milli * 29.97 / 1000;
fdprintf(context->out->fh, "%02u:%02u:%02u:%02.f\t", hour, minute, second, frame);
// SMPTE format - use configurable frame rate (issue #1191)
float fps = get_scc_fps(context->scc_framerate);
// Calculate frame number from milliseconds, ensuring it stays in valid range 0 to fps-1
// Use floor to avoid rounding up to fps (e.g., 29.97 -> 30 is invalid)
int max_frames = (int)fps;
int frame = (int)(milli * fps / 1000.0f);
if (frame >= max_frames)
frame = max_frames - 1; // Cap at max valid frame (e.g., 29 for 29.97fps)
fdprintf(context->out->fh, "%02u:%02u:%02u:%02d\t", hour, minute, second, frame);
}
void clear_screen(const struct encoder_ctx *context, LLONG end_time, const unsigned char channel, const bool disassemble)
@@ -545,11 +875,56 @@ int write_cc_buffer_as_scenarist(const struct eia608_screen *data, struct encode
unsigned int bytes_written = 0;
enum font_bits current_font = FONT_REGULAR;
enum ccx_decoder_608_color_code current_color = COL_WHITE;
unsigned char current_row = 14;
unsigned char current_column = 0;
// Initialize to impossible values to ensure the first character always
// triggers a position code write (fixes issue #1776)
unsigned char current_row = UINT8_MAX;
unsigned char current_column = UINT8_MAX;
// Timing variables for accurate timing mode (issue #1120)
LLONG actual_start_time = data->start_time; // When caption should display
LLONG actual_end_time = data->end_time; // When caption should clear
LLONG preroll_start = data->start_time; // When to start loading (default: same as display)
float fps = get_scc_fps_internal(context->scc_framerate);
bool use_separate_display_time = false; // Whether to write EOC at separate timestamp
// If accurate timing is enabled, calculate pre-roll and handle collisions
if (context->scc_accurate_timing)
{
// Calculate total bytes needed for this caption
unsigned int total_bytes = calculate_caption_bytes(data);
// Calculate when we need to start loading
preroll_start = calculate_preroll_time(actual_start_time, total_bytes, fps);
// Check for collisions with previous caption and resolve
if (resolve_collision(context, &preroll_start, &actual_start_time, fps))
{
// Timing was adjusted due to collision
// Also adjust end time by the same amount
LLONG shift = actual_start_time - data->start_time;
actual_end_time = data->end_time + shift;
}
// Update timing state for next caption
float ms_per_frame = 1000.0f / fps;
unsigned int frames_needed = (total_bytes + 1) / 2;
LLONG transmission_time_ms = (LLONG)(frames_needed * ms_per_frame);
context->scc_last_transmission_end = preroll_start + transmission_time_ms;
context->scc_last_display_end = actual_end_time;
// Enable separate display timing (like scc_tools)
use_separate_display_time = true;
// 1. Load the caption at pre-roll time
add_timestamp(context, preroll_start, disassemble);
}
else
{
// Legacy mode: use original timing
// 1. Load the caption
add_timestamp(context, data->start_time, disassemble);
}
// 1. Load the caption
add_timestamp(context, data->start_time, disassemble);
write_control_code(context->out->fh, data->channel, RCL, disassemble, &bytes_written);
for (uint8_t row = 0; row < 15; ++row)
{
@@ -576,6 +951,23 @@ int write_cc_buffer_as_scenarist(const struct eia608_screen *data, struct encode
{
if (switch_font || switch_color)
{
// Optimization (issue #1191): Use styled PAC when at column 0 with non-default style
// This avoids needing a separate mid-row code
if (column == 0 && can_use_styled_pac(data->colors[row][column], data->fonts[row][column], 0))
{
write_styled_preamble(context->out->fh, data->channel, row,
data->colors[row][column], data->fonts[row][column],
disassemble, &bytes_written);
current_row = row;
current_column = 0;
current_font = data->fonts[row][column];
current_color = data->colors[row][column];
// Write the character and continue
write_character(context->out->fh, data->characters[row][column], disassemble, &bytes_written);
++current_column;
continue;
}
if (data->characters[row][column] == ' ')
{
// The MID-ROW code is going to move the cursor to the
@@ -615,12 +1007,26 @@ int write_cc_buffer_as_scenarist(const struct eia608_screen *data, struct encode
check_padding(context->out->fh, disassemble, &bytes_written);
}
// 2. Show the caption
// 2. Show the caption (EOC = End of Caption, makes it visible)
if (use_separate_display_time)
{
// For accurate timing: write display command at actual display time
// This matches scc_tools behavior where load and display are separate
add_timestamp(context, actual_start_time, disassemble);
}
write_control_code(context->out->fh, data->channel, EOC, disassemble, &bytes_written);
write_control_code(context->out->fh, data->channel, ENM, disassemble, &bytes_written);
// 3. Clear the caption
clear_screen(context, data->end_time, data->channel, disassemble);
// 3. Clear the caption at the end time
// In accurate timing mode, skip clear - the next caption's EOC will handle the transition
// This matches scc_tools behavior which doesn't write EDM between consecutive captions
if (!use_separate_display_time)
{
// Legacy mode: always write clear
clear_screen(context, actual_end_time, data->channel, disassemble);
}
// In accurate timing mode, scc_last_display_end is still tracked for reference
// but we don't write the clear command to avoid out-of-order timestamps
return 1;
}

View File

@@ -36,18 +36,22 @@ void write_stringz_as_smptett(char *string, struct encoder_ctx *context, LLONG m
unsigned h2, m2, s2, ms2;
int len = strlen(string);
unsigned char *unescaped = (unsigned char *)malloc(len + 1);
if (!unescaped)
fatal(EXIT_NOT_ENOUGH_MEMORY, "In write_stringz_as_smptett() - not enough memory for unescaped buffer.\n");
unsigned char *el = (unsigned char *)malloc(len * 3 + 1); // Be generous
if (!el)
{
free(unescaped);
fatal(EXIT_NOT_ENOUGH_MEMORY, "In write_stringz_as_smptett() - not enough memory for el buffer.\n");
}
int pos_r = 0;
int pos_w = 0;
char str[1024];
if (el == NULL || unescaped == NULL)
fatal(EXIT_NOT_ENOUGH_MEMORY, "In write_stringz_as_smptett() - not enough memory.\n");
millis_to_time(ms_start, &h1, &m1, &s1, &ms1);
millis_to_time(ms_end - 1, &h2, &m2, &s2, &ms2);
sprintf((char *)str, "<p begin=\"%02u:%02u:%02u.%03u\" end=\"%02u:%02u:%02u.%03u\">\r\n", h1, m1, s1, ms1, h2, m2, s2, ms2);
snprintf(str, sizeof(str), "<p begin=\"%02u:%02u:%02u.%03u\" end=\"%02u:%02u:%02u.%03u\">\r\n", h1, m1, s1, ms1, h2, m2, s2, ms2);
if (context->encoding != CCX_ENC_UNICODE)
{
dbg_print(CCX_DMT_DECODER_608, "\r%s\n", str);
@@ -87,7 +91,7 @@ void write_stringz_as_smptett(char *string, struct encoder_ctx *context, LLONG m
begin += strlen((const char *)begin) + 1;
}
sprintf((char *)str, "</p>\n");
snprintf(str, sizeof(str), "</p>\n");
if (context->encoding != CCX_ENC_UNICODE)
{
dbg_print(CCX_DMT_DECODER_608, "\r%s\n", str);
@@ -126,13 +130,15 @@ int write_cc_bitmap_as_smptett(struct cc_subtitle *sub, struct encoder_ctx *cont
unsigned h2, m2, s2, ms2;
millis_to_time(sub->start_time, &h1, &m1, &s1, &ms1);
millis_to_time(sub->end_time - 1, &h2, &m2, &s2, &ms2); // -1 To prevent overlapping with next line.
sprintf((char *)context->buffer, "<p begin=\"%02u:%02u:%02u.%03u\" end=\"%02u:%02u:%02u.%03u\">\n", h1, m1, s1, ms1, h2, m2, s2, ms2);
write_wrapped(context->out->fh, buf, strlen(buf));
int written = snprintf(buf, INITIAL_ENC_BUFFER_CAPACITY, "<p begin=\"%02u:%02u:%02u.%03u\" end=\"%02u:%02u:%02u.%03u\">\n", h1, m1, s1, ms1, h2, m2, s2, ms2);
if (written > 0 && (size_t)written < INITIAL_ENC_BUFFER_CAPACITY)
write_wrapped(context->out->fh, buf, written);
len = strlen(rect[i].ocr_text);
write_wrapped(context->out->fh, rect[i].ocr_text, len);
write_wrapped(context->out->fh, context->encoded_crlf, context->encoded_crlf_length);
sprintf(buf, "</p>\n");
write_wrapped(context->out->fh, buf, strlen(buf));
written = snprintf(buf, INITIAL_ENC_BUFFER_CAPACITY, "</p>\n");
if (written > 0 && (size_t)written < INITIAL_ENC_BUFFER_CAPACITY)
write_wrapped(context->out->fh, buf, written);
}
}
}
@@ -218,7 +224,7 @@ int write_cc_buffer_as_smptett(struct eia608_screen *data, struct encoder_ctx *c
{
wrote_something = 1;
sprintf(str, " <p begin=\"%02u:%02u:%02u.%03u\" end=\"%02u:%02u:%02u.%03u\" tts:origin=\"%1.3f%% %1.3f%%\">\n <span>", h1, m1, s1, ms1, h2, m2, s2, ms2, col1, row1);
snprintf(str, sizeof(str), " <p begin=\"%02u:%02u:%02u.%03u\" end=\"%02u:%02u:%02u.%03u\" tts:origin=\"%1.3f%% %1.3f%%\">\n <span>", h1, m1, s1, ms1, h2, m2, s2, ms2, col1, row1);
if (context->encoding != CCX_ENC_UNICODE)
{
dbg_print(CCX_DMT_DECODER_608, "\r%s\n", str);
@@ -236,8 +242,16 @@ int write_cc_buffer_as_smptett(struct eia608_screen *data, struct encoder_ctx *c
get_decoder_line_encoded(context, context->subline, row, data);
char *final = malloc(strlen((const char *)(context->subline)) + 1000); // Being overly generous? :P
char *temp = malloc(strlen((const char *)(context->subline)) + 1000);
size_t subline_len = strlen((const char *)(context->subline));
size_t buf_size = subline_len + 1000; // Being overly generous? :P
char *final = malloc(buf_size);
char *temp = malloc(buf_size);
if (!final || !temp)
{
freep(&final);
freep(&temp);
fatal(EXIT_NOT_ENOUGH_MEMORY, "In write_cc_buffer_as_smptett() - not enough memory.\n");
}
*final = 0;
*temp = 0;
/*
@@ -297,37 +311,56 @@ int write_cc_buffer_as_smptett(struct eia608_screen *data, struct encoder_ctx *c
if (end == NULL)
{
// Incorrect styling, writing as it is
strcpy(final, (const char *)(context->subline));
snprintf(final, buf_size, "%s", (const char *)(context->subline));
}
else
{
size_t final_len = 0;
int start_index = start - (char *)(context->subline);
int end_index = end - (char *)(context->subline);
strncat(final, (const char *)(context->subline), start_index); // copying content before opening tag e.g. <i>
// copying content before opening tag e.g. <i>
if (start_index > 0 && (size_t)start_index < buf_size - 1)
{
memcpy(final, (const char *)(context->subline), start_index);
final[start_index] = '\0';
final_len = start_index;
}
strcat(final, "<span>"); // adding <span> : replacement of <i>
// adding <span> : replacement of <i>
size_t remaining = buf_size - final_len;
int written = snprintf(final + final_len, remaining, "<span>");
if (written > 0 && (size_t)written < remaining)
final_len += written;
// The content in italics is between <i> and </i>, i.e. between (start_index + 3) and end_index.
int content_len = end_index - start_index - 3;
if (content_len > 0)
{
remaining = buf_size - final_len;
if ((size_t)content_len < remaining - 1)
{
memcpy(final + final_len, (const char *)(context->subline) + start_index + 3, content_len);
final_len += content_len;
final[final_len] = '\0';
}
}
strncat(temp, (const char *)(context->subline) + start_index + 3, end_index - start_index - 3); // the content in italics
strcat(final, temp); // attaching to final sentence.
// adding appropriate style tag
remaining = buf_size - final_len;
if (style == 1)
strcpy(temp, "<style tts:backgroundColor=\"#000000FF\" tts:fontSize=\"18px\" tts:fontStyle=\"italic\"/> </span>");
written = snprintf(final + final_len, remaining, "<style tts:backgroundColor=\"#000000FF\" tts:fontSize=\"18px\" tts:fontStyle=\"italic\"/> </span>");
else if (style == 2)
strcpy(temp, "<style tts:backgroundColor=\"#000000FF\" tts:fontSize=\"18px\" tts:fontWeight=\"bold\"/> </span>");
written = snprintf(final + final_len, remaining, "<style tts:backgroundColor=\"#000000FF\" tts:fontSize=\"18px\" tts:fontWeight=\"bold\"/> </span>");
else
strcpy(temp, "<style tts:backgroundColor=\"#000000FF\" tts:fontSize=\"18px\" tts:textDecoration=\"underline\"/> </span>");
written = snprintf(final + final_len, remaining, "<style tts:backgroundColor=\"#000000FF\" tts:fontSize=\"18px\" tts:textDecoration=\"underline\"/> </span>");
strcat(final, temp); // adding appropriate style tag.
if (written > 0 && (size_t)written < remaining)
final_len += written;
sprintf(temp, "%s", (const char *)(context->subline) + end_index + 4); // finding remaining sentence.
strcat(final, temp); // adding remaining sentence.
// finding remaining sentence and adding it
remaining = buf_size - final_len;
snprintf(final + final_len, remaining, "%s", (const char *)(context->subline) + end_index + 4);
}
}
else // No style or Font Color
@@ -340,44 +373,75 @@ int write_cc_buffer_as_smptett(struct eia608_screen *data, struct encoder_ctx *c
if (end == NULL)
{
// Incorrect styling, writing as it is
strcpy(final, (const char *)(context->subline));
snprintf(final, buf_size, "%s", (const char *)(context->subline));
}
else
{
size_t final_len = 0;
int start_index = start - (char *)(context->subline);
int end_index = end - (char *)(context->subline);
strncat(final, (const char *)(context->subline), start_index); // copying content before opening tag e.g. <font ..>
// copying content before opening tag e.g. <font ..>
if (start_index > 0 && (size_t)start_index < buf_size - 1)
{
memcpy(final, (const char *)(context->subline), start_index);
final[start_index] = '\0';
final_len = start_index;
}
strcat(final, "<span>"); // adding <span> : replacement of <font ..>
// adding <span> : replacement of <font ..>
size_t remaining = buf_size - final_len;
int written = snprintf(final + final_len, remaining, "<span>");
if (written > 0 && (size_t)written < remaining)
final_len += written;
char *temp_pointer = strchr((const char *)(context->subline), '#'); // locating color code
char color_code[7];
strncpy(color_code, temp_pointer + 1, 6); // obtained color code
color_code[6] = '\0';
char color_code[8];
if (temp_pointer)
{
snprintf(color_code, sizeof(color_code), "%.6s", temp_pointer + 1); // obtained color code
}
else
{
color_code[0] = '\0';
}
temp_pointer = strchr((const char *)(context->subline), '>'); // The content is in between <font ..> and </font>
strncat(temp, temp_pointer + 1, end_index - (temp_pointer - (char *)(context->subline) + 1));
if (temp_pointer)
{
// Copy the content between <font ..> and </font>
int content_len = end_index - (temp_pointer - (char *)(context->subline) + 1);
if (content_len > 0)
{
remaining = buf_size - final_len;
if ((size_t)content_len < remaining - 1)
{
memcpy(final + final_len, temp_pointer + 1, content_len);
final_len += content_len;
final[final_len] = '\0';
}
}
}
strcat(final, temp); // attaching to final sentence.
// adding font color tag
remaining = buf_size - final_len;
written = snprintf(final + final_len, remaining, "<style tts:backgroundColor=\"#FFFF00FF\" tts:color=\"%s\" tts:fontSize=\"18px\"/></span>", color_code);
if (written > 0 && (size_t)written < remaining)
final_len += written;
sprintf(temp, "<style tts:backgroundColor=\"#FFFF00FF\" tts:color=\"%s\" tts:fontSize=\"18px\"/></span>", color_code);
strcat(final, temp); // adding font color tag
sprintf(temp, "%s", (const char *)(context->subline) + end_index + 7); // finding remaining sentence.
strcat(final, temp); // adding remaining sentence
// finding remaining sentence and adding it
remaining = buf_size - final_len;
snprintf(final + final_len, remaining, "%s", (const char *)(context->subline) + end_index + 7);
}
}
else
{
// NO styling, writing as it is
strcpy(final, (const char *)(context->subline));
snprintf(final, buf_size, "%s", (const char *)(context->subline));
}
}
@@ -386,7 +450,7 @@ int write_cc_buffer_as_smptett(struct eia608_screen *data, struct encoder_ctx *c
write_wrapped(context->out->fh, context->encoded_crlf, context->encoded_crlf_length);
context->trim_subs = old_trim_subs;
sprintf(str, " <style tts:backgroundColor=\"#000000FF\" tts:fontSize=\"18px\"/></span>\n </p>\n");
snprintf(str, sizeof(str), " <style tts:backgroundColor=\"#000000FF\" tts:fontSize=\"18px\"/></span>\n </p>\n");
if (context->encoding != CCX_ENC_UNICODE)
{
dbg_print(CCX_DMT_DECODER_608, "\r%s\n", str);

View File

@@ -80,10 +80,20 @@ ____sbs_context: [%p]\n\
LOG_DEBUG("SBS: init_sbs_context: INIT\n");
____sbs_context = malloc(sizeof(sbs_context_t));
if (!____sbs_context)
{
fatal(EXIT_NOT_ENOUGH_MEMORY, "In init_sbs_context: Out of memory allocating sbs_context_t.");
}
____sbs_context->time_from = -1;
____sbs_context->time_trim = -1;
____sbs_context->capacity = 16;
____sbs_context->buffer = malloc(____sbs_context->capacity * sizeof(unsigned char));
if (!____sbs_context->buffer)
{
free(____sbs_context);
____sbs_context = NULL;
fatal(EXIT_NOT_ENOUGH_MEMORY, "In init_sbs_context: Out of memory allocating buffer.");
}
____sbs_context->buffer[0] = 0;
____sbs_context->handled_len = 0;
}
@@ -222,7 +232,7 @@ char *sbs_find_insert_point_partial(char *old_tail, const char *new_start, size_
{
/*
#ifdef DEBUG_SBS
sprintf(fmtbuf, "SBS: sbs_find_insert_point_partial: compare\n\
snprintf(fmtbuf, sizeof(fmtbuf), "SBS: sbs_find_insert_point_partial: compare\n\
\tnot EQ: [TRUE]\n\
\tmaxerr: [%%d]\n\
\tL buffer: [%%.%zus]\n\
@@ -291,7 +301,7 @@ char *sbs_find_insert_point_partial(char *old_tail, const char *new_start, size_
/*
#ifdef DEBUG_SBS
sprintf(fmtbuf, "SBS: sbs_find_insert_point_partial: REPLACE ENTIRE TAIL !!\n\
snprintf(fmtbuf, sizeof(fmtbuf), "SBS: sbs_find_insert_point_partial: REPLACE ENTIRE TAIL !!\n\
\tmaxerr: [%%d]\n\
\tL buffer: [%%.%zus]\n\
\tL string: [%%.%zus]\n\
@@ -438,7 +448,9 @@ void sbs_strcpy_without_dup(const unsigned char *str, sbs_context_t *context)
skip_ws++;
}
strcpy(context->buffer, context->buffer + skip_ws);
// Use memmove for overlapping memory regions (strcpy is undefined for overlapping buffers)
size_t remaining_len = strlen((char *)(context->buffer + skip_ws)) + 1;
memmove(context->buffer, context->buffer + skip_ws, remaining_len);
context->handled_len = 0;
}
@@ -451,10 +463,12 @@ void sbs_strcpy_without_dup(const unsigned char *str, sbs_context_t *context)
&& !isspace(context->buffer[sbs_len - 1]) // not a space char at the end of existing buf
)
{
strcat(context->buffer, " ");
// Capacity is guaranteed by sbs_append_string before calling this function
strncat((char *)context->buffer, " ", context->capacity - sbs_len - 1);
sbs_len++;
}
strcat(context->buffer, str);
strncat((char *)context->buffer, (char *)str, context->capacity - sbs_len - 1);
}
/**
@@ -539,13 +553,16 @@ struct cc_subtitle *sbs_append_string(unsigned char *str, const LLONG time_from,
: new_capacity;
}
context->buffer = (unsigned char *)realloc(
unsigned char *tmp = (unsigned char *)realloc(
context->buffer,
new_capacity * sizeof(/*unsigned char*/ context->buffer[0]));
if (!context->buffer)
if (!tmp)
{
free(context->buffer);
fatal(EXIT_NOT_ENOUGH_MEMORY, "In sbs_append_string: Not enough memory to append buffer");
}
context->buffer = tmp;
context->capacity = new_capacity;
LOG_DEBUG("SBS: sbs_append_string: REALLOC BUF DONE:\n\
@@ -599,6 +616,10 @@ struct cc_subtitle *sbs_append_string(unsigned char *str, const LLONG time_from,
{
// it is new sentence!
tmpsub = malloc(sizeof(struct cc_subtitle));
if (!tmpsub)
{
fatal(EXIT_NOT_ENOUGH_MEMORY, "In sbs_append_string: Out of memory allocating cc_subtitle.");
}
tmpsub->type = CC_TEXT;
// length of new string:

View File

@@ -67,17 +67,21 @@ struct spupng_t *spunpg_init(struct ccx_s_write *out)
ccx_common_logging.fatal_ftn(CCX_COMMON_EXIT_FILE_CREATION_FAILED, "Cannot open %s: %s\n",
out->filename, strerror(errno));
}
size_t filename_len = strlen(out->filename);
sp->dirname = (char *)malloc(
sizeof(char) * (strlen(out->filename) + 3));
sizeof(char) * (filename_len + 3));
if (NULL == sp->dirname)
ccx_common_logging.fatal_ftn(EXIT_NOT_ENOUGH_MEMORY, "spunpg_init: Memory allocation failed (sp->dirname)");
strcpy(sp->dirname, out->filename);
memcpy(sp->dirname, out->filename, filename_len + 1);
char *p = strrchr(sp->dirname, '.');
if (NULL == p)
p = sp->dirname + strlen(sp->dirname);
*p = '\0';
strcat(sp->dirname, ".d");
// Buffer size is filename_len + 3, current length is at most filename_len, appending ".d" (2 chars)
size_t current_len = strlen(sp->dirname);
size_t remaining = filename_len + 3 - current_len;
strncat(sp->dirname, ".d", remaining - 1);
if (0 != mkdir(sp->dirname, 0777))
{
if (errno != EEXIST)
@@ -90,22 +94,23 @@ struct spupng_t *spunpg_init(struct ccx_s_write *out)
}
// enough to append /subNNNN.png
sp->pngfile = (char *)malloc(sizeof(char) * (strlen(sp->dirname) + 13));
sp->relative_path_png = (char *)malloc(sizeof(char) * (strlen(sp->dirname) + 13));
size_t pngfile_size = strlen(sp->dirname) + 13;
sp->pngfile = (char *)malloc(sizeof(char) * pngfile_size);
sp->relative_path_png = (char *)malloc(sizeof(char) * pngfile_size);
if (NULL == sp->pngfile || NULL == sp->relative_path_png)
ccx_common_logging.fatal_ftn(EXIT_NOT_ENOUGH_MEMORY, "spunpg_init: Memory allocation failed (sp->pngfile)");
sp->fileIndex = 0;
sprintf(sp->pngfile, "%s/sub%04d.png", sp->dirname, sp->fileIndex);
snprintf(sp->pngfile, pngfile_size, "%s/sub%04d.png", sp->dirname, sp->fileIndex);
// Make relative path
char *last_slash = strrchr(sp->dirname, '/');
if (last_slash == NULL)
last_slash = strrchr(sp->dirname, '\\');
if (last_slash != NULL)
sprintf(sp->relative_path_png, "%s/sub%04d.png", last_slash + 1, sp->fileIndex);
snprintf(sp->relative_path_png, pngfile_size, "%s/sub%04d.png", last_slash + 1, sp->fileIndex);
else // do NOT do sp->relative_path_png = sp->pngfile (to avoid double free).
strcpy(sp->relative_path_png, sp->pngfile);
memcpy(sp->relative_path_png, sp->pngfile, strlen(sp->pngfile) + 1);
// For NTSC closed captions and 720x480 DVD subtitle resolution:
// Each character is 16x26.
@@ -186,7 +191,38 @@ void write_sputag_close(struct spupng_t *sp)
}
void write_spucomment(struct spupng_t *sp, const char *str)
{
fprintf(sp->fpxml, "<!--\n%s\n-->\n", str);
fprintf(sp->fpxml, "<!--\n");
const char *p = str;
const char *last_safe_pos = str; // Track the last safe position to flush
while (*p)
{
if (*p == '-' && *(p + 1) == '-')
{
if (p > last_safe_pos)
{
fwrite(last_safe_pos, 1, p - last_safe_pos, sp->fpxml);
}
fputc('-', sp->fpxml);
p += 2;
last_safe_pos = p;
}
else
{
p++;
}
}
if (p > last_safe_pos)
{
fwrite(last_safe_pos, 1, p - last_safe_pos, sp->fpxml);
}
fprintf(sp->fpxml, "\n-->\n");
}
char *get_spupng_filename(void *ctx)
@@ -197,16 +233,17 @@ char *get_spupng_filename(void *ctx)
void inc_spupng_fileindex(struct spupng_t *sp)
{
sp->fileIndex++;
sprintf(sp->pngfile, "%s/sub%04d.png", sp->dirname, sp->fileIndex);
size_t pngfile_size = strlen(sp->dirname) + 13;
snprintf(sp->pngfile, pngfile_size, "%s/sub%04d.png", sp->dirname, sp->fileIndex);
// Make relative path
char *last_slash = strrchr(sp->dirname, '/');
if (last_slash == NULL)
last_slash = strrchr(sp->dirname, '\\');
if (last_slash != NULL)
sprintf(sp->relative_path_png, "%s/sub%04d.png", last_slash + 1, sp->fileIndex);
snprintf(sp->relative_path_png, pngfile_size, "%s/sub%04d.png", last_slash + 1, sp->fileIndex);
else // do NOT do sp->relative_path_png = sp->pngfile (to avoid double free).
strcpy(sp->relative_path_png, sp->pngfile);
memcpy(sp->relative_path_png, sp->pngfile, strlen(sp->pngfile) + 1);
}
void set_spupng_offset(void *ctx, int x, int y)
{
@@ -214,6 +251,9 @@ void set_spupng_offset(void *ctx, int x, int y)
sp->xOffset = x;
sp->yOffset = y;
}
// Forward declaration for calculate_spupng_offsets
static void calculate_spupng_offsets(struct spupng_t *sp, struct encoder_ctx *ctx);
int save_spupng(const char *filename, uint8_t *bitmap, int w, int h,
png_color *palette, png_byte *alpha, int nb_color)
{
@@ -347,7 +387,7 @@ int write_cc_bitmap_as_spupng(struct cc_subtitle *sub, struct encoder_ctx *conte
struct cc_bitmap *rect;
png_color *palette = NULL;
png_byte *alpha = NULL;
int wrote_opentag = 1;
int wrote_opentag = 0; // Track if we actually wrote the tag
x_pos = -1;
y_pos = -1;
@@ -358,13 +398,11 @@ int write_cc_bitmap_as_spupng(struct cc_subtitle *sub, struct encoder_ctx *conte
return 0;
inc_spupng_fileindex(sp);
write_sputag_open(sp, sub->start_time, sub->end_time - 1);
if (sub->nb_data == 0 && (sub->flags & SUB_EOD_MARKER))
{
context->prev_start = -1;
if (wrote_opentag)
write_sputag_close(sp);
// No subtitle data, skip writing
return 0;
}
rect = sub->data;
@@ -403,10 +441,20 @@ int write_cc_bitmap_as_spupng(struct cc_subtitle *sub, struct encoder_ctx *conte
}
}
filename = get_spupng_filename(sp);
set_spupng_offset(sp, x_pos, y_pos);
// Set image dimensions for offset calculation
sp->img_w = width;
sp->img_h = height;
// Calculate centered offsets based on screen size (PAL/NTSC)
calculate_spupng_offsets(sp, context);
if (sub->flags & SUB_EOD_MARKER)
context->prev_start = sub->start_time;
pbuf = (uint8_t *)malloc(width * height);
if (!pbuf)
{
fatal(EXIT_NOT_ENOUGH_MEMORY, "In write_cc_bitmap_as_spupng: Out of memory allocating pbuf.");
}
memset(pbuf, 0x0, width * height);
for (i = 0; i < sub->nb_data; i++)
@@ -434,6 +482,15 @@ int write_cc_bitmap_as_spupng(struct cc_subtitle *sub, struct encoder_ctx *conte
/* TODO do rectangle wise, one color table should not be used for all rectangles */
mapclut_paletee(palette, alpha, (uint32_t *)rect[0].data1, rect[0].nb_colors);
// Save PNG file first
save_spupng(filename, pbuf, width, height, palette, alpha, rect[0].nb_colors);
freep(&pbuf);
// Write XML tag with calculated centered offsets
write_sputag_open(sp, sub->start_time, sub->end_time - 1);
wrote_opentag = 1; // Mark that we wrote the tag
#ifdef ENABLE_OCR
if (!context->nospupngocr)
{
@@ -446,8 +503,6 @@ int write_cc_bitmap_as_spupng(struct cc_subtitle *sub, struct encoder_ctx *conte
}
}
#endif
save_spupng(filename, pbuf, width, height, palette, alpha, rect[0].nb_colors);
freep(&pbuf);
end:
if (wrote_opentag)
@@ -504,6 +559,12 @@ int write_image(struct pixel_t *buffer, FILE *fp, int width, int height)
// Allocate memory for one row (4 bytes per pixel - RGBA)
row = (png_bytep)malloc(4 * width * sizeof(png_byte));
if (!row)
{
mprint("\nFailed to allocate memory for row in write_image.\n");
ret_code = 0;
goto finalise;
}
// Write image data
int x, y;
@@ -523,10 +584,8 @@ int write_image(struct pixel_t *buffer, FILE *fp, int width, int height)
png_write_end(png_ptr, NULL);
finalise:
if (info_ptr != NULL)
png_free_data(png_ptr, info_ptr, PNG_FREE_ALL, -1);
if (png_ptr != NULL)
png_destroy_write_struct(&png_ptr, (png_infopp)NULL);
png_destroy_write_struct(&png_ptr, &info_ptr);
if (row != NULL)
free(row);
@@ -574,6 +633,10 @@ void center_justify(struct pixel_t *target, int target_w,
// Note that the copy is smaller than the line:
// its width is set to just enough to contain the valid area
struct pixel_t *temp_buffer = malloc(valid_w * valid_h * sizeof(struct pixel_t));
if (!temp_buffer)
{
fatal(EXIT_NOT_ENOUGH_MEMORY, "In center_justify: Out of memory allocating temp_buffer.");
}
// x,y here is input-based (i.e. `buffer`)
for (int x = 0; x < valid_w; ++x)
@@ -677,6 +740,10 @@ uint32_t *utf8_to_utf32(char *src)
len_dst = (len_src + 2) * 4; // one for FEFF and one for \0
uint32_t *string_utf32 = (uint32_t *)calloc(len_dst, 1);
if (!string_utf32)
{
fatal(EXIT_NOT_ENOUGH_MEMORY, "In utf8_to_utf32: Out of memory allocating string_utf32.");
}
size_t inbufbytesleft = len_src;
size_t outbufbytesleft = len_dst;
char *inbuf = src;
@@ -731,8 +798,8 @@ int spupng_export_string2png(struct spupng_t *sp, char *str, FILE *output)
struct pixel_t *buffer = malloc(canvas_width * canvas_height * sizeof(struct pixel_t));
if (buffer == NULL)
{
mprint("\nFailed to alloc memory for buffer. Need %d bytes.\n",
canvas_width * canvas_height * sizeof(struct pixel_t));
fatal(EXIT_NOT_ENOUGH_MEMORY, "In spupng_export_string2png: Out of memory allocating buffer. Need %lu bytes.",
(unsigned long)(canvas_width * canvas_height * sizeof(struct pixel_t)));
}
memset(buffer, 0, canvas_width * canvas_height * sizeof(struct pixel_t));
@@ -740,7 +807,10 @@ int spupng_export_string2png(struct spupng_t *sp, char *str, FILE *output)
char *tmp = strdup(str);
if (!tmp)
{
free(buffer);
return -1;
}
char *token = strtok(tmp, "<>");
@@ -849,9 +919,11 @@ int spupng_export_string2png(struct spupng_t *sp, char *str, FILE *output)
struct pixel_t *new_buffer = realloc(buffer, canvas_width * canvas_height * sizeof(struct pixel_t));
if (new_buffer == NULL)
{
mprint("\nFailed to alloc memory for buffer. Need %d bytes.\n",
canvas_width * canvas_height * sizeof(struct pixel_t));
return 0;
free(buffer);
free(tmp);
free(string_utf32);
fatal(EXIT_NOT_ENOUGH_MEMORY, "In spupng_export_string2png: Out of memory expanding buffer. Need %lu bytes.",
(unsigned long)(canvas_width * canvas_height * sizeof(struct pixel_t)));
}
memset(new_buffer + old_height * canvas_width, 0, (canvas_height - old_height) * canvas_width * sizeof(struct pixel_t));
buffer = new_buffer;
@@ -933,6 +1005,8 @@ int spupng_export_string2png(struct spupng_t *sp, char *str, FILE *output)
*/
// Save image
sp->img_w = canvas_width;
sp->img_h = canvas_height;
write_image(buffer, output, canvas_width, canvas_height);
free(tmp);
free(buffer);
@@ -944,11 +1018,13 @@ int spupng_export_string2png(struct spupng_t *sp, char *str, FILE *output)
// Convert EIA608 Data(buffer) to string
// out must have at least 256 characters' space
// Return value is the length of the output string
#define EIA608_OUT_SIZE 256
int eia608_to_str(struct encoder_ctx *context, struct eia608_screen *data, char *out)
{
int str_len = 0;
int first = 1;
out[0] = '\0'; // Initialize output buffer
for (int row = 0; row < ROWS; row++)
{
if (data->row_used[row])
@@ -997,11 +1073,22 @@ int eia608_to_str(struct encoder_ctx *context, struct eia608_screen *data, char
if (!first)
{ // Add '\n' if it is not the first line.
strcat(out, "\n");
str_len += 2;
if (str_len + 1 < EIA608_OUT_SIZE)
{
out[str_len++] = '\n';
out[str_len] = '\0';
}
str_len++; // Count the newline even if truncated (for return value)
}
first = 0;
strncat(out, start, len);
// Copy at most what fits in remaining buffer
size_t remaining = (str_len < EIA608_OUT_SIZE - 1) ? (EIA608_OUT_SIZE - 1 - str_len) : 0;
size_t to_copy = (len < remaining) ? len : remaining;
if (to_copy > 0)
{
memcpy(out + str_len, start, to_copy);
out[str_len + to_copy] = '\0';
}
str_len += len;
}
}
@@ -1010,6 +1097,28 @@ int eia608_to_str(struct encoder_ctx *context, struct eia608_screen *data, char
// string needs to be in UTF-8 encoding.
// This function will take care of encoding.
static void calculate_spupng_offsets(struct spupng_t *sp, struct encoder_ctx *ctx)
{
int screen_w = 720;
int screen_h;
/* Teletext is always PAL */
if (ctx->in_fileformat == 2 || ctx->is_pal)
{
screen_h = 576;
}
else
{
screen_h = 480;
}
sp->xOffset = (screen_w - sp->img_w) / 2;
sp->yOffset = (screen_h - sp->img_h) / 2;
// SPU / DVD requires even yOffset (interlacing)
if (sp->yOffset & 1)
sp->yOffset++;
}
int spupng_write_string(struct spupng_t *sp, char *string, LLONG start_time, LLONG end_time,
struct encoder_ctx *context)
{
@@ -1028,6 +1137,7 @@ int spupng_write_string(struct spupng_t *sp, char *string, LLONG start_time, LLO
}
// free(string_utf32);
fclose(sp->fppng);
calculate_spupng_offsets(sp, context);
write_sputag_open(sp, start_time, end_time);
write_spucomment(sp, string);
write_sputag_close(sp);

View File

@@ -6,9 +6,10 @@
#include "ocr.h"
#include "ccextractor.h"
/* The timing here is not PTS based, but output based, i.e. user delay must be accounted for
if there is any */
int write_stringz_as_srt(char *string, struct encoder_ctx *context, LLONG ms_start, LLONG ms_end)
/* Helper function to write SRT to a specific output file (issue #665 - teletext multi-page)
Takes output file descriptor and counter pointer as parameters */
static int write_stringz_as_srt_to_output(char *string, struct encoder_ctx *context, LLONG ms_start, LLONG ms_end,
int out_fh, unsigned int *srt_counter)
{
int used;
unsigned h1, m1, s1, ms1;
@@ -20,22 +21,27 @@ int write_stringz_as_srt(char *string, struct encoder_ctx *context, LLONG ms_sta
millis_to_time(ms_start, &h1, &m1, &s1, &ms1);
millis_to_time(ms_end - 1, &h2, &m2, &s2, &ms2); // -1 To prevent overlapping with next line.
context->srt_counter++;
sprintf(timeline, "%u%s", context->srt_counter, context->encoded_crlf);
(*srt_counter)++;
snprintf(timeline, sizeof(timeline), "%u%s", *srt_counter, context->encoded_crlf);
used = encode_line(context, context->buffer, (unsigned char *)timeline);
write_wrapped(context->out->fh, context->buffer, used);
sprintf(timeline, "%02u:%02u:%02u,%03u --> %02u:%02u:%02u,%03u%s",
h1, m1, s1, ms1, h2, m2, s2, ms2, context->encoded_crlf);
write_wrapped(out_fh, context->buffer, used);
snprintf(timeline, sizeof(timeline), "%02u:%02u:%02u,%03u --> %02u:%02u:%02u,%03u%s",
h1, m1, s1, ms1, h2, m2, s2, ms2, context->encoded_crlf);
used = encode_line(context, context->buffer, (unsigned char *)timeline);
dbg_print(CCX_DMT_DECODER_608, "\n- - - SRT caption - - -\n");
dbg_print(CCX_DMT_DECODER_608, "%s", timeline);
write_wrapped(context->out->fh, context->buffer, used);
write_wrapped(out_fh, context->buffer, used);
int len = strlen(string);
unsigned char *unescaped = (unsigned char *)malloc(len + 1);
if (!unescaped)
fatal(EXIT_NOT_ENOUGH_MEMORY, "In write_stringz_as_srt() - not enough memory for unescaped buffer.\n");
unsigned char *el = (unsigned char *)malloc(len * 3 + 1); // Be generous
if (el == NULL || unescaped == NULL)
fatal(EXIT_NOT_ENOUGH_MEMORY, "In write_stringz_as_srt() - not enough memory.\n");
if (!el)
{
free(unescaped);
fatal(EXIT_NOT_ENOUGH_MEMORY, "In write_stringz_as_srt() - not enough memory for el buffer.\n");
}
int pos_r = 0;
int pos_w = 0;
// Scan for \n in the string and replace it with a 0
@@ -64,20 +70,28 @@ int write_stringz_as_srt(char *string, struct encoder_ctx *context, LLONG ms_sta
dbg_print(CCX_DMT_DECODER_608, "\r");
dbg_print(CCX_DMT_DECODER_608, "%s\n", context->subline);
}
write_wrapped(context->out->fh, el, u);
write_wrapped(context->out->fh, context->encoded_crlf, context->encoded_crlf_length);
write_wrapped(out_fh, el, u);
write_wrapped(out_fh, context->encoded_crlf, context->encoded_crlf_length);
begin += strlen((const char *)begin) + 1;
}
dbg_print(CCX_DMT_DECODER_608, "- - - - - - - - - - - -\r\n");
write_wrapped(context->out->fh, context->encoded_crlf, context->encoded_crlf_length);
write_wrapped(out_fh, context->encoded_crlf, context->encoded_crlf_length);
free(el);
free(unescaped);
return 0;
}
/* The timing here is not PTS based, but output based, i.e. user delay must be accounted for
if there is any */
int write_stringz_as_srt(char *string, struct encoder_ctx *context, LLONG ms_start, LLONG ms_end)
{
return write_stringz_as_srt_to_output(string, context, ms_start, ms_end,
context->out->fh, &context->srt_counter);
}
int write_cc_bitmap_as_srt(struct cc_subtitle *sub, struct encoder_ctx *context)
{
int ret = 0;
@@ -112,11 +126,11 @@ int write_cc_bitmap_as_srt(struct cc_subtitle *sub, struct encoder_ctx *context)
millis_to_time(sub->start_time, &h1, &m1, &s1, &ms1);
millis_to_time(sub->end_time - 1, &h2, &m2, &s2, &ms2); // -1 To prevent overlapping with next line.
context->srt_counter++;
sprintf(timeline, "%u%s", context->srt_counter, context->encoded_crlf);
snprintf(timeline, sizeof(timeline), "%u%s", context->srt_counter, context->encoded_crlf);
used = encode_line(context, context->buffer, (unsigned char *)timeline);
write_wrapped(context->out->fh, context->buffer, used);
sprintf(timeline, "%02u:%02u:%02u,%03u --> %02u:%02u:%02u,%03u%s",
h1, m1, s1, ms1, h2, m2, s2, ms2, context->encoded_crlf);
snprintf(timeline, sizeof(timeline), "%02u:%02u:%02u,%03u --> %02u:%02u:%02u,%03u%s",
h1, m1, s1, ms1, h2, m2, s2, ms2, context->encoded_crlf);
used = encode_line(context, context->buffer, (unsigned char *)timeline);
write_wrapped(context->out->fh, context->buffer, used);
len = strlen(str);
@@ -150,7 +164,18 @@ int write_cc_subtitle_as_srt(struct cc_subtitle *sub, struct encoder_ctx *contex
{
if (sub->type == CC_TEXT)
{
ret = write_stringz_as_srt(sub->data, context, sub->start_time, sub->end_time);
// For teletext multi-page extraction (issue #665), use page-specific output
struct ccx_s_write *out = get_teletext_output(context, sub->teletext_page);
unsigned int *counter = get_teletext_srt_counter(context, sub->teletext_page);
if (out && counter)
{
ret = write_stringz_as_srt_to_output(sub->data, context, sub->start_time, sub->end_time,
out->fh, counter);
}
else
{
ret = write_stringz_as_srt(sub->data, context, sub->start_time, sub->end_time);
}
freep(&sub->data);
sub->nb_data = 0;
ret = 1;
@@ -196,12 +221,12 @@ int write_cc_buffer_as_srt(struct eia608_screen *data, struct encoder_ctx *conte
char timeline[128];
++context->srt_counter;
sprintf(timeline, "%u%s", context->srt_counter, context->encoded_crlf);
snprintf(timeline, sizeof(timeline), "%u%s", context->srt_counter, context->encoded_crlf);
used = encode_line(context, context->buffer, (unsigned char *)timeline);
write_wrapped(context->out->fh, context->buffer, used);
sprintf(timeline, "%02u:%02u:%02u,%03u --> %02u:%02u:%02u,%03u%s",
h1, m1, s1, ms1, h2, m2, s2, ms2, context->encoded_crlf);
snprintf(timeline, sizeof(timeline), "%02u:%02u:%02u,%03u --> %02u:%02u:%02u,%03u%s",
h1, m1, s1, ms1, h2, m2, s2, ms2, context->encoded_crlf);
used = encode_line(context, context->buffer, (unsigned char *)timeline);
write_wrapped(context->out->fh, context->buffer, used);

Some files were not shown because too many files have changed in this diff Show More