Fix SEI payload type handling: changes payload_type and payload_size from i32 to u32 for type safety, keeping as usize casts only where needed for indexing.
Use if-let patterns instead of is_some() + unwrap() to satisfy
the stricter clippy::unnecessary_unwrap lint in Rust 1.93.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- parse_PAT: Add bounds check for payload_length >= 8 before accessing
header fields (fixes#2053)
- parse_PMT: Add ES_info_length validation and 2-byte minimum check
before reading descriptor_tag and desc_len in PRIVATE_USER_MPEG2
and teletext parsing loops (fixes#2054)
- processmp4: Add NULL check for file parameter before passing to
mprint (fixes#2055)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The test_mkvlang_sets_mkv_language test was comparing against
Language::Eng, but the mkvlang field type was changed to MkvLangFilter
when BCP 47 language tag support was added in PR #2038.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add libavdevice, libswresample, and libavfilter dependencies for
the hardsubx variant on both Ubuntu 24.04 and Debian 13 workflows.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
CCExtractor is linked against libcurl-gnutls which requires this
runtime dependency on Ubuntu 24.04.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
apt install automatically resolves and installs dependencies,
unlike dpkg -i which fails if dependencies are missing.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Use ubuntu-24.04 runner instead of ubuntu-22.04
- Update dependencies to match Ubuntu 24.04 library versions
(libtesseract5, libleptonica6, libavcodec60, etc.)
- Update GPAC cache key for new Ubuntu version
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Update CMakeLists.txt version from 0.89 to 0.96 to match lib_ccx.h
- Extract version from lib_ccx.h instead of CMakeLists.txt for accuracy
- Add missing runtime dependencies: libtesseract, libleptonica
- Add FFmpeg dependencies for hardsubx variant
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add GitHub Actions workflow to build Debian packages (.deb) for Linux.
Features:
- Builds GPAC from source (abi-16.4 tag) since libgpac-dev is not
available in newer Debian/Ubuntu releases
- Creates two variants: basic (with OCR) and hardsubx (with FFmpeg)
- Bundles GPAC library with the package using patchelf for rpath
- Includes proper Debian package structure with control, postinst, postrm
- Runs on releases, manual trigger, or workflow file changes
- Uploads packages as artifacts and attaches to releases
This provides an unofficial .deb package for users who prefer that
format over AppImage or snap.
Relates to #1610
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Build workflows were not triggering on CMakeLists.txt changes.
Added **CMakeLists.txt and **.cmake patterns to path filters for:
- build_linux.yml
- build_mac.yml
- build_windows.yml
- build_docker.yml
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Two fixes for static library linking:
1. Preserve CMAKE_C_FLAGS in lib_ccx/CMakeLists.txt instead of
overwriting them. This allows passing include paths via
-DCMAKE_C_FLAGS which is needed for some build configurations.
2. Add target_link_options with --undefined flags for C functions
called from Rust (decode_vbi, do_cb, store_hdcc). With static
libraries, the linker processes them in order and only pulls
symbols that are currently unresolved. Since ccx is processed
before ccx_rust, these symbols weren't being pulled in.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Replace poorly-named tests (options_1 through options_51, broken_1, etc.)
with 201 descriptively-named tests organized by category:
- Input/output format tests
- Encoding tests
- Stream/program selection tests
- CEA-708 service tests
- Codec selection tests
- Timing option tests
- Debug flag tests
- Teletext option tests
- XMLTV option tests
- Credits option tests
- Buffering option tests
- And more
Each test name now clearly indicates what CLI option is being tested
and what behavior is expected, e.g.:
- test_input_ts_sets_transport_stream_mode
- test_608_enables_decoder_608_debug
- test_service_enables_708_with_single_service
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The --mkvlang option previously only supported single ISO 639-2 codes
due to using a Language enum with a fixed list of variants. Extended
codes (like "fre-ca") and multiple codes (like "eng,chi") would panic.
This change introduces MkvLangFilter, a proper type for language
filtering that:
- Validates language codes per BCP 47 specification
- Supports ISO 639-2 (3-letter codes like "eng")
- Supports BCP 47 tags (like "en-US", "zh-Hans-CN")
- Supports comma-separated multiple codes
- Provides clean error messages for invalid input
- Includes comprehensive unit tests
The C code continues to receive the raw string for strstr() matching,
maintaining backward compatibility.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The Rust CLI parser was showing "CCExtractor 1.0" instead of the
actual version (0.96.5). This was a placeholder value from when
the parser was first ported to Rust in August 2024 that was never
updated.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Root cause: When FTS timestamps were invalid due to PTS discontinuities,
the code fell back to DVB page timeout (65 seconds) as subtitle duration.
This caused impossible 65-second subtitle durations in split output.
Fix: Added DVB_MAX_SUBTITLE_DURATION_MS constant (10s) and simplified the
duration capping logic to always enforce reasonable subtitle durations.
Tested with: multiprogram_spain.ts, BBC1.ts, BBC2.ts - all outputs now
have properly capped durations with no timestamps exceeding 10 seconds.
When using -out=report mode, the encoder context (enc_ctx) is NULL
because no output file needs to be created. The Rust FFI function
ccxr_process_avc was dereferencing this NULL pointer, causing a
segmentation fault.
Add NULL pointer checks at the FFI boundary to skip AVC processing
when enc_ctx is NULL. This is safe because report mode only needs
stream analysis, not caption extraction.
Fixes#2023
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Remove redundant free() after free_subtitle() in pipeline cleanup
(free_subtitle already frees the struct via freep(&sub))
- Add ctx->prev = NULL after free_encoder_context in dinit_encoder
- Keep free_encoder_context non-recursive for prev (dinit_encoder owns it)
- Remove debug output from general_loop.c
- All deduplication infrastructure implemented and tested
- Test script validates code paths execute correctly
- Dedup ring buffer integrated into all DVB subtitle processing
- Full validation requires OCR build (-DWITH_OCR=ON)
- Code review confirms all 8 stories are complete
- Created dvb_dedup_test.sh to test DVB-001 through DVB-008
- Tests multilingual split, single stream, non-DVB files
- Tests --no-dvb-dedup flag functionality
- Checks for excessive duplication in output
- Note: Requires OCR (Tesseract) for full validation
- Without OCR, files are empty but dedup logic still executes
- Added no_dvb_dedup field to ccx_s_options structure
- Initialized to 0 (deduplication enabled by default)
- Added --no-dvb-dedup CLI flag in Rust args parser
- Added flag to Options struct in lib_ccxr
- Wired flag through Rust-to-C FFI boundary in common.rs
- Modified dvbsub_handle_display_segment to respect flag
- Dedup logic only runs when no_dvb_dedup is false (default)
- Added help text describing flag purpose
- Created dvb_dedup.h with dedup_entry and dedup_ring structures
- Implemented dvb_dedup.c with init, is_duplicate, and add functions
- Integrated dedup_ring into DVBSubContext structure
- Added deduplication check in dvbsub_handle_display_segment
- Dedup uses PTS + PID + composition_id + ancillary_id as unique key
- 8-slot ring buffer to track recently emitted subtitles
- Prevents duplicate subtitles from propagating to output files
- Clear enc_ctx->prev->last_str after encode_sub() in dvb_subtitle_decoder.c
- This prevents OCR-recognized text from leaking into subsequent subtitles
- Tested: All subtitle output shows unique text with zero duplicates
The previous fix (#1996) prevented a panic when the buffer was too small
to verify if a "moov" box contains "mvhd", but it incorrectly accepted
the box without verification.
The original intent was: "moov without mvhd is invalid, skip it."
This fix maintains that intent:
- If buffer too small to verify mvhd → skip the box
- If moov has mvhd → accept (valid)
- If moov lacks mvhd → skip (invalid)
This is safe for format detection since:
1. The probe reads up to 1MB of start bytes
2. The scoring system requires multiple valid boxes
3. Skipping an unverifiable box is safer than accepting it
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Replace magic number 49997 with `50000 - 3` and add a comment explaining:
- Why we subtract 3 (the loop accesses i+3, so we stop 3 bytes early)
- Why we cap at 50000 (don't scan huge buffers entirely)
- Why we use saturating_sub (handle tiny buffers safely)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Delete the unused `impl FromCType<*mut PMT_entry> for *mut PMTEntry`
implementation which had a critical bug: it returned a pointer to a
stack-allocated PMTEntry, causing undefined behavior (dangling pointer).
This code was never called anywhere in the codebase. The actual usage
in demuxer.rs uses the value-returning variant `FromCType<PMT_entry>
for PMTEntry` with explicit `Box::into_raw(Box::new(...))` wrapping,
which is the correct pattern.
Rather than fixing dead buggy code, just remove it.
Supersedes #1988🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Multi-program transport stream files can have different PCR (Program
Clock Reference) bases for each program. For example, one program might
have timestamps starting at 23 hours, another at 25 hours. This caused
the progress time display to show wildly incorrect values like "265:45"
for a 6-second file.
The fix tracks the minimum timestamp offset seen across all programs and
uses that as the baseline. When timestamps from programs with higher PCR
bases are encountered (offset > 60 seconds from minimum), the display
falls back to showing time relative to the minimum baseline.
Changes:
- Add min_global_timestamp_offset field to lib_ccx_ctx to track the
minimum PCR-based offset seen
- Update progress display logic in general_loop.c to normalize times
relative to the minimum offset
- Apply same fix to both live stream and file processing modes
Test results with multi-program DVB teletext sample (dvbt.ts):
- Before: 1% | 265:45, 2% | 00:00, 3% | 263:11, ... (jumping wildly)
- After: 1% | 00:00, 2% | 00:00, ... 87% | 00:05, 100% | 00:00 (stable)
Single-program files continue to work correctly.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Removes a debug println statement in the Rust timestamp conversion code
that was printing the hours value when it exceeded 24. This caused
spurious numbers (like "25") to appear in the output when processing
files with PTS timestamps that exceeded 24 hours.
The debug code was likely left over from development/debugging and
should not be present in production code.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add NULL check for `region` before accessing `region->bgcolor` in
the OCR processing block of `write_dvb_sub()`.
The bug occurs when processing DVB subtitles where `get_region()`
returns NULL for all display items in the list. After the display
processing loop, `region` may be NULL, but the code attempted to
access `region->bgcolor` unconditionally, causing a segfault.
The crash manifested as:
- Valgrind: "Invalid read of size 4 at address 0x18"
- The 0x18 offset corresponds to the `bgcolor` field in DVBSubRegion
Testing with bbc_small.ts:
- Before: SIGSEGV crash at 0% processing
- After: 100% processing, 50+ subtitles extracted successfully
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
When parsing truncated MKV files, the Matroska parser would enter an
infinite loop. This happened because:
1. At EOF, fgetc() returns -1 which becomes 0xFF when cast to UBYTE
2. Reading 4 EOF bytes creates element code 0xFFFFFFFF (unknown element)
3. The "skip unknown element" logic reads another 0xFF as vint length (127)
4. FSEEK past EOF clears the EOF flag without error
5. The while loop condition (pos + len > get_current_byte) never becomes
false because the recorded segment length is larger than the file
The fix adds feof() checks after each mkv_read_byte() call in all
parsing loops. This detects EOF immediately after reading and breaks
out of the loop cleanly.
Tested with truncated MKV samples (ticket1398-orig.mkv, azumi.mkv)
that previously caused timeouts - now complete in under a second.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- telxcc.c: Use array_length macro for G0_LATIN_NATIONAL_SUBSETS
bounds check instead of hardcoded value. Prevents potential
access to uninitialized memory when index equals array size.
- misc.h: Fix UTF-8 encoding of author name (Iñaki García Etxebarria)
When using --output-field both (formerly -12), CCExtractor creates
separate output files for each field. If one field has no captions,
a 0-byte file was left behind, which is confusing for users.
This fix checks the file size in dinit_write() before closing.
If the file is empty (0 bytes), it deletes the file and prints
an informational message.
This is a simpler approach than deferred file creation - files are
still created at initialization but cleaned up if they remain empty.
Fixes#1282🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Automatically creates a PR to homebrew-core when a new release
is published, updating the ccextractor formula to the new version.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Update version number across all packaging and build files for the
0.96.5 release.
Files updated:
- docs/CHANGES.TXT - Added changelog entry
- src/lib_ccx/lib_ccx.h - VERSION define
- linux/configure.ac - AC_INIT version
- mac/configure.ac - AC_INIT version
- OpenBSD/Makefile - V variable
- package_creators/PKGBUILD - pkgver
- package_creators/ccextractor.spec - Version
- package_creators/debian.sh - VERSION
- packaging/chocolatey/ccextractor.nuspec - version
- packaging/chocolatey/tools/chocolateyInstall.ps1 - URL
- packaging/winget/*.yaml - PackageVersion and URLs
Note: SHA256 checksums in chocolatey and winget files will need to be
updated after the MSI is built.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Extend EPG time string buffers from 21 to 74 bytes to silence
compiler warnings about potential buffer truncation.
The actual output is always 20 chars ("YYYYMMDDHHMMSS +0000") plus
null terminator, but the compiler warns because %02d with int
arguments could theoretically produce larger output.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Apply cargo fmt to decoder/mod.rs
- Fix clippy manual_flatten warning in build.rs by using .flatten()
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Root cause: CCX_RAW_TYPE data from MXF demuxer was not being passed to
the DTVCC decoder, only to the legacy 608 decoder via process_raw_with_field.
Changes:
- general_loop.c: Changed CCX_RAW_TYPE handling to use process_cc_data
instead of process_raw_with_field to properly invoke DTVCC decoder
- general_loop.c: Added DTVCC activation for MXF/GXF sources since they
may contain 708 captions
- general_loop.c: Initialize timing from caption PTS when not set
- ccx_dtvcc.h: Added ccxr_dtvcc_set_active FFI declaration
- lib.rs: Added ccxr_dtvcc_set_active function to enable DTVCC decoder
- decoder/mod.rs: Fixed flush logic to always process visible windows
- ccx_demuxer_mxf.c: Fixed PTS calculation to use 90kHz units based on
edit_rate, and changed verbose logging to debug()
Fixes#1647🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Adds support for processing raw CDP files captured from SDI VANC
(e.g., from Blackmagic Decklink capture cards). CDP packets are
automatically detected by their 0x9669 identifier when using -in=raw.
Changes:
- Added process_raw_cdp() function to parse concatenated CDP packets
- Added CDP format detection in raw_loop() (checks for 0x9669 header)
- Extracts cc_data triplets from CDP packets and processes them
through process_cc_data() for both CEA-608 and CEA-708 support
- Calculates timing based on CDP frame rate and packet count
Usage:
ccextractor -in=raw captured_vanc.bin -o output.srt
Fixes#1406🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Root cause: CCX_RAW_TYPE data from MXF demuxer was not being passed to
the DTVCC decoder, only to the legacy 608 decoder via process_raw_with_field.
Changes:
- general_loop.c: Changed CCX_RAW_TYPE handling to use process_cc_data
instead of process_raw_with_field to properly invoke DTVCC decoder
- general_loop.c: Added DTVCC activation for MXF/GXF sources since they
may contain 708 captions
- general_loop.c: Initialize timing from caption PTS when not set
- ccx_dtvcc.h: Added ccxr_dtvcc_set_active FFI declaration
- lib.rs: Added ccxr_dtvcc_set_active function to enable DTVCC decoder
- decoder/mod.rs: Fixed flush logic to always process visible windows
- ccx_demuxer_mxf.c: Fixed PTS calculation to use 90kHz units based on
edit_rate, and changed verbose logging to debug()
Fixes#1647🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Root cause: CCX_RAW_TYPE data from MXF demuxer was not being passed to
the DTVCC decoder, only to the legacy 608 decoder via process_raw_with_field.
Changes:
- general_loop.c: Changed CCX_RAW_TYPE handling to use process_cc_data
instead of process_raw_with_field to properly invoke DTVCC decoder
- general_loop.c: Added DTVCC activation for MXF/GXF sources since they
may contain 708 captions
- general_loop.c: Initialize timing from caption PTS when not set
- ccx_dtvcc.h: Added ccxr_dtvcc_set_active FFI declaration
- lib.rs: Added ccxr_dtvcc_set_active function to enable DTVCC decoder
- decoder/mod.rs: Fixed flush logic to always process visible windows
- ccx_demuxer_mxf.c: Fixed PTS calculation to use 90kHz units based on
edit_rate, and changed verbose logging to debug()
Fixes#1647🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
When using --input <format>, the startup output showed [Stream mode: ]
(empty) instead of showing the format name like [Stream mode: SCC].
Root cause: The Rust logger's print() function uses print!() which
doesn't automatically flush stdout. When mixing C and Rust code that
both write to stdout, the Rust output was getting buffered and not
appearing before the C code continued writing.
The fix adds explicit std::io::stdout().flush() after each print!()
call to ensure output appears immediately and interleaves correctly
with C code.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add support for `--input scc` command line option to explicitly specify
SCC (Scenarist Closed Caption) input format, for consistency with other
input format options.
Changes:
- Add `Scc` variant to `InFormat` enum in args.rs
- Handle `InFormat::Scc` in parser.rs to set StreamMode::Scc
- Add `StreamMode::Scc` case in print_cfg() in both Rust and C code
Fixes#1972🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The --quiet flag was broken due to two issues:
1. Inverted mapping in Rust FFI: The C→Rust constant mapping was wrong.
CCX_MESSAGES_QUIET=0, CCX_MESSAGES_STDOUT=1, CCX_MESSAGES_STDERR=2
but the Rust code mapped 0→Stdout, 1→Stderr, 2→Quiet.
2. Logger initialization timing: The Rust logger was initialized BEFORE
command-line arguments were parsed, so --quiet had no effect.
Changes:
- Fix the OutputTarget mapping in ccxr_init_basic_logger()
- Add set_target() method to CCExtractorLogger
- Add ccxr_update_logger_target() to update logger after arg parsing
- Call ccxr_update_logger_target() after ccxr_parse_parameters()
Fixes#1956🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The palette crate renamed `to_positive_degrees()` to `into_positive_degrees()`
in version 0.7.0. This was causing build failures on Fedora which uses
system-packaged Rust crates with newer versions.
Changes:
- Update palette dependency from 0.6.1 to 0.7
- Change method call from to_positive_degrees() to into_positive_degrees()
Fixes build failure reported in #1954.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The FFMPEG_INCLUDE_DIR environment variable was only checked inside
the macOS-specific block, so it had no effect on Linux builds.
Changes:
- Move FFMPEG_INCLUDE_DIR check outside platform-specific blocks so
it works on all platforms
- Add pkg-config fallback on Linux to automatically find FFmpeg
include paths
This fixes compilation on systems like Fedora where FFmpeg headers
are installed in non-standard locations (e.g., /usr/include/ffmpeg).
Fixes#1954🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The idr_pic_id is read to advance the bitstream position (required for
correct parsing of subsequent fields), but the value itself is not
needed for caption extraction. CCExtractor uses pic_order_cnt_lsb for
frame ordering and PTS for timing - idr_pic_id serves no purpose here.
Closes#1895🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Fix cargo build cache path: rust.bat sets CARGO_TARGET_DIR to the
windows/ directory, which results in artifacts at
windows/x86_64-pc-windows-msvc/, not windows/target/
- Remove redundant CARGO_TARGET_DIR from build steps since rust.bat
overrides it anyway
Note: vcpkg.json builtin-baseline intentionally not changed to avoid
breaking transitive dependencies (libxml2 etc.)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The Chocolatey cache only stored package metadata, not the actual
installed SDK files at C:\Program Files\GPAC\sdk\include. This caused
build failures when the cache hit but GPAC headers weren't available.
GPAC install is fast (~30s) so caching isn't worth the complexity.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Major optimizations to reduce Windows build time from ~45 min to ~10 min:
1. **Single consolidated job** - Previously two parallel jobs (Release/Debug)
duplicated the entire 34-minute vcpkg install. Now builds both
configurations sequentially in one job, sharing all cached dependencies.
2. **lukka/run-vcpkg action** - Replaces manual git clone + bootstrap with
the official vcpkg action that has built-in caching and better handling.
3. **Cache vcpkg installed packages** - Separately cache the installed/
directory with hash-based keys for faster cache hits.
4. **Cargo caching** - Add caching for Rust registry and build artifacts,
similar to the Linux build workflow.
5. **Chocolatey caching** - Cache gpac package to skip download on hits.
6. **Conditional installs** - Skip vcpkg install and choco install when
cache is available.
7. **Updated Rust toolchain action** - Replace deprecated actions-rs/toolchain
with dtolnay/rust-toolchain.
Expected improvements:
- Cold build: ~20 minutes (down from ~45 min)
- Warm build (cache hit): ~5-10 minutes
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Previously, Tesseract OCR was initialized eagerly when a DVB subtitle
stream was detected in the transport stream. This caused ~10 second
startup overhead even for files that:
- Have DVB streams but no actual bitmap subtitles
- Have DVB streams alongside CEA-608 text captions (which don't need OCR)
- Have DVB streams but the user only wants raw bitmap output
The initialization also created OpenMP worker threads that generated
hundreds of thousands of futex syscalls, causing valgrind tests to
take 15+ minutes instead of seconds.
This change defers OCR initialization until a DVB bitmap region actually
needs to be processed with OCR. Benefits:
- Files with DVB streams but no bitmap content: 10s → 0.1s
- Files with DVB + CEA-608 captions: 10s → 1-3s
- Valgrind test performance: 15+ min → seconds (no thread pool overhead
when OCR isn't used)
The ocr_initialized flag ensures init_ocr() is called only once, on
first bitmap encounter.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Map legacy CEA-608 field extraction options to their modern equivalent:
- -1 → --output-field=1 (extract field 1 only)
- -2 → --output-field=2 (extract field 2 only)
- -12 → --output-field=12 (extract both fields)
These options are documented in the help text and were commonly used
but stopped working after the Rust argument parser migration.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add aliases for options that were commonly used with single-dash
or without hyphens in older versions of ccextractor:
- --parsePAT: add alias "pat" (for -pat)
- --parsePMT: add alias "pmt" (for -pmt)
- --no-teletext: add alias "noteletext" (for -noteletext)
- --no-rollup: add alias "noru" (for -noru)
- --no-bom: add alias "nobom" (for -nobom)
- --no-autotimeref: add alias "noautotimeref" (for -noautotimeref)
- --no-scte20: add alias "noscte20" (for -noscte20)
These aliases, combined with normalize_legacy_option() which converts
single-dash to double-dash (e.g., -noteletext -> --noteletext), allow
old scripts using legacy syntax to continue working.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
When Rust CEA-708 decoder is enabled, dec_ctx.dtvcc is set to NULL
and dec_ctx.dtvcc_rust holds the actual DtvccRust context. The null
check was incorrectly checking dtvcc, causing the function to return
early and skip all CEA-708 data processing.
This fixes tests 21, 31, 32, 105, 137, 141-149 which were failing
with exit code 10 (EXIT_NO_CAPTIONS) because no captions were being
extracted from CEA-708 streams.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Added documentation for EIA_708_BUFFER_LENGTH explaining that 2048 bytes
is 16x the CEA-708 specification minimum of 128 bytes per service
- Removed debug logging of target address from target.rs as per TODO
- References CEA-708-E Section 8.4.3 for buffer specifications
Addresses two TODO items in the Rust codebase cleanup effort.
The cb_708 counter was being incremented twice for each CEA-708 data block:
1. In do_cb_dtvcc_rust() in Rust (src/rust/src/lib.rs)
2. In do_cb() in C (src/lib_ccx/ccx_decoders_common.c)
Since FTS calculation uses cb_708 (fts = fts_now + fts_global + cb_708 * 1001 / 30),
the double-increment caused timestamps to advance ~2x as fast as expected,
resulting in incorrect milliseconds in start timestamps.
This fix removes the increment from the Rust code since the C code already
handles it in do_cb().
Fixes timestamp issues reported in PR #1782 tests where start times like
00:00:20,688 were incorrectly output as 00:00:20,737.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Move PLAN_PR1618_REIMPLEMENTATION.md to local plans/ folder
- Add plans/ to .gitignore to keep plans local
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The ccxr_process_cc_data function was still accessing dec_ctx.dtvcc
(which is NULL when Rust is enabled), causing a null pointer panic.
Changed to use dec_ctx.dtvcc_rust (the persistent DtvccRust context)
instead, which fixes the crash when processing CEA-708 data.
Added do_cb_dtvcc_rust() function that works with DtvccRust instead
of the old Dtvcc struct.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Remove extra space before comment in ccx_decoders_common.c
- Fix comment indentation in mp4.c
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- init_cc_decode(): Initialize dtvcc_rust via ccxr_dtvcc_init()
- dinit_cc_decode(): Free dtvcc_rust via ccxr_dtvcc_free()
- flush_cc_decode(): Flush via ccxr_flush_active_decoders()
- general_loop.c: Set encoder via ccxr_dtvcc_set_encoder() (3 locations)
- mp4.c: Use ccxr_dtvcc_set_encoder() and ccxr_dtvcc_process_data()
- Add ccxr_dtvcc_is_active() declaration to ccx_dtvcc.h
- Fix clippy warnings in tv_screen.rs (unused assignments)
- All changes guarded with #ifndef DISABLE_RUST
- Update implementation plan to mark Phase 3 complete
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add void *dtvcc_rust field to lib_cc_decode struct
- Declare ccxr_dtvcc_init, ccxr_dtvcc_free, ccxr_dtvcc_process_data in ccx_dtvcc.h
- Declare ccxr_dtvcc_set_encoder in lib_ccx.h
- Declare ccxr_flush_active_decoders in ccx_decoders_common.h
- All declarations guarded with #ifndef DISABLE_RUST
- Update implementation plan to mark Phase 2 complete
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Remove duplicate CCX_DTVCC_MAX_SERVICES constant from decoder/mod.rs
- Import existing DTVCC_MAX_SERVICES from lib_ccxr::common
- Fix clippy uninlined_format_args warnings in avc/core.rs and decoder/mod.rs
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This is Phase 1 of the fix for issue #1499. It adds the Rust-side
infrastructure for a persistent CEA-708 decoder context without
modifying any C code, ensuring backward compatibility.
Problem:
The current Rust CEA-708 decoder creates a new Dtvcc struct on every
call to ccxr_process_cc_data(), causing all state to be reset. This
breaks stateful caption processing.
Solution:
Add a new DtvccRust struct that:
- Owns its decoder state (rather than borrowing from C)
- Persists across processing calls
- Is managed via FFI functions callable from C
Changes:
- Add DtvccRust struct in decoder/mod.rs with owned decoders
- Add CCX_DTVCC_MAX_SERVICES constant (63)
- Add FFI functions in lib.rs:
- ccxr_dtvcc_init(): Create persistent context
- ccxr_dtvcc_free(): Free context and all owned memory
- ccxr_dtvcc_set_encoder(): Set encoder (not available at init)
- ccxr_dtvcc_process_data(): Process CC data
- ccxr_flush_active_decoders(): Flush all active decoders
- ccxr_dtvcc_is_active(): Check if context is active
- Add unit tests for DtvccRust
- Use heap allocation for large structs to avoid stack overflow
The existing Dtvcc struct and ccxr_process_cc_data() remain unchanged
for backward compatibility. Phase 2-3 will add C header declarations
and modify C code to use the new functions.
Fixes: #1499 (partial)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The -sc flag was used in older versions (0.94 and earlier) for sentence
capitalization. The Rust argument parser only accepts --sentencecap now.
This adds --sc as an alias to maintain backwards compatibility with
older documentation and user scripts.
Related to #1917🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The literal `0xcdcdcdcdcdcdcdcd` is a 64-bit value used as a "poison"
pattern to detect uninitialized pointers. On 32-bit systems like
armv7l, this causes a compile error because `usize` is only 32 bits.
The fix defines a platform-appropriate constant:
- 64-bit: 0xcdcdcdcdcdcdcdcd
- 32-bit: 0xcdcdcdcd
Fixes#1938🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The code was using `std::arch::x86_64::*` unconditionally for both
x86 and x86_64 architectures. On 32-bit x86 (i686), the correct
module is `std::arch::x86`, not `std::arch::x86_64`.
This caused a build failure on i686:
error[E0432]: unresolved import `std::arch::x86_64`
The fix uses separate conditional imports:
- `std::arch::x86::*` for 32-bit x86
- `std::arch::x86_64::*` for 64-bit x86_64
Both modules provide the same SSE2 intrinsics used by find_next_zero().
Fixes#1937🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The start_credits_text and end_credits_text pointers were being copied
directly from the encoder config options, but free_encoder_context()
would later free them. This caused memory corruption when the pointers
referred to memory owned by ccx_options.
Now these strings are deep-copied in init_encoder() so each encoder
context owns its own copy, fixing the --startcreditstext regression.
Resolves conflicts while preserving Issue #447 fix for DVB multi-stream handling:
- Kept DVB metadata update logic in ts_tables.c for split mode
- Adapted to upstream's single-param dvbsub_init_decoder signature
- Updated lib_ccx.c and general_loop.c to match new API
After PAT changes, the pipeline's decoder was NULLed out to prevent
crashes, but this caused all subsequent DVB data to be skipped.
Now the decoder is reinitialized when detected as NULL, allowing
subtitle extraction to continue across PAT changes.
Fixes segmentation fault at 99% when PAT changes occur during DVB
subtitle processing. The crash happened because decoder context
private_data was freed but still accessed.
Changes:
- Add NULL check in process_data() before dvbsub_decode call
- Add defensive NULL check at start of dvbsub_decode()
- Add defensive NULL check at start of write_dvb_sub()
- Deep copy DVB bitmap data in copy_subtitle() to avoid aliasing
- Safe DVBSubContext copy that doesn't alias linked list pointers
- Clean up pipeline decoder refs in dinit_cap() after PAT change
- Direct FTS calculation for DVB-only streams
Tested with 11GB TS file with 23 PAT changes - no crash.
- Replace spin-lock with proper mutex (CRITICAL_SECTION/pthread_mutex)
- Add per-pipeline OCR contexts for thread safety
- Include PID in output filenames to handle duplicate languages
- Add dvbsub_get_context_size() and dvbsub_copy_context() for state management
- Improve language code validation (ISO 639-2 compliant)
- Change fatal error to warning for oversized PES packets
- Better language lookup from potential_streams before cinfo fallback
- Reset potential_stream data in demuxer cleanup
The help text references -svc for CEA-708 service selection, but the
Rust argument parser only accepted --service. This adds --svc as an
alias to maintain backwards compatibility with older documentation
and user scripts.
Fixes#1917🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The fix looks correct - properly adding `return;` after Rust calls to prevent the C code from also executing, and using `(void)` to silence return value warnings.
Windows CI passes (which was the target for this MSVC fix). The Linux CI failure appears unrelated since networking code isn't typically part of the regression test suite.
Merging - thanks for the fix!
Excellent work addressing the feedback! The separation of CC_SOLID_BLANK and PARITY_BIT_MASK makes the code much clearer - even though they have the same value, they serve different purposes and that's now well-documented.
The additional documentation for validate_cc_pair is very helpful for understanding the CEA-608/708 validation logic.
Merging - thanks for the thorough fix!
Previously, the `initialized_ocr` flag was stored at the program level
and shared across all DVB subtitle streams within a program. This caused
OCR to only initialize for the first DVB stream, leaving subsequent
streams without an OCR context and unable to extract subtitles.
The fix removes the `initialized_ocr` flag entirely. Each DVB subtitle
decoder now gets its own OCR context, matching the behavior of DVD and
VOBSUB decoders which already worked correctly with multiple streams.
Test results with multi-language DVB sample:
- Before: Second stream (0xCE0) → "No captions were found"
- After: Second stream (0xCE0) → 5 subtitles extracted correctly
Fixes#1067🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add build_*/ pattern and linux/build_scan/ to ignore various build
output directories (build_ocr/, build_ocr_asan/, etc.)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit fixes two issues:
1. ATSC CC data in private MPEG-2 streams (stream type 0x06) was not
being processed. The code returned CCX_PRIVATE_MPEG2_CC buffer type
which was never properly implemented - it just dumped debug output
and returned placeholder bytes.
Fix: Treat ATSC CC in private MPEG-2 streams the same as in
user-private streams (0x80-0x8F) by returning CCX_PES buffer type.
Both contain the same CC data format and should use the same
processing path.
2. Several dump() calls were using CCX_DMT_GENERIC_NOTICES which is
enabled by default, causing binary output to flood the terminal
when processing certain files.
Fix: Changed to appropriate debug-only masks (CCX_DMT_VERBOSE,
CCX_DMT_PARSE) so binary dumps only appear when debug mode is
explicitly enabled.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The Platform attribute is not valid in WiX v4+. Instead, specify the
target architecture at build time using the -arch x64 flag.
Changes:
- Remove invalid Platform="x64" attribute from Package element
- Add -arch x64 to wix build command in release workflow
- Keep ProgramFiles64Folder for explicit 64-bit installation path
This ensures the MSI is built as a proper 64-bit package that installs
to "Program Files" instead of "Program Files (x86)".
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add Platform="x64" to the WiX Package element and use ProgramFiles64Folder
instead of ProgramFiles6432Folder to ensure the MSI:
- Is recognized as a 64-bit installer by tools like winget/komac
- Installs to "Program Files" instead of "Program Files (x86)"
This fixes winget manifest detection issues where the installer was
incorrectly identified as x86 architecture.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Update version number across all files:
- src/lib_ccx/lib_ccx.h (main version define)
- linux/configure.ac, mac/configure.ac (autoconf)
- OpenBSD/Makefile
- package_creators/ (PKGBUILD, ccextractor.spec, debian.sh)
- packaging/winget/ (all yaml manifests)
- packaging/chocolatey/ (nuspec and install script)
Note: Checksums in winget/chocolatey will need to be updated
when the actual release MSI is built.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Document all changes since 0.96.2 including:
- VOBSUB subtitle extraction for MP4 and MKV files
- Native SCC input file support
- SCC output improvements (frame rate, styled PAC codes)
- Various bug fixes for timing, builds, and OCR
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Use tabs for continuation indentation in C code (clang-format)
- Remove extra trailing spaces in Rust code (rustfmt)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Excellent fix! The `__has_include()` approach is clean and removes the symlink workaround.
Verified locally:
- Normal build: ✅
- `-system-libs` build: ✅
Merging.
Add two new OCR options to improve subtitle recognition:
1. Character blacklist (enabled by default):
- Blacklists characters |, \, `, _, ~ that are commonly misrecognized
- Prevents "I" being recognized as "|" (pipe character)
- Use --no-ocr-blacklist to disable if needed
2. Line-split mode (opt-in via --ocr-line-split):
- Splits multi-line subtitle images into individual lines
- Uses PSM 7 (single text line mode) for each line
- Adds 10px padding around each line for better edge recognition
- May improve accuracy for some VOBSUB subtitles
Test results with VOBSUB sample:
- Blacklist: Reduces pipe errors from 14 to 0
- Matches subtile-ocr's approach for preventing misrecognition
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Replace unclear TODO with explanation of why VCL HRD parameters
are skipped. VCL HRD is for video buffering compliance and not
needed for caption extraction.
Changes:
- Replace TODO comment with clear explanation
- Update mprint message to be more informative
- Remove commented-out exit(1)
Addresses #1894
Change bswap16 and bswap32 to use int16_t and int32_t instead of
short and long for consistent behavior across platforms.
On Windows x64, `long` is 4 bytes (LLP64 model), while on Linux x64
`long` is 8 bytes (LP64 model). This difference could cause
inconsistent NAL unit length parsing in MP4/MOV files, potentially
affecting timestamp calculations.
This fix ensures the byte-swapping functions work identically on
both platforms by using fixed-width integer types from <stdint.h>.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add vobsub_decoder.c and vobsub_decoder.h to the Visual Studio project
and filters files.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add vobsub_decoder.c and vobsub_decoder.h to linux and mac Makefile.am
to fix autoconf build failures.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The ocr_text field in struct cc_bitmap is only defined when ENABLE_OCR
is set. Wrap the free() calls with #ifdef ENABLE_OCR to fix build
failures in non-OCR configurations.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add support for extracting VOBSUB (bitmap) subtitles from MP4 files
and converting them to text formats via OCR. This complements the
existing MKV VOBSUB support added in commit 1fccb783.
Changes:
- Add shared vobsub_decoder module for SPU parsing and OCR
- Add process_vobsub_track() function in mp4.c for subp:MPEG tracks
- Detect and count VOBSUB tracks in MP4 container
- Extract palette from decoder config when available
- Process SPU samples through OCR pipeline
The VOBSUB decoder module provides:
- SPU control sequence parsing (timing, colors, coordinates)
- RLE-encoded bitmap decoding (interlaced format)
- Palette parsing from idx header format
- Integration with Tesseract OCR via ocr_rect()
Tested with sample from issue #1349 - successfully extracted 61
subtitles from 128 SPU samples with accurate OCR text output.
Fixes#1349🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
MSVC doesn't support variable-length arrays (VLAs). The const int
declaration wasn't being treated as a compile-time constant,
causing Windows build failure with errors C2057, C2466, C2133.
Changed to #define which is a true compile-time constant.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add docs/VOBSUB.md explaining the VOBSUB extraction workflow
- Add tools/vobsubocr/Dockerfile for building subtile-ocr OCR tool
- Document how to convert VOBSUB (.idx/.sub) to SRT using OCR
The Dockerfile uses subtile-ocr (https://github.com/gwen-lg/subtile-ocr),
an actively maintained fork of vobsubocr with better accuracy.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Previously, CCExtractor would only print "Error: VOBSUB not supported"
when encountering VOBSUB (S_VOBSUB) subtitle tracks in Matroska files.
This left users without any usable output.
This commit adds full VOBSUB extraction support:
- Generate proper .idx index files with timestamps and file positions
- Generate proper .sub files with PS-wrapped SPU data
- Correct PS Pack header with SCR derived from timestamps
- Correct PES header with PTS for each subtitle
- 2048-byte block alignment (standard VOBSUB format)
The output is compatible with VLC, FFmpeg, and other players that
support VobSub subtitle format.
Tested with sample from issue #1371 - output validates correctly
with FFprobe and produces identical subtitle data to mkvextract.
Fixes#1371🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The --delay option was not being applied to DVB and other bitmap-based
subtitles (DVD subtitles, etc.), only to CEA-608 subtitles. This made
it impossible for users to correct timing offsets in DVB subtitle
extraction.
Changes:
- Add subs_delay to sub->start_time and sub->end_time for CC_BITMAP
subtitles in encode_sub(), matching the behavior for CC_608
- Add bounds checking to skip subtitles that become negative after
applying a negative delay
- Properly free bitmap data when skipping to avoid memory leaks
This provides a workaround for issue #1248 where DVB subtitles were
extracted with incorrect timing offset. Users can now use --delay to
adjust the timing.
Fixes#1248🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit addresses the remaining items from issue #1191:
1. SCC Output Frame Rate:
- Added scc_framerate to encoder_cfg and encoder_ctx structs
- The --scc-framerate option now affects both input parsing AND output
- Supports 24, 25, 29.97 (default), and 30 fps
2. Styled PAC (Preamble Address Code) Optimization:
- Added support for styled PACs that encode color/font at column 0
- When captions start at column 0 with non-default style, uses a single
styled PAC instead of indent PAC + mid-row code
- More efficient output that matches professional SCC files
Files changed:
- ccx_common_option.h/c: Added scc_framerate to encoder_cfg
- ccx_encoders_common.h/c: Added scc_framerate to encoder_ctx
- ccx_encoders_scc.c: Added get_scc_fps(), styled PAC functions,
and optimized write_cc_buffer_as_scenarist()
- common.rs: Copy scc_framerate to enc_cfg
Fixes#1191🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Set in_bufferdatatype for MP4/MOV container tracks to prevent incorrect
cb_field counter increments that were adding ~200ms to caption timestamps.
Root Cause:
-----------
The in_bufferdatatype variable was never set in mp4.c, remaining as
CCX_UNKNOWN. This caused the check in do_cb() (ccx_decoders_common.c)
to fail:
if (ctx->in_bufferdatatype != CCX_H264 && ctx->in_bufferdatatype != CCX_PES)
cb_field1++;
With in_bufferdatatype == CCX_UNKNOWN, cb_field1 was incremented for
each CEA-608 caption block processed. When get_fts() was called to
timestamp captions, it added cb_field1 * 1001/30 ms to the base time.
With ~6 caption blocks per frame (typical for roll-up captions), this
added approximately 200ms (6 × 33.37ms ≈ 200ms) to caption start times.
Analysis:
---------
Sample file: 1974a299f0502fc8199dabcaadb20e422e79df45972e554d58d1d025ef7d0686.mov
Before fix:
- FFmpeg first caption: 13,847ms
- CCExtractor first caption: 14,047ms
- Offset: 200ms late
The timing flow:
1. MP4 sample has PTS=1246245 (13,847ms at 90kHz)
2. set_fts() correctly sets fts_now based on PTS
3. do_cb() processes caption blocks, incrementing cb_field1 each time
4. get_fts() returns: fts_now + fts_global + cb_field1 * 1001/30
5. With cb_field1=6: adds 6 * 33.37 = 200ms offset
The fix ensures cb_field counters are not incremented for container
formats (MP4, MOV, MKV) because these formats associate all caption
data with the frame's PTS directly - there's no sub-frame timing.
Fix:
----
Set in_bufferdatatype in the three MP4 track processing functions:
- process_avc_track(): CCX_H264 for H.264/AVC tracks
- process_hevc_track(): CCX_H264 for H.265/HEVC tracks
- process_xdvb_track(): CCX_PES for MPEG-2 video tracks
After fix:
- FFmpeg first caption: 13,847ms
- CCExtractor first caption: 13,847ms
- Offset: 0ms (exact match)
This fix resolves timing issues for tests 226-230 on the sample platform.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The scc_framerate field was not being initialized in the C init_options()
function, leaving it with an undefined value. This could cause undefined
behavior when the options struct is used before the Rust code initializes
the field.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Replace tabs with spaces in doc comments
- Use #[derive(Default)] with #[default] attribute
- Use array syntax for char pattern matching
- Apply clang-format to C files
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add automated package publishing for Windows package managers:
## Winget
- Initial manifest files for CCExtractor.CCExtractor
- Workflow to auto-submit PRs to microsoft/winget-pkgs on release
## Chocolatey
- Package files (nuspec, install/uninstall scripts)
- Workflow to build and push packages on release
## Setup Required
- WINGET_TOKEN secret (GitHub PAT with public_repo scope)
- CHOCOLATEY_API_KEY secret (from chocolatey.org account)
Closes#1308🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The -system-libs mode was overwriting BLD_LINKER and losing the FFmpeg
libraries that -hardsubx adds. This fix preserves the FFmpeg libraries
when both flags are used together.
Also add permissions: contents: write to the workflow to allow
uploading assets to releases.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add a new GitHub Actions workflow that builds CCExtractor using the
-system-libs flag, creating binaries that dynamically link against
system libraries instead of bundling dependencies.
This is useful for:
- Linux distribution packaging (Debian, Ubuntu, Fedora, etc.)
- Homebrew/Linuxbrew packaging
- Users who prefer smaller binaries with system library updates
Two variants are built:
- basic: Standard OCR-enabled build
- hardsubx: Build with HardSubX (burned-in subtitle extraction)
The workflow runs on releases and can be manually triggered.
Related to #1907🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The Windows release was missing Tesseract OCR runtime dependencies
(tessdata files) needed for the HardSubx feature to work. Users had
to manually install Tesseract OCR and set TESSDATA_PREFIX.
Changes:
- Add get_executable_directory() to ocr.c that returns the directory
containing the executable (works on Windows, Linux, and macOS)
- Update probe_tessdata_location() to search for tessdata in the
executable directory, enabling bundled tessdata to be found
- Update release workflow to download eng.traineddata and osd.traineddata
from tesseract-ocr/tessdata_fast during release builds
- Update WiX installer to include tessdata directory with the
traineddata files
Now the Windows release includes tessdata files, and CCExtractor will
automatically find them in the installation directory without requiring
users to install Tesseract separately or set environment variables.
Fixes#1578🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Update the version extraction logic in the release workflow to properly
handle 3-part semantic versions like v0.96.1 in addition to existing
2-part versions like v0.96.
MSI installers require 4-part versions (major.minor.build.revision):
- v0.96 → 0.96.0.0 (unchanged behavior)
- v0.96.1 → 0.96.1.0 (new support)
- v0.96.1.2 → 0.96.1.2 (passthrough)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
MSI version numbers must be numeric (major.minor.build format).
Strip everything after the first dash from tag names to get valid
version numbers (e.g., v1.08-test becomes 1.08).
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Set InstallScope="perMachine" to ensure proper admin-level registry access
- Bump InstallerVersion from 200 to 500 (Windows Installer 5.0)
This should fix the "Could not write key VersionMinor to Product" error.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Instead of trying to override WixUI_InstallDir, create a custom UI
based on it but without the LicenseAgreementDlg. This is the proper
way to remove dialogs from WiX UI sets.
- Add CustomUI.wxs with dialog flow: Welcome -> InstallDir -> VerifyReady
- Update installer.wxs to use CustomInstallDirUI instead of WixUI_InstallDir
- Update workflow to build both .wxs files
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The previous Publish elements without Order didn't override the defaults.
Adding Order="1" ensures our overrides fire after the WixUI defaults,
making our InstallDirDlg navigation take precedence.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Override the WixUI_InstallDir dialog sequence to skip the license
agreement dialog, restoring the original behavior before WiX v6 migration.
- WelcomeDlg Next button now goes directly to InstallDirDlg
- InstallDirDlg Back button returns to WelcomeDlg
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The installer.wxs was referencing old FFmpeg DLLs that no longer exist:
- avcodec-57.dll → avcodec-60.dll
- avformat-57.dll → avformat-60.dll
- avutil-55.dll → avutil-58.dll
- swresample-2.dll → swresample-4.dll
- swscale-4.dll → swscale-7.dll
Added new DLLs that are now part of the build:
- avdevice-60.dll, avfilter-9.dll, postproc-57.dll
- libgpac.dll, OpenSVCDecoder.dll
- libcryptoMD.dll, libsslMD.dll
- desktop_drop_plugin.dll
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The <ui:WixUI Id="WixUI_InstallDir" InstallDirectory="INSTALLFOLDER" />
element already defines WIXUI_INSTALLDIR (via the InstallDirectory attribute)
and ARPNOMODIFY (in the wixlib). Declaring them again causes WIX0091 errors.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Added WIXUI_INSTALLDIR property (required per WiX issue #7105)
- Changed RemoveFolder Id from "DesktopFolder" to "RemoveDesktopShortcut"
to avoid ID conflict with StandardDirectory element
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The WiX v4 extension path was hardcoded and didn't match the actual
installed location. WiX v4 allows referencing globally installed
extensions by name directly.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The installer directory already has files from the copy step, so
Expand-Archive needs -Force to overwrite/merge.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Changed WindowsTargetPlatformVersion from 10.0.22621.0 to 10.0 to
automatically use whichever Windows 10 SDK is installed on the build
machine. This fixes CI failures when the runner has a different SDK
version installed.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The solution file only has x64 configurations (Release-Full|x64,
Debug-Full|x64). The workflow was incorrectly trying to build with
Win32 platform which doesn't exist.
Changes:
- Platform=Win32 → Platform=x64
- Output path ./Release-Full/ → ./x64/Release-Full/
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Start new changelog section for unreleased changes. First entry is
the multi-page teletext extraction feature (#665) which allows
extracting multiple teletext pages simultaneously with separate
output files.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The page update logic at line 1029-1035 was incorrectly updating
tlt_config.page for all accepted pages, even in single-page auto-detect
mode. This caused the auto-detect logic at line 979 to be bypassed
because the first packet (even with an invalid page number like 0xFF)
would set tlt_config.page, preventing proper auto-detection.
The fix restricts the page update to multi-page mode only. In single-page
mode, tlt_config.page is set exclusively by:
1. User specification (--tpage option)
2. Auto-detect logic (first valid subtitle page found)
This fixes regression in SP Test 76 which uses sample
8c1615c1a84d4b9b34134bde8085214bb93305407e935edcdfd4c2fc522c215f.mpg
with --autoprogram --out=ttxt --latin1.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Implement support for extracting multiple teletext pages simultaneously,
with each page output to a separate file.
Changes:
- Support multiple --tpage arguments (e.g., --tpage 397 --tpage 398)
- Create separate output files per page with _pNNN suffix
(e.g., output_p397.srt, output_p398.srt)
- Maintain backward compatibility for single-page extraction (no suffix)
- Add per-page SRT counters for correct subtitle numbering
- Fix BCD to decimal page number conversion in telxcc.c
- Add --tpages-all mode support for auto-detecting all pages
Tested with 21 teletext samples from the sample platform, all passing.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Extends HEVC caption extraction support to MKV files.
Changes to matroska.h:
- Add hevc_codec_id constant for V_MPEGH/ISO/HEVC
- Add hevc_track_number field to matroska_ctx structure
- Add process_hevc_frame_mkv() function declaration
Changes to matroska.c:
- Detect HEVC tracks in parse_segment_track_entry()
- Modify parse_simple_block() to route HEVC tracks to HEVC processor
- Add process_hevc_frame_mkv() with is_hevc flag and store_hdcc() call
- Parse HEVCDecoderConfigurationRecord in parse_private_codec_data()
- Initialize hevc_track_number in matroska_loop()
- Update output messages to report HEVC tracks
Tested with HEVC MKV file - extracts 73 captions matching MP4 output.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
PR #1852 added HEVC caption extraction for MPEG-TS containers,
but MP4/MKV containers weren't supported. This adds HEVC support
for MP4 containers using GPAC.
Changes:
- Add HEVC subtype definitions (hev1, hvc1)
- Add process_hevc_sample() to parse HEVC NAL units and extract CC
- Add process_hevc_track() to iterate through HEVC track samples
- Detect and process HEVC tracks in processmp4()
- Add store_hdcc() call to flush buffered CC data after each sample
The key fix was adding store_hdcc() after processing each sample.
Without this, CC data was being parsed but never output because
store_hdcc() is normally called from slice_header() which is
AVC-only.
Closes#1690 (for MP4 containers)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Move all changes made after the 0.95 version bump (commit ee232b5)
to a new 0.96 section marked as "Unreleased".
This separates the released 0.95 content from ongoing development
work that will be included in the next release.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
OCR build fix:
- linuxdeploy was failing with "Invalid magic bytes in file header"
because it was passed the wrapper script instead of the actual binary
- When OCR is enabled, ccextractor is renamed to ccextractor.bin and
a wrapper script sets TESSDATA_PREFIX before executing the binary
- Now correctly passes ccextractor.bin to linuxdeploy when it exists
HardSubX build fix:
- Add libavdevice-dev to FFmpeg dependencies in CI workflow
- rusty_ffmpeg requires libavdevice which was missing
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Rewrites the AppImage build script to support three build variants
matching the Docker build options:
- minimal: Basic CCExtractor without OCR (smallest size)
- ocr: CCExtractor with OCR support (default)
- hardsubx: CCExtractor with burned-in subtitle extraction
Changes to build_appimage.sh:
- Add BUILD_TYPE environment variable to select variant
- Fix CMake options (was incorrectly using make flags)
- Bundle tessdata for OCR builds with wrapper script
- Create proper desktop file and icon handling
- Improve error handling and cleanup
New GitHub Actions workflow (build_appimage.yml):
- Builds all three variants on release
- Uploads AppImages as release assets
- Can be manually triggered for specific variants
- Caches GPAC build for faster CI runs
Usage:
./build_appimage.sh # Builds 'ocr' variant
BUILD_TYPE=minimal ./build_appimage.sh
BUILD_TYPE=hardsubx ./build_appimage.sh
Closes#1348🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
When processing DVB subtitles from live streams or corrupted files,
the bitmap clipping operation can fail, resulting in a NULL pix object.
Previously, this would cause a fatal crash with "Failed to perform OCR -
Failed to get text" because the code continued to call TessBaseAPIGetUTF8Text
even when no image was set.
Changes:
- Handle cpix_gs == NULL by logging a message and returning NULL
(skip this bitmap) instead of continuing and crashing
- Change the fatal error when TessBaseAPIGetUTF8Text returns NULL
to a non-fatal skip, since this can happen with empty/invalid bitmaps
- Both cases now properly clean up allocated resources before returning
This allows CCExtractor to gracefully skip problematic subtitle frames
instead of crashing, which is especially important for live streams
where packet loss or discontinuities can occur.
Fixes#1010🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The plans/ directory is in .gitignore but these files were added
before that entry existed. Removing from tracking while keeping
files on disk.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
For clarity and consistency, use explicit i64 instead of c_longlong.
While c_longlong is 64-bit on all platforms, i64 is clearer and
follows the same pattern as the previous commit that removed c_long.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The C type 'long' has different sizes on different platforms:
- Linux: 64-bit
- Windows: 32-bit
This causes ABI mismatches when interfacing with Rust, since Rust's
c_long matches the platform's long size, but we were treating these
values as 64-bit throughout.
Changed the following fields from 'long' to 'int64_t':
- asf_constants.h: parsebufsize
- avc_functions.h: cc_databufsize, num_nal_unit_type_7, num_vcl_hrd,
num_nal_hrd, num_jump_in_frames, num_unexpected_sei_length
- ccx_decoders_608.h: bytes_processed_608
- ccx_demuxer.h: capbufsize, capbuflen
- lib_ccx.h: ts_readstream() return type, FILEBUFFERSIZE
- file_functions.c: FILEBUFFERSIZE definition
- ts_functions.c: ts_readstream() implementation
Also updated Rust code in common.rs to remove c_long casts, since
bindgen will now generate i64 for these fields.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The extern declaration for ccxr_add_current_pts used c_long, but the
actual implementation in time.rs uses i64. This caused an ABI mismatch
on Windows where:
- c_long = i32 (32-bit)
- i64 = 64-bit
On Linux both are 64-bit so it worked, but on Windows the type
mismatch could cause incorrect parameter passing.
Changes:
- Change extern fn declaration from c_long to i64
- Remove unnecessary cast (FRAME_DURATION_TICKS is already i64)
- Remove unused c_long import
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Introduced a forward declaration for .
- Updated to calculate and set image dimensions before writing XML tags.
- Adjusted offset calculations based on screen size for better alignment of subtitles.
- Improved handling of the opening XML tag based on subtitle data presence.
When using --goptime, timestamps were compressed to 00:00:01-02 instead
of actual GOP times (17:56:40-47). This was caused by conflicts between:
- GOP timing set from GOP headers (wall-clock time, e.g., 17:56:40)
- PES PTS timing (stream-relative time, e.g., 00:00:02)
The sync detection saw these as 64,598-second "jumps" and kept resetting
timing, corrupting the output.
Fixes:
1. Guard video PES timing in general_loop.c - skip set_current_pts and
set_fts when use_gop_as_pts == 1 to prevent PES PTS from overwriting
GOP-based timing
2. Disable sync check in ccextractor.c when use_gop_as_pts == 1 since
GOP time and PES PTS are in different time bases and sync detection
is meaningless
Test results:
- Before: 00:00:01,231 --> 00:00:01,729
- After: 17:56:41,319 --> 17:56:43,084
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
On Windows, c_long is i32 while on Linux it's i64. The function
ccxr_print_mstime_static expects i64, so casting to c_long caused
a type mismatch error on Windows builds.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The Rust FFI functions were using c_long for PTS/FTS timestamps, but:
- C code uses LLONG (int64_t, 64 bits on all platforms)
- Rust c_long is 32 bits on Windows, 64 bits on Linux
This caused timestamp truncation on Windows when PTS values exceeded
2^31 (~24 days at 90kHz), resulting in wrong subtitle timestamps.
For example, a file with Min PTS of 23:50:45 (7,726,090,500 ticks)
would have its PTS truncated, breaking the teletext delta calculation
that normalizes timestamps to start at 0.
Changes:
- ccxr_add_current_pts: pts parameter i64
- ccxr_set_current_pts: pts parameter i64
- ccxr_get_fts: return type i64
- ccxr_get_visible_end: return type i64
- ccxr_get_visible_start: return type i64
- ccxr_get_fts_max: return type i64
- ccxr_print_mstime_static: mstime parameter i64
- fts_at_gop_start: extern static i64
Fixes tests 18 and 19 on Windows CI which showed raw PTS timestamps
(23:50:46) instead of normalized timestamps (00:00:00).
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Korean broadcasts use EUC-KR encoding (variable-width) in CEA-708
captions, where ASCII is 1 byte and Korean characters are 2 bytes.
The decoder was always writing 2 bytes per character (UTF-16BE style),
causing NULL bytes to be inserted before every ASCII character.
Changes:
- Add is_utf16_charset() to detect fixed-width 16-bit encodings
- Modify write_char() to accept use_utf16 flag:
- true: Always 2 bytes (UTF-16BE for Japanese, issue #1451)
- false: 1 byte for ASCII, 2 bytes for extended (EUC-KR for Korean)
- Detect charset type in write_row() before building output buffer
This fixes Korean subtitle extraction when using --service "1[EUC-KR]"
while maintaining compatibility with Japanese UTF-16BE (issue #1451).
Closes#1065🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
When converting CEA-708 decoder settings from C to Rust via from_ctype(),
a null timing pointer would cause the entire conversion to fail and return
None. This triggered the unwrap_or(default()) fallback, resetting critical
settings like `enabled` and `services_enabled` to false/0.
This caused CEA-708 captions to not be extracted (exit code 10) even when
--service was specified, because the decoder's is_active flag was reset
to 0 during demuxer initialization.
The fix handles null timing pointer gracefully by using a default
CommonTimingCtx instead of propagating None, preserving the other
decoder settings.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit adds detection and basic handling of DVB teletext streams
in WTV (Windows TV) files. Previously, teletext streams were silently
ignored.
Changes:
- Add WTV_STREAM_TELETEXT GUID to wtv_constants.h
- Detect teletext streams by examining the format GUID at offset 0x4C
in MSTVCAPTION stream metadata
- Initialize teletext decoder when teletext stream is found
- Add timing support for teletext streams
- Wrap teletext data in PES headers for the teletext decoder
Limitation: WTV files store teletext in Microsoft's VBI sample format,
which differs from standard DVB teletext data units. The decoder will
process the data but may not extract subtitles from all WTV files.
This is noted in a warning message shown when teletext is detected.
Even FFmpeg's libzvbi fails to decode this format in the test sample.
Addresses: #1391🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Fixes#1608 - Update bindgen to enable Fedora Linux packaging.
- Upgrade bindgen from 0.64.0 to 0.72.1
- Fix deprecated CargoCallbacks API
- Replace (?i) regex flags with character classes for compatibility
The inline case-insensitivity flag (?i) causes bindgen 0.72.1 to
silently produce empty bindings. This fix uses [Dd][Tt][Vv][Cc][Cc]
character classes to match both lowercase (dtvcc_*) and uppercase
(DTVCC_*) type/function names.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
On Windows, when processing MP4/MOV files with CEA-708 captions, the
output file was being truncated to only the last subtitle. This occurred
because:
1. C code opened the file using open() and stored the fd in writer->fd
2. At end of processing, Rust's ccxr_flush_decoder was called
3. Rust checked writer->fhandle (a separate Windows-specific field)
4. Since fhandle was null (C only set fd), Rust called File::create()
5. File::create() truncates existing files, losing all previous content
The fix checks if fd is already valid before creating a new file. If fd
is valid, it converts it to a Windows handle using _get_osfhandle(),
avoiding the file truncation.
Fixes#1449🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Redirect stderr to /dev/null for the GPAC source file search to avoid
showing "No such file or directory" error when GPAC is not installed.
The build continues to work correctly in both cases.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Fixes two buffer overflow vulnerabilities reported in issues #1427 and #1428:
- #1428 (Global buffer overflow in slice_header): The slice_type value
read from H.264 exp-golomb data was used to index slice_types[] array
without bounds checking. Valid values are 0-9 per H.264 spec Table 7-6.
Now validates slice_type < 10 before use.
- #1427 (Heap buffer overflow in parse_PMT): ES_info_length from PMT
descriptor data was trusted without validation against buffer bounds.
Malformed PMT with excessive ES_info_length could read past buffer end.
Now validates ES_info_length and descriptor lengths against buffer.
Both issues were discovered using AddressSanitizer with crafted TS files.
Fixes#1427Fixes#1428🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Homebrew installs leptonica as 'libleptonica.dylib', not 'liblept.dylib'.
Changed AC_CHECK_LIB from [lept] to [leptonica] to match the actual
library name on macOS.
The AC_CHECK_LIB checks in configure.ac need LDFLAGS and CPPFLAGS
to find libraries installed via Homebrew (in /opt/homebrew on Apple
Silicon or /usr/local on Intel Macs).
Fixes#1173 - Error in ./configure enabling hardsubx on Mac
Fixes#1306 - Add HARDSUBX compilation docs for macOS
The configure.ac script failed on macOS with "binary operator expected"
because pkg-config output was unquoted. When pkg-config returns multiple
libraries (e.g., "-ltesseract -lcurl"), the unquoted expansion caused
`test ! -z` to receive multiple arguments instead of a single string.
Changes:
- Quote pkg-config output in TESSERACT_PRESENT conditional (mac & linux)
- Add macOS section to docs/HARDSUBX.txt with all build methods
- Add GitHub Actions jobs to test HARDSUBX builds on macOS:
- build_shell_hardsubx: Tests ./build.command -hardsubx
- build_autoconf_hardsubx: Tests ./configure --enable-hardsubx
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The include "../lib_hash/sha2.h" in params.c requires an include path
that makes "../lib_hash" resolve to "thirdparty/lib_hash".
Changed -I../src/lib_hash (which doesn't exist) to
-I../src/thirdparty/lib_hash. With this path, the compiler searches
for "../lib_hash/sha2.h" as:
../src/thirdparty/lib_hash/../lib_hash/sha2.h
= ../src/thirdparty/lib_hash/sha2.h ✓
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Tests all three Dockerfile build types in parallel:
- minimal: Basic CCExtractor without OCR
- ocr: CCExtractor with Tesseract OCR support
- hardsubx: CCExtractor with burned-in subtitle extraction
Each job builds from local source and verifies the image works
by running --version. Uses GitHub Actions cache for faster rebuilds.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add a new `-system-libs` flag to mac/build.command that uses
system-installed libraries via pkg-config instead of bundled ones.
This enables Homebrew formula compatibility while preserving the
default standalone build behavior.
When `-system-libs` is passed:
- Uses pkg-config for: freetype2, gpac, libpng, libprotobuf-c,
libutf8proc, zlib
- Does not compile bundled thirdparty sources
- Links against system libraries
Default behavior (no flag):
- Compiles bundled libraries as before
- No change to existing builds
Also adds a CI job `build_shell_system_libs` to test the new flag.
Refs #1580, #1534🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Previously, configure would succeed even without GPAC installed,
leading to a confusing compile-time error:
"gpac/isomedia.h: No such file or directory"
Now configure checks for GPAC via pkg-config and fails early with
a helpful error message listing the package names for common distros:
- gpac-devel (Fedora/RHEL)
- libgpac-dev (Debian/Ubuntu)
- gpac (Arch)
Fixes#1584🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Fixes#1550 - Docker builds were broken after PR #1535 switched from
vendored GPAC to system GPAC.
Changes:
- Switch from Alpine to Debian Bookworm (Alpine's musl libc has issues
with Rust bindgen's libclang dynamic loading)
- Support three build variants via BUILD_TYPE argument:
- minimal: No OCR support
- ocr (default): Tesseract OCR for bitmap subtitles
- hardsubx: OCR + FFmpeg for burned-in subtitle extraction
- Support dual source modes via USE_LOCAL_SOURCE argument:
- 0 (default): Clone from GitHub (standalone Dockerfile)
- 1: Use local source (faster for developers)
- Add .dockerignore to exclude build artifacts (~2.7GB -> ~900KB context)
- Update README.md with comprehensive build instructions
Tested all three variants successfully:
- minimal: ~130MB image
- ocr: ~215MB image
- hardsubx: ~610MB image
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Previously, when using -out=mcc with raw input files (-in=raw),
CCExtractor would print "Output format not supported" and produce
no output. This was because the raw file processing path decoded
CEA-608 data to text, but MCC format requires raw cc_data bytes.
The fix adds a new code path that bypasses the 608 decoder when
MCC output is requested:
- Added process_raw_for_mcc() helper function that:
- Converts 2-byte raw pairs to 3-byte cc_data format
- Wraps each CC pair in CDP format via mcc_encode_cc_data()
- Maintains proper timing at 29.97fps
- Modified raw_loop() to detect MCC output and use the new path
Test results with McPoodle raw files:
- Before: "Output format not supported" (exit code 10)
- After: Valid MCC file with proper timing and CDP-wrapped data
Fixes#1542🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Fixes#1455
When read_video_pes_header() encounters a malformed or truncated PES
packet (returns -1), copy_capbuf_demux_data() previously returned
CCX_EOF which terminated the entire file processing. This was overly
aggressive - a single broken PES packet should be skipped, not
terminate the entire file.
UK Freeview DVB recordings from September 2022 onwards contain some
malformed PES packets in the DVB subtitle stream that triggered this
condition, causing ccextractor to stop at 0% with "Processing ended
prematurely" error even though VLC could display the subtitles.
The fix changes the error handling to skip the broken packet and
continue processing:
- Before: return CCX_EOF (terminates file)
- After: return CCX_OK (skips packet, continues)
Test results with UK Freeview sample:
- Before: 0% processed, 0 subtitles extracted
- After: 100% processed, 10 subtitles extracted correctly
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The HEVC NAL type constants are defined for completeness and reference,
but not all are currently used in the codebase.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
HEVC uses B-frames extensively, causing CC data to arrive in decode
order instead of presentation order. This was causing character pairs
to be scrambled (e.g., "MEDIOCRE" became "MIOEDCRE").
Changes:
- Implement PTS-based sequence numbering for HEVC CC data (similar to H.264)
- Change flush logic to only trigger on IDR frames (not every VCL NAL)
- Add HEVC fallback detection for streams without PAT/PMT
Fixes#1639 (ATSC 3.0 HEVC caption extraction)
Tested with issue_1639_sample.ts and caption_test_1690.ts
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Fixes#1690 - Captions fail to extract on HEVC video stream
HEVC video streams with embedded EIA-608/708 captions weren't being
extracted, even though VLC/MPV could display them.
Root causes fixed:
1. HEVC stream type (0x24) wasn't recognized for CC extraction
2. HEVC NAL parsing used H.264 format (1-byte) instead of HEVC (2-byte)
3. HEVC SEI types (39/40) weren't handled (only H.264 SEI type 6)
4. CC data accumulation across SEIs caused u8 overflow/garbled output
Changes:
- C code: Add HEVC stream detection, CCX_HEVC buffer type, is_hevc flag
- Rust code: HEVC NAL header parsing (2-byte, type=(byte[0]>>1)&0x3F),
HEVC SEI handling (PREFIX_SEI=39, SUFFIX_SEI=40), immediate CC flush
Thanks to @trufio465-bot for the initial research in PR #1735.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The AVC parser would fail with "Leading bytes are non-zero" error when
processing HLS/Twitch stream segments that start mid-stream without
proper NAL unit headers at the beginning.
Root cause: When process_avc encountered non-zero leading bytes, it
returned an error with 0 bytes processed. The C code would not remove
any bytes from the buffer, causing subsequent data to accumulate with
the corrupt beginning, leading to infinite errors.
Fix:
- Add find_nal_start_code() to search for valid NAL start codes
- If buffer doesn't start with 0x00 0x00, search for first NAL start
- Skip garbage data before first valid NAL unit
- Return full buffer length when no NAL found (clears the buffer)
- Change forbidden_zero_bit error from fatal to skip-and-continue
Tested with 6 Twitch HLS sample files - all now process correctly.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Old versions of ccextractor accepted single-dash long options like
-quiet, -stdout, -autoprogram. The new Rust-based argument parser
(clap) only accepts double-dash options (--quiet, --stdout, etc.).
When users ran scripts with -quiet, clap parsed it as individual
short options -q -u -i -e -t and failed with exit code 7. Users
with stderr redirected never saw the error, causing silent failures
with zero-length output files.
This adds a normalize_legacy_option() function that pre-processes
arguments before passing them to clap:
- Single-dash long options (e.g., -quiet) convert to --quiet
- Double-dash options remain unchanged
- Short options like -o remain unchanged
- Numeric options like -1, -12 remain unchanged
Includes 6 unit tests for the new function.
Fixes#1576🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Fix raw caption file processing that would stop at exactly 9:43:00 (2MB).
Root causes and fixes:
1. Premature EOF: After processing first chunk (BUFSIZE ~2MB), data->len
was never reset. On next iteration, general_get_more_data() calculated
want = BUFSIZE - len = 0 and returned EOF immediately.
Fix: Reset data->len = 0 after each chunk and change loop condition.
2. 32-bit integer overflow: The calculation cb_field1 * 1001 / 30 * 90
overflowed for large cb_field1 values (>1M). For example,
34,989,487 * 90 = 3,149,053,830 exceeds 32-bit signed max.
Fix: Cast cb_field1 to LLONG before multiplication.
3. Timing initialization: Raw mode needs min_pts=0, sync_pts=0, and
pts_set=MinPtsSet for correct fts_now calculation.
Tested with sample files from issue #1565:
- DTV3.raw: Now processes to 17:59:56 (was stopping at 9:43)
- DTV4.raw: Now processes to 14:00:00 (was stopping at 9:43)
- DTV5.raw: Now processes to 13:19:59 (was stopping at 9:43)
Closes#1565🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add a new --list-tracks (-L) option that lists all tracks found in
media files without processing them. This is useful for exploring
media files before caption extraction.
Supports:
- Matroska (MKV/WebM) files
- MP4/MOV files
- MPEG Transport Stream files
The feature is implemented entirely in Rust with native parsers for
each format, avoiding dependency on external libraries.
Closes#1669🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The previous WTV timing fix (commit 300f8ca6) set min_pts and pts_set=2
(MinPtsSet) but didn't set sync_pts. This caused the Rust timing code
to detect a massive PTS jump when processing WTV files with large
initial timestamps (e.g., files recorded at 18:38:23).
The PTS jump detection computes (current_pts - sync_pts), and with
sync_pts=0 but current_pts=6039323550 (18:38:23 in PTS units), the
difference exceeded MAX_DIF and triggered the jump handling, resulting
in empty output.
This fix sets sync_pts to the same value as min_pts when first
initializing timing, preventing the false PTS jump detection.
Test results:
- Before: WTV files with large initial PTS produced empty output
- After: Timestamps match expected ground truth exactly
(e.g., 00:00:00,601 --> 00:00:02,801 for first caption)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
When using --udp or --tcp options, ccxr_demuxer_open() was called with
a NULL file pointer, causing a crash in CStr::from_ptr().
The fix checks if the file pointer is NULL before dereferencing it,
and uses an empty string for network input modes.
Fixes#1846🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
WTV timing fix:
- Set min_pts on first valid timestamp to enable fts_now calculation
- Set pts_set = 2 (MinPtsSet) instead of 1 (Received)
- This fixes WTV files where all timestamps were clustered around 1 second
instead of being spread across the actual video duration
Latin-1 encoding fix:
- Change music note substitution from pilcrow (0xB6) to '#' (0x23)
- Pilcrow caused grep to treat output files as binary
- '#' is a more recognizable substitute for the musical note character
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The rcwt_loop() function set min_pts = 0 for RCWT files but did not
set pts_set = 2 (MinPtsSet). This caused the Rust timing code to skip
the fts_now calculation (which checks pts_set == MinPtsSet), resulting
in all captions having timestamps compressed near 0 instead of their
correct times spread across the file duration.
The fix adds pts_set = 2 after setting min_pts, which tells the timing
system that min_pts is valid and fts_now can be calculated properly.
Fixes Test 217 timing issue where:
- Before: 00:00:00,001 --> 00:00:00,091 (wrong)
- After: 00:00:02,402 --> 00:00:04,536 (correct)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This PR triggers a fresh CI run to verify the combined effect of:
- PR #1847: Hardsubx crash fix, memory leak fixes, rcwt exit code fix
- PR #1848: XDS empty content entries fix
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
When outputting US TV Parental Guidelines ContentAdvisory XDS data,
the code was always calling xdsprint() for both the age rating and
the content flags (violence, language, etc). However, if there are
no content flags (e.g., for TV-G which has no additional advisories),
the content string is empty.
This caused duplicate XDS entries in the output - one with the age
rating and one with an empty string. The fix only outputs the content
string if it is not empty.
Fixes regression test 113 output mismatch.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The rcwt_loop function was returning exit code 10 (no captions) even
when CEA-608 captions were successfully extracted from RCWT/BIN format
files. This happened because CEA-608 decoding writes directly to the
encoder via printdata() without setting dec_sub->got_output.
Add a check after the main loop (similar to general_loop) that also
considers enc_ctx->srt_counter, enc_ctx->cea_708_counter, and
dec_ctx->saw_caption_block to properly detect when captions were found.
Fixes regression test 217 which was failing with exit code 10.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The hardsubx code was using C's free() on strings allocated by Rust's
CString::into_raw(). Since Rust and C use different memory allocators,
this caused heap corruption that manifested as garbage OCR output after
processing ~27 subtitle frames.
Changes:
- Export free_rust_c_string() from Rust as extern "C" function
- Declare free_rust_c_string() in hardsubx.h for C code
- Replace free(subtitle_text) with free_rust_c_string(subtitle_text)
in hardsubx_decoder.c for Rust-allocated strings
- Fix memory leaks in process_hardsubx_linear_frames_and_normal_subs()
where subtitle_text_hard and prev_subtitle_text_hard were not freed
- Remove dummy CI trigger file (no longer needed)
Testing:
- AddressSanitizer: No memory errors detected
- Valgrind: 0 bytes definitely lost, 0 bytes indirectly lost
- Manual testing: OCR output now correct for entire video duration
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Free basefilename in _dinit_hardsubx (allocated by get_basename)
- Free subtitle_text after each frame processing iteration
- Free prev_subtitle_text when replaced and at end of function
- Free sws_ctx with sws_freeContext (was never freed)
Reduces memory leaks from 63,926 bytes to 0 bytes.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
1. Remove invalid free(tessdata_path) - probe_tessdata_location() returns
a pointer to static strings or getenv() result, not heap memory.
2. Fix alloc-dealloc mismatch in OCR text handling:
- TessBaseAPIGetUTF8Text() allocates with C++ operator new[]
- The code was freeing with C free() causing allocator mismatch
- Now properly copy string and use TessDeleteText() before returning
- Unified all OCR text return paths to use Rust-allocated strings
3. Previous fix: freep(&lctx->dec_sub) instead of freep(lctx->dec_sub)
These fixes resolve Test 241 (Hardsubx) crash on Sample Platform.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The freep() function expects a pointer-to-pointer (void**) so it can
dereference, free, and NULL-out the pointer. The code was passing
lctx->dec_sub directly instead of &lctx->dec_sub.
This caused freep to interpret the first 8 bytes of the cc_subtitle
struct as a pointer and attempt to free() it, resulting in a crash
(SIGABRT/exit code 134) in the memory allocator.
Fixes Test 241 (Hardsubx) crash on Sample Platform.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This PR triggers a fresh CI run to analyze all failing regression tests
and determine whether each needs a ground truth update or a code fix.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add changelog entries for recent merged PRs:
- Fix: Garbled captions from HDHomeRun and I/P-only H.264 streams (#1109)
- Fix: Enable stdout output for CEA-708 captions on Windows (#1693)
- Fix: McPoodle DVD raw format read/write (#1524)
- Fix: Variable shadowing in general_loop
- Fix: Double-free crash in teletext cleanup
- Fix: Uninitialized memory and memory leaks (Valgrind)
- Fix: Dangling pointers in Rust FFI
- New: Teletext subtitle pages in -out=report (#1034)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Fix comment spacing (single space before //)
- Mark is_two_byte_loop_marker as #[cfg(test)] since it's only used in tests
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Reading:
- Migrate DVD raw parser from C to Rust (src/rust/src/demuxer/dvdraw.rs)
- Add FFI exports: ccxr_process_dvdraw(), ccxr_is_dvdraw_header()
- Handle both McPoodle's single-byte and legacy 2-byte loop markers
- Add 15 unit tests covering all edge cases
Writing:
- Fix LC3/LC4 constants from 2-byte to 1-byte to match McPoodle's format
- Output files now have identical size to McPoodle's original
Fixes#1524🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Fix clippy warning: variable does not need to be mutable.
The current_index variable is only assigned once during initialization
and never modified afterward.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
For I/P-only streams (like HDHomeRun recordings), the caption buffer was
being flushed on every reference frame (I and P). Since ALL frames in these
streams are reference frames, this defeated the caption reordering mechanism,
causing garbled output.
The fix:
- Only flush the buffer and reset reference PTS on IDR frames (NAL type 5),
not on P-frames
- Initialize currefpts on first frame to avoid huge indices at stream start
- Properly flush buffer and reset reference when large PTS gaps are detected
This allows P-frames to accumulate in the buffer and be sorted by their
PTS-based indices before output.
Fixes#1109🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit fixes two issues uncovered during Sample Platform testing:
1. Variable shadowing in general_loop() (general_loop.c):
- The inner `int ret = process_non_multiprogram_general_loop(...)`
was shadowing the outer `ret` variable
- This caused the return value to always be 0, making ccextractor
report "No captions found" even when captions were extracted
- Also added `ret = 1` when captions are detected via counters,
needed for CEA-708 which writes directly via Rust
2. Missing private_data refresh in update_decoder_list_cinfo (lib_ccx.c):
- After PAT changes, dinit_cap() frees the teletext context and
NULLs dec_ctx->private_data
- But update_decoder_list_cinfo() returned existing decoder without
refreshing private_data from the new cap_info
- This caused all subsequent teletext processing to be skipped
- Fixed by updating dec_ctx->private_data when returning existing decoder
These fixes resolve Sample Platform test failures in CEA-708 and Teletext
categories where tests returned exit code 10 (no captions) unexpectedly.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- demux.rs: Update dummy_demuxer() to explicitly initialize all fields
instead of using ..Default::default(), which is not allowed when the
struct implements Drop
- common.rs, demuxer.rs: Apply cargo fmt formatting fixes
This fixes the Rust test compilation error:
"cannot move out of type CcxDemuxer which implements the Drop trait"
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add proper cleanup of xds_ctx in rcwt_loop() for --in=bin and --in=raw
formats. The general_loop() path already frees xds_ctx, but rcwt_loop()
was missing this cleanup, causing an 880-byte leak.
This fixes Valgrind tests 217 (--in=bin) and 218 (--in=raw).
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- XDS encoder leak: Free xds_str when skipping subtitles with invalid timestamps
- XDS decoder cleanup: Add proper cleanup for leftover XDS strings in dinit_cc_decode()
- Remove incorrect free(p) after write_xds_string() - the pointer is stored
for later use by the encoder and must not be freed immediately
- Remove xds_ctx free from dinit_cc_decode() to avoid double-free
These fixes address the 100-byte XDS leak found in Valgrind test 114.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The embedded dec_sub struct in lib_cc_decode had its data field
allocated by write_cc_buffer() but never freed during cleanup.
Added cleanup in dinit_cc_decode() to:
- Free DVB bitmap data (data0/data1) if present
- Free the dec_sub.data field itself
This fixes ~1.7MB to ~2.6MB leaks seen in tests 89, 93, and 96.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit fixes several Valgrind-detected memory issues:
1. Use-after-free in Teletext during PAT changes:
- When parse_PAT() calls dinit_cap() to reinitialize stream info,
it freed the Teletext context but dec_ctx->private_data still
pointed to the freed memory
- Fixed by NULLing out dec_ctx->private_data in dinit_cap() when
freeing shared codec private data
- Also added NULL check in process_data() before calling teletext
functions to gracefully handle freed contexts
2. Uninitialized variables in general_loop():
- stream_mode, get_more_data, ret, and program_iter were declared
without initialization
- While logically set before use, Valgrind tracked them as
potentially uninitialized through complex control flow
- Fixed by initializing all variables at declaration
These fixes eliminate millions of Valgrind errors in teletext tests
(tests 78, 80) and uninitialized value warnings (tests 67, 84, 86).
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit fixes several significant memory leaks found by Valgrind testing:
1. Dtvcc::new encoder leak (decoder/mod.rs):
- Previously always allocated a new encoder_ctx even when ctx.encoder
was not null, then threw away the allocation
- Fix: Only allocate when ctx.encoder is null
- Impact: Eliminated 55MB-331MB leaks per video processing run
2. ccxr_demuxer_isopen optimization (demuxer.rs):
- Previously copied entire demuxer structure just to check infd
- Fix: Directly check (*ctx).infd != -1
- Impact: Eliminated repeated allocations during file processing
3. ccxr_demuxer_close optimization (demuxer.rs):
- Previously did full copy roundtrip (C->Rust->C) to close a file
- Fix: Work directly on C struct, call close() and activity callback
- Impact: Eliminated copy-related allocations and leaks
4. CcxDemuxer Drop implementation (common_types.rs):
- pid_buffers and pids_programs contain raw pointers from Box::into_raw
- These were never freed when CcxDemuxer was dropped
- Fix: Implement Drop to free all non-null Box pointers
- Impact: Eliminates remaining FFI-related leaks
Test results show dramatic improvement:
- Test 24: 55MB leak -> 0 bytes (PERFECT)
- Test 26: 9.75MB leak -> 0 bytes (PERFECT)
- Test 27: 237MB leak -> 0 bytes (PERFECT)
- Test 28: 331MB leak -> 0 bytes (PERFECT)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit fixes critical memory issues found during comprehensive
Valgrind testing:
1. **Use-after-free in inputfile array** (common.rs):
- Problem: `copy_from_rust` was called multiple times (parse_parameters,
demuxer_open, demuxer_close), and each call freed and reallocated the
inputfile array. C code holding references to the old array would then
access freed memory.
- Fix: Only set inputfile on the first call (when inputfile is null).
Subsequent calls skip modifying inputfile since it shouldn't change
during processing.
2. **Memory leak in enc_cfg strings** (common.rs):
- Problem: Each call to `copy_from_rust` allocated new encoder config
strings without freeing the old ones, causing 1,536 bytes leaked per
demuxer open/close cycle.
- Fix: Only set enc_cfg on the first call (when output_filename is null).
Encoder config is static and doesn't need to be re-synced.
3. **Uninitialized memory in telxcc_init** (telxcc.c):
- Problem: `malloc` was used to allocate TeletextCtx but not all fields
were explicitly initialized, causing Valgrind to report 400+ errors
about conditional jumps on uninitialized values.
- Fix: Changed to `calloc` to zero-initialize all fields.
**Valgrind results improvement (Test 3):**
- Errors: 458 → 21 (95% reduction)
- Definitely lost: 2,304 → 768 bytes (67% reduction)
- Use-after-free bugs: Eliminated
- Double-free bugs: Eliminated
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Addresses memory issues identified during Phase 5 (Runtime Analysis) of
the bug analysis plan using Valgrind memory checking.
## Changes
### C Code (Uninitialized Memory)
- ccx_demuxer.c: Use calloc() instead of malloc() in init_demuxer() to
ensure all struct fields are zero-initialized before use
- lib_ccx.c: Use calloc() instead of malloc() in init_decoder_setting()
for consistent initialization
### Rust FFI Code (Memory Leaks)
- utils.rs: Add helper functions for proper FFI string memory management:
- free_rust_c_string(): Free a Rust-allocated CString
- replace_rust_c_string(): Free old string before allocating new one
- free_rust_c_string_array(): Free an array of Rust-allocated CStrings
- common.rs: Update copy_from_rust() to properly manage string memory:
- Free old strings before allocating new ones for all string fields
- Add free_encoder_cfg_strings() to clean up encoder config strings
- Free old inputfile array before allocating new one
## Valgrind Results Comparison
| Metric | Before | After | Improvement |
|---------------------|-----------|-----------|-----------------|
| Definitely lost | 2,371 B | 1,536 B | 35% reduction |
| Indirectly lost | 212 B | 0 B | 100% fixed |
| Uninitialized errors| 131,095 | 0 | 100% fixed |
The remaining 1,536 bytes are from services_charsets array in
EncoderConfig (low priority, rare use case).
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The `to_ctype()` implementations for `DecoderDtvccSettings` and
`Decoder608Settings` were creating temporaries on the stack and
returning pointers to them. These pointers became dangling after
the function returned, causing memory corruption when
`copy_from_rust()` was called.
This fix:
- Preserves the original C-managed `report` and `timing` pointers
in `copy_from_rust()` instead of overwriting them with dangling
pointers to temporaries
- Adds explicit `settings_dtvcc.timing = NULL` initialization in
`init_options()` for completeness
Before this fix, valgrind reported:
- "Invalid write of size 4" in `dtvcc_init` (4016 bytes below stack
pointer)
- "Invalid read" errors in `copy_to_rust` / `DecoderDtvccSettings::
from_ctype`
After this fix, these critical memory corruption errors are resolved.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This fixes a double-free bug that caused CCExtractor to crash with
exit code 134 (SIGABRT) when processing teletext streams.
## Root Cause
The teletext context (TeletextCtx) pointer was shared between two
structures:
- `dec_ctx->private_data` (decoder context)
- `cinfo->codec_private_data` (capture info in cinfo_tree)
When `general_loop()` ended, it called `telxcc_close()` which freed
the TeletextCtx and NULLed `dec_ctx->private_data`. However, the
shared pointer in `cinfo->codec_private_data` was NOT NULLed.
Later, during cleanup in `dinit_cap()`, the code would find the
non-NULL `cinfo->codec_private_data` and attempt to free it again,
causing a double-free crash.
## The Fix
After `telxcc_close()` frees the teletext context in `general_loop()`,
iterate through all cinfo entries and NULL out any that shared the
same pointer. This prevents `dinit_cap()` from attempting to free
already-freed memory.
## Regression
This bug was exposed by commit 7e1a01447 which added cleanup code
to `dinit_cap()` to free `codec_private_data`. The `telxcc_close()`
call in `general_loop()` has existed since 2015, but the double-free
only became possible after the new cleanup code was added.
## Testing
Validated fix against all 27 teletext-related CI tests that were
failing with exit code 134:
Teletext section (21 tests): 63-83 - all PASS
DVB section: 18, 19 - all PASS
Other teletext tests: 224, 234, 235, 236 - all PASS
Verified with valgrind that no "Invalid free" or "double free"
errors occur after the fix.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit addresses Issue #243 where DVB subtitles from Spanish
broadcasts were producing corrupt/garbled OCR output like
"alajentiegaranual dep jemios" instead of "a la entrega anual de premios".
Root cause analysis:
1. Image preprocessing was degrading quality - pixContrastNorm was
causing issues for some DVB sources
2. Default quantization mode (ocr_quantmode=1) was too aggressive,
reducing images to just 3 colors which lost important detail
Changes:
- Remove pixContrastNorm calls from ocr.c (both main OCR and color
detection passes) - these were causing more harm than good
- Change default ocr_quantmode from 1 to 0 (no quantization) in both
C code (ccx_common_option.c) and Rust code (options.rs)
- Add NULL checks in dvbsub_close_decoder() and telxcc_close() for
safety
- Add proper cleanup of codec_private_data pointers in lib_ccx.c and
ts_info.c to prevent double-free crashes
Testing performed:
- Test 21 (English DVB): Completes in ~1 second with good OCR quality
- Test 239 (DVB timing): All 8 subtitles have correct timing
- Spanish DVB (Issue #243): Now produces readable text like
"¡Bienvenidos a la entrega anual de premios" instead of garbage
Users can still use --quant 1 to restore the old quantization behavior
if needed.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This is the autoconf equivalent of the CMake fix in PR #1760.
When building with HARDSUBX enabled but OCR disabled, the autoconf
build system was missing explicit tesseract/leptonica linking in the
HARDSUBX block. While configure.ac sets OCR_IS_ENABLED when HARDSUBX
is enabled (so it would work via the OCR block), this change makes
the dependency explicit and consistent with the CMake fix.
Related: PR #1760, Issue #1719🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- ccx_encoders_ssa.c: Fix combined malloc check pattern
- Check each allocation separately
- Free first allocation if second fails before calling fatal
- ccx_encoders_webvtt.c: Fix 2 combined check patterns
- write_stringz_as_webvtt: Separate checks with proper cleanup
- write_cc_bitmap_as_webvtt: Separate calloc checks with cleanup
- ccx_encoders_smptett.c: Fix combined malloc check pattern
- Check each allocation separately
- Free first allocation if second fails before calling fatal
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- asf_functions.c: Fix 2 unsafe realloc patterns
- Use temporary pointer to preserve original buffer reference
- Free original buffer before calling fatal on allocation failure
- telxcc.c: Fix 2 unsafe realloc patterns in teletext buffer functions
- page_buffer_add_string: Use safe realloc pattern with temp pointer
- ucs2_buffer_add_char: Use safe realloc pattern with temp pointer
- ccx_encoders_srt.c: Fix potential memory leak in write_stringz_as_srt
- Check each allocation separately
- Free successful allocation before fatal if second allocation fails
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- lib_ccx.c: Fix memory leaks in init_libraries error paths
- Add proper cleanup for report_608, EPG buffers, and ctx when
init_decoder_setting fails
- Add comprehensive cleanup at end: label when init_ctx_outbase fails
- utility.c: Fix unsafe realloc in str_reallocncat
- Preserve original pointer and free it on realloc failure
- Prevents memory leak when realloc returns NULL
- avc_functions.c: Fix unsafe realloc patterns in user_data_registered_itu_t_t35
- Use temporary pointer for realloc result
- Free original buffer before calling fatal on allocation failure
- Fixes two instances of unsafe realloc pattern
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Batch 2.2 memory fixes:
dvb_subtitle_decoder.c:
- Fix memory leak in write_dvb_sub: free rect->data1 and rect before fatal
when data0 allocation fails
general_loop.c:
- Fix unsafe realloc in rcwt_loop: use temp variable to preserve original
parsebuf pointer on failure
- Fix memory leak: free parsebuf on early return in rcwt_loop
ts_functions.c:
- Fix unsafe realloc in copy_payload_to_capbuf: use temp variable to
preserve original cinfo->capbuf on failure
- Fix unsafe realloc in hauppauge buffer handling: free original buffer
before fatal on failure
ccx_decoders_608.c:
- Fix two unsafe realloc patterns in write_cc_buffer_as_transcript and
write_cc_buffer_to_gui: use temp variable to preserve original sub->data
on failure
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* fix(dvb): Multiple fixes for DVB subtitle extraction from Chinese broadcasts (#224)
This commit addresses multiple issues with DVB subtitle extraction reported in #224:
1. **PMT parsing crash fix** (ts_tables.c):
- Added minimum length check (16 bytes) to prevent out-of-bounds access
- Added bounds check before memcpy to prevent buffer overflow when section > 1021 bytes
2. **Negative subtitle timing fix** (general_loop.c):
- For DVB subtitle streams, properly initialize min_pts from audio/subtitle PTS
- This fixes the issue where all timestamps were negative (~95000 seconds off)
3. **OCR improvements** (ocr.c):
- Fixed ignore_alpha_at_edge() which could create invalid crop windows
- Added image inversion for DVB subtitles (light text on dark background)
to improve Tesseract OCR accuracy
- Added contrast normalization to further improve character recognition
- Fixed nofontcolor check to respect --no-fontcolor parameter
- Added iteration safety limit in color detection loop
4. **--ocrlang parameter fix** (Rust files):
- Changed ocrlang from Language enum to String to accept Tesseract language
names directly (e.g., "chi_tra", "chi_sim", "eng")
- Added case-insensitive matching for --dvblang parameter
- Added better error messages for invalid language codes
Tested with 12GB Chinese DVB broadcast file:
- Timing: All timestamps now positive (0.235s, 2.594s, etc.)
- OCR: ~80-90% accuracy with chi_tra traineddata (improved from ~70%)
- No crashes during full file processing
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* fix(ocr): Fix crashes in DVB subtitle color detection
Two issues fixed in the OCR color detection code:
1. Tesseract crash during iteration:
- The color detection pass used raw color images without preprocessing
- Tesseract expects dark text on light background, but DVB subtitles
have light text on dark background
- Added grayscale conversion, inversion, and contrast enhancement
(same preprocessing as the main OCR pass)
2. Heap corruption in histogram calculation:
- The histogram loop had no bounds checking on array accesses
- Tesseract could return invalid bounding boxes causing buffer overflows
- Added validation of bounding box coordinates before processing
- Added safe index checking for copy->data and histogram arrays
Also added skip_color_detection label for clean error handling and
proper cleanup of the preprocessed image.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* fix(dvb): Fix zero-duration subtitles and overlaps during PTS jumps
Add start_pts field to cc_subtitle struct to track raw PTS values
independent of FTS timeline resets. Modify end_time calculation in
dvbsub_handle_display_segment() to cap duration at 4 seconds when
PTS jumps cause timeline discontinuities, preventing zero-duration
and overlapping subtitles.
Also update .gitignore to exclude plans/ directory and temp files.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Apply clang-format to all C/H files in src/
- Apply cargo fmt to Rust code
- Update Cargo.lock with latest compatible dependency versions
- Add 24 new entries to CHANGES.TXT for recent fixes and features
Changes in CHANGES.TXT cover:
- CEA-708 bounds checks and UTF-16BE encoding fixes
- New --ttxtforcelatin option for Teletext
- TS files without PAT/PMT fallback support
- Timing accuracy improvements across MP4/MPEG/TS
- Memory safety improvements (null checks, buffer overruns)
- Multi-file processing fixes
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
* fix(rust): Add bounds checks to prevent panic on malformed CEA-708 data
Fixes#1616 - Segmentation fault when extracting from MP4 remuxed from HLS
The CEA-708 decoder could panic when processing truncated or malformed
caption data blocks:
1. Fixed EXT1 command handling in process_service_block():
- Changed &block[1..] to &block[(i+1)..] for correct slice offset
- Added bounds check before accessing the next byte after EXT1
2. Added bounds checks in handle_extended_char():
- Check for empty block before accessing block[0]
- Check block.len() >= 2 before accessing block[1] for C3 commands
3. Removed unnecessary `as i64` cast in es/pic.rs to fix clippy warning
Added 4 unit tests to verify the bounds checking:
- test_handle_extended_char_empty_block
- test_handle_extended_char_c3_insufficient_bytes
- test_process_service_block_ext1_at_end
- test_process_service_block_ext1_with_truncated_c3
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* fix(rust): cast c_long to i64 in pic.rs for Windows compatibility
On Windows, c_long is i32 (32-bit) while on Linux it's i64 (64-bit).
The addition of fts_at_gop_start + frame_offset_ms was failing on Windows
because fts_at_gop_start (c_long = i32) couldn't be added to frame_offset_ms (i64).
Added explicit cast to i64 with #[allow(clippy::unnecessary_cast)] since
the cast is necessary for Windows even though it's redundant on Linux.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Fixes#1701
The `is_decoder_processed_enough()` function had a bug where it would always
return FALSE in multiprogram mode due to the condition:
`dec_ctx->processed_enough == CCX_TRUE && ctx->multiprogram == CCX_FALSE`
This caused the "Error in switch_to_next_file()" warning to trigger incorrectly
for files without captions or in multiprogram mode.
Changes:
- Fix `is_decoder_processed_enough()` in C and Rust:
- In single-program mode: return TRUE if ANY decoder has processed enough
- In multiprogram mode: return TRUE only if ALL decoders have processed enough
- Add check for empty decoder list in `switch_to_next_file()`:
- If no decoders exist (no captions found), suppress the premature ending warning
- This is a normal condition, not an error
- Update Rust tests to verify the new behavior
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
* fix(708): Write consistent 2-byte UTF-16BE encoding for CEA-708 captions
Previously, the write_utf16_char (C) and write_char (Rust) functions
wrote 1 byte for ASCII characters (high byte = 0) and 2 bytes for
non-ASCII characters. This created an invalid mix of 8-bit and 16-bit
values that iconv/encoding_rs couldn't convert properly when UTF-16BE
encoding was specified.
The fix always writes 2 bytes per character, ensuring consistent
UTF-16BE encoding. This allows iconv to properly convert the data to
UTF-8, fixing garbled output for Japanese and Chinese captions.
Before fix (garbled):
人々が私を知‰挰弰栰䴰Ź섰漠時間管理につい‰晦<U+F830>䐰昰䐰縰
After fix (correct):
人々が私を知 ったとき、私は 時間管理につい て書いています
Fixes#1451🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* test(708): Update write_char test to expect 2-byte UTF-16BE output
The test was checking for the old (incorrect) behavior where ASCII
characters were written as 1 byte. The fix for issue #1451 correctly
changed write_char to always write 2 bytes for proper UTF-16BE encoding.
Updated the test to match this correct behavior.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Some broadcast streams incorrectly signal Cyrillic character set (via
X/28 or M/29 packets) when the actual content is Latin text. This causes
garbled output where Latin text like "No. Not back then, anyway." appears
as Cyrillic "Но. Нот бацк тхен, анiваi."
This fix adds a new --ttxtforcelatin option that forces the teletext G0
character set to Latin, ignoring any Cyrillic designation in the stream.
Root cause: The broadcast contained triplet 0x1290 which has bits 10-13
set to 0x1 (Cyrillic family) and bits 7-9 set to 0x5 (Ukrainian option),
causing CCExtractor to use CYRILLIC3 charset instead of Latin.
Usage: ccextractor input.ts --ttxtforcelatin -o output.srt
Before fix (without option):
Subtitle 3: Но. Нот бацк тхен, анiваi.
After fix (with --ttxtforcelatin):
Subtitle 3: No. Not back then, anyway.
Fixes#1395🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Some DVR recordings (e.g., Channel Master DVR+) create transport stream
files that contain valid video and audio data but lack PAT (Program
Association Table) and PMT (Program Map Table). Without these tables,
CCExtractor couldn't identify which PIDs contain video streams with
embedded captions.
This change adds a fallback mechanism that:
1. Enables packet analysis mode when no PAT is found after reading ~1000
TS packets (188KB)
2. Detects video streams by analyzing PES headers (stream_id 0xE0-0xEF)
3. Identifies stream type (MPEG-2 vs H.264) from elementary stream data
4. Registers detected video streams for caption extraction
5. Also detects GA94 caption markers to identify caption-carrying PIDs
The fix allows CCExtractor to extract CEA-608/708 captions from TS files
without PAT/PMT, matching the behavior when FFmpeg is enabled.
Fixes#805🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
When a PTS discontinuity (jump) is detected, the code updates fts_offset
and min_pts to establish a new timeline. However, it was not setting
pts_set back to MinPtsSet, which meant fts_now calculation (which only
runs when pts_set == MinPtsSet) would stop working. This caused all
timestamps after the PTS jump to be stuck.
This fixes issue #1277 where DVD VOB files with PTS discontinuities
(common at chapter boundaries) would stop extracting captions after
about 6 minutes. Version 0.84 worked correctly, but 0.85+ had this
regression.
Closes#1277🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-authored-by: Claude <noreply@anthropic.com>
Follow-up to PR #1769 - use the defined enum constant for HEVC stream
type (0x24) instead of magic numbers for better code maintainability.
Also simplifies the case statement in get_printable_stream_type() by
removing redundant assignment since the enum value passes through
unchanged.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
- ccxr_process_cc_data: Add null pointer checks for dec_ctx, data, and
dec_ctx.dtvcc before dereferencing. Also check cc_count > 0.
- ccxr_parse_parameters: Add null check for argv pointer and use
to_string_lossy() instead of expect() to handle invalid UTF-8
gracefully without panicking.
These changes prevent potential crashes when FFI functions are called
with invalid arguments from C code.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Fixes#1693 - ccextractorwinfull.exe can't print captions to stdout
The CEA-708 decoder crashed on Windows when using --stdout because the
dtvcc_writer was not properly initialized for stdout output:
1. Fixed Windows stdout handle initialization in ccx_encoders_common.c:
- Use GetStdHandle(STD_OUTPUT_HANDLE) instead of NULL for fhandle
- This allows the Rust writer to detect stdout mode properly
2. Changed env_logger target from Stdout to Stderr in lib.rs:
- Debug messages no longer pollute stdout when using --stdout
- This prevents mixing debug output with subtitle content
3. Removed redundant debug statement in service_decoder.rs:
- The bare `debug!("{}", self.current_window)` was noisy and
duplicated by a more detailed debug statement below it
Added tests:
- test_writer_output_with_valid_fd: Verifies stdout mode works
- test_writer_output_missing_filename_and_fd: Verifies proper error handling
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
On Windows, c_long is i32, while on Linux it's i64. This causes
a type mismatch when adding fts_at_gop_start (c_long) to
frame_offset_ms (i64). Fix by explicitly casting to i64.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Replace `.unwrap()` and `.expect()` calls with safe alternatives to prevent
Rust panics when processing multiple files with different characteristics
(e.g., DVD-type followed by HDTV-type).
Changes:
- Use `unwrap_or(0)` for all type conversions that could fail
- Handle RwLock poisoning gracefully in apply_timing_info/write_back_from_timing_info
- Add fps validation and millis capping in GopTimeCode::new()
- Add fallback calculation in ccxr_calculate_ms_gop_time when GopTimeCode
creation fails
Fixes#1377🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Fix two bugs that prevented multi-file processing from working:
1. In common.rs: `options.inputfile.iter()` was iterating over the
Option itself (yielding 0 or 1 items) instead of the Vec contents,
causing num_input_files to always be 1.
2. In parser.rs: append_file_to_queue() was using vec.len() as the
index for new files after resizing with empty strings, causing
files to be placed at positions 0, 10, 20... instead of 0, 1, 2...
Fixes#1810🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Document Fix 7: MP4 c608 track timing and garbage frame detection
- Mark all regressions as fixed or documented as known limitations
- Update status to "Ready for Merge"
- MPEG-PS 66ms offset documented as known limitation (FFmpeg uses
different timing reference for MPEG-PS vs TS containers)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Fix MP4 c608/c708 caption tracks by setting frame type to I-frame
before calling set_fts(). Without video frames, frame type would stay
Unknown and min_pts would never be set, causing broken timestamps.
- Fix premature pts_set = MinPtsSet assignment. Now only set after
min_pts is actually set, preventing fts_now calculation with
uninitialized min_pts (0x01FFFFFFFF) which caused negative timestamps.
- Add garbage frame detection threshold (100ms). When an I-frame arrives:
- If gap between pending_min_pts and I-frame PTS > 100ms: use I-frame
PTS (garbage leading frames from truncated GOP)
- If gap <= 100ms: use pending_min_pts (valid B-frames)
- Track pending_min_pts for all frames (not just unknown type) to enable
proper garbage vs valid B-frame detection.
Results:
- 5df914ce...mp4: 666ms -> 0ms (FIXED)
- c032183e...ts: 284ms -> 0ms (FIXED)
- addf5e2f...ts: 68ms -> ~1ms (FIXED)
- 80848c45...mpg: remains 66ms (FFmpeg uses different reference for MPEG-PS)
- da904de3...mpg: remains 66ms (FFmpeg uses different reference for MPEG-PS)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
For elementary streams with GOP timing (use_gop_as_pts=1), fts_now was
only updated when a GOP header was parsed, not for each frame. This
caused all frames within a GOP to have the same timestamp, resulting
in broken caption timing (1ms, 9ms, 17ms instead of proper times).
The fix calculates fts_now for each frame based on:
fts_at_gop_start + (frames_since_last_gop * 1000 / fps)
Test results for dc7169d7...h264 (raw MPEG-2 elementary stream):
- Before: 1ms, 9ms, 17ms, 25ms (broken)
- After: 2867ms, 4634ms, 6368ms (correct range)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
When transitioning from pop-on to roll-up mode, the first CR command
(with only 1 line visible, changes=0) was resetting ts_start_of_current_line
to -1. This caused the next caption's start time to be set when characters
were typed (~133ms later), not when the CR command was received.
The fix preserves the CR time when rollup_from_popon=1 and changes=0,
ensuring the caption start time matches when the display state changed.
Test results:
- c83f765c...ts: 134ms offset → 1ms (fixed)
- 725a49f8...mpg: 133ms offset → 0ms (fixed)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
When transitioning from pop-on to roll-up mode, CCExtractor was setting
the caption start time when the first character was typed. FFmpeg uses
the time when the display state changed to show multiple lines. This
caused the first roll-up caption after a mode switch to be timestamped
too early.
Changes:
- Add rollup_from_popon flag to track mode transitions
- Reset ts_start_of_current_line on mode switch
- Defer start time until CR causes scrolling in transition mode
- Use ts_start_of_current_line when buffer scrolls during transition
Test results for 725a49f8...mpg:
- Before: 484ms early
- After: 133ms late (~4 frames, acceptable)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The previous timing fixes were being bypassed because set_fts() is called
multiple times per frame - first from the PES/TS layer (with unknown frame
type) and later from the ES parsing layer (with known frame type). The first
call was setting min_pts before we knew whether it was an I-frame.
Changes:
- When frame type is unknown, track PTS in pending_min_pts but DON'T set min_pts
- Only set min_pts when frame type is known AND it's an I-frame
- Added unknown_frame_count for fallback handling of H.264 streams
- After 100+ calls with unknown frame type, use pending_min_pts as fallback
Test results:
- 8e8229b88bc6...mpg: 101ms -> 1ms offset ✓
- c032183ef018...ts: 284ms -> 0ms offset ✓
- add511677cc42...vob: 366ms -> 34ms offset ✓
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add seen_known_frame_type and pending_min_pts fields to track frame
types during initial stream parsing. This infrastructure supports
distinguishing between MPEG-2 streams (where frame types are set) and
H.264 in MPEG-PS (where frame types remain unknown).
Current behavior maintains compatibility by allowing min_pts to be set
from any frame type, which correctly handles both stream types and
matches FFmpeg timing output.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Streams recorded mid-broadcast often start with trailing B/P frames from
a previous GOP. These frames have earlier PTS values than the first
decodable I-frame.
Previously, CCExtractor set min_pts from the first PES packet with a PTS,
which could be an undecodable B/P frame. FFmpeg's cc_dec uses the first
decoded frame (necessarily an I-frame) as its timing reference.
This caused consistent timing offsets. For example, c032183ef01...ts had
a 284ms offset because:
- First PES packet PTS: 2508198438
- First I-frame PTS: 2508223963
- Difference: 25525 ticks = 284ms
Changes:
- timing.rs: Only set min_pts when current_picture_coding_type == IFrame
- ccx_decoders_common.c: Don't increment cb_field counters for container
formats (CCX_H264, CCX_PES) since frame PTS is already correct
- sequencing.c: Include CCX_PES in reset_cb logic alongside CCX_H264
Test results for c032183ef01...ts:
- Before: CCExtractor 1,836ms vs FFmpeg 1,552ms = 284ms offset
- After: CCExtractor 1,552ms vs FFmpeg 1,552ms = 0ms offset
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- EPG_output_live: add NULL checks for filename/finalfilename malloc,
add fopen failure check
- EPG_DVB_decode_string: add NULL checks for decode_buffer and out
malloc
- EPG_decode_content_descriptor: add NULL check for categories malloc
- EPG_decode_parental_rating_descriptor: add NULL check for ratings
malloc
- EPG_decode_extended_event_descriptor: add NULL checks for net and
extended_text malloc
- EPG_ATSC_decode_multiple_string: add NULL checks for event_name and
text malloc
- parse_EPG_packet: add NULL check for buffer malloc, fix unsafe
realloc that lost original pointer on failure
- EPG_decode_short_event_descriptor: fix memory leak - free event_name
on early return
- EPG_DVB_decode_EIT: fix memory leak - call EPG_free_event on early
return
All OOM conditions now use fatal(EXIT_NOT_ENOUGH_MEMORY, ...) following
the project's coding patterns.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Fix macro formatting to have 'do' and '{' on separate lines and
align backslashes consistently, as required by clang-format.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Replace all sprintf calls with snprintf to prevent potential buffer
overflows in CEA-708 output functions. Key changes:
- dtvcc_change_pen_colors: add bounds checking for font color tags
- dtvcc_change_pen_attribs: add bounds checking for italic/underline tags
- dtvcc_write_srt: track buffer length with snprintf
- dtvcc_write_transcript: add bounds checking for CC/mode labels
- dtvcc_write_sami_header: use snprintf macro for all SAMI tags
- dtvcc_write_sami_footer: use snprintf with length check
- dtvcc_write_sami: add bounds checking for sync tags
- dtvcc_write_scc_header: use snprintf for SCC header
- add_needed_scc_labels: add buffer size parameter for safe writes
- dtvcc_write_scc: use snprintf macro for all SCC formatting
- dtvcc_writer_init: use snprintf for filename suffix
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add NULL checks after malloc calls for compressed_data_buffer and buff_ptr
- Replace sprintf with snprintf for all string formatting operations
- Replace strcat with bounds-checked direct character assignment
- Replace vsprintf with vsnprintf in debug_log function
- Replace sprintf loop in random_chars with direct character lookup table
- Increase buffer sizes for date_str (50->64), time_str (30->32), tcr_str (25->32)
- Initialize tcr_str in default case to prevent uninitialized use
- Add lib_ccx.h include for fatal() function declaration
Functions modified:
- mcc_encode_cc_data: OOM check + sprintf -> snprintf + strcat -> direct assignment
- generate_mcc_header: sprintf -> snprintf for uuid_str, date_str, time_str, tcr_str
- add_boilerplate: OOM check for buff_ptr
- random_chars: sprintf -> direct character lookup (more efficient)
- debug_log: vsprintf -> vsnprintf + safer strlen check
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- search_language_pack: add NULL check after strdup(), fix unsafe
realloc() that lost original pointer on failure
- init_ocr: fix memory leak where ctx wasn't freed on early return
when tessdata not found, add NULL checks for strdup() calls
- ocr_bitmap: fix memory leak when pixCreate partially fails, add
missing boxDestroy for crop_points on early return, add NULL checks
for histogram/iot/mcit allocations, fix unsafe realloc() calls,
add NULL check for text_out strdup
- ocr_rect: add NULL check for copy allocation, initialize copy->data
to NULL to prevent freep on uninitialized pointer, add NULL check
for copy->data allocation
- paraof_ocrtext: use fatal() on malloc failure for consistent OOM
handling
All OOM conditions now use fatal(EXIT_NOT_ENOUGH_MEMORY, ...) following
the project's coding patterns.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit addresses multiple memory safety issues in ccx_encoders_spupng.c:
**NULL pointer dereference fixes (crash prevention):**
1. write_cc_bitmap_as_spupng() line 440: Added NULL check after malloc
for pbuf - previously would crash on memset if allocation failed.
2. write_image() line 541: Added NULL check after malloc for row buffer
with proper cleanup via goto finalise.
3. center_justify() line 611: Added NULL check after malloc for
temp_buffer - previously would crash immediately on use.
4. utf8_to_utf32() line 718: Added NULL check after calloc for
string_utf32 - previously would crash on use by iconv.
5. spupng_export_string2png() line 780: Fixed existing NULL check that
printed error but did not return/exit - code would continue to
memset(NULL, ...) causing a crash.
**Memory leak fixes:**
6. spupng_export_string2png() line 789: Fixed leak where buffer was not
freed when strdup(str) failed and function returned early.
7. spupng_export_string2png() line 901: Fixed leak on realloc failure
where buffer, tmp, and string_utf32 were leaked. Now properly frees
all three before calling fatal().
All fatal() calls include diagnostic information (function name and
bytes requested where applicable) to aid debugging OOM conditions.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Fix macro formatting to have 'do' and '{' on separate lines and
align backslashes consistently, as required by clang-format.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Replace all sprintf calls with snprintf to prevent potential buffer
overflows in CEA-708 output functions. Key changes:
- dtvcc_change_pen_colors: add bounds checking for font color tags
- dtvcc_change_pen_attribs: add bounds checking for italic/underline tags
- dtvcc_write_srt: track buffer length with snprintf
- dtvcc_write_transcript: add bounds checking for CC/mode labels
- dtvcc_write_sami_header: use snprintf macro for all SAMI tags
- dtvcc_write_sami_footer: use snprintf with length check
- dtvcc_write_sami: add bounds checking for sync tags
- dtvcc_write_scc_header: use snprintf for SCC header
- add_needed_scc_labels: add buffer size parameter for safe writes
- dtvcc_write_scc: use snprintf macro for all SCC formatting
- dtvcc_writer_init: use snprintf for filename suffix
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add NULL checks after malloc calls for compressed_data_buffer and buff_ptr
- Replace sprintf with snprintf for all string formatting operations
- Replace strcat with bounds-checked direct character assignment
- Replace vsprintf with vsnprintf in debug_log function
- Replace sprintf loop in random_chars with direct character lookup table
- Increase buffer sizes for date_str (50->64), time_str (30->32), tcr_str (25->32)
- Initialize tcr_str in default case to prevent uninitialized use
- Add lib_ccx.h include for fatal() function declaration
Functions modified:
- mcc_encode_cc_data: OOM check + sprintf -> snprintf + strcat -> direct assignment
- generate_mcc_header: sprintf -> snprintf for uuid_str, date_str, time_str, tcr_str
- add_boilerplate: OOM check for buff_ptr
- random_chars: sprintf -> direct character lookup (more efficient)
- debug_log: vsprintf -> vsnprintf + safer strlen check
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- search_language_pack: add NULL check after strdup(), fix unsafe
realloc() that lost original pointer on failure
- init_ocr: fix memory leak where ctx wasn't freed on early return
when tessdata not found, add NULL checks for strdup() calls
- ocr_bitmap: fix memory leak when pixCreate partially fails, add
missing boxDestroy for crop_points on early return, add NULL checks
for histogram/iot/mcit allocations, fix unsafe realloc() calls,
add NULL check for text_out strdup
- ocr_rect: add NULL check for copy allocation, initialize copy->data
to NULL to prevent freep on uninitialized pointer, add NULL check
for copy->data allocation
- paraof_ocrtext: use fatal() on malloc failure for consistent OOM
handling
All OOM conditions now use fatal(EXIT_NOT_ENOUGH_MEMORY, ...) following
the project's coding patterns.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit addresses multiple memory safety issues in ccx_encoders_spupng.c:
**NULL pointer dereference fixes (crash prevention):**
1. write_cc_bitmap_as_spupng() line 440: Added NULL check after malloc
for pbuf - previously would crash on memset if allocation failed.
2. write_image() line 541: Added NULL check after malloc for row buffer
with proper cleanup via goto finalise.
3. center_justify() line 611: Added NULL check after malloc for
temp_buffer - previously would crash immediately on use.
4. utf8_to_utf32() line 718: Added NULL check after calloc for
string_utf32 - previously would crash on use by iconv.
5. spupng_export_string2png() line 780: Fixed existing NULL check that
printed error but did not return/exit - code would continue to
memset(NULL, ...) causing a crash.
**Memory leak fixes:**
6. spupng_export_string2png() line 789: Fixed leak where buffer was not
freed when strdup(str) failed and function returned early.
7. spupng_export_string2png() line 901: Fixed leak on realloc failure
where buffer, tmp, and string_utf32 were leaked. Now properly frees
all three before calling fatal().
All fatal() calls include diagnostic information (function name and
bytes requested where applicable) to aid debugging OOM conditions.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The get_visible_start() and get_visible_end() functions were adding a
cb_field offset (cb_field * 1001/30 ms) to caption timestamps. This
offset was designed for broadcast MPEG-TS streams where caption data
arrives continuously at field rate (59.94 fields/sec).
However, for container formats like MP4, all caption data for a video
frame is bundled together and should use the frame's PTS directly. The
offset was causing caption start times to be ~300ms (9 frames) later
than the actual video frame timestamp.
Root cause analysis:
1. Previous caption ends → get_visible_end() returns inflated time
due to cb_field offset → minimum_fts set to this inflated value
2. New caption starts → get_visible_start() constrained by
minimum_fts + 1 → start time incorrectly pushed forward
Fix:
- Add new Rust FFI functions ccxr_get_visible_start() and
ccxr_get_visible_end() that return base FTS (fts_now + fts_global)
without the cb_field offset
- Update C wrappers to call the new Rust functions
- Update Rust decoder timing to use base FTS
Verification against ffmpeg:
- Before fix: 00:16:06,799 (300ms late)
- After fix: 00:16:06,499 (matches ffmpeg exactly)
- ffmpeg ref: 00:16:06,499
The get_fts() function is unchanged - it still returns the
offset-adjusted time for use cases that need it (like extraction
time boundary checking).
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Replace sprintf/strcpy with snprintf/memcpy in LOW priority files:
- general_loop.c: proper buffer allocation with OOM check, snprintf
- ccx_encoders_g608.c: snprintf with sizeof for timeline buffer
- lib_ccx.c: fix buffer size calculation, add missing null check, snprintf
- ccx_common_timing.c: snprintf with documented max size for time functions
- ts_functions.c: snprintf with sizeof in debug code
- matroska.c: bounded memcpy to prevent overflow from malformed language codes
- output.c: snprintf with known allocated size
This completes Phase 3.1 of the buffer safety audit.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add NULL checks after malloc calls for temp_encoder, current_name, and newname
- Replace sprintf with snprintf for safe string formatting
- Replace strcpy/strcat with strncpy and snprintf to prevent buffer overflows
- Increase buffer sizes from 6/10/15 to 16 chars to safely hold extension numbers
- Use proper size tracking with filename_len and buffer size variables
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Replace sprintf with snprintf for all string formatting operations
- Replace strcpy/strcat chains with snprintf for bounds-safe concatenation
- Replace strcpy with strncpy + null terminator for fixed-size buffers
- Fix bug in xds_do_private_data: sprintf in loop was overwriting instead
of appending hex bytes to output string
Functions modified:
- xds_do_copy_generation_management_system: 3 sprintf -> snprintf
- xds_do_content_advisory: 5 sprintf -> snprintf, strcpy/strcat chain fixed
- xds_do_current_and_future: strcpy -> strncpy for program description
- xds_do_channel: strcpy -> strncpy for network name
- xds_do_private_data: fixed loop to properly append hex bytes
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
- search_language_pack: add NULL check after strdup(), fix unsafe
realloc() that lost original pointer on failure
- init_ocr: fix memory leak where ctx wasn't freed on early return
when tessdata not found, add NULL checks for strdup() calls
- ocr_bitmap: fix memory leak when pixCreate partially fails, add
missing boxDestroy for crop_points on early return, add NULL checks
for histogram/iot/mcit allocations, fix unsafe realloc() calls,
add NULL check for text_out strdup
- ocr_rect: add NULL check for copy allocation, initialize copy->data
to NULL to prevent freep on uninitialized pointer, add NULL check
for copy->data allocation
- paraof_ocrtext: use fatal() on malloc failure for consistent OOM
handling
All OOM conditions now use fatal(EXIT_NOT_ENOUGH_MEMORY, ...) following
the project's coding patterns.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit addresses multiple memory safety issues in ccx_encoders_spupng.c:
**NULL pointer dereference fixes (crash prevention):**
1. write_cc_bitmap_as_spupng() line 440: Added NULL check after malloc
for pbuf - previously would crash on memset if allocation failed.
2. write_image() line 541: Added NULL check after malloc for row buffer
with proper cleanup via goto finalise.
3. center_justify() line 611: Added NULL check after malloc for
temp_buffer - previously would crash immediately on use.
4. utf8_to_utf32() line 718: Added NULL check after calloc for
string_utf32 - previously would crash on use by iconv.
5. spupng_export_string2png() line 780: Fixed existing NULL check that
printed error but did not return/exit - code would continue to
memset(NULL, ...) causing a crash.
**Memory leak fixes:**
6. spupng_export_string2png() line 789: Fixed leak where buffer was not
freed when strdup(str) failed and function returned early.
7. spupng_export_string2png() line 901: Fixed leak on realloc failure
where buffer, tmp, and string_utf32 were leaked. Now properly frees
all three before calling fatal().
All fatal() calls include diagnostic information (function name and
bytes requested where applicable) to aid debugging OOM conditions.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Fix macro formatting to have 'do' and '{' on separate lines and
align backslashes consistently, as required by clang-format.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Replace all sprintf calls with snprintf to prevent potential buffer
overflows in CEA-708 output functions. Key changes:
- dtvcc_change_pen_colors: add bounds checking for font color tags
- dtvcc_change_pen_attribs: add bounds checking for italic/underline tags
- dtvcc_write_srt: track buffer length with snprintf
- dtvcc_write_transcript: add bounds checking for CC/mode labels
- dtvcc_write_sami_header: use snprintf macro for all SAMI tags
- dtvcc_write_sami_footer: use snprintf with length check
- dtvcc_write_sami: add bounds checking for sync tags
- dtvcc_write_scc_header: use snprintf for SCC header
- add_needed_scc_labels: add buffer size parameter for safe writes
- dtvcc_write_scc: use snprintf macro for all SCC formatting
- dtvcc_writer_init: use snprintf for filename suffix
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add NULL checks after malloc calls for compressed_data_buffer and buff_ptr
- Replace sprintf with snprintf for all string formatting operations
- Replace strcat with bounds-checked direct character assignment
- Replace vsprintf with vsnprintf in debug_log function
- Replace sprintf loop in random_chars with direct character lookup table
- Increase buffer sizes for date_str (50->64), time_str (30->32), tcr_str (25->32)
- Initialize tcr_str in default case to prevent uninitialized use
- Add lib_ccx.h include for fatal() function declaration
Functions modified:
- mcc_encode_cc_data: OOM check + sprintf -> snprintf + strcat -> direct assignment
- generate_mcc_header: sprintf -> snprintf for uuid_str, date_str, time_str, tcr_str
- add_boilerplate: OOM check for buff_ptr
- random_chars: sprintf -> direct character lookup (more efficient)
- debug_log: vsprintf -> vsnprintf + safer strlen check
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The Rust parser was incorrectly setting date_format to HHMMSS (no
milliseconds) instead of HHMMSSFFF (with milliseconds) for --out=ttxt.
This bug was introduced in PR #1619 when porting the parser to Rust.
The original C code correctly used ODF_HHMMSSMS which includes
milliseconds in the timestamp format (HH:MM:SS,mmm).
Before: 10:25:16 (missing milliseconds)
After: 10:25:16,000 (correct format matching original C behavior)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Replace sprintf, strcpy, and strcat calls with snprintf and bounds-checked
operations to prevent potential buffer overflows. Key changes:
- write_stringz_as_smptett: use snprintf for timestamp formatting
- write_cc_bitmap_as_smptett: use snprintf with INITIAL_ENC_BUFFER_CAPACITY
- write_cc_buffer_as_smptett:
- Add NULL checks for malloc allocations
- Track buffer size and use snprintf throughout
- Replace strcpy/strcat chains with bounds-checked memcpy/snprintf
- Use snprintf for style tag and color code formatting
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
* fix(scc): Always emit position codes at start of caption (fixes#1776)
The SCC encoder was initializing current_row=14 and current_column=0,
which caused the first position code (PAC) to be skipped when caption
content started at row 14 (the last row), column 0. This happened because
the condition checking if row/column changed would be false.
For example, a caption starting at row 15 (1-indexed), column 0 should
output the PAC code 9470/{1500} but this was being omitted.
Fix by initializing current_row and current_column to UINT8_MAX, which
is an impossible value that will never match any valid row (0-14) or
column (0-31), ensuring the position code is always written for the
first character of each caption.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* fix(rust): Remove unused assignments to fix clippy warnings
Remove unnecessary `time_show.time_in_ms += 1000 / 29.97` operations
that were restoring values that were never read afterwards.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
- EPG_output_live: add NULL checks for filename/finalfilename malloc,
add fopen failure check
- EPG_DVB_decode_string: add NULL checks for decode_buffer and out
malloc
- EPG_decode_content_descriptor: add NULL check for categories malloc
- EPG_decode_parental_rating_descriptor: add NULL check for ratings
malloc
- EPG_decode_extended_event_descriptor: add NULL checks for net and
extended_text malloc
- EPG_ATSC_decode_multiple_string: add NULL checks for event_name and
text malloc
- parse_EPG_packet: add NULL check for buffer malloc, fix unsafe
realloc that lost original pointer on failure
- EPG_decode_short_event_descriptor: fix memory leak - free event_name
on early return
- EPG_DVB_decode_EIT: fix memory leak - call EPG_free_event on early
return
All OOM conditions now use fatal(EXIT_NOT_ENOUGH_MEMORY, ...) following
the project's coding patterns.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add NULL checks after malloc calls in copy_encoder_context(),
copy_decoder_context(), copy_subtitle(), and init_cc_decode()
- Fix buffer overflows in copy_encoder_context() where string
allocations were missing +1 for null terminator
- Call fatal(EXIT_NOT_ENOUGH_MEMORY, ...) on allocation failure
following the pattern used in matroska.c
- Initialize pointers to NULL after memcpy to prevent use of
stale pointers from the copied structure
- Prevent null pointer dereference in init_cc_decode() when dtvcc_init
returns NULL
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Remove three unused assignments to `time_show.time_in_ms` that were
flagged by Clippy as "value assigned is never read".
The pattern was: subtract frame delay, use the value, then restore it.
However, since `time_show` is not used after the match statement, the
restoration assignments were unnecessary dead code.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
* fix(matroska): add memory safety checks and fix memory leaks
This commit addresses multiple memory safety issues in the Matroska
parser identified through static analysis (cppcheck).
## Null pointer dereference after malloc (15 fixes)
Added null checks after all malloc/calloc calls to prevent crashes
when memory allocation fails:
- read_byte_block(): line 28
- read_bytes_signed(): line 38
- generate_timestamp_ass_ssa(): line 267
- parse_segment_cluster_block_group_block(): lines 306, 361
- parse_segment_cluster_block_group_block_additions(): line 405
- parse_segment_cluster_block_group(): line 476
- parse_segment_track_entry(): lines 958, 973
- parse_private_codec_data(): line 1019
- generate_filename_from_track(): line 1167
- ass_ssa_sentence_erase_read_order(): line 1191
- save_sub_track(): lines 1264, 1271, 1303, 1310
- matroska_loop(): lines 1496, 1505
## Buffer overflow fixes (3 fixes)
- generate_timestamp_ass_ssa(): Increased buffer from 15 to 32 bytes,
changed sprintf to snprintf. GCC warned output could be 11-23 bytes.
- save_sub_track(): Increased number[] buffer from 9 to 16 bytes,
changed sprintf to snprintf.
- generate_filename_from_track(): Now calculates required buffer size
dynamically instead of using fixed 200 bytes.
## Memory leak fixes (7 fixes)
- parse_ebml(): Fixed leak of read_vint_block_string() return value
- parse_segment_info(): Fixed 4 leaks of read_vint_block_string()
returns (filename, title, muxing_app, writing_app)
- parse_segment_track_entry(): Added free(lang) before reassignment
- save_sub_track(): Fixed leak where text pointer was advanced,
losing original allocation
## Realloc error handling (3 fixes)
Fixed realloc calls to use temporary variable, preventing loss of
original pointer if realloc fails:
- parse_segment_cluster_block_group_block(): line 366
- parse_segment_cluster_block_group(): line 475
- parse_segment_track_entry(): line 973
## Use-after-free fix (1 fix)
- matroska_loop(): Saved avc_track_number and dec_sub.got_output
before calling matroska_free_all(), then used saved values
## Missing free fixes (2 fixes)
- free_sub_track(): Added free(track->sentences) for the array itself
- matroska_free_all(): Added free(mkv_ctx->sub_tracks) for the array
## Other improvements
- Initialized sub_track->sentences to NULL in parse_segment_track_entry()
to ensure safe NULL check in free_sub_track()
All changes use EXIT_NOT_ENOUGH_MEMORY (exit code 500) for
out-of-memory conditions, consistent with the rest of the codebase.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* fix(dvb_subtitle_decoder): add NULL checks after malloc calls
Add missing NULL checks for 9 malloc() calls in the DVB subtitle decoder
that could cause crashes or undefined behavior if memory allocation fails.
All checks use fatal(EXIT_NOT_ENOUGH_MEMORY, ...) to terminate gracefully
with an appropriate error message, consistent with the approach used in
matroska.c and other parts of the codebase.
Affected functions and allocations:
- dvbsub_init_decoder(): DVBSubContext allocation
- dvbsub_parse_clut_segment(): DVBSubCLUT allocation
- dvbsub_parse_region_segment(): DVBSubRegion, pbuf, DVBSubObject,
and DVBSubObjectDisplay allocations
- dvbsub_parse_page_segment(): DVBSubRegionDisplay allocation
- write_dvb_sub(): cc_bitmap (rect), data1, and data0 allocations
- dvbsub_handle_display_segment(): private_data allocation
This also fixes a potential memory leak in write_dvb_sub() where rect
and rect->data1 would be leaked if the rect->data0 allocation failed
(previously returned -1 without cleanup, now terminates via fatal()).
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
This commit addresses multiple memory safety issues in the Matroska
parser identified through static analysis (cppcheck).
## Null pointer dereference after malloc (15 fixes)
Added null checks after all malloc/calloc calls to prevent crashes
when memory allocation fails:
- read_byte_block(): line 28
- read_bytes_signed(): line 38
- generate_timestamp_ass_ssa(): line 267
- parse_segment_cluster_block_group_block(): lines 306, 361
- parse_segment_cluster_block_group_block_additions(): line 405
- parse_segment_cluster_block_group(): line 476
- parse_segment_track_entry(): lines 958, 973
- parse_private_codec_data(): line 1019
- generate_filename_from_track(): line 1167
- ass_ssa_sentence_erase_read_order(): line 1191
- save_sub_track(): lines 1264, 1271, 1303, 1310
- matroska_loop(): lines 1496, 1505
## Buffer overflow fixes (3 fixes)
- generate_timestamp_ass_ssa(): Increased buffer from 15 to 32 bytes,
changed sprintf to snprintf. GCC warned output could be 11-23 bytes.
- save_sub_track(): Increased number[] buffer from 9 to 16 bytes,
changed sprintf to snprintf.
- generate_filename_from_track(): Now calculates required buffer size
dynamically instead of using fixed 200 bytes.
## Memory leak fixes (7 fixes)
- parse_ebml(): Fixed leak of read_vint_block_string() return value
- parse_segment_info(): Fixed 4 leaks of read_vint_block_string()
returns (filename, title, muxing_app, writing_app)
- parse_segment_track_entry(): Added free(lang) before reassignment
- save_sub_track(): Fixed leak where text pointer was advanced,
losing original allocation
## Realloc error handling (3 fixes)
Fixed realloc calls to use temporary variable, preventing loss of
original pointer if realloc fails:
- parse_segment_cluster_block_group_block(): line 366
- parse_segment_cluster_block_group(): line 475
- parse_segment_track_entry(): line 973
## Use-after-free fix (1 fix)
- matroska_loop(): Saved avc_track_number and dec_sub.got_output
before calling matroska_free_all(), then used saved values
## Missing free fixes (2 fixes)
- free_sub_track(): Added free(track->sentences) for the array itself
- matroska_free_all(): Added free(mkv_ctx->sub_tracks) for the array
## Other improvements
- Initialized sub_track->sentences to NULL in parse_segment_track_entry()
to ensure safe NULL check in free_sub_track()
All changes use EXIT_NOT_ENOUGH_MEMORY (exit code 500) for
out-of-memory conditions, consistent with the rest of the codebase.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
* feat: added demuxer module
* Cargo Lock Update
* Completed file_functions and demuxer
* Completed file_functions and demuxer
* written extern functions for demuxer
* Removed libc completely, added tests for gxf and ported gxf to C
* Hardsubx error fixed
* Fixing format issues
* clippy errors fixed
* fixing format issues
* fixing format issues
* Windows failing tests
* Windows failing tests
* demuxer: added demuxer data transfer functions and removed some structs
* made Demuxer and File Functions
* Minor formatting changes
* Minor Rebasing changes
* demuxer: format rust and unit test rust checks
* C formatting
* Windows Failing test
* Windows Failing test
* Update CHANGES.TXT
* Update CHANGES.TXT
* Windows Failing Tests
* Windows Failing Tests
* Problem in Copy to Rust and some typos that copilot review suggested
* Minor Formatting Error
* Windows Failing Regressions
* Windows Failing Regressions
* Minor Comment Change
* Data transfer module for DemuxerData added and more rustlike syntax to ctorust.rs
* Minor Formatting Changes
* demuxer: Rebase and a few tweaks to file_functions
* demuxer: Minor Formatting Error
* [FIX] 134 Codes in XDS and General Tests (#1708)
* Made pointers valid in Unit Tests of Decoder
* fix: test_do_cb
* Copilot Suggestions
* Suggestions about Redundancy
* Suggestions about Redundancy
* [FEAT] Add `bitstream` module in `lib_ccxr` (#1649)
* feat: Add bitstream module
* run code formatters
* Run cargo clippy --fix
* Run cargo fmt --all
* refactor: remove rust pointer from C struct
* feat: Add bitstream module
* run code formatters
* Run cargo clippy --fix
* Run cargo fmt --all
* refactor: remove rust pointer from C struct
* Added Bitstream to libccxr_exports
* Minor Formatting Issue
* Bitstream: Removed redundant CType
* bitstream: recommended changes for is_byte_aligned
* bitstream: recommended changes for long comments
* bitstream: comment fix
* bitstream: removed redundant comparism comments
---------
Co-authored-by: Deepnarayan Sett <depnra1@gmail.com>
Co-authored-by: Deepnarayan Sett <71217129+steel-bucket@users.noreply.github.com>
* demuxer: minor formatting changes
* Demuxer: Changes to mistakes in CHANGES.txt
* Demuxer: Removed extra newline in ccextractor.c
* Demuxer: Changes to Encoding resolved
* Demuxer: Moved CCX_NOPTS to common structs and some changes to Demuxer Data regd. MPEG_CLOCK_FREQ
* some refactoring to CCX_NOPTS
* Demuxer: Minor Mistake regarding CHANGES.txt
* Demuxer: Unit test rust failing because of CCX_NOPTS
* Demuxer: changed common_structs to common_types
* Demuxer: Removed redundant libraries from Cargo.toml and moved tempfile to dev-dependencies
* Demuxer: Removed to_vec function and renamed PSIBuffer/PMTEntry from_ctype functions
* Demuxer: Renamed Stream_Type, improved Time complexity of the default() function and removed redundant comments
* Demuxer: Removed two repeated code blocks and removed redundant comments
* Demuxer: Removed two code blocks
* Demuxer: Review Changes
* Demuxer: Removed redundant tests
* Update src/rust/src/demuxer/demux.rs
Co-authored-by: Prateek Sunal <prtksunal@gmail.com>
* Demuxer: Errors due to Rebase
* Demuxer: Removed get_stream_mode
* Demuxer: Errors due to rebasing and removing redundant CType Functions
* Demuxer: Failing ES regressions
* Demuxer: MythTV failing regression
* Demuxer: Removed redundant comments
* Demuxer: Unplugged ES for now
* Demuxer: Replugged in ES
* Demuxer: Formatting error
* Demuxer: Windows failing CI
* Demuxer: Windows failing CI
* Demuxer: Windows failing Regressions
* Demuxer: Formatting
* Demuxer: Minor Cargo Clippy change
* Demuxer: running regressions again
* Demuxer: Cargo Lockfile Change
* Demuxer: running regressions again
* Demuxer: running regressions again
---------
Co-authored-by: Swastik Patel <swastikpatel29@gmail.com>
Co-authored-by: Prateek Sunal <prtksunal@gmail.com>
Original description:
Pull Requests Description :
Added logic to detect and replace any occurrence of "--" in comments with a single "-" to ensure valid XML.
Used a bulk write ('fwrite') to efficiently handle portions of the string that don't contain invalid sequences.
Ensured that comments are written correctly without altering the original structure of the code.
Updated function 'write_spucomment' to handle the sanitization process efficiently.
* [FIX] Update vcpkg baseline and use forked rsmpeg for FFmpeg 7
Update vcpkg baseline from Feb 2024 to Dec 2025 to resolve libxml2
hash mismatch. GitLab regenerates archives dynamically, causing
SHA512 verification failures with old baselines.
Switch to CCExtractor's forked rsmpeg (github.com/CCExtractor/rsmpeg)
which pins rusty_ffmpeg to 0.16.4 for FFmpeg 7.1 compatibility.
This provides consistent FFmpeg 7 support across all platforms.
Changes:
- Update vcpkg baseline in workflow and vcpkg.json
- Use forked rsmpeg from git for all platforms
- Use ffmpeg7_1 feature instead of ffmpeg6/ffmpeg8
- Use link_vcpkg_ffmpeg for Windows
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* Enable use_prebuilt_binding feature for rsmpeg
This ensures consistent FFmpeg 7 API signatures across all platforms,
regardless of the system FFmpeg version installed. Ubuntu's FFmpeg 6
has different function signatures than FFmpeg 7.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* Standardize on FFmpeg 6.1.1 across all platforms
Use FFmpeg 6 consistently:
- Linux: uses apt packages (libavcodec-dev, etc.) which provide FFmpeg 6
- Windows: vcpkg baseline pinned to FFmpeg 6.1.1 (commit 5a58e645)
- macOS: uses system FFmpeg 6
This ensures consistent behavior and API compatibility across all platforms.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* Use platform-appropriate FFmpeg versions
- Linux: FFmpeg 6 (from Ubuntu apt packages)
- Windows: FFmpeg 7 (from vcpkg with recent baseline)
- macOS: FFmpeg 7 (from Homebrew)
This fixes the Windows build which was failing due to vcpkg
baseline hash mismatch for libxml2 in older baselines.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* Use FFmpeg 7 with prebuilt bindings for Linux
Use ffmpeg7 feature everywhere and use_prebuilt_binding for Linux
to ensure FFmpeg 7 API signatures regardless of system FFmpeg version.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* Fix library names for Windows build with updated vcpkg
- Update leptonica library name from 1.83.1 to 1.85.0
- Update tesseract library name from tesseract53 to tesseract55 (v5.5.1)
- Update libiconv library names: charset.lib -> libcharset.lib, iconv.lib -> libiconv.lib
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* Fix iconv library name for vcpkg static build
vcpkg libiconv for x64-windows-static produces only iconv.lib
with charset functionality bundled in, not separate libcharset.lib
and libiconv.lib files.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* Fix iconv library names: use charset.lib and iconv.lib
Restores the correct vcpkg libiconv library names:
- charset.lib (libcharset library)
- iconv.lib (libiconv library)
These are the original names from vcpkg libiconv package for x64-windows-static.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* try: New Hash
Updated the builtin baseline hash for ccextractor.
* Remove charset.lib and iconv.lib from dependencies
The project has its own win_iconv.c implementation in src/thirdparty/win_iconv/
which provides iconv functionality. With the updated vcpkg baseline (ab2977be),
the libiconv library doesn't produce charset.lib or libcharset.lib files.
FFmpeg is also built with --disable-iconv in this vcpkg configuration, so
the external iconv libraries are not needed by any of the vcpkg dependencies.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Co-authored-by: Deepnarayan Sett <71217129+steel-bucket@users.noreply.github.com>
- Update all build configuration files to require Rust 1.87.0+
- Add clippy.toml with MSRV configuration as requested
- Maintain modern Rust features like is_multiple_of()
- Fixes build compatibility issue #1765
- Reverts is_multiple_of(2) to stable % 2 == 0 check to maintain
compatibility with Rust 1.54.0 (project MSRV)
- Adds clippy.toml with msrv = '1.54.0' to prevent Clippy from
suggesting APIs that aren't available in the MSRV
Fixes: #1765
Fixes#1719 - build was failing with --enable-hardsubx due to missing
tesseract library linking. Added pkg_check_modules for tesseract and
leptonica in the HARDSUBX section of CMakeLists.txt.
Tested with: cmake -DWITH_HARDSUBX=ON -DWITH_OCR=ON -DWITH_FFMPEG=ON
GPAC renamed its libraries to `libgpac.so.13` causing image build to fail:
```
Error: building at STEP "COPY --from=builder /usr/local/lib/libgpac.so.12 /usr/local/lib/": checking on sources under "/home/pszemus/.local/share/containers/storage/overlay/faa4f2b5c39251a5cf42a97234d2d5652336a2388c96a64d85fc1922c4c43a71/merged": copier: stat: "/usr/local/lib/libgpac.so.12": no such file or directory
```
so let's fix the gpac version to the latest release (2.4.0)
* chore(cargo): Add dependencies
* feat: Create new module `net` in lib_ccxr
feat: Create new module `net` in lib_ccxr
* feat: Add block related functionality in `block.rs`
* feat: Add `target.rs` module for sending data blocks related functions
* feat(modules): Add all necessary modules
* feat: Add `source.rs` module for reading data blocks related functions from source
* feat: Add C equivalent functions in rust
* feat(module): Add `net` module in `libccxr_export`
* chore(cargo): Update Cargo.lock
* feat: Add C equivalent code in `libccxr_exports` & use in `networking.c`
* chore: Remove unused imports
* chore(clippy): Fix clippy warnings
* Net Module: Fixes in parser.rs - removed an extra check
* Net Module: Fixes in block.rs - fixed formatting issues
* Net Module: Fixes in source.rs - rewrote UDP implementation and a few other fixes
* Net Module: Fixes in target.rs - fixed formatting issues
* Net module: Rebasing and formatting changes
* Net module: Clashing names after rebase
* Net module: Clippy errors
---------
Co-authored-by: IshanGrover2004 <groverishan2004@gmail.com>
* Ported ES Module to Rust
* Windows Failing CI
* ES module: Clippy changes
* ES module: Cmake failing CI
* ES module: Cmake failing CI
* ES Module: Fixed mistake in read_gop_info
* ES Module: Minor mistakes in pic.rs and seq.rs
* ES Module: Goptime regression failing
* ES Module: Windows failing CI
* ES Module: ASCII value change in userdata.rs
* ES Module: Formatting issues
* Removal: Removed redundant C code already ported to Rust
* Removal: C formatting
* Removal: More Removal and CI issues in Mac
* Removal: CI issues in Mac
* Removal: Changes due to Rebase
* Removal: Failing CI on mac
* Removal: Failing regression test on dvdraw
* Fix hardsubx_decoder.c compilation with ENABLE_FFMPEG
Fix unresolved function reference when compiling with ENABLE_FFMPEG
* Fix regression compilation ffmpeg_intgr.c to support ffmpeg 5
Fix regression bug for compiling with ENABLE_FFMPEG and ffmpeg 5, introduced in https://github.com/CCExtractor/ccextractor/issues/1418
* Update CHANGES.TXT
* Update ffmpeg_intgr.c
Update for changes to FFMPEG 5 API
* [FIX] Corrected bitness check for 64-bit systems
* Improve Dockerfile: cleanup, parallel build, and remove redundancies
- Replaced cd with WORKDIR for clarity and Docker best practices.
- Removed unused LIB_CLANG_PATH export, as it only affected a single build layer; the library is automatically detected during build.
- Parallelized the GPAC build using make -j$(nproc).
- Removed redundant CMD instruction, as ENTRYPOINT already defines the container's execution command.
* [DOCS] Update CHANGES.TXT for Dockerfile improvements
---------
Co-authored-by: AhmedYasserrr <ahmdyasrj@gamil.com>
* Update url crate
* Fix vulnerability discovered with `cargo-audit` by upgrading `url` crate to version `2.5.4`
* Update url crate in lib_ccxr submodule
* Fix vulnerability discovered with `cargo-audit` by upgrading `url` crate to version `2.5.4`
* Update Cargo.toml
* Update Cargo.toml with latest compatible version of every crate
* Fix implicit declaration error on some systems.
This commit fixes a compile-time error regarding an implicit declaration
of mapclut_paletee() on some compilers and compiler versions. Notably,
Arch Linux and Ubuntu 24.10 seem to be affected.
The error resolved is:
```
../src/lib_ccx/ocr.c: In function 'ocr_rect':
../src/lib_ccx/ocr.c:922:9: error: implicit declaration of function 'mapclut_paletee' [-Wimplicit-function-declaration]
922 | mapclut_paletee(palette, alpha, (uint32_t *)rect->data1, rect->nb_colors);
| ^~~~~~~~~~~~~~~
```
This was resolved by `#include`-ing "ccx_encoders_spupng.h" in the file
src/lib_ccx/ocr.c. Thanks to GitHub user @steel-bucket for sharing the
fix in this issue's comments.
Fixes: #1646
* Update CHANGES.TXT.
Mention the fix for #1646.
Fixes: #1646
* feat: Add new module for timings functionality
* feat: Add timing functionality in `timing.rs` module
* feat: List all module & function conversion
* chore: Clippy fixes
* feat: Equivalent `ccx_common_timing.h` functions in rust module
* feat: Add static constants & include struct in `build.rs`
* feat: Add extern C functions
* feat: Include & use rust extern functions in C
* fix: Windows build
* fix: Windows build
---------
Co-authored-by: Prateek Sunal <prtksunal@gmail.com>
* Add flag for Page Segmentation Modes control
I added an flag --psm for controlling PSM (Page Segmentation Modes) in Tesseract. The default option (3) gives me quite bad results. When I use 6, 11, or 12 for Bulgarian, it gives me much better OCR results. I haven't tested other languages yet, but I expect improvements as well if other mode is used.
* feat: add psm for rust parser
* fix: add psm to options
* fix: add default value of psm to 3
* fix: correct type of ocr oem
* fix(rust): use fatal! instead of exit
---------
Co-authored-by: Prateek Sunal <prtksunal@gmail.com>
* fix: add ucla checks for millis_separator
* fix: reassign back profane and capitalization lists to c
* fix: C formatting
* fix(rust): clippy warnings
* feat: Add new function to allocate any object to heap with zero allocated
* feat: Add unit tests for `decoder/commands.rs`
* docs: Mention about PR in changelogs
* feat: Add unit tests for `decoder/windows.rs`
Refactor the code and use Default where needed
Implement `PartialEq` also
* fix: Intialise tmp extern C values for easy mocking
* feat: Add unit tests for `decoder/timing.rs`
* feat: Add unit tests for `decoder/output.rs`
* feat: Add unit tests for `decoder/mod.rs`
* feat: Add unit tests for `decoder/tv_screen.rs`
* feat: Add unit tests for `lib.rs`
* fix: Failing test
* feat: [WIP] Add unit tests for `decoder/service_decoder.rs`
* feat: Add unit tests for `decoder/service_decoder.rs`
* feat: Add unit tests for `hardsubx/imgops.rs`
* feat: Add unit tests for `hardsubx/utility.rs`
* fix: cargo clippy
* fix: doctest for `lib_ccxr` module
* feat: Add test `lib_ccxr/util/mod.rs`
* feat: Add test `lib_ccxr/util/levenshtein.rs`
* feat: Add test `lib_ccxr/util/bits.rs`
* feat: Add test `lib_ccxr/time/units.rs`
* chore: Change function name
* fix: Failing of missing values `tlt_config`
* ci: Run unit test cases in `lib_ccxr` module also
* ci: Run clippy & fmt in `lib_ccxr` module also
* chore(clippy): Fix clippy warnings
* feat: Add new module `encoding`
* feat: Add code for `encoding.rs`
A module for working with different kinds of text encoding formats
* feat: Add code for function `line21_to_utf8`
* feat: Add code for remaining todos function
* feat: unpack gpac
* fix: linux ci
* fix: mac build
* fix: remove unused [no ci]
* fix: ignore config.h [no ci]
* temp commit, will drop this soon
* fix: install gpac
* fix: gpac
* fix: formatting
* fix: preproccessor directive
* fix: comment display version for now
* fix: display dlls code
* fix: bundle vcruntime in hardsubx windows
* fix: again
* fix: erros in ci
* fix: ci
* fix: add vcruntime in additional dependencies
* fix: try to copy vcruntime after build
* fix: space in runtime library
* fix: remove for now [no ci]
* fix: things in vcxproj
* fix: ci for leptonica sys
* fix: docs
* fix: copy dlls on post build event
* fix: copy vcruntime after build
* feat: add arguments through clap
* fix: type of some arguments
* fix: "-" and "--" in comments
* fix: format files
* fix: add argument parsing till mkvlang
* fix: one todo item
* chore: lint fixes
* fix: nocodec value
* fix: for nocodec
* fix: add cfg feature for hardsubx
* feat: complete till startcreditstext
* fix: add more notes, args: option affect processed
* feat: port all till network stuff
* fix: complete almost all argument parsing
* fix: error free code
* fix: complete params port
* fix: hardsubx erros
* feat: clean up main function
* fix: pr reviews
* fix: make input,output function better
* fix: variant not used warning
* fix: warnings
* fix: all clippy warnings
* feat: add tests
* feat: add tests
* chore: lint fixes
* fix: move unit tests to correct folder
* fix: remove unncessary files
* fix: make function for parse_args
* fix: review changes
* fix: Impl CcxOptions whenever I could
* fix: try to convert rust to c
* chore: push c code
* fix: add more rust to c conversions
* fix: use set methods for bitfield
* fix: errors
* fix: arguments parsing
* fix: all issues
* fix: many errors
* chore: lint fix
* fix: err
* fix: unsafe function error
* fix: unsafe warning
* fix: safety lint
* chore: add docs
* fix: windows build
* fix: function
* fix: dependencies
* fix: set_binary_mode
* chore: lint fix
* fix: set_binary_mode for windows
* fix: error
* fix: undefined reference error
* chore: remove comment
* fix: output field
* chore: fix lint
* fix: ru1, ru2, ru3
* fix: undef before
* fix: parameter and update deps
* chore: update vcpkg
* feat: add release-with-debug profile
* fix; uncomment code
* fix: update visual studio to 2022
* chore: update docs
* fix: use default vcpkg
* fix: caching logic on release ci
* fix: vcpkg caching
* fix: add setup vcpkg
* chore: remove unneccesary formatting
* fix: Always write 2 bytes for UTF-16BE
* fix: formatting
* feat: add rest of the notes to bring continuity
* fix: remove extra line
* fix: add hardsubx note
* fix: source code format error
* chore: lint fixes acc to rustfmt
* feat: add unit test ci
* fix: conversion of strings, add file queue handling
* fix: decoder cfg
* fix: update dependencies
* chore: lint fix
* chore: add safety doc
* fix: default value for CcxOptions
* fix(rust): default value for teletext
* fix: leptonica version for windows
* fix: format errors
* fix: workflow
* Revert "fix: leptonica version for windows"
This reverts commit 461ef55e7b.
* fix: pin ffmpeg to 6 for mac
* fix(parser): default values and unwrap's
* fix(parser): hardsubx fixes
* chore(parse): lint fixes
* fix(windows): switch back to sdk 2019
* fix(workflow): windows workflow revert
* fix(windows): revert to old files which were working before
* fix(workflow): pin vcpkg packages
* chore(rust): downgrade leptonica
* fix(windows): move vcpkg.json to correct place
* fix(windows): improve vcxproj
* fix(windows): workflow
* fix(windows): workflow
* fix(windows): workflow clone from vcpkg everytime
* fix(workflow): error
* fix(workflow): don't skip building vcpkg
* fix: remove depth from vcpkg
* temporary commit
* fix(windows): pin gpac and use local vcpkg manifest properly
* fix(windows): install vcpkg dependencies manually
* fix(windows): update dll names
* fix(windows); dependencies copy
* fix(windows): don't continue on error for release
* fix(macos): build ffmpeg for mac workflow
* fix: move ffmpeg to current workspace
* fix: re-add profile for windows
* fix: pkg config for mac
* fix(mac): use ffmpeg@6 from brew
* fix(macos): there is no ffmpeg_prebuilt
* fix(macos): specify ffmpeg pkg config
* fix(macos): globally define pkg config
* fix(macos): add ffmpeg include and libs dir
* fix(macos): include ffmpeg headers in makefile
* fix: include ffmpeg libraries and include directories
* fix: try to manually specify ffmpeg header in rust
* fix: also include leptonica headres
* fix: leptonica name
* fix: test
* fix: string null when output_filename is empty
* fix: error
* fix: remove cflgas
* fix(mac): disable cmake ocr hardsubx
* chore: update gitignore
* fix: null if string is empty
* fix: allow --in
* chore: bump version to 1.0 in rust
* chore: add space to trigger sp
* fix: don't panic with rust
* fix: add double dashes to indicate parameters
* chore: update CHANGES.txt
* fix: test
* fix(workflow): update workflow name
* fix(rust): linux output_filename in sampleplatform
* fix(rust): parser default values
* fix(rust): exit with MalformedParameter instead of panic
* fix(decoder): revert always write 2 bytes
* chore(rust): format
* chore: update lock file
* fix(test): test lib_ccxr and rename to test
* fix(mac): remove failing cmake_ocr test
* fix: ci errors
* fix: feature related changes
* fix: trim down default features
* fix: don't check clippy for all features
* chore: Add cargo dependencies
* feat: Make time module in `lib_ccxr`
* feat: Add conversion guide in `time/mod.rs` module & Create `units` module
* feat: Add time units code
* feat: Make time module in `lib_ccxr/util` & Add helper function
* feat: Add utils time related functions
* feat: Add extern functions in `libccxr_exports`
* feat: Add extern functions in C and use in proper place
* docs: Mention in Changelogs
* feat: Add common module
common module is made for all `ccx_common_*` files
* feat: Add constants module within common module
Used to have all constants enums listed in ccx_common_constants C file
* feat: Add all constants, enums in rust equivaleent to `ccx_common_constansts` C file
* docs: Mention in Changelogs
* docs: Add more conversion data
* chore: Add bitflags crate as dependency
* feat: Add function to initialize Rust logger using options in C
* feat: Add new module `log`
* refactor: Add ccx_s_option into list of bindgen struct
* feat: Add Initialize logger function
* feat: All logging functions & macros
* chore: Fix clippy
* docs: Mention in Changelogs
* chore: format issue fix
* fix: Remove activity_header from rust & use initially to print in C
* refactor: Remove debugging statements
* fix: Add `\n` in info!
* create lib_ccxr and libccxr_exports
* chore: Fix bindgen crate version
* chore: Fix rsmpeg crate version
* docs: Add PR info in Changelogs
---------
Co-authored-by: Elbert Ronnie <elbert.ronniep@gmail.com>
* feat: Add `decoder/encoding` new module
This `decoder/encoding.rs` file will contain the content of
`lib_ccx/ccx_708_decoder_encoding.c` file
* feat: Add encoding functions
* feat: Add conditional compilation to include Rust functions
* fix: conditional compilation logic
* refactor: Use of match statement instead of if-else
* fix: Calling C function for rust
* feat: Enable `derive_default` feature
* feat!: Add script for building AppImage
* chore(delete): Remove `build-static.sh` file
* refactor: Add link for logo photo
* chore: Replace dead link
* feat: Add timing functions for SCC format in C & Rust
* feat: Add SCC support to Rust 708 decoder
* feat: Add SCC support to C 708 decoder
* docs: fix symbol in scc_time format
* chore: clippy fixes
* docs: Add new feature in Changelog
* fix: update SCC timing functions according to need
* feat: Add new member(old caption end time) for overlapping situations
* fix: update SCC timing functions according to need
* feat: Add support for overlapping captions situations
* fix: frame formula for timings
* feat: Add support for orientation of subtitles in C
by adding necessary labels needed for it
* feat: Add support for orientation of subtitles in Rust
by adding necessary labels needed for it
* docs: Add info for scc labels
* chore: clippy fixes
* docs: Add what `add_needed_scc_labels` do and correct parameters name
* feat: breaking all parameters
* fix: some parameters
* fix: many things
* fix: error
* fix: -h
* fix: more parameters
* fix: add dash to help commands
* fix: help for output-field
* fix: single dash
* fix: --out and --in
* fix: move notes to the end of help menu
* fix: final changes to notes
* fix: extra spacing
* fix: wrong formatting of parenthesis
* Update stream_functions.c: fix MP4 file type detector
On bad inputs containing e.g. the following sequence of bytes within the first 1MiB "ff ff ff ff 6d 65 74 61" `detect_stream_type` was executing an infinite loop because "ff ff ff ff" was interpreted as a length of the candidate "meta" MP4 box, caused the size_t overflow inside `isValidMP4Box` which pointed `nextBoxLocation` to the previous byte and the execution flow processed the same "meta" again.
* Update CHANGES.TXT
* Treat a candidate MP4 box as invalid instead of bailing out
* Fix stuck mp4 processing in `process_avc_sample`
On corrupted inputs it could read data past the sample end and also get stuck in an infinite loop.
* Fix the stats code to not count zero-sized NALs and avoid dereferencing memory past the NAL end
* Add comment.
* Format changes
* [FIX] Added a note for Ubuntu 23.10
libgpac-dev isn't available on Ubuntu 23.10 (Mantic) added a note instructing to build it from source instead.
* [FIX] Added build instructions for Ubuntu 23.10 and later
libgpac-dev isn't available in Ubuntu 23.10 and later, hence causing the build to fail. added the instructions to build it from source.
* feat: unpack gpac
* fix: linux ci
* fix: mac build
* fix: remove unused [no ci]
* fix: ignore config.h [no ci]
* temp commit, will drop this soon
* fix: install gpac
* fix: gpac
* fix: formatting
* fix: preproccessor directive
* fix: comment display version for now
* fix: display dlls code
* fix: bundle vcruntime in hardsubx windows
* fix: again
* fix: erros in ci
* fix: ci
* fix: add vcruntime in additional dependencies
* fix: try to copy vcruntime after build
* fix: space in runtime library
* fix: remove for now [no ci]
* fix: things in vcxproj
* fix: ci for leptonica sys
* fix: docs
* fix: copy dlls on post build event
* fix: copy vcruntime after build
* feat: mac ci
* fix: ci dependencies
* fix: more depdendencies
* fix: libavcodec not found
* fix: include directories in mac
* fix: error in endif()
* feat: unpack gpac
* fix: linux ci
* fix: mac build
* fix: remove unused [no ci]
* fix: ignore config.h [no ci]
* temp commit, will drop this soon
* fix: install gpac
* fix: gpac
* fix: formatting
* fix: preproccessor directive
* fix: comment display version for now
* fix: display dlls code
* fix: bundle vcruntime in hardsubx windows
* fix: again
* fix: erros in ci
* fix: ci
* fix: add vcruntime in additional dependencies
* fix: try to copy vcruntime after build
* fix: space in runtime library
* fix: remove for now [no ci]
* fix: things in vcxproj
* fix: ci for leptonica sys
* fix: docs
* fix: copy dlls on post build event
* fix: copy vcruntime after build
* Delete (probably) wrongly committed vs config file
* Remove Nuklear GUI
* Clean up SLN configs (Reduce to 64 bit full debug & release)
* Sync bat scripts, prepare to move
* Build rust in release when release
* Update changelog
* Delete rustx86.bat
The broadcast raw format *must* contain data from onely one field, or
neither `ccextractor` nor McPoodle's tools can actually read it. Since
we don't actually get XDS data from `writeraw`, there's no reason to
keep the call for field 2.
Fixes#1503.
Using tesseract-ocr's stock pkg-config, it would produce an error due to
unquoted whitespace:
$ test ! -z `pkg-config --libs-only-l --silence-errors tesseract`
bash: test: syntax error: `-larchive' unexpected
* linux/configure.ac: Use a positive test, and double-quote the $() command
substitution.
Co-authored-by: Carlos Fernandez Sanz <carlos@ccextractor.org>
This header is generated by the pre-build.sh script. The compilation
fails if it is missing.
* linux/Makefile.am (ccextractor_SOURCES): Add
../src/lib_ccx/compile_info_real.h.
* fix: bump leptonica-sys to 0.4.3 and update Cargo.lock
* fix: bump rust version to 1.57.0 and build vcpkg for window hardsubx builds
* fix: add Bcrypt dependency
* fix: switch to rust stable
* chore: bump package versions
* fix: try to remove i686 to fix error
* fix: install tesseract and lint fixes
* fix: try using ffmpeg the third
* fix: include headers
* fix: add rsmpeg
* fix: switch default triplet to static md
* fix: import errors
* fix: directory path
* fix: pre build commands
* fix: update vcxproj
* fix: linux ci
* fix: ci fixes
* chore: lint fixes
* fix: error
* fix: copy include files
* fix: ci error
* fix: link swresample lib
* fix: some errors
* fix: include directory path and include all libraries
* fix: try to add library directories
* fix: fixes in libraries
* fix: formatting ci
* fix: mflat errors
* fix: libcurl
* fix: preprocessor definitions
* fix: add libcrypto
* fix: remove lib_hash to fix conflicts (we have libcrypto already)
* fix: add avcodec and avformat dependencies on windows
* fix: add remaining deps that may fix the build
* fix: add crypt depdency
* fix: rename conflicting names
* Revert "fix: remove lib_hash to fix conflicts (we have libcrypto already)"
This reverts commit f57ff716ed.
* fix: prefix with CC_
* fix: post build actions
* fix: ocr error
* Revert "fix: ocr error"
This reverts commit 92599454b6.
* fix: xcopy error
* fix: generated file name for x64
* fix: ocr error
* fix: add item group at top to see if it works
* fix: remove unwanted headers, removed \\ from VCPKG_ROOT, remove unwanted includes in vcxproj
* fix: add libpng for non hardsubx, comment the broken ocr code again
* fix: libpng path
* feat: add lib png headers in ClCompile
* fix: png.h not found
* fix: last try for ocr fix
* fix: libpng not found
* fix: cl compile headers
* fix: libpng and ocr
* fix: libpng error
* fix: redefinition error
* fix: zlib for non hardsubx
* fix: lib names
* fix: zlib.h not found
* Respect `-stdout` if multiple CC tracks are found
When passed the `-stdout` flag, CCExtractor should write the
subtitles to standard output, instead of an output file.
However, as noted in Issue #1453, CCExtractor doesn't
respect the `-stdout` flag when multiple CC tracks are present in
a Matroska input file (usually .mkv).
This commit ensures that output is written to standard output if `-
stdout` is present even if the input file is a Matroska container
with multiple CC tracks.
Signed-off-by: Abhishek Kumar <abhi.kr.2100@gmail.com>
* Mention fixing of issue #1453 in changelog
Signed-off-by: Abhishek Kumar <abhi.kr.2100@gmail.com>
* Correctly spell Matroska
Signed-off-by: Abhishek Kumar <abhi.kr.2100@gmail.com>
Signed-off-by: Abhishek Kumar <abhi.kr.2100@gmail.com>
* [FIX] WebVTT X-TIMESTAMP-MAP header placement (#1463)
* Fixed --no-timestamp-map flag
* Disable X-TIMESTAMP-MAP by default
* X-TIMESTAMP-MAP is only part of the HLS spec, and is not valid WebVTT, so it should be disabled by default.
* Write second WebVTT newline when timing info is missing
* add tesseract-sys in dependencies of rust modules
* add appropriate feature flags and required packages to cargo toml
* expose classifier
* Redefine structs that are required for hardsubx
Note: rust-bindgen isn't being used directly for this because it will also redefine structures of leptonica, tesseract, and ffmpeg and we don't want that.
We want to use definitions of structs as in the rust interfact libraries we are importing
* write code to generate bindings for mprint
* - write a function to convert rust strings to c strings
- write a memory safe wrapper to mprint that uses above function
* - add helper function to deal with tess strings in a memory safe manner
- port get_ocr_text_simple
- port get_ocr_text_wordwise
* improve conversion of C string to Rust string by using built-in functions
* replace mprint usage with warn!
* port get_ocr_text_letterwise
* remove redundant mprint function
* improve readability _tess_string_helper by using more general variable names inside
* make get_ocr_text_simple call get_ocr_text_simple_threshold to remove redundant codefix bugs
* remove manual definition of cc_subtitle and use bindgen bindings
* style changes to rust hardsubx classifier
* add get_ocr_text_letterwise_threshold and make get_ocr_text_letterwise call it appropriately
* move hardsubx context struct to mod.rs
* add get_ocr_text_wordwise_threshold and make get_ocr_text_wordwise call it
* use the ffmpeg-sys definition of Pix
* hide ported functions under macros
* use the AVPacket from bindings and not ffmpeg to make compatibility work for now.
TODO: rewrite init_hardsubx and also deal with the ffmpeg stuff when that is done
* improce _tess_string_helper by using appropriate built-in functions
* linter recommended changes
* clang style change
* fix loop bug that didn't allow for re-evaluation of it on usage of continue statement
* start porting of decoder with the _process_frame_color_basic function and related code
* hide the C version of _process_frame_color_basic behind an #ifdef
* add _process_frame_tickertext
* hide the C version of _process_frame_tickertext behind ifdef and add #[no_mangle] to the rust version
* check if word is empty as soon as word is detected
* port _process_frame_white_basic
* hide the C version _process_frame_white_basic behind compiler macros
* stylistic changes
* safety docs for hardsubx classifier
* safety docs for decoder as of now
* safe docs for utils.rs
* style changes
* format and style changes
* modify safety docs
* formatting fix
* set up bindings conversion of hardsubx utility functions (and structs) and set up the module
* add low level ffmpeg rust binding
* Methods ported:
- convert_pts_to_ns
- convert_pts_to_ms
- convert_pts_to_s
A pure rust method was added called _edit_distance_rec that implements levenstein distance calculation using recursion and dynamic programming
The port of edit_distance_rec is simply a wrapper that calls above function.
This redundancy won't be nevessary as more downstream modules are ported to Rust
* put C code of hardsubx_utility under define rust flag
* run formatter
* make compilation of hardsubx rust modules conditional on the HARDSUBX and the OCR flags. Make ffmpeg a conditional dependency based on those flags
* remove namespaced dependency in cargo because that is a nightly feature
* add conditioal compilatio of ffmpeg related bindigs in build.rs
* make clang argument of -DENABLE_HARDSUBX conditional on cargo feature of hardsubx_ocr
* enable specific relevant features for ffmpeg-sys-next
* enable hardsubx_ocr feature in windows build
* add build feature in ffmpeg-sys-next
* ffmpeg build feature is conditional on platform
* Revert "ffmpeg build feature is conditional on platform"
This reverts commit e456fee942.
This is because conditional features do not work in cargo toml
* install yasm in the linux build github action for ocr and hardsubx enabled cmake
* turn globals to locals to reduce code
* remove redundant attributes
* style changes
* make import of ffmpeg-sys-next conditional on hardsubx_ocr flag
* add --all-features flag in clippy for github workflow
* run formatter
* fix clippy command
* install yasm as part of rust format build check
* install libtesseract-dev etc. for clippy build test
* readability change
* declare the function edit_distance as unsafe
* remove commented code
* formatting changes
* combine declaration and assignment
* add build command for building hardsubx rust
context to issue: #1445
* make hardsubx rust work with autoconf build. For issue: #1445
* update autoconf for mac for issue #1445
* add hardsubx rust module and expose it
* port rgb_to_hsv to rust
* add dependency fast-math and extern it
* port rgb_to_lab to rust
also make preprocessor to not allow compilation of hardsubx_imgops
if WITHOUT_RUST is OFF
* improve if-else constructs for readability
* unroll macros that were only used once and remove their definition
* Improve readability of rgb_to_lab function (and fixes)
The function in Rust behaves slightly differently than its C counterpart
* remove fast math library, use palette library and rewrite imgops using it
* run formatter
* replace destructuring assignment statement with normal assignment statements because of build rust compiler issues
* run formatter on C code for imgops
* remove extern for modules because it is not required
* improve comment placement in rust imgops
Co-authored-by: Punit Lodha <48253287+PunitLodha@users.noreply.github.com>
* [NEW] add functionality to allow extraction of cc and burnt-in subs in the same pass
- add flag under hardsubx called -hcc that calls this method
- minor refactoring of moving some code from general_loop to a new function
- appropriate addition to the header files to expose certain methods
* add change log
* run clang formatter
Most of the users use Ubuntu 18.04 and later, so added the `libtesseract-dev` rather than `tesseract-ocr-dev` in the bash command so new people don't run into any errors as the NOTE was written after the command.
* Fix Mac Build processes
For all:
Add Neon files to libpng for Apple Silicon
Update compilation.md documentation
For autoconf:
Make Linux and Mac Makefile.am and configure.ac identical
Fix wrong location for zvbi/bcd.h in both Mac/Linux
For cmake::
Include GPAC config for Darwin in Mac version
For mac/build.command:
Update for new zvbi location
* Update CHANGES.TXT for Mac Build commit
* Use rust by default and add -WITHOUT_RUST flag
* Fix for shell and autoconf builds
* change directory for version check
* change to staticlib
* Update windows to build rust
* fix formatting
* add information about 708 decoder in version flag
* revert file mode to 644
* Use x86 for OCR releases
* fix flushing bug
* fix formatting
* update lib names
* remove bazel
* update changelog
* Add rust lib
* add steps for building rust lib
* use rust lib
* add conditional flag for rust
* use cargo config.toml
* add decoder module and update bindings
* use match instead of if else
* add target directory flag
* add env_logger
* use env_logger
* Process data first and then pass to safe function
* Attempt to fix long-running regression in TeleText
Regression test 78 (https://sampleplatform.ccextractor.org/regression/test/78/view)
has been broken since #614 was merged to fix other issues.
It's been traced back to be caused by not setting t0 at the correct time
(setting it using a calculated PTS time rather than taking it from the video frame),
and this commits attempts to fix that.
* Add changes
* Clang-format changes
* Improved fix
This uses the current_pts rather than the min_pts because the value
of the delta should be relative to when the packet was received.
If min_pts wasn't set yet, it'll be retrieved and set as current_pts
* Fixup
* [FIX] Must have two newlines after WEBVTT header
Bug introduced in #1092
* [FIX] segfault with multitrack reports
* [FIX] segfault with unsupported file reports
* [FIX] Write subtitle header to multitrack outputs
* [FIX] Write multitrack files to the output file directory
* Add update_gpac.py
Add a Python script that partially automates updating GPAC to a newer
version.
* Update GPAC to version 1.0.1
Update the vendored version of GPAC to version 1.0.1.
* Add necessary GPAC header files
Add some GPAC header files that GPAC needs to compile.
* Define _GF_CONFIG_H_ to fix Linux build failing
gpac/configuration.h has a series of default configuration options for
various platforms, but it doesn't have a case for Linux and it results
in a compilation error if it encounters an unknown platform.
The settings in configuration.h don't appear to try to set any defaults
for Linux anyway, so we can disable all use of those configuration.h
settings by defining _GF_CONFIG_H_.
* Add some more necessary GPAC header files
Add a few more header files necessary to get GPAC to compile.
* Fix renamed and removed media types
Some mp4 media types ("clcp", "c608") were renamed by GPAC. "c708"
appears to have been removed, so we can just add the definition of that
to the top of mp4.c.
* Remove Remotery from updated GPAC
Remotery appears to be some code for profiling GPAC which we aren't
using, and including Remotery.c and Remotery.h ends up pulling in a lot
of files, so it's easier to just remove the include of Remotery.h and
the single use of it in os_divers.c
* Remove unused box definitions
Remove box definitions that we don't use from box_funcs.c in order to
avoid adding too many files from GPAC.
* Replace alloc function declarations with defines
Replace the GPAC wrappers around the malloc-style functions (gf_malloc,
gf_free, etc.) with defines that use the standard C versions of these
functions so that we can avoid including GPAC's alloc.c
* Remove WebVTT handling code in gf_isom_dump_srt_track
Remove the code that handles WebVTT in gf_isom_dump_srt_track to avoid
needing to pull in a lot of other files from GPAC.
gf_isom_dump_srt_track doesn't appear to be used by ccextractor directly
or indirectly (it's only called in gf_isom_text_dump which doesn't
appear to be called anywhere else) so it should be fine removing it.
* Disable use of Remotery and gzip on Linux
Use GPAC_DISABLE_REMOTERY and NO_GZIP to disable Remotery because we
aren't interested in profiling (see
5c0c9cf71e for more info) and gzip
compression through gzio.c respectively.
* Fix compilation errors in GPAC on linux
GPAC on linux after the update requires some threading functions and
dynamic loading functions in pthread and dl respectively.
* Add necessary files for GPAC to compile
Add several C and header files that GPAC needs to compile
* Disable Remotery and Gzip in all build systems
Disable Remotery and gzip (using the same method as
f49dc371b5) for:
- The linux build script (linux/build)
- The mac build script (mac/build.command)
- The mac makefile
- cmake
- bazel
- Visual Studio
* Add extra GPAC files to several build systems
Add the names of several GPAC files that were added in the update to the
linux and mac Makefiles and to the Windows Visual Studio project.
Adding these filenames isn't necessary for CMake, Bazel, or the linux or
mac build scripts because all of them compile all C files recursively in
the src/thirdparty/gpacmp4 directory instead of having an explicit list
of files to compile.
* Change NO_GZIP to GPAC_DISABLE_ZLIB in VS project
Instead of defining NO_GZIP to disable gzip support, define
GPAC_DISABLE_ZLIB, which does the same thing but also prevents the
compiler from trying to zlib.
* Avoid using GPAC's configuration.h completely
GPAC's configuration.h has a few problems with the defaults that it
sets:
- It defines GPAC_MEMORY_TRACKING on Windows, which switches to an
alternate implementation of malloc, meaning that we would have to pull
in alloc.c
- It causes compilation errors on Linux (see 9164c08979)
This disables using configuration.h by:
- Defining GPAC_HAVE_CONFIG_H to make GPAC use a separate config.h file
instead of the default configuration.h file
- Making an essentially empty config.h file to make attempts to include
it not fail
This commit also removes configuration.h from the repo to make sure we
don't accidentally include it, and removes the _GF_CONFIG_H_ hack from
the previously mentioned commit because we don't need it anymore (it's
sole purpose was avoiding using configuration.h).
* Link pthread and dl on Mac and Linux
Add -lpthread and -ldl to link pthread and dl respectively on Mac and
Linux. Needed because the update to GPAC 1.0.1 introduced os_thread.c
(which uses pthread) and os_module.c (which uses dlsym and related
functions).
* Remove unused Remotery.h header file
5c0c9cf71e removed the only use of
Remotery.h in the GPAC files that we pulled in, so there's no need to
keep it around.
* Add GPAC update to changelog
* Fix cmake build error
Building with CMake currently fails because it can't find functions from
dl (dlopen, dlsym, etc.)
* Fix bazel build error
Bazel currently doesn't find the header files in gpac/modules/ when
building gpac, most likely because it isn't searching all directories in
gpac/ recursively for header files
* Define GPAC_HAVE_CONFIG_H in lib_ccx BUILD file
lib_ccx indirectly includes gpac/tools.h, which tries to include
gpac/configuration.h, which was removed in
b46c4e8a2d. This just copies the solution
from that commit to the bazel BUILD file (defining GPAC_HAVE_CONFIG_H so
GPAC uses gpac/config.h instead).
* Link to dl and pthread in bazel GPAC BUILD file
The updated GPAC version requires functions from dl and pthread, which
weren't linked to previously when building with bazel.
* Fix 708 timing issue
Process packet as soon as the packet len is equal to the specified len
* check if cc_valid
* fix formatting
* Check if header is parsed before parsing pkt data
* Fix segfault on Windows
Using the format specifier %d to print out size is technically undefined
behavior, as size is defined as a u64, while %d is meant to print out
ints, which seems to be defined as 32 bits on most machines, and using a
format specifier with the wrong size is undefined behavior. This causes
a segfault on Windows as this apparently causes the wrong pointer to be
passed in for the filename.
* Add change to changelog
* Fix -Wunused-result warnings
* Wrap checked writes into a function
* In write_wrapped, continue writing in case of partial write
If a partial write occurs, it doesn't necessarily mean that something
failed, according to write(2). If this is the case, then the following
write will return -1.
* Fix build on MSVC
https://stackoverflow.com/questions/37460579/error-c2036-void-unknown-size
Expands the Windows build steps to include DLL's in the artifact, making an out-of-box use of said artifacts easier. The new artifacts will allow running ccextractor (not the GUI yet) directly.
Improves the build for 32 bit variants.
Contains fixes:
- `/SAFESEH:NO`: needed for linking precompiled ffmpeg-lib libraries
- add paths from $(ProjectDir)libs\lib\ffmpeg-lib and avcodec.lib; avformat.lib; avutil.lib; swscale.lib
- add extra post-build actions to copy libraries
- add $(vcpkg) paths
Adds a (likely non-working) build stage for building with OCR to the Windows GitHub actions, so we can assure that Windows keeps building with OCR just fine.
Makes a small update to the ISSUE_TEMPLATE to clarify instructions for sending samples that cannot be made public.
Co-authored-by: Willem <github@canihavesome.coffee>
* [ISSUE_TEMPLATE.md] Comment out instructions
* [PULL_REQUEST_TEMPLATE.md] Comment out instructions
* Mention in ISSUE_TEMPLATE.md that only useful arguments should be put
* Follow feedback
This was caused by 19241744d7, moving from
`unsigned char` to `enums` for colors and fonts. The problem with this is
that each colour isn't one byte next to each other so memcpy and memset
didn't work anymore.
The problem:
```patch
6812,6813c6812,6813
< EDITION OF AMERICA'S NEXT TOP
< <i> MODEL</i> ON WEDNESDAYS.<i> </i>
---
> EDITION OF<i> AMERICA'S NEXT TOP</i>
> <i> MODEL</i> ON WEDNESDAYS.
6817c6817
< EDITION OF AMERICA'S NEXT TOP
---
> EDITION OF<i> AMERICA'S NEXT TOP</i>
6819c6819
< >><i> THE VAMPIRE DIARIES </i>
---
> >><i> THE VAMPIRE DIARIES</i>
6824,6825c6824,6825
< >><i> THE VA</i>MPIRE DIARIES
< AND<i> THE SECRET CIRCLE </i>
---
> >><i> THE VAMPIRE DIARIES</i>
> AND<i> THE SECRET CIRCLE</i>
6829,6831c6829,6831
< >><i> THE VA</i>MPIRE DIARIES
< AND<i> THE S</i>ECRET CIRCLE
< ON THURSDAYS.<i> </i>
---
> >><i> THE VAMPIRE DIARIES</i>
> AND<i> THE SECRET CIRCLE</i>
> ON THURSDAYS.
6835c6835
< AND<i> THE S</i>ECRET CIRCLE
---
> AND<i> THE SECRET CIRCLE</i>
```
* file_buffer: Fix unitialized variable usage warning
Clang warns:
In file included from src/lib_ccx/asf_functions.c:5:
src/lib_ccx/file_buffer.h:76:7: warning: variable 'result' is used uninitialized whenever 'if' condition is false [-Wsometimes-uninitialized]
if (buffer)
^~~~~~
src/lib_ccx/file_buffer.h:86:9: note: uninitialized use occurs here
return result;
^~~~~~
src/lib_ccx/file_buffer.h:76:3: note: remove the 'if' if its condition is always true
if (buffer)
^~~~~~~~~~~
src/lib_ccx/file_buffer.h:73:15: note: initialize the variable 'result' to silence this warning
size_t result;
^
= 0
* common_timing: Fix uninitialized variable usage warning
The vast majority of the code is already using fatal(), so I don't see
why this should be an exception.
Clang warns:
src/lib_ccx/ccx_common_timing.c:274:3: warning: variable 'fts' is used uninitialized whenever switch default is taken [-Wsometimes-uninitialized]
default:
^~~~~~~
src/lib_ccx/ccx_common_timing.c:280:9: note: uninitialized use occurs here
return fts;
^~~
src/lib_ccx/ccx_common_timing.c:261:11: note: initialize the variable 'fts' to silence this warning
LLONG fts;
^
= 0
* encoders: Fix handling of multibyte characters in UTF-8 converter
This is actually incorrect because characters longer than 1 byte will be
butchered.
Clang warns:
src/lib_ccx/ccx_encoders_common.c:178:12: warning: result of comparison of constant 256 with expression of
type 'unsigned char' is always true [-Wtautological-constant-out-of-range-compare]
if (c < 256)
~ ^ ~~~
src/lib_ccx/ccx_encoders_common.c:193:12: warning: result of comparison of constant 256 with expression of
type 'unsigned char' is always true [-Wtautological-constant-out-of-range-compare]
if (c < 256)
~ ^ ~~~
src/lib_ccx/ccx_encoders_common.c:209:12: warning: result of comparison of constant 256 with expression of
type 'unsigned char' is always true [-Wtautological-constant-out-of-range-compare]
if (c < 256)
~ ^ ~~~
src/lib_ccx/ccx_encoders_common.c:229:12: warning: result of comparison of constant 256 with expression of type 'unsigned char' is always true [-Wtautological-constant-out-of-range-compare]
if (c < 256)
~ ^ ~~~
* gxf: Fix tautological comparison warnings
Clang warns:
src/lib_ccx/ccx_gxf.c:425:17: warning: result of comparison of constant 256 with expression of type 'unsigned char' is always false [-Wtautological-constant-out-of-range-compare]
if (tag_len > STR_LEN)
~~~~~~~ ^ ~~~~~~~
src/lib_ccx/ccx_gxf.c:542:17: warning: result of comparison of constant 256 with expression of type 'unsigned char' is always false [-Wtautological-constant-out-of-range-compare]
if (tag_len > STR_LEN)
~~~~~~~ ^ ~~~~~~~
src/lib_ccx/ccx_gxf.c:617:17: warning: result of comparison of constant 256 with expression of type 'unsigned char' is always false [-Wtautological-constant-out-of-range-compare]
if (tag_len > STR_LEN)
~~~~~~~ ^ ~~~~~~~
* gxf: Fix uninitialized variable usage warnings
Clang warns:
src/lib_ccx/ccx_gxf.c:1449:8: warning: variable 'first_field_nb' is used uninitialized whenever switch case is taken [-Wsometimes-uninitialized]
case TRACK_TYPE_MPEG1_525:
^~~~~~~~~~~~~~~~~~~~
src/lib_ccx/ccx_gxf.c:1475:35: note: uninitialized use occurs here
debug("first field number %d\n", first_field_nb);
^~~~~~~~~~~~~~
src/lib_ccx/ccx_gxf.c:28:115: note: expanded from macro 'debug'
^~~~~~~~~~~
src/lib_ccx/ccx_gxf.c:1450:8: warning: variable 'first_field_nb' is used uninitialized whenever switch case is taken [-Wsometimes-uninitialized]
case TRACK_TYPE_MPEG2_525:
^~~~~~~~~~~~~~~~~~~~
src/lib_ccx/ccx_gxf.c:1475:35: note: uninitialized use occurs here
debug("first field number %d\n", first_field_nb);
^~~~~~~~~~~~~~
src/lib_ccx/ccx_gxf.c:28:115: note: expanded from macro 'debug'
^~~~~~~~~~~
src/lib_ccx/ccx_gxf.c:1456:3: warning: variable 'first_field_nb' is used uninitialized whenever switch default is taken [-Wsometimes-uninitialized]
default:
^~~~~~~
src/lib_ccx/ccx_gxf.c:1475:35: note: uninitialized use occurs here
debug("first field number %d\n", first_field_nb);
^~~~~~~~~~~~~~
src/lib_ccx/ccx_gxf.c:28:115: note: expanded from macro 'debug'
^~~~~~~~~~~
src/lib_ccx/ccx_gxf.c:1410:30: note: initialize the variable 'first_field_nb' to silence this warning
unsigned char first_field_nb;
^
= '\0'
src/lib_ccx/ccx_gxf.c:1449:8: warning: variable 'last_field_nb' is used uninitialized whenever switch case is taken [-Wsometimes-uninitialized]
case TRACK_TYPE_MPEG1_525:
^~~~~~~~~~~~~~~~~~~~
src/lib_ccx/ccx_gxf.c:1476:34: note: uninitialized use occurs here
debug("last field number %d\n", last_field_nb);
^~~~~~~~~~~~~
src/lib_ccx/ccx_gxf.c:28:115: note: expanded from macro 'debug'
^~~~~~~~~~~
src/lib_ccx/ccx_gxf.c:1450:8: warning: variable 'last_field_nb' is used uninitialized whenever switch case is taken [-Wsometimes-uninitialized]
case TRACK_TYPE_MPEG2_525:
^~~~~~~~~~~~~~~~~~~~
src/lib_ccx/ccx_gxf.c:1476:34: note: uninitialized use occurs here
debug("last field number %d\n", last_field_nb);
^~~~~~~~~~~~~
src/lib_ccx/ccx_gxf.c:28:115: note: expanded from macro 'debug'
^~~~~~~~~~~
src/lib_ccx/ccx_gxf.c:1456:3: warning: variable 'last_field_nb' is used uninitialized whenever switch default is taken [-Wsometimes-uninitialized]
default:
^~~~~~~
src/lib_ccx/ccx_gxf.c:1476:34: note: uninitialized use occurs here
debug("last field number %d\n", last_field_nb);
^~~~~~~~~~~~~
src/lib_ccx/ccx_gxf.c:28:115: note: expanded from macro 'debug'
^~~~~~~~~~~
src/lib_ccx/ccx_gxf.c:1411:29: note: initialize the variable 'last_field_nb' to silence this warning
unsigned char last_field_nb;
^
= '\0'
* ts_functions: Fix incorrect enumeration type in get_buffer_type
Clang warns:
src/lib_ccx/ts_functions.c:127:10: warning: implicit conversion from enumeration type 'enum ccx_bufferdata_type' to different enumeration type 'enum ccx_stream_type' [-Wenum-conversion]
return CCX_PES;
~~~~~~ ^~~~~~~
src/lib_ccx/ts_functions.c:131:10: warning: implicit conversion from enumeration type 'enum ccx_bufferdata_type' to different enumeration type 'enum ccx_stream_type' [-Wenum-conversion]
return CCX_H264;
~~~~~~ ^~~~~~~~
src/lib_ccx/ts_functions.c:135:10: warning: implicit conversion from enumeration type 'enum ccx_bufferdata_type' to different enumeration type 'enum ccx_stream_type' [-Wenum-conversion]
return CCX_DVB_SUBTITLE;
~~~~~~ ^~~~~~~~~~~~~~~~
src/lib_ccx/ts_functions.c:139:10: warning: implicit conversion from enumeration type 'enum ccx_bufferdata_type' to different enumeration type 'enum ccx_stream_type' [-Wenum-conversion]
return CCX_ISDB_SUBTITLE;
~~~~~~ ^~~~~~~~~~~~~~~~~
src/lib_ccx/ts_functions.c:143:10: warning: implicit conversion from enumeration type 'enum ccx_bufferdata_type' to different enumeration type 'enum ccx_stream_type' [-Wenum-conversion]
return CCX_HAUPPAGE;
~~~~~~ ^~~~~~~~~~~~
src/lib_ccx/ts_functions.c:147:10: warning: implicit conversion from enumeration type 'enum ccx_bufferdata_type' to different enumeration type 'enum ccx_stream_type' [-Wenum-conversion]
return CCX_TELETEXT;
~~~~~~ ^~~~~~~~~~~~
src/lib_ccx/ts_functions.c:151:10: warning: implicit conversion from enumeration type 'enum ccx_bufferdata_type' to different enumeration type 'enum ccx_stream_type' [-Wenum-conversion]
return CCX_PRIVATE_MPEG2_CC;
~~~~~~ ^~~~~~~~~~~~~~~~~~~~
src/lib_ccx/ts_functions.c:155:10: warning: implicit conversion from enumeration type 'enum ccx_bufferdata_type' to different enumeration type 'enum ccx_stream_type' [-Wenum-conversion]
return CCX_PES;
~~~~~~ ^~~~~~~
src/lib_ccx/ts_functions.c:491:24: warning: implicit conversion from enumeration type 'enum ccx_stream_type' to different enumeration type 'enum ccx_bufferdata_type' [-Wenum-conversion]
ptr->bufferdatatype = get_buffer_type(cinfo);
~ ^~~~~~~~~~~~~~~~~~~~~~
* utility: Fix tautological comparison warnings
Clang warns:
src/lib_ccx/utility.c:605:24: warning: result of comparison of constant 65536 with expression of type 'unsigned short' is always true [-Wtautological-constant-out-of-range-compare]
} else if (utf16_char < 0x010000) {
~~~~~~~~~~ ^ ~~~~~~~~
src/lib_ccx/utility.c:610:24: warning: result of comparison of constant 1114112 with expression of type 'unsigned short' is always true [-Wtautological-constant-out-of-range-compare]
} else if (utf16_char < 0x110000) {
~~~~~~~~~~ ^ ~~~~~~~~
* ocr: Fix floating point -> integer abs() warning
Clang warns:
src/lib_ccx/ocr.c:529:8: warning: using integer absolute value function 'abs' when argument is of floating point type [-Wabsolute-value]
if(abs(h-h0)>50) // Color has changed
^
src/lib_ccx/ocr.c:529:8: note: use function 'fabsf' instead
if(abs(h-h0)>50) // Color has changed
^~~
fabsf
src/lib_ccx/ocr.c:529:8: note: include the header <math.h> or explicitly provide a declaration for 'fabsf'
* encoders: Fix incorrect string types when EIA-608 is in use
Clang warns:
src/lib_ccx/ccx_encoders_helpers.c: In function ‘clever_capitalize’:
src/lib_ccx/ccx_encoders_helpers.c:186:4: warning: case label value exceeds maximum value for type
186 | case 0x89: // This is a transparent space
| ^~~~
* ocr: Fix implicit struct declaration warning
Clang warns:
In file included from src/lib_ccx/dvd_subtitle_decoder.c:10:
src/lib_ccx/ocr.h:18:54: warning: ‘struct encoder_ctx’ declared inside parameter list will not be visible outside of this definition or declaration
18 | char *paraof_ocrtext(struct cc_subtitle *sub, struct encoder_ctx *context);
| ^~~~~~~~~~~
All the SCC and CCD examples I can find have CRLF line endings. VLC and
libavformat (used by MPV) don't care, so just go with the popular
convention and switch to CRLF. There's no reason a user would want to
choose their line endings in this scenario.
It now returns a value like the rest of the printf family. It doesn't
brute force the amount of memory that needs to be allocated.
It also removes a warning.
I do not believe there should be any performance concerns with this
implementation as it is what `glibc` does:
https://code.woboq.org/userspace/glibc/libio/iovdprintf.c.html
* cea708: Fix missing new line in log message
* subtype: Remove unused CC_708 type
CEA-708 inputs are coerced to CC_608 before hitting encode_sub.
GCC warns:
src/lib_ccx/ccx_encoders_common.c: In function ‘encode_sub’:
src/lib_ccx/ccx_encoders_common.c:1119:2: warning: enumeration value ‘CC_708’ not handled in switch [-Wswitch]
1119 | switch (sub->type)
| ^~~~~~
* build: Disable pointer-sign warning
This warning triggers all over the codebase due to the widespread use of
unsigned char arrays for parsed subtitle strings and them being passed
to string functions that expect signed ones. Since this won't actually
cause issues, silence the warning across the entire codebase.
* splitbysentence: Fix warnings
GCC warns:
src/lib_ccx/ccx_encoders_splitbysentence.c: In function ‘sbs_is_pointer_on_sentence_breaker’:
src/lib_ccx/ccx_encoders_splitbysentence.c:170:7: warning: variable ‘p’ set but not used [-Wunused-but-set-variable]
170 | char p = *(current - 1);
| ^
src/lib_ccx/ccx_encoders_splitbysentence.c: In function ‘sbs_find_insert_point_partial’:
src/lib_ccx/ccx_encoders_splitbysentence.c:231:1: warning: multi-line comment [-Wcomment]
231 | // sprintf(fmtbuf, "SBS: sbs_find_insert_point_partial: compare\n\
| ^
src/lib_ccx/ccx_encoders_splitbysentence.c:263:1: warning: multi-line comment [-Wcomment]
263 | // LOG_DEBUG("SBS: sbs_find_insert_point_partial: LEFT CHANGED,\n\tbuf:[%s]\n\tstr:[%s]\n\
| ^
src/lib_ccx/ccx_encoders_splitbysentence.c:297:1: warning: multi-line comment [-Wcomment]
297 | // sprintf(fmtbuf, "SBS: sbs_find_insert_point_partial: REPLACE ENTIRE TAIL !!\n\
| ^
src/lib_ccx/ccx_encoders_splitbysentence.c:222:6: warning: unused variable ‘i’ [-Wunused-variable]
222 | int i; // top level indexer for strings
| ^
src/lib_ccx/ccx_encoders_splitbysentence.c: In function ‘reformat_cc_bitmap_through_sentence_buffer’:
src/lib_ccx/ccx_encoders_splitbysentence.c:730:8: warning: unused variable ‘str’ [-Wunused-variable]
730 | char *str;
| ^~~
src/lib_ccx/ccx_encoders_splitbysentence.c:729:6: warning: unused variable ‘i’ [-Wunused-variable]
729 | int i = 0;
| ^
src/lib_ccx/ccx_encoders_splitbysentence.c:728:6: warning: unused variable ‘used’ [-Wunused-variable]
728 | int used;
| ^~~~
src/lib_ccx/ccx_encoders_splitbysentence.c:727:18: warning: unused variable ‘ms_end’ [-Wunused-variable]
727 | LLONG ms_start, ms_end;
| ^~~~~~
src/lib_ccx/ccx_encoders_splitbysentence.c:727:8: warning: unused variable ‘ms_start’ [-Wunused-variable]
727 | LLONG ms_start, ms_end;
| ^~~~~~~~
src/lib_ccx/ccx_encoders_splitbysentence.c:726:20: warning: unused variable ‘rect’ [-Wunused-variable]
726 | struct cc_bitmap* rect;
| ^~~~
* spupng: Fix warnings
GCC warns:
src/lib_ccx/ccx_encoders_spupng.c: In function ‘init_face’:
src/lib_ccx/ccx_encoders_spupng.c:644:6: warning: suggest parentheses around assignment used as truth value [-Wparentheses]
644 | if (error = FT_New_Face(ft_library, font, 0, face))
| ^~~~~
src/lib_ccx/ccx_encoders_spupng.c:651:6: warning: suggest parentheses around assignment used as truth value [-Wparentheses]
651 | if (error = FT_Set_Pixel_Sizes(*face, 0, FONT_SIZE))
| ^~~~~
src/lib_ccx/ccx_encoders_spupng.c: In function ‘spupng_export_string2png’:
src/lib_ccx/ccx_encoders_spupng.c:698:7: warning: suggest parentheses around assignment used as truth value [-Wparentheses]
698 | if (error = FT_Init_FreeType(&ft_library))
| ^~~~~
src/lib_ccx/ccx_encoders_spupng.c:706:6: warning: suggest parentheses around assignment used as truth value [-Wparentheses]
706 | if (error = init_face(&face_regular, ccx_options.enc_cfg.render_font))
| ^~~~~
src/lib_ccx/ccx_encoders_spupng.c:708:6: warning: suggest parentheses around assignment used as truth value [-Wparentheses]
708 | if (error = init_face(&face_italics, ccx_options.enc_cfg.render_font_italics))
| ^~~~~
src/lib_ccx/ccx_encoders_spupng.c:850:9: warning: unused variable ‘height’ [-Wunused-variable]
850 | int height = slot->bitmap.rows;
| ^~~~~~
src/lib_ccx/ccx_encoders_spupng.c:849:9: warning: unused variable ‘width’ [-Wunused-variable]
849 | int width = slot->bitmap.width;
| ^~~~~
src/lib_ccx/ccx_encoders_webvtt.c: In function ‘write_webvtt_header’:
src/lib_ccx/ccx_encoders_webvtt.c:263:1: warning: control reaches end of non-void function [-Wreturn-type]
263 | }
| ^
* webvtt: Fix missing return warning
The return value of this function is never used, so just drop the
values.
GCC warns:
src/lib_ccx/ccx_encoders_webvtt.c: In function ‘write_webvtt_header’:
src/lib_ccx/ccx_encoders_webvtt.c:263:1: warning: control reaches end of non-void function [-Wreturn-type]
263 | }
| ^
* gxf: Fix MIN macro redefinition warning
GCC warns:
src/lib_ccx/ccx_gxf.c:23: warning: "MIN" redefined
23 | #define MIN(a, b) ( (a < b) ? a : b)
|
In file included from src/lib_ccx/ccx_demuxer.h:8,
from src/lib_ccx/ccx_gxf.h:4,
from src/lib_ccx/ccx_gxf.c:13:
src/lib_ccx/utility.h:8: note: this is the location of the previous definition
8 | #define MIN(X, Y) (((X) < (Y)) ? (X) : (Y))
|
* dvd: Fix unused variable warnings
GCC warns:
src/lib_ccx/dvd_subtitle_decoder.c: In function ‘get_bitmap’:
src/lib_ccx/dvd_subtitle_decoder.c:133:9: warning: unused variable ‘discard’ [-Wunused-variable]
133 | int discard = get_bits(ctx, &nextbyte, &pos, &m);
| ^~~~~~~
src/lib_ccx/dvd_subtitle_decoder.c:172:9: warning: unused variable ‘discard’ [-Wunused-variable]
172 | int discard = get_bits(ctx, &nextbyte, &pos, &m);
| ^~~~~~~
src/lib_ccx/dvd_subtitle_decoder.c: In function ‘write_dvd_sub’:
src/lib_ccx/dvd_subtitle_decoder.c:320:6: warning: unused variable ‘ret’ [-Wunused-variable]
320 | int ret =0;
| ^~~
* es_functions: Fix unused variable warning
This also removes the stale commented code that used this variable.
GCC warns:
src/lib_ccx/es_functions.c: In function ‘read_pic_info’:
src/lib_ccx/es_functions.c:682:7: warning: unused variable ‘frame_type_to_char’ [-Wunused-variable]
682 | char frame_type_to_char[] = { '?', 'I', 'P','B', 'D', '?', '?','?' };
| ^~~~~~~~~~~~~~~~~~
* dvb: Fix unused variable warning when OCR is disabled
GCC warns:
src/lib_ccx/dvb_subtitle_decoder.c: In function ‘write_dvb_sub’:
src/lib_ccx/dvb_subtitle_decoder.c:1509:6: warning: unused variable ‘ret’ [-Wunused-variable]
1509 | int ret = 0;
| ^~~
* general_loop: Fix warnings
GCC warns:
src/lib_ccx/general_loop.c: In function ‘general_loop’:
src/lib_ccx/general_loop.c:1113:15: warning: suggest parentheses around ‘&&’ within ‘||’ [-Wparentheses]
1113 | (enc_ctx && (enc_ctx->srt_counter || enc_ctx->cea_708_counter) ||
| ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
At top level:
src/lib_ccx/general_loop.c:25:28: warning: ‘DO_NOTHING’ defined but not used [-Wunused-const-variable=]
25 | const static unsigned char DO_NOTHING[] = {0x80, 0x80};
| ^~~~~~~~~~
* networking: Fix unknown pragma warning for non-MSVC compilers
GCC warns:
src/lib_ccx/networking.c:22: warning: ignoring #pragma warning [-Wunknown-pragmas]
22 | #pragma warning( suppress : 4005)
|
* networking: Fix unused variable warnings on non-Windows platforms
GCC warns:
src/lib_ccx/networking.c: In function ‘net_udp_read’:
src/lib_ccx/networking.c:342:12: warning: variable ‘addr’ set but not used [-Wunused-but-set-variable]
342 | in_addr_t addr;
| ^~~~
src/lib_ccx/networking.c:340:12: warning: unused variable ‘len’ [-Wunused-variable]
340 | socklen_t len = sizeof(source_addr);
| ^~~
src/lib_ccx/networking.c:338:7: warning: unused variable ‘ip’ [-Wunused-variable]
338 | char ip[INET_ADDRSTRLEN];
| ^~
* params: Fix unused variable warning when OCR is disabled
GCC warns:
src/lib_ccx/params.c: In function ‘version’:
src/lib_ccx/params.c:1015:8: warning: unused variable ‘leptversion’ [-Wunused-variable]
1015 | char *leptversion;
| ^~~~~~~~~~~
* params_dump: Fix empty encoding when ASCII is used
GCC warns:
src/lib_ccx/params_dump.c: In function ‘params_dump’:
src/lib_ccx/params_dump.c:110:2: warning: enumeration value ‘CCX_ENC_ASCII’ not handled in switch [-Wswitch]
110 | switch (ccx_options.enc_cfg.encoding)
| ^~~~~~
* params_dump: Fix comparison between mismatching enums
GCC warns:
src/lib_ccx/params_dump.c: In function ‘print_file_report’:
src/lib_ccx/params_dump.c:402:18: warning: comparison between ‘enum ccx_stream_type’ and ‘enum ccx_stream_mode_enum’ [-Wenum-compare]
402 | (info->stream == CCX_SM_TRANSPORT ||
| ^~
src/lib_ccx/params_dump.c:403:18: warning: comparison between ‘enum ccx_stream_type’ and ‘enum ccx_stream_mode_enum’ [-Wenum-compare]
403 | info->stream == CCX_SM_PROGRAM ||
| ^~
src/lib_ccx/params_dump.c:404:18: warning: comparison between ‘enum ccx_stream_type’ and ‘enum ccx_stream_mode_enum’ [-Wenum-compare]
404 | info->stream == CCX_SM_ASF ||
| ^~
src/lib_ccx/params_dump.c:405:18: warning: comparison between ‘enum ccx_stream_type’ and ‘enum ccx_stream_mode_enum’ [-Wenum-compare]
405 | info->stream == CCX_SM_WTV))
| ^~
* telxcc: Fix unused variable warning
GCC warns:
src/lib_ccx/telxcc.c: In function ‘process_telx_packet’:
src/lib_ccx/telxcc.c:928:10: warning: unused variable ‘flag_subtitle’ [-Wunused-variable]
928 | uint8_t flag_subtitle;
| ^~~~~~~~~~~~~
* ts_functions: Fix unused variable warnings
GCC warns:
src/lib_ccx/ts_functions.c: In function ‘get_pts’:
src/lib_ccx/ts_functions.c:642:11: warning: variable ‘pes_packet_length’ set but not used [-Wunused-but-set-variable]
642 | uint16_t pes_packet_length;
| ^~~~~~~~~~~~~~~~~
src/lib_ccx/ts_functions.c:641:10: warning: variable ‘pes_stream_id’ set but not used [-Wunused-but-set-variable]
641 | uint8_t pes_stream_id;
| ^~~~~~~~~~~~~
* ts_tables_epg: Fix warnings
GCC warns:
src/lib_ccx/ts_tables_epg.c: In function ‘EPG_add_event’:
src/lib_ccx/ts_tables_epg.c:380:6: warning: unused variable ‘isnew’ [-Wunused-variable]
380 | int isnew=true, j;
| ^~~~~
src/lib_ccx/ts_tables_epg.c: In function ‘EPG_DVB_decode_string’:
src/lib_ccx/ts_tables_epg.c:469:6: warning: variable ‘ret’ set but not used [-Wunused-but-set-variable]
469 | int ret=-1;
| ^~~
src/lib_ccx/ts_tables_epg.c: In function ‘EPG_ATSC_decode_EIT’:
src/lib_ccx/ts_tables_epg.c:802:25: warning: variable ‘emt_location’ set but not used [-Wunused-but-set-variable]
802 | uint8_t title_length, emt_location;
| ^~~~~~~~~~~~
src/lib_ccx/ts_tables_epg.c:764:10: warning: variable ‘table_id’ set but not used [-Wunused-but-set-variable]
764 | uint8_t table_id;
| ^~~~~~~~
src/lib_ccx/ts_tables_epg.c: In function ‘EPG_ATSC_decode_VCT’:
src/lib_ccx/ts_tables_epg.c:837:10: warning: variable ‘table_id’ set but not used [-Wunused-but-set-variable]
837 | uint8_t table_id;
| ^~~~~~~~
src/lib_ccx/ts_tables_epg.c: In function ‘EPG_DVB_decode_EIT’:
src/lib_ccx/ts_tables_epg.c:883:10: warning: variable ‘segment_last_section_number’ set but not used [-Wunused-but-set-variable]
883 | uint8_t segment_last_section_number;
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~
src/lib_ccx/ts_tables_epg.c:882:10: warning: variable ‘last_section_number’ set but not used [-Wunused-but-set-variable]
882 | uint8_t last_section_number;
| ^~~~~~~~~~~~~~~~~~~
src/lib_ccx/ts_tables_epg.c: In function ‘parse_EPG_packet’:
src/lib_ccx/ts_tables_epg.c:1041:11: warning: unused variable ‘transport_error_indicator’ [-Wunused-variable]
1041 | unsigned transport_error_indicator = (tspacket[1]&0x80)>>7;
| ^~~~~~~~~~~~~~~~~~~~~~~~~
* matroska: Fix unused variable warning
The call is left alone since it might create a decoder context.
GCC warns:
src/lib_ccx/matroska.c: In function ‘matroska_save_all’:
src/lib_ccx/matroska.c:1182:27: warning: unused variable ‘dec_ctx’ [-Wunused-variable]
1182 | struct lib_cc_decode *dec_ctx = update_decoder_list(mkv_ctx->ctx);
| ^~~~~~~
* utility: Only define MIN when necessary
GCC warns:
In file included from src/lib_ccx/ccx_demuxer.h:8,
from src/lib_ccx/lib_ccx.h:15,
from src/gpacmp4/mp4.c:6:
src/lib_ccx/utility.h:8: warning: "MIN" redefined
8 | #define MIN(X, Y) (((X) < (Y)) ? (X) : (Y))
|
In file included from src/gpacmp4/gpac/tools.h:33,
from src/gpacmp4/gpac/isomedia.h:50,
from src/gpacmp4/mp4.c:5:
src/gpacmp4/gpac/setup.h:324: note: this is the location of the previous definition
324 | #define MIN(X, Y) ((X)<(Y)?(X):(Y))
|
* Implement subtitle modification for all 608 encoders
This is done by modifying the subtitles in `ccx_encoders_common.c`
rather than per encoder.
* Use `char *` instead of subtitle data to capitalize
* Implement subtitle modification for OCR encoders
* Remove signness warnings
* Remove two-word profanity
They do not work for the moment
* Deal with different encoding
* Mention in changelog
* scc: Reformat control code list
- Separate sections with a blank line
- Align with 4-wide tabs rather than spaces
- Rewrite some comments
* scc: Revamp control code handling
This can be made much more readable by adding a small info struct that
contains all the information about a control code (first byte odd &
even, second byte, and assembly). Information is stored in and retrieved
from an array, created using an array initializer with the enum values
as indices.
This allows us to remove the massive switch-case blocks, leading to much
cleaner and more streamlined code.
* scc: Fix character pair writing
The space was being inserted in the wrong position, so the first
character of each caption was being cut off. The last character was also
cut off in captions with even lengths.
Reported-By: Nils ANDRÉ-CHANG <nils@nilsand.re>
* scc: Apply pair writing to control codes
The same mandatory pair logic applies here.
* scc: Fix timing and lingering captions
- Write EDM codes at end times to clear them from the screen as intended
by the captioners
- Show captions at the correct times:
- EOC+ENM *shows* the caption. It doesn't clear it -- that's EDM's job.
- The caption is *not* shown immediately after loading. EOC (End Of
Caption) is required for it to actually show.
Old behavior:
Start time: Load caption
End time: Show loaded caption
New behavior:
Start time: Load and show caption
End time: Clear displayed caption
These changes fix the issue where captions were always one line off --
that is, caption 1 would show when caption 2 was supposed to show.
* scc: Calculate frame number using a more precise frame rate
* scc: Fix timecode format specifiers
These are ints are unsigned.
* ocr: Fix minor memory leak
Detected by Valgrind:
==1203168== 2,880 bytes in 57 blocks are definitely lost in loss record 3 of 4
==1203168== at 0x483877F: malloc (vg_replace_malloc.c:309)
==1203168== by 0x51ADBEE: strdup (in /usr/lib/libc-2.30.so)
==1203168== by 0x24D1F8: ocr_bitmap (ocr.c:569)
==1203168== by 0x24E25B: ocr_rect (ocr.c:907)
==1203168== by 0x284832: write_dvb_sub (dvb_subtitle_decoder.c:1665)
==1203168== by 0x284B7A: dvbsub_handle_display_segment (dvb_subtitle_decoder.c:1720)
==1203168== by 0x285024: dvbsub_decode (dvb_subtitle_decoder.c:1828)
==1203168== by 0x2406AF: process_data (general_loop.c:648)
==1203168== by 0x2416D0: general_loop (general_loop.c:1025)
==1203168== by 0x1AC89A: api_start (ccextractor.c:214)
==1203168== by 0x16EC03: main (ccextractor.c:536)
* changes: Document OCR memory leak fix
* eia608: Re-use constant rather than hard-coding length in arrays
Hard-coding them is less clear and more prone to breakage.
* eia608: Add and use constant for max number of rows
Hard-coding it everywhere is unclear and prone to breakage.
* eia608: Initialize colors and fonts properly with a loop
memset is for single-byte types; an enum is defined to be the size of an
int, so using memset to fill an array of enum values is incorrect.
Fix it by using a simple loop to fill the elements, as there is no
memset-like function for arbitrary item lengths in C.
GCC warns:
src/lib_ccx/ccx_decoders_608.c: In function ‘clear_eia608_cc_buffer’:
src/lib_ccx/ccx_decoders_608.c:111:3: warning: ‘memset’ used with length equal to number of elements without multiplication by element size [-Wmemset-elt-size]
111 | memset(data->colors[i], context->settings->default_color, CCX_DECODER_608_SCREEN_WIDTH + 1);
| ^~~~~~
src/lib_ccx/ccx_decoders_608.c:112:3: warning: ‘memset’ used with length equal to number of elements without multiplication by element size [-Wmemset-elt-size]
112 | memset(data->fonts[i], FONT_REGULAR, CCX_DECODER_608_SCREEN_WIDTH + 1);
| ^~~~~~
Adds a GitHub Action that will build CCExtractor for Windows with msbuild. It will build in Release mode and Debug mode, without OCR or other features enabled.
* Fix free segfault
I restricted the scope and used free because the features of freep
aren't needed here.
Restricting the scope makes it clear when freeing the variable should be
done.
* Mention that freeing should be done
* Fix indentation, use switch instead of if
* Remove confusing comment
Enums are abstractions and should be used as such. They shouldn't be
used like integers.
* Return a const char* instead of char * allocated on heap
* Test return value inline
* Add SCC output
* Add CCD format
* Add channel header to CCD
* Return const pointer
* Revert formatting change
* Colour -> Color
* Fix formatting
* Move comment to relevant place
* Improve readability
* Fix formatting
* Fix erroneous comment
* Use different parity function not requiring GNU extension
* Use enum instead of int
* Fix bug
* Implement channel functionality
* Fix CI errors
* Fix CI build
* Add options to help menu
* Mention change in changelog
* Add file to build systems
* Remove uneeded link against zlib
* Remove the use of <stdbool.h> and use const char
* Rewrite SCC formatter
* Use fdprintf
* Added the option to disable timestamps for WebVTT
* Mentioned in changelog
* Added the option to params.c
* Encoder checks its context nwo
* Encoder checks its context
* Fix indentation
* Calculate subs_delay in encode_sub rather than in the individual encoders
Fix#1103
* Use precalculated times when sub->type == CC_TEXT
* Use calculate delay in encode_sub when sub->type == CC_608
* Added underline support
* Added changes to CHANGES.TXT
* Delete CHANGES.TXT~
* Delete .CHANGES.TXT.un~
* Update CHANGES.TXT
* Changed strncpy to memcpy when the size of the data being transferred is known
* Add declaration of struct image_copy before function
* Used strdup for duplicating strings
* Added error checking for strdup
* Added support for <i> and <b> tags
* Deleted code support bold
* Added -italics flag to sepcify italics font
* Added function for initializing freetype font face objects
* Added support for color
* Make header respect `-lf`
* [ccx_encoders_webvtt.c] Use the ternary operator to select line endings
* Use sprintf for choosing line ending and use ternary operator
* Revert
For people new to the software it can be a challenge to use it for the first time. By adding this to the README they can see the file formats supported and how the software works without having to search for their own file. This will be especially helpful to the many new GCI students who likely don't have much experience in the TV industry but want to learn how the software works.
- Improve the structure of package installation command to make it easy to copy and paste
- Improve the formatting of code blocks by mentioning language as specified by MD
* no styling unless in full mode
* part 1 of moving style to here
* no style header unless requested with webvtt-full
* only one new line to support x-timestamp-map
* move x-timestamp-map up to abide by specifications
and support ffmpeg and brightcove
* remove stray new line, crlfs are added upstream
297 seems to contain a null bug
* don't write null characters to sub file
* needed space after -full mode style
* typo
* Fix macOS travis build and remove linux builds.
* Add Apple logo for macOS build badge.
* Link the apple logo to travis build.
* Correct redundant compiler type.
* [FIX] Fix incorrect comparison of strings for AVC codec id in .mkv
* Initial work on adding DVB support to .mkv
* [REQUEST] Finished adding support for DVB inside MKV (#1000)
* Update CHANGES.TXT
At least in Ubuntu 18.04 (possibly the related Debian version and newer Ubuntus) the package `tesseract-ocr-dev` does not exist anymore. It was replaced by `libtesseract-dev`.
* Fixed the icon file not found error for windows and linux.
* Optimized distribution of icons and removed CSW dependency while running GUI
* Font and icons are now loaded directly from memory
* Added source icons and icons to C array convertor
* Destroy pix after use and release memory
* Free the frame and any dynamically allocated objects in it
* Fix typographical error
* Free the packet that was allocated by av_read_frame
* Add missing declarations
* update for windows priority
* Update ccx_decoders_708.h
* to solve timing issue bugs
one of many instances where data is received without any window defined in decoder.
* Update ccx_decoders_708.c
Currently the instructions to install with hardsubx are vague and a new method was added in PR #966
This method makes installation with HARDSUBX easy and hence has been added to the documentation.
There are no parameters as o1 or o2 in ccextractor but they have been mentioned in the help.
This commit removes such instances from file(s).
Change severity: trivial
* Add Updated GPAC
File changes have been directly inserted from libGPAC master into ccextractor's libGPAC.
This has resulted into removal of multiple custom functions and minor changes. These will be rectified in the next step of the updation.
Change severity: Very High
* Update libGPAC dependency
We use libGPAC for all our MP4 operations and, this commit updates it to the latest version.
All previous changes to the original library were restored post straight file updation and bugs have been removed.
change severity: very high
* Add Guide To Updating Dependencies
A small textual guide on how to update dependencies easily and efficiently.
strdup will give a segmentation fault if the argument passed to it is
NULL. TessResultIteratorGetUTF8Text returns a char* which can be NULL
and we should not call strdup directly over it. Once we check if the
value returned is not NULL, then we can call strdup.
* [IMPROVEMENT] Add LICENSE File
We should be adding a LICENSE File to the root of the project. We do mention that we follow GPL v2 and hence can include it's declaration file.
* Rename LICENSE to LICENSE.txt
The start and end timestamps of extracted burned in captions are flawed
and off by a large difference. Also, the start time of the first burned
in caption extracted is always zero, which is not always the case. And
the extracted captions always appear in continuous timestamps.
This commit improves the start and end timestamps of the extracted
burned in captions and reduces the error significantly, bringing the
timestamps fairly close to the actual timings as they appear in the
media file.
Add instructions to make the installation systemwide (on Linux) which can allow CCExtractor to be used from anywhere with just the below command in terminal:
`ccextractor [videofile]`
This commit adds some checks to avoid segmentation faults.
* In `add_cc_sub_text()`, strdup will cause a segfault if we duplicate an
empty string.
* In `init_encoder()`, initialize pointer fields to NULL to avoid random
addressing so we can avoid illegal memory accessing and segfaults in
other places.
To build ccextractor with hardsubx support on linux, we need to configure
ccextractor with the `-enable-hardsubx` switch along with the
`ENABLE_HARDSUBX` flag passed during compilation with make. This commit
adds the missing configure instruction.
* Added tags file and removed the previosly wrongly writtern file
* Added .vscode to visual code section
* Added the .tags in .gitignore
* Changed *.tags to *.tags*
write_dvb_sub(): Test for out of bounds and report details when this happens. Still doesn't fix the underlying issue but will help figure it out.
ocr.c: Solve malloc()/delete[] combinations that happened when operating on tesseract output. Now a malloc()'ed copy is immediately made, tesseract's results are unallocated using tesseract's delete function, and we continue using our own copy which is later free()'d.
ccx_encoders_srt.c: Made sure a pointer is non-NULL before dereferencing.
dvb_subtitle_decoder.c: Initialize pointer members to NULL when creating a structure.
lib_ccx.c: Initialize (memset 0) structure cc_subtitle after memory allocation.
README.TXT: Removed reference to sourceforge.
* Implementation of text renderer
* Fix some characters being cut
* Fix encoding and other bugs
* Add black background & fix bugs
* Fix more bugs
* Change to relative path
* Add a font option & Default font for MacOS & Fix anti-aliasing
* Document -font & enlarge default canvas
* function header in "get_more_data" functions were standarlized.
* Unnecessary stream_mode check inside the while loop was removed.
* terminate_asap if condition was moved to while condition.
* Unnecessary condition was removed.
compilation warnings before:
../src/gpacmp4/avilib.c:35:0: warning: "PACKAGE" redefined
#define PACKAGE "GPAC/avilib"
<command-line>:0:0: note: this is the location of the previous definition
../src/gpacmp4/avilib.c:36:0: warning: "VERSION" redefined
#define VERSION GPAC_FULL_VERSION
* Update README.md
* Delete README.MAC.TXT
No longer accurate given work done to integrate Mac into build processes.
* Change to use project's PNG/ZLIB libraries
* Fix Mac build command
Makes OCR an optional parameter
Adds python API file to build
* Update README.md
When the -out=raw option is used, the ccextractor jumped to spupng output
format, generating broken files in spupng format without CC data.
With this fix, now it generates CC data in McPoodle's Broadcast format.
* Removed all extractors except the grid extractor.
Removed the call to transcript extractor in ccx_encoders_transcript.c
* Removed unnecessary array appening statements in python_grid_extractor.
WIP: switch in extractor.
* Added switch in g608 grid extractor.
* Deleted comments from wrappers.
* Refactored code in ccextractor.c and .h files.
Removed all the commented part.
Made proper changes according to the coding conventions.
* Removed calls to extractor from all the encoders.
The only call made to extractor is from ccx_encoders_python.c.
* Removed a comment from wrapper.c.
In init_write function of output.c added a call to free the output string returned by asprintf in case of
sending filename to callback function.
* Added calls to free the char* which is malloced by asprintf in
extractor.c
WIP: Free the global variable elements.
* Sample testing correctly for italics tag.
Also added a hack to print only 32 characters when unicode fails.
WIP: Font tag.
* Added support for handling font and italics in Python SRT generator.
* modified the font generator.
Also, added count method for checking blank strings in
python_srt_generator.
* Added free statements for avoiding memory leaks.
* added return code for failure of asprintf calls.
* Removing unnecessary code from api_testing.py
* Made modifications to Makefile and build script.
* Added recursive_tester.py
Autoconf builds successfully.
* BUG: Made change to get_line_encoded to encode the last \0 character in a
line. Otherwise the EOL characted is absent causing garbage value to be
present in SRT.
* Exporting the encoding of the captions from CCExtractor to Python so
that the python SRT generator can generate proper SRT files.
* Modified the include statement in extractor.h
* added python_extract to encoders_srt and the captions are being
extracted in needed format. Search for an alternative to asprintf
* Checking if the alternative to asprintf generate proper srts
* CC captions accessible via python script
* Removing python caption code from __wrap_write function
* removing old cc_to_python functions
* Removing python_subs structure and all the changes done for that struct
* Removing filename functions from ccextractor.*
* Renaming make_message to time_wrapper
* Applying to python_extract codebase: SSA format
* Added python_extract_time_based and done validation for ssa
* pplying python_extract_time_based: Done validation for srt and webvtt
* led attempt for SAMI support of python_extract. Code is commented
* Appluing python_extract_time_based: validate support for SMPTETT
* Added python_extract_transcript and made changes for time printing.
* added show_extracted_captions_wtih_timings function
* Added show_extracted_captions_with_timings to python script for testing
purpose.
* refactored extractors to api directory. commented out show captions in main()
* build and build library working for the extractors.
* made caption generator work with a 0.1 time sleep. Start refactoring
* added asprintf for windows.
* file being written in the running directory
* Auto -deletion of python temporary file
* Python captions printing status set to proper.
* termination of tail successful
* Writing successful for the sample
* Generating unalternating output
* adding api_support.py
* Adding bld_flags in build_api
* Added to build_library
* Auto deletion of temporary file on SIGINT
* Discussing Seg fault with Izaron
* working for python and linux with samples. testing -out=pythonapi with stream
* Done adding bitmap support
* added -out=pythonapi support for bitmap
* Setting the messages_target to 0 for output = pythonapi
* Added wrapper for setting -out=pythonapi. Checking if -stdout value can be used in python.
* adding the cc_to_stdout=1 value for -out=pythonapi. Thus generation of output file has been avoided. May be needed to change in future.
* added extractor for g608 grid. removed sami extractor. need to work on overlap of -out=pythonapi and -out=g608
* Removed overlap of -out=pythonapi by adding -pythonapi and
signal_python_api global variable.
* added support for seperate c608 grid catching. Need to test the output
via python.
* added support for seperate printing of text font and color in CE608.
Need to make sure that the function is inbuilt.
* ADDED ce608 GRID SUPPORT FROM PYTHON
need to discuss whether to keep the print_cc_grid function specific to
the module or make it user accessible.
Mostly it would be better to make it user accessible.
* made changes in the call_from_python_api function such that only
api_options is needed to be passed.
An if statement before the call to g608_extractor has also been added.
Waiting for Carlos to comment on the output generated till this stage.
* added a signal_python_api check before calling every write function.
Thus basic writing output can be avoided.
* Commented all calls to python_extract_time_based.
making changes to python_extract_g608 to be called only from the point
when a g608 caption is detected.
* Added pass_cc_buffer_to_python in encoders_common.c temporarily
redefined get_*_encoded from static to normal
included the above functions in encoders_common.h
* Added if-else statement for switch in encode_sub function.
This is done mainly for making sure no output is generated in the api
call.
* Added ccx_encoders_python.c
Defined pass_cc_buffer_to_python in ccx_encoders_python.c
added if else statement in encode_sub's switch to make sure that the output is not generated in case of -pythonapi call
* Removed __wrap_write from the entire code base.
It's declaration and definition are only present in CCExtractor.*
* Commented out the /dev/null part in ccx_encoders_common.c.
Proceeding further on checking for file generation.
* Added output_filename in array global variable and is generated in
init_write function.
included ccextractor.h in output.c to access global variable
signal_python_api for avoiding output generation in init_write and
invalid free in dinit_write.
* Modified the definition of init_write function for accessing
signal_python_api.
* Deleted the commented part of /dev/null in ccx_encoders_common.c.
* Added target_message=0 in -pythonapi param parsing in param.c to avoid
the API from printing to STDOUT.
Deleted the commented part of -out=pythonapi.
Thinking of adding a different param for silencing the output when the
call is made from python api.
* Removed __wrap_write from ccextractor.c and ccextractor.h.
* Added ccx_to_python_g608 and modified api_support.py file.
added documentation in ccextractor.c.
* added the generate srt script. However, some random characters are
coming in first line. Need to talk about this.
* Added SRT generator for python.
Using string to remove the garbage value.
Add code for srt counter and also the start_time and end_time
conversion.
* removed the trash characters and added code to print the timings.
However, the last blank frame also results in a print. Need to take care
of this.
* rectified the mistake of writing only timings and not captions.
now next step is to just make the timings print properly
* some minor changes before diving into extracting srt_counter from the made codebase
* Added extraction of srt_counter in python_extract via fflush
srt_counter-value.
Need to modify the processing in python.
* Added the entire method to extract captions and generate srt files. Next, step would be a to define a concise function for writing the srt
* Processing into a srt working properly.
Next step is to add the information of font into the caption text.
* the data is getting generated for proper SRT counters.
* A turning point to the appraoch.
Added END OF FRAME line for printing the data for every particular
srt_counter.
Proceeding further with the generation of srt by data manipulation.
* some minor bugs but the output srt is being generated correctly. However, The font and colour encoding needs to be done.
* Taken care of random characters. Need to discuss this with Carlos. Moving further to font/color processing.
* Taken care of random characters. Need to discuss this with Carlos. Moving further to font/color processing.
* Added fflush and cleaned up the python code of srt generation
* Added <i> tag for italics.
Proceeding further with other types.
* Added the code to check for underline.
However, need to check how CCExtractor generates srt when both italics
and underline are present. For now a new line is added if both are
present.
* Shifting for making changes in th i/O work.
* Stable ouput for samples with italics is being generated.
* Added the PYTHONAPI macro definition and testing for its existence in the set_python_api function.
* build script for linux is working correctly.
Build_library is showing error of invalid def of set_pythonapi.
Moreover, extractor has some memory seg fault.
* Added mod to set a MACRO as my_python_api to set the callback function.
Till now all calls to the reporter are commented.
Working on getting the reporter to print the lines.
* Changes have been implemented to bring reporter in working state.
For now a constant string is passed from extractor. Need to make the
proper parsing possible.
* Changed the code in extractor such that entire grid is returned to the
callback function.
Need to provide this grid to the write function and also cleanup the
codebase.
* Writing the outputted srt in a file called "temp.srt".
Need to modify init_write to push filename that is to be created in
python using callback.
* Added code to get start and end time simultaneously.
entire SRT is getting generated.
* removed ccx_python_encoders.c
* Compiling and executing on Windows
* Moved definitions get_line_encoded, get_color_encoded, get_font_encoded from ccx_encoders_g608.c to ccx_encoders_common.c.
Also, deleted the static definition of get_font_encoded from
ccx_encoders_webvtt.c
* added a write statement in write_cc_bitmap_as_srt
* Rectified transfer of get_line_encoded, get_color_encoded and
get_font_encoded from ccx_decoders_common.c to ccx_encoders_common.c.
Segmenting now doesn't destroys the whole encoding context, just closes and reopens the output file
Correct a wrong function prototype for process_hex()
OCR: Attempt to correctly deal with TessBaseAPIRecognize returning an error
Changed output for parse PMT to CCX_DMT_PMT instead of CCX_DMT_VERBOSE
- TS: If we don't have pinfo don't pay attention to the current_next_indicator bit.
(fixes problem with The Lion Guard_20170321_09301000.ts). Not sure this fix is the correct one but that's what VLC does.
If no -o is suppled with stdin/network etc, the output name generated
was NULL, leading to creation of files like `.srt` which were in
category of hidden files.
Please read and understand the contribution guide before creating an issue or pull request. We would like to thank [Nishad TR](https://github.com/nishad) for their contributor's guide, upon which we based ours.
## Etiquette
This project is open source, and as such, we (the maintainers) give our **free time** to build, maintain and **provide user support** for the CCExtractor program. We make the code freely available in the hope that it will be of use to other developers and users. It would be extremely unfair for us to suffer abuse or anger for our hard work.
Please be considerate towards the developers and other users when raising issues or presenting pull requests.
It's the duty of the maintainer to ensure that all submissions to the project are of sufficient quality to benefit the project. Many developers have different skillsets, strengths, and weaknesses. Respect the decision of the maintainers, and do not be upset or abusive if your submission is not used.
## Viability
When requesting or submitting new features, first consider whether it might be useful to others. Open source projects are used by many developers, who may have entirely different needs to your own. Think about whether or not your feature is likely to be used by other users of the project.
## Procedure
**Before filing an issue**:
- Attempt to replicate the problem, to ensure that it wasn't a coincidental incident.
- Check to make sure your feature suggestion isn't already present within the project.
- Check the pull requests tab to ensure that the bug doesn't have a fix in progress.
- Check the pull requests tab to ensure that the feature isn't already in progress.
**Before submitting a pull request**:
- Ensure that your submission is [viable](#viability) for the project.
- Check the codebase to ensure that your feature doesn't already exist.
- Check the pull requests to ensure that another person hasn't already submitted the feature or fix.
## Technical requirements
- Before Submitting your Pull Request, merge `master` with your new branch and fix any conflicts. (Make sure you don't break anything in development!)
- Commit Unix line endings.
- Make sure to reasonably test your code. We have a sample platform that runs a test-suite for you, but it only covers a general set of tests.
Please prefix your issue with one of the following: [BUG], [PROPOSAL], [QUESTION].
To get the version of CCExtractor, you can use `--version`.
If this issue is related to the flutter GUI, please make the issue on the GUI repo [here](https://github.com/CCExtractor/ccextractorfluttergui/issues/new)
Please check all that apply and **remove the ones that do not**.
In the necessary information section, if this is a regression (something that used to work does not work anymore), make sure to specify the last known working version.
Only specify the minimum number of arguments needed to reproduce the issue.
In the additional information section, describe your problem.
Please make the affected input file available for us (no screenshots, those don't help!). Public links to Dropbox, Google Drive, etc, are all fine. If it is not possible to make it available publicly, send us a private invitation (both Dropbox and Google Drive allow that). In this case we will download the file and upload it to the private developer repository. Methods to send the private invitation to us can be found [here](https://ccextractor.org/public:general:support#email).
Do **not** upload your file to any location that will require us to sign up or endure a wait list, slow downloads, etc. If your upload expires make sure you keep it active somehow (replace links if needed). Keep in mind that while we go over all tickets some may take a few days, and it's important we have the file available when we actually need it.
Make sure to enable notifications in GitHub so you get notifications about your ticket. We may need to ask questions and we do everything inside GitHub's system.
Once you have read all of the instructions **delete all the text from here to the top**.
CCExtractor version: {replace with the version}
# In raising this issue, I confirm the following:
- [ ] I have read and understood the [contributors guide](https://github.com/CCExtractor/ccextractor/blob/master/.github/CONTRIBUTING.md).
- [ ] I have checked that the bug-fix I am reporting can be replicated, or that the feature I am suggesting isn't already present.
- [ ] I have checked that the issue I'm posting isn't already reported.
- [ ] I have checked that the issue I'm porting isn't already solved and no duplicates exist in [closed issues](https://github.com/CCExtractor/ccextractor/issues?q=is%3Aissue+is%3Aclosed) and in [opened issues](https://github.com/CCExtractor/ccextractor/issues)
- [ ] I have checked the pull requests tab for existing solutions/implementations to my issue/suggestion.
- [ ] I have used the latest available version of CCExtractor to verify this issue exists.
- [ ] I have ticked all the boxes in this section and to prove it I'm deleting the section completely to remove boilerplate text.
# Necessary information
- Is this a regression (i.e. did it work before)? {YES/NO}
- What platform did you use? {Window/Linux/Mac}
- What were the used arguments? `{replace with the arguments}`
# Video links
* {Replace with a link to a video file}
# Additional information
{issue content here, replace this line with your issue content}
CCExtractor is a tool that produces subtitles from TV use. Global accessibility (all users, all content, all countries) is the goal. With so many different formats, this is a constantly moving target, but we intend to keep up with all sources and formats.
[](https://sampleplatform.ccextractor.org/test/master/windows)
[](https://sampleplatform.ccextractor.org/test/master/linux)
[](https://github.com/CCExtractor/ccextractor/releases/latest)
Carlos' version (mainstream) is the most stable branch.
CCExtractor is a tool used to produce subtitles for TV recordings from almost anywhere in the world. We intend to keep up with all sources and formats.
Subtitles are important for many people. If you're learning a new language, subtitles are a great way to learn it from movies or TV shows. If you are hard of hearing, subtitles can help you better understand what's happening on the screen. We aim to make it easy to generate subtitles by using the command line tool or Windows GUI.
The official repository is ([CCExtractor/ccextractor](https://github.com/CCExtractor/ccextractor)) and master being the most stable branch.
### **Features**
- Extract subtitles in real-time
- Translate subtitles
- Extract closed captions from DVDs
- Convert closed captions to subtitles
### Programming Languages & Technologies
The core functionality is written in C. Other languages used include C++ and Python.
## Installation and Usage
Downloads for precompiled binaries and source code can be found [on our website](http://www.ccextractor.org/doku.php?id=public:general:downloads).
Downloads for precompiled binaries and source code can be found [on our website](https://ccextractor.org/public/general/downloads/).
### Windows Package Managers
**WinGet:**
```powershell
wingetinstallCCExtractor.CCExtractor
```
**Chocolatey:**
```powershell
chocoinstallccextractor
```
**Scoop:**
```powershell
scoopbucketaddextras
scoopinstallccextractor
```
Extracting subtitles is relatively simple. Just run the following command:
```ccextractor <input>```
`ccextractor <input>`
This will extract the subtitles.
This will extract the subtitles.
More usage information can be found on our website:
- [Using the command line tool](http://www.ccextractor.org/doku.php?id=public:general:command_line_usage)
- [Using the Windows GUI](http://www.ccextractor.org/doku.php?id=public:general:win_gui_usage)
- [Using the command line tool](https://ccextractor.org/public/general/command_line_usage/)
- [Using the Flutter GUI](https://ccextractor.org/public/general/flutter_gui/)
You can also find the list of parameters and their brief description by running `ccextractor` without any arguments.
## Compiling
You can find sample files on [our website](https://ccextractor.org/public/general/tvsamples/) to test the software.
### Debian/Ubuntu
### Building from Source
Install these packages in the terminal
- [Building on Windows using WSL](docs/build-wsl.md)
sudo apt-get install -y gcc
sudo apt-get install -y libcurl4-gnutls-dev
sudo apt-get install -y tesseract-ocr
sudo apt-get install -y tesseract-ocr-dev
sudo apt-get install -y libleptonica-dev
Then run script linux/build or linux/builddebug.
#### Linux (Autotools) build notes
### Windows
CCExtractor also supports an autotools-based build system under the `linux/`
directory.
Open the windows/ccextractor.sln file with Visual Studio (2015 at least), and build it. Configurations "(Debug|Release)-Full" includes dependent libraries which are used for OCR.
Important notes:
- The autotools workflow lives inside `linux/`. The `configure` script is
generated there and should be run from that directory.
- Typical build steps are:
```
cd linux
./autogen.sh
./configure
make
```
- Rust support is enabled automatically if `cargo` and `rustc` are available
on the system. In that case, Rust components are built and linked during
`make`.
- If you encounter unexpected build or linking issues, a clean rebuild
(`make clean` or a fresh clone) is recommended, especially when Rust is
involved.
This build flow has been tested on Linux and WSL.
## Compiling CCExtractor
To learn more about how to compile and build CCExtractor for your platform check the [compilation guide](https://github.com/CCExtractor/ccextractor/blob/master/docs/COMPILATION.MD).
## Support
By far the best way to get support is by opening a support ticket at our [issue tracker](https://github.com/CCExtractor/ccextractor/issues).
By far the best way to get support is by opening an issue at our [issue tracker](https://github.com/CCExtractor/ccextractor/issues).
When creating a ticket:
When you create a new issue, please fill in the needed details in the provided template. That makes it easier for us to help you more efficiently.
- Make sure you are using the latest CCExtractor version.
- If it's a new issue (for example a video file that a previous CCExtractor version processed fine but now causes a crash), mention the last version you know was working.
- If the issue is about a specific file, make that file available for us. Don't just send us the output from CCExtractor, as we can't do anything about a screenshot that shows a crash. We need the input that actually causes it. You can upload the file to Dropbox, Google Drive, etc, and make it public so you get a download link to add to your ticket.
- If you cannot make the file public for any (reasonable) reason you can send us a private invitation (both Dropbox and Google Drive allow that). In this case we will download the file and upload it to the private developer repository.
- Do not upload your file to any location that will require us to sign up or endure a wait list, slow downloads, etc.
- If your upload expires make sure you keep it active somehow (replace links if needed). Keep in mind that while we go over all tickets some may take a few days, and it's important we have the file available when we actually need it.
- Make sure you set an alert in GitHub so you get notifications about your ticket. We may need to ask questions and we do everything inside GitHub's system.
- Please use English.
- It goes without saying, we like polite people.
If you have a question or a problem you can also [contact us by email or chat with the team in Slack](https://ccextractor.org/public/general/support/).
If you want to contribute to CCExtractor but can't submit some code patches or issues or video samples, you can also [donate to us](https://sourceforge.net/donate/index.php?group_id=190832)
You can also [contact us by email or chat with the team in Slack](http://www.ccextractor.org/doku.php?id=public:general:support).
## Contributing
You can contribute to the project by forking it, modifying the code, and making a pull request to the repository.
You can contribute to the project by reporting issues, forking it, modifying the code and making a pull request to the repository. We have some rules, outlined in the [contributor's guide](.github/CONTRIBUTING.md).
## News & Other Information
News about releases and modifications to the code can be found in the `CHANGES.TXT` file.
News about releases and modifications to the code can be found in the [CHANGES.TXT](docs/CHANGES.TXT) file.
For more information visit the CCExtractor website: [http://www.ccextractor.org](http://www.ccextractor.org)
For more information visit the CCExtractor website: [https://www.ccextractor.org](https://www.ccextractor.org)
The easiest way to install CCExtractor for Mac and Linux is through Homebrew:
```bash
brew install ccextractor
```
Note: If you don't have Homebrew installed, see [brew.sh](https://brew.sh/)
for installation instructions.
---
# Compiling CCExtractor
You may compile CCExtractor across all major platforms using `CMakeLists.txt` stored under `ccextractor/src/` directory. Autoconf and custom build scripts are also available. See platform specific instructions in the below sections.
Downloads for precompiled binaries and source code can be found [on our website](https://www.ccextractor.org?id=public:general:downloads).
### Hardsubx (Burned-in Subtitles) and FFmpeg Versions
CCExtractor's hardsubx feature extracts burned-in subtitles from videos using OCR. It requires FFmpeg libraries. The build system automatically selects appropriate FFmpeg versions for each platform:
- **Linux**: FFmpeg 6.x (default)
- **Windows**: FFmpeg 6.x (default)
- **macOS**: FFmpeg 8.x (default)
You can override the default by setting the `FFMPEG_VERSION` environment variable to `ffmpeg6`, `ffmpeg7`, or `ffmpeg8` before building. This flexibility ensures compatibility with different FFmpeg installations across platforms.
## Docker
You can now use docker image to build latest source of CCExtractor without any environmental hustle. Follow these [instructions](https://github.com/CCExtractor/ccextractor/tree/master/docker/README.md) for building docker image & usage of it.
Rust 1.54 or above is also required. [Install Rust](https://www.rust-lang.org/tools/install). Check specific compilation methods below, on how to compile without rust.
**Note:** On Ubuntu Version 23.10 (Mantic) and later, `libgpac-dev` isn't available, you should build gpac from source by following the easy build instructions [here](https://github.com/gpac/gpac/wiki/GPAC-Build-Guide-for-Linux)
**Note:** On Ubuntu Version 18.04 (Bionic) and later, `libtesseract-dev` is installed rather than `tesseract-ocr-dev`, which does not exist anymore.
**Note:** On Ubuntu Version 14.04 (Trusty) and earlier, you should build leptonica and tesseract from source
2. Compiling
### Using the build script
By default build script does not include debugging information hence, you cannot debug the executable produced (i.e. `./ccextractor`) on a debugger. To include debugging information, use the `builddebug` script.
```bash
# navigate to linux directory and call the build script
cd ccextractor/linux
# compile without debug flags
./build
# compile with debug info
./build -debug # same as ./builddebug
# compile with hardsubx (burned-in subtitle extraction)
# Hardsubx requires FFmpeg libraries. Different FFmpeg versions are used by default:
# - Linux: FFmpeg 6.x (automatic)
# - Windows: FFmpeg 6.x (automatic)
# - macOS: FFmpeg 8.x (automatic)
./build -hardsubx # uses platform-specific FFmpeg version
# To override the default FFmpeg version, set FFMPEG_VERSION:
FFMPEG_VERSION=ffmpeg8 ./build -hardsubx # force FFmpeg 8 on any platform
FFMPEG_VERSION=ffmpeg6 ./build -hardsubx # force FFmpeg 6 on any platform
FFMPEG_VERSION=ffmpeg7 ./build -hardsubx # force FFmpeg 7 on any platform
# [Optional] For custom FFmpeg installations, set these environment variables:
FFMPEG_INCLUDE_DIR=/usr/include
FFMPEG_PKG_CONFIG_PATH=/usr/lib/pkgconfig
# test your build
./ccextractor
```
### Standard linux compilation through Autoconf scripts
```bash
sudo apt-get install autoconf # dependency to generate configuration script
cd ccextractor/linux
./autogen.sh
./configure
make
# test your build
./ccextractor
# make build systemwide
sudo make install
```
### Using CMake
```bash
# create and navigate to directory where you want to store built files
cd ccextractor/
mkdir build
cd build
# generate makefile using cmake and then compile
cmake ../src/ # options here
make
# test your build
./ccextractor
# make build systemwide
sudo make install
```
`cmake` also accepts the options:
`-DWITH_OCR=ON` to enable OCR
`-DWITH_HARDSUBX=ON` to enable burned-in subtitles (requires FFmpeg)
For hardsubx with specific FFmpeg versions:
Set `FFMPEG_VERSION=ffmpeg6` for FFmpeg 6.x (default on Linux and Windows)
([OPTIONAL] For custom FFmpeg installations, set these environment variables)
FFMPEG_INCLUDE_DIR=/usr/include
FFMPEG_PKG_CONFIG_PATH=/usr/lib/pkgconfig
### Compiling with GUI
The GUI for CCExtractor has been moved to a separate repository ([https://github.com/CCExtractor/ccextractorfluttergui](https://github.com/CCExtractor/ccextractorfluttergui)).
## macOS
1. Make sure all the dependencies are met. Decide if you want OCR; if so, you'll need to install tesseract and leptonica.
Dependencies can be installed via Homebrew as:
```bash
brew install pkg-config
brew install autoconf automake libtool
brew install cmake gpac
# optional if you want OCR:
brew install tesseract
brew install leptonica
# optional if you want hardsubx (burned-in subtitle extraction):
brew install ffmpeg
```
If configuring OCR, use pkg-config to verify tesseract and leptonica dependencies, e.g.
```bash
pkg-config --exists --print-errors tesseract
pkg-config --exists --print-errors lept
```
### Compiling
#### Using build.command script:
```bash
cd ccextractor/mac
./build.command # basic build
./build.command -ocr # build with OCR support
./build.command -hardsubx # build with hardsubx (uses FFmpeg 8 by default on macOS)
# Override FFmpeg version if needed:
FFMPEG_VERSION=ffmpeg7 ./build.command -hardsubx
# test your build
./ccextractor
```
#### Using CMake
```bash
# create and navigate to directory where you want to store built files
cd ccextractor/
mkdir build
cd build
# generate makefile using cmake and then compile
cmake ../src/ # options here
make
# test your build
./ccextractor
```
`cmake` also accepts the options:
`-DWITH_OCR=ON` to enable OCR
`-DWITH_HARDSUBX=ON` to enable burned-in subtitles
#### Standard compilation through Autoconf scripts:
```bash
cd ccextractor/mac
./autogen.sh
./configure
make
# test your build
./ccextractor
```
#### Compiling with GUI:
The GUI for CCExtractor has been moved to a separate repository ([https://github.com/CCExtractor/ccextractorfluttergui](https://github.com/CCExtractor/ccextractorfluttergui)).
## Windows
Dependencies are clang and rust. To enable OCR, rust x86_64-pc-windows-msvc or i686-pc-windows-msvc target should be installed
GPAC is also required, you can install it through chocolatey:
```
choco install gpac
```
Other dependencies are required through vcpkg, so you can follow below steps:
1. Download vcpkg (prefer version `2023.02.24` as it is supported)
2. Integrate vcpkg into your system, run the below command in the downloaded vcpkg folder:
```
vcpkg integrate install
```
3. Set Environment Variable for Vcpkg triplet, you can choose between x86 or x64 based on your system.
```
setx VCPKG_DEFAULT_TRIPLET "x64-windows-static"
setx RUSTFLAGS "-Ctarget-feature=+crt-static"
```
4. Install dependencies from vcpkg
In this step we are using `x64-windows-static` triplet, but you will have to use the triplet you set in Step 3
Note: Following screenshots and steps are based on Visual Studio 2017, but they should be more or less same for other versions.
1.Open `windows/` directory to locate `ccextractor.vcxproj` and `ccextractor.sln` (red arrow).

2.Accept the security prompt (if any), to proceed with compilation.

3.Using Visual Studio (2015 or above), open ccextractor.sln. This will build both CCExtractor and its GUI. To build them separately, open the respective .vcxproj file.
4.In Solution Explorer, you'll see two projects with the VS version and Windows release version in parenthesis. Change them to parameters which are true for you by clicking right mouse button on project and selecting properties.


5.Right click and select `build` to compile the project and generate executable file.

6.Find the executable file in `Debug` or `Release` folder, based on selected configuration.
- This layer will have function names the same as defined in C but with the prefix `ccxr_`. These are the functions defined in the `lib_ccx` crate under appropriate modules. And these functions will be provided to the C library.
A guide to how dependencies should be updated in CCExtractor.
Author: thealphadollar
======================
CCExtractor depends on multiple dependencies and they are updated from time to time. On every major revision of the dependencies, the changes need to be incorporated into our repository.
It is not straightforward since we make minor (or sometimes major) changes into the library to use it and these changes are lost in case of direct file replacement. To overcome this issue, we should follow the below pathway.
*) Create a duplicate copy of the CCExtractor's folder of the library, to be updated (we will be calling this folder lib(copy) in steps and original one as lib).
*) Download the latest files of the library from official source (the folder is called as lib(orig) in further steps).
*) Look for files with the same name in lib and lib(orig). It can be done manually in case of small libraries (libpng), otherwise a script can be written utilising the grep command to find out files from the library which we use.
*) In lib, replace all the files (found in previous step) with their updated versions from lib(orig). A copy command can be used in the script written for the previous step to accomplish this step.
Now, the files in our repository have been updated. In steps to follow, we will try to grab lost changes using lib(copy).
*) Run diff command between lib(copy) and lib for all files and store the output in a text document. Here files from lib(copy) should be given as first argument to notice deletions clearly.
*) Look for deletions in an updated file and manually inspect (or ask mentor) whether that part is to be restored or not. In most cases, it is to be restored but it's better to ask than to break.
Once the changes have been restored, try to compile CCExtractor. It is very much likely that the compilation will fail. The most probably reason for this could be inclusion of unnecessary lines of code and their accompanying dependencies.
e.g "X is not defined" can be an error when we don't include the file in which X is defined nor remove the unnecessary line using X.
CCExtractor doesn't use a library fully, we use only the code and files necessary. This requires manual removal of extra lines and dependencies.
*) Output the compilation erros in a text document while compiling.
*) Use inspection and comparison with lib(copy) to decide whether the line causing error is to be removed.
Compile again, debug and push the change for the Continuous Integration tests on samples.
CCExtractor supports extracting VOBSUB (S_VOBSUB) subtitles from Matroska (MKV) containers. VOBSUB is an image-based subtitle format originally from DVD video.
## Overview
VOBSUB subtitles consist of two files:
-`.idx` - Index file containing metadata, palette, and timestamp/position entries
-`.sub` - Binary file containing the actual subtitle bitmap data in MPEG Program Stream format
## Basic Usage
```bash
ccextractor movie.mkv
```
This will extract all VOBSUB tracks and create paired `.idx` and `.sub` files:
-`movie_eng.idx` + `movie_eng.sub` (first English track)
-`movie_eng_1.idx` + `movie_eng_1.sub` (second English track, if present)
- etc.
## Converting VOBSUB to SRT (Text)
Since VOBSUB subtitles are images, you need OCR (Optical Character Recognition) to convert them to text-based formats like SRT.
### Using subtile-ocr (Recommended)
[subtile-ocr](https://github.com/gwen-lg/subtile-ocr) is an actively maintained Rust tool that provides accurate OCR conversion.
#### Option 1: Docker (Easiest)
We provide a Dockerfile that builds subtile-ocr with all dependencies:
```bash
# Build the Docker image (one-time)
cd tools/vobsubocr
docker build -t subtile-ocr .
# Extract VOBSUB from MKV
ccextractor movie.mkv
# Convert to SRT using OCR
docker run --rm -v $(pwd):/data subtile-ocr -l eng -o /data/movie_eng.srt /data/movie_eng.idx
```
#### Option 2: Install subtile-ocr Natively
If you have Rust and Tesseract development libraries installed:
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.