Document all changes since 0.96.2 including:
- VOBSUB subtitle extraction for MP4 and MKV files
- Native SCC input file support
- SCC output improvements (frame rate, styled PAC codes)
- Various bug fixes for timing, builds, and OCR
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add docs/VOBSUB.md explaining the VOBSUB extraction workflow
- Add tools/vobsubocr/Dockerfile for building subtile-ocr OCR tool
- Document how to convert VOBSUB (.idx/.sub) to SRT using OCR
The Dockerfile uses subtile-ocr (https://github.com/gwen-lg/subtile-ocr),
an actively maintained fork of vobsubocr with better accuracy.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Start new changelog section for unreleased changes. First entry is
the multi-page teletext extraction feature (#665) which allows
extracting multiple teletext pages simultaneously with separate
output files.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Move all changes made after the 0.95 version bump (commit ee232b5)
to a new 0.96 section marked as "Unreleased".
This separates the released 0.95 content from ongoing development
work that will be included in the next release.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Fixes#1173 - Error in ./configure enabling hardsubx on Mac
Fixes#1306 - Add HARDSUBX compilation docs for macOS
The configure.ac script failed on macOS with "binary operator expected"
because pkg-config output was unquoted. When pkg-config returns multiple
libraries (e.g., "-ltesseract -lcurl"), the unquoted expansion caused
`test ! -z` to receive multiple arguments instead of a single string.
Changes:
- Quote pkg-config output in TESSERACT_PRESENT conditional (mac & linux)
- Add macOS section to docs/HARDSUBX.txt with all build methods
- Add GitHub Actions jobs to test HARDSUBX builds on macOS:
- build_shell_hardsubx: Tests ./build.command -hardsubx
- build_autoconf_hardsubx: Tests ./configure --enable-hardsubx
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add a new --list-tracks (-L) option that lists all tracks found in
media files without processing them. This is useful for exploring
media files before caption extraction.
Supports:
- Matroska (MKV/WebM) files
- MP4/MOV files
- MPEG Transport Stream files
The feature is implemented entirely in Rust with native parsers for
each format, avoiding dependency on external libraries.
Closes#1669🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add changelog entries for recent merged PRs:
- Fix: Garbled captions from HDHomeRun and I/P-only H.264 streams (#1109)
- Fix: Enable stdout output for CEA-708 captions on Windows (#1693)
- Fix: McPoodle DVD raw format read/write (#1524)
- Fix: Variable shadowing in general_loop
- Fix: Double-free crash in teletext cleanup
- Fix: Uninitialized memory and memory leaks (Valgrind)
- Fix: Dangling pointers in Rust FFI
- New: Teletext subtitle pages in -out=report (#1034)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* fix(dvb): Multiple fixes for DVB subtitle extraction from Chinese broadcasts (#224)
This commit addresses multiple issues with DVB subtitle extraction reported in #224:
1. **PMT parsing crash fix** (ts_tables.c):
- Added minimum length check (16 bytes) to prevent out-of-bounds access
- Added bounds check before memcpy to prevent buffer overflow when section > 1021 bytes
2. **Negative subtitle timing fix** (general_loop.c):
- For DVB subtitle streams, properly initialize min_pts from audio/subtitle PTS
- This fixes the issue where all timestamps were negative (~95000 seconds off)
3. **OCR improvements** (ocr.c):
- Fixed ignore_alpha_at_edge() which could create invalid crop windows
- Added image inversion for DVB subtitles (light text on dark background)
to improve Tesseract OCR accuracy
- Added contrast normalization to further improve character recognition
- Fixed nofontcolor check to respect --no-fontcolor parameter
- Added iteration safety limit in color detection loop
4. **--ocrlang parameter fix** (Rust files):
- Changed ocrlang from Language enum to String to accept Tesseract language
names directly (e.g., "chi_tra", "chi_sim", "eng")
- Added case-insensitive matching for --dvblang parameter
- Added better error messages for invalid language codes
Tested with 12GB Chinese DVB broadcast file:
- Timing: All timestamps now positive (0.235s, 2.594s, etc.)
- OCR: ~80-90% accuracy with chi_tra traineddata (improved from ~70%)
- No crashes during full file processing
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* fix(ocr): Fix crashes in DVB subtitle color detection
Two issues fixed in the OCR color detection code:
1. Tesseract crash during iteration:
- The color detection pass used raw color images without preprocessing
- Tesseract expects dark text on light background, but DVB subtitles
have light text on dark background
- Added grayscale conversion, inversion, and contrast enhancement
(same preprocessing as the main OCR pass)
2. Heap corruption in histogram calculation:
- The histogram loop had no bounds checking on array accesses
- Tesseract could return invalid bounding boxes causing buffer overflows
- Added validation of bounding box coordinates before processing
- Added safe index checking for copy->data and histogram arrays
Also added skip_color_detection label for clean error handling and
proper cleanup of the preprocessed image.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* fix(dvb): Fix zero-duration subtitles and overlaps during PTS jumps
Add start_pts field to cc_subtitle struct to track raw PTS values
independent of FTS timeline resets. Modify end_time calculation in
dvbsub_handle_display_segment() to cap duration at 4 seconds when
PTS jumps cause timeline discontinuities, preventing zero-duration
and overlapping subtitles.
Also update .gitignore to exclude plans/ directory and temp files.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Apply clang-format to all C/H files in src/
- Apply cargo fmt to Rust code
- Update Cargo.lock with latest compatible dependency versions
- Add 24 new entries to CHANGES.TXT for recent fixes and features
Changes in CHANGES.TXT cover:
- CEA-708 bounds checks and UTF-16BE encoding fixes
- New --ttxtforcelatin option for Teletext
- TS files without PAT/PMT fallback support
- Timing accuracy improvements across MP4/MPEG/TS
- Memory safety improvements (null checks, buffer overruns)
- Multi-file processing fixes
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
* feat: added demuxer module
* Cargo Lock Update
* Completed file_functions and demuxer
* Completed file_functions and demuxer
* written extern functions for demuxer
* Removed libc completely, added tests for gxf and ported gxf to C
* Hardsubx error fixed
* Fixing format issues
* clippy errors fixed
* fixing format issues
* fixing format issues
* Windows failing tests
* Windows failing tests
* demuxer: added demuxer data transfer functions and removed some structs
* made Demuxer and File Functions
* Minor formatting changes
* Minor Rebasing changes
* demuxer: format rust and unit test rust checks
* C formatting
* Windows Failing test
* Windows Failing test
* Update CHANGES.TXT
* Update CHANGES.TXT
* Windows Failing Tests
* Windows Failing Tests
* Problem in Copy to Rust and some typos that copilot review suggested
* Minor Formatting Error
* Windows Failing Regressions
* Windows Failing Regressions
* Minor Comment Change
* Data transfer module for DemuxerData added and more rustlike syntax to ctorust.rs
* Minor Formatting Changes
* demuxer: Rebase and a few tweaks to file_functions
* demuxer: Minor Formatting Error
* [FIX] 134 Codes in XDS and General Tests (#1708)
* Made pointers valid in Unit Tests of Decoder
* fix: test_do_cb
* Copilot Suggestions
* Suggestions about Redundancy
* Suggestions about Redundancy
* [FEAT] Add `bitstream` module in `lib_ccxr` (#1649)
* feat: Add bitstream module
* run code formatters
* Run cargo clippy --fix
* Run cargo fmt --all
* refactor: remove rust pointer from C struct
* feat: Add bitstream module
* run code formatters
* Run cargo clippy --fix
* Run cargo fmt --all
* refactor: remove rust pointer from C struct
* Added Bitstream to libccxr_exports
* Minor Formatting Issue
* Bitstream: Removed redundant CType
* bitstream: recommended changes for is_byte_aligned
* bitstream: recommended changes for long comments
* bitstream: comment fix
* bitstream: removed redundant comparism comments
---------
Co-authored-by: Deepnarayan Sett <depnra1@gmail.com>
Co-authored-by: Deepnarayan Sett <71217129+steel-bucket@users.noreply.github.com>
* demuxer: minor formatting changes
* Demuxer: Changes to mistakes in CHANGES.txt
* Demuxer: Removed extra newline in ccextractor.c
* Demuxer: Changes to Encoding resolved
* Demuxer: Moved CCX_NOPTS to common structs and some changes to Demuxer Data regd. MPEG_CLOCK_FREQ
* some refactoring to CCX_NOPTS
* Demuxer: Minor Mistake regarding CHANGES.txt
* Demuxer: Unit test rust failing because of CCX_NOPTS
* Demuxer: changed common_structs to common_types
* Demuxer: Removed redundant libraries from Cargo.toml and moved tempfile to dev-dependencies
* Demuxer: Removed to_vec function and renamed PSIBuffer/PMTEntry from_ctype functions
* Demuxer: Renamed Stream_Type, improved Time complexity of the default() function and removed redundant comments
* Demuxer: Removed two repeated code blocks and removed redundant comments
* Demuxer: Removed two code blocks
* Demuxer: Review Changes
* Demuxer: Removed redundant tests
* Update src/rust/src/demuxer/demux.rs
Co-authored-by: Prateek Sunal <prtksunal@gmail.com>
* Demuxer: Errors due to Rebase
* Demuxer: Removed get_stream_mode
* Demuxer: Errors due to rebasing and removing redundant CType Functions
* Demuxer: Failing ES regressions
* Demuxer: MythTV failing regression
* Demuxer: Removed redundant comments
* Demuxer: Unplugged ES for now
* Demuxer: Replugged in ES
* Demuxer: Formatting error
* Demuxer: Windows failing CI
* Demuxer: Windows failing CI
* Demuxer: Windows failing Regressions
* Demuxer: Formatting
* Demuxer: Minor Cargo Clippy change
* Demuxer: running regressions again
* Demuxer: Cargo Lockfile Change
* Demuxer: running regressions again
* Demuxer: running regressions again
---------
Co-authored-by: Swastik Patel <swastikpatel29@gmail.com>
Co-authored-by: Prateek Sunal <prtksunal@gmail.com>
* Removal: Removed redundant C code already ported to Rust
* Removal: C formatting
* Removal: More Removal and CI issues in Mac
* Removal: CI issues in Mac
* Removal: Changes due to Rebase
* Removal: Failing CI on mac
* Removal: Failing regression test on dvdraw
* Fix hardsubx_decoder.c compilation with ENABLE_FFMPEG
Fix unresolved function reference when compiling with ENABLE_FFMPEG
* Fix regression compilation ffmpeg_intgr.c to support ffmpeg 5
Fix regression bug for compiling with ENABLE_FFMPEG and ffmpeg 5, introduced in https://github.com/CCExtractor/ccextractor/issues/1418
* Update CHANGES.TXT
* Update ffmpeg_intgr.c
Update for changes to FFMPEG 5 API
* [FIX] Corrected bitness check for 64-bit systems
* Improve Dockerfile: cleanup, parallel build, and remove redundancies
- Replaced cd with WORKDIR for clarity and Docker best practices.
- Removed unused LIB_CLANG_PATH export, as it only affected a single build layer; the library is automatically detected during build.
- Parallelized the GPAC build using make -j$(nproc).
- Removed redundant CMD instruction, as ENTRYPOINT already defines the container's execution command.
* [DOCS] Update CHANGES.TXT for Dockerfile improvements
---------
Co-authored-by: AhmedYasserrr <ahmdyasrj@gamil.com>
* Fix implicit declaration error on some systems.
This commit fixes a compile-time error regarding an implicit declaration
of mapclut_paletee() on some compilers and compiler versions. Notably,
Arch Linux and Ubuntu 24.10 seem to be affected.
The error resolved is:
```
../src/lib_ccx/ocr.c: In function 'ocr_rect':
../src/lib_ccx/ocr.c:922:9: error: implicit declaration of function 'mapclut_paletee' [-Wimplicit-function-declaration]
922 | mapclut_paletee(palette, alpha, (uint32_t *)rect->data1, rect->nb_colors);
| ^~~~~~~~~~~~~~~
```
This was resolved by `#include`-ing "ccx_encoders_spupng.h" in the file
src/lib_ccx/ocr.c. Thanks to GitHub user @steel-bucket for sharing the
fix in this issue's comments.
Fixes: #1646
* Update CHANGES.TXT.
Mention the fix for #1646.
Fixes: #1646
* Add flag for Page Segmentation Modes control
I added an flag --psm for controlling PSM (Page Segmentation Modes) in Tesseract. The default option (3) gives me quite bad results. When I use 6, 11, or 12 for Bulgarian, it gives me much better OCR results. I haven't tested other languages yet, but I expect improvements as well if other mode is used.
* feat: add psm for rust parser
* fix: add psm to options
* fix: add default value of psm to 3
* fix: correct type of ocr oem
* fix(rust): use fatal! instead of exit
---------
Co-authored-by: Prateek Sunal <prtksunal@gmail.com>
* feat: Add new function to allocate any object to heap with zero allocated
* feat: Add unit tests for `decoder/commands.rs`
* docs: Mention about PR in changelogs
* feat: Add unit tests for `decoder/windows.rs`
Refactor the code and use Default where needed
Implement `PartialEq` also
* fix: Intialise tmp extern C values for easy mocking
* feat: Add unit tests for `decoder/timing.rs`
* feat: Add unit tests for `decoder/output.rs`
* feat: Add unit tests for `decoder/mod.rs`
* feat: Add unit tests for `decoder/tv_screen.rs`
* feat: Add unit tests for `lib.rs`
* fix: Failing test
* feat: [WIP] Add unit tests for `decoder/service_decoder.rs`
* feat: Add unit tests for `decoder/service_decoder.rs`
* feat: Add unit tests for `hardsubx/imgops.rs`
* feat: Add unit tests for `hardsubx/utility.rs`
* fix: cargo clippy
* fix: doctest for `lib_ccxr` module
* feat: Add test `lib_ccxr/util/mod.rs`
* feat: Add test `lib_ccxr/util/levenshtein.rs`
* feat: Add test `lib_ccxr/util/bits.rs`
* feat: Add test `lib_ccxr/time/units.rs`
* chore: Change function name
* fix: Failing of missing values `tlt_config`
* ci: Run unit test cases in `lib_ccxr` module also
* ci: Run clippy & fmt in `lib_ccxr` module also
* chore(clippy): Fix clippy warnings