mirror of
https://github.com/CCExtractor/ccextractor.git
synced 2026-04-05 21:51:23 +00:00
* [feat] Allow output \0 terminated frames * Fix rust `FromCType` * use encoded_end_frame for text-based captions * add changelog entry * fix CEA-708 Rust decoder * fix Rust formating * remove unused `crlf` field - satisfy clippy function argument limit * silence clippy function argument limit in `Writer` * Fix writing frame end with multiline captions * fix formatting errors
1383 lines
60 KiB
Plaintext
1383 lines
60 KiB
Plaintext
0.96.7 (unreleased)
|
||
-------------------
|
||
- New: Allow output \0 terminated frames via --null-terminated
|
||
- New: Added ASS/SSA \pos-based positioning for CEA-608 captions when layout is simple (1–2 rows) (#1726)
|
||
- Fix: Remove strdup() memory leaks in WebVTT styling encoder, fix invalid CSS rgba(0,256,0) green value, fix missing free(unescaped) on write-error path (#2154)
|
||
- Fix: Prevent crash in Rust timing module when logging out-of-range PTS/FTS timestamps from malformed streams.
|
||
- Fix: Resolve Windows MSVC debug build crash caused by cross-CRT invalid free on Rust-allocated output_filename (#2126)
|
||
- Fix: Use dynamic current_fps instead of hardcoded 29.97 in CEA-708 SCC frame delay calculations (#2172)
|
||
|
||
0.96.6 (2026-02-19)
|
||
-------------------
|
||
- Fix: Prevent startup OOM on 32-bit (x86) Windows by lazily allocating CEA-708 service decoders and adding proper alloc error handling.
|
||
- New: Add 23.98fps as valid --scc-framerate value; clarify help text covers input and output (#2147)
|
||
- FIX: spupng start numbering at sub0000, avoid index advance on empty EOD, normalize header filename
|
||
- New: 32-bit (x86) Windows build and installer (#2116)
|
||
- New: Add optional machine-readable JSON output for -out=report via --report-format json
|
||
- New: Add Snap packaging support with Snapcraft configuration and GitHub Actions CI workflow
|
||
- New: Implement dictionary-based capitalization and censorship for transcripts
|
||
- Fix: Massive OCR memory leak — Tesseract instance was recreated and leaked every DVB subtitle
|
||
frame, causing ~28 GB peak memory on a 2-hour film (#2114)
|
||
- Fix: SPUPNG subtitle offset calculation for EIA-608/teletext (#893)
|
||
- Fix: Empty WebVTT files now include required header for HLS compatibility (#1743)
|
||
- Fix: macOS hardsubx (burned-in subtitle extraction) on Apple Silicon
|
||
- Fix: Teletext decoder crash on malformed BCD data (#1990)
|
||
- Fix: Crash in report-only mode (-out=report) with AVC streams
|
||
- Fix: Incorrect strlen argument when writing end timestamps in MKV subtitle extraction (WebVTT, SRT, ASS/SSA)
|
||
- Fix: File descriptor leak and missing open() error check in MKV subtitle track saving
|
||
- Fix: DVB EIT start time BCD decoding in XMLTV output causing invalid timestamps (#1835)
|
||
- Fix: DVB subtitle duration capped to prevent 65-second page timeout display bug
|
||
- Fix: Clear status line output on Linux/WSL to prevent text artifacts (#2017)
|
||
- Fix: Prevent infinite loop on truncated MKV files
|
||
- Fix: Correct progress time display for multi-program Transport Streams
|
||
- Fix: Delete empty output files instead of leaving 0-byte files (#1282)
|
||
- Fix: --mkvlang now supports BCP 47 language tags (e.g., en-US, zh-Hans-CN) and multiple codes
|
||
- Fix: Segmentation fault when using --multiprogram
|
||
- Fix: Dangling pointers in decoder context copy causing potential crashes on cleanup
|
||
- Fix: 16 MB memory leak per file in Rust/C FFI demuxer layer
|
||
- Fix: Spurious numbers printed to console during processing
|
||
- Fix: Heap overflow in Transport Stream PAT/PMT parsing (security fix)
|
||
- Fix: Various memory safety and stability fixes in demuxers (MP4, PS, MKV, DVB)
|
||
- Fix: Configuration file parser bugs — heap buffer overflow on long lines, broken EOF
|
||
detection due to incorrect fgetc() return type, and last line dropped if file lacks
|
||
trailing newline
|
||
- Fix: Typos in configuration keys: FIX_PADDINDG → FIX_PADDING,
|
||
INVASTIGATE_PACKET → INVESTIGATE_PACKET
|
||
|
||
0.96.5 (2026-01-05)
|
||
-------------------
|
||
- New: CCExtractor is available again via Homebrew on macOS and Linux.
|
||
- New: Add support for raw CDP (Caption Distribution Packet) files (#1406)
|
||
- New: Add --scc-accurate-timing option for bandwidth-aware SCC output (#1120)
|
||
- Fix: MXF files containing CEA-708 captions not being detected/extracted (#1647)
|
||
- Docs: Add Windows WSL build instructions
|
||
- Fix: Security fixes (out-of-bounds read/write) in a few places in the legacy C code.
|
||
|
||
0.96.4 (2026-01-01)
|
||
-------------------
|
||
- New: Persistent CEA-708 decoder context - maintains state across multiple calls for proper subtitle continuity
|
||
- New: OCR character blacklist options (--ocr-blacklist, --ocr-blacklist-file) for improved accuracy
|
||
- New: OCR line-split option (--ocr-splitontimechange) for better subtitle segmentation
|
||
- Fix: 32-bit build failures on i686 and armv7l architectures
|
||
- Fix: Legacy command-line argument compatibility (-1, -2, -12, --sc, --svc)
|
||
- Fix: Prevent heap buffer overflow in Teletext processing (security fix)
|
||
- Fix: Prevent integer overflow leading to heap buffer overflow in Transport Stream handling (security fix)
|
||
- Fix: Lazy OCR initialization - only initialize when first DVB subtitle is encountered
|
||
- Build: Optimized Windows CI workflow for faster builds
|
||
- Fix: Updated GUI with version 0.7.1. A blind attempt to fix a hang on start on some Windows.
|
||
|
||
0.96.3 (2025-12-29)
|
||
-------------------
|
||
- New: VOBSUB subtitle extraction with OCR support for MP4 files
|
||
- New: VOBSUB subtitle extraction support for MKV/Matroska files
|
||
- New: Native SCC (Scenarist Closed Caption) input file support - CCExtractor can now read SCC files
|
||
- New: Configurable frame rate (--scc-framerate) and styled PAC codes for SCC output
|
||
- Fix: Apply --delay option to DVB/bitmap subtitles (previously only worked with text-based subtitles)
|
||
- Fix: 200ms timing offset in MOV/MP4 caption extraction
|
||
- Fix: utf8proc include path for system library builds
|
||
- Fix: Use fixed-width integer types in MP4 bswap functions for better portability
|
||
- Fix: Guard ocr_text access with ENABLE_OCR preprocessor check
|
||
- Fix: Preserve FFmpeg libs when building with -system-libs -hardsubx
|
||
- Build: Add vobsub_decoder to Windows and autoconf build systems
|
||
- Build: Add winget and Chocolatey packaging workflows for Windows distribution
|
||
- Docs: Add VOBSUB extraction documentation and subtile-ocr Dockerfile
|
||
|
||
0.96.2 (2025-12-26)
|
||
-------------------
|
||
- Fix: Resolve utf8proc header include path when building against system libraries on Linux.
|
||
- Rebundle Windows version to include required runtime files to process hardcoded subtitles
|
||
(hardcodex mode).
|
||
- New: Add optional -system-libs flag to Linux build script for package manager compatibility
|
||
|
||
0.96.1 (2025-12-25)
|
||
-------------------
|
||
- Rebundle Windows version to include an updated GUI. No changes in CCExtractor itself.
|
||
|
||
0.96 (2025-12-23)
|
||
-----------------
|
||
- New: Multi-page teletext extraction support (#665)
|
||
- Extract multiple teletext pages simultaneously with separate output files
|
||
- Use --tpage multiple times (e.g., --tpage 100 --tpage 200)
|
||
- Output files are named with page suffix (e.g., output_p100.srt, output_p200.srt)
|
||
- Fix: SPUPNG subtitle offset calculation to center based on actual image dimensions
|
||
|
||
- New: Added --list-tracks (-L) option to list all tracks in media files without processing
|
||
New: Chinese, Korean, Japanese support - proper encoding and OCR.
|
||
New: Correct McPoodle DVD raw format support
|
||
Fix: Timing is now frame perfect (using FFMpeg timing dump as reference) in all formats.
|
||
Fix: Solved garbling in all the pending issues we had on GitHub.
|
||
Fix: All causes of "premature end of file" messages due to bugs and not actual file cuts.
|
||
Fix: All memory leaks, double frees and usual C nastyness that valgrind could find.
|
||
- Fix Include ATSC VCT virtual channel numbers and call signs in XMLTV output
|
||
- Fix: Restore ATSC XMLTV generation with ETT parsing for extended descriptions, multi-segment handling, extended table ID's (EIT/VCT), corrected <programme> XMLTV formatting, buffer bounds fixes
|
||
- Fix: Add HEVC/H.265 stream type recognition to prevent crashes on ATSC 3.0 streams.
|
||
Fix: Tolerance to damaged streams - recover where possible instead of terminating.
|
||
Issues closed: Over 40! Too many to list here, but each of them was either a bug squashed or a feature implemented.
|
||
|
||
0.95 (2025-09-15 - never formally packaged)
|
||
-----------------
|
||
- New: Create a Docker image to simplify the CCExtractor usage without any environmental hustle (#1611)
|
||
- New: Add SCC support for CEA-708 decoder (#1595)
|
||
Refactor: Lots of code ported to Rust.
|
||
- Fix: Improved handling of IETF language tags in Matroska files (#1665)
|
||
- Breaking: Major argument flags revamp for CCExtractor (#1564 & #1619)
|
||
- Fix: segmentation fault in using hardsubx
|
||
- Fix: WebVTT X-TIMESTAMP-MAP placement (#1463)
|
||
- Fix: ffmpeg 5.0, tesseract 5.0 compatibility and remove deprecated methods
|
||
- Fix: tesseract 5.x traineddata location in ocr
|
||
- Improvement: Ignore MXF Caption Essence Container version byte to enhance SRT subtitle extraction compatibility
|
||
- New: Add tesseract page segmentation modes control with `--psm` flag
|
||
- Fix: Support for MINGW-w64 cross compiling
|
||
|
||
0.94 (2021-12-14)
|
||
-----------------
|
||
- BOM is no longer enabled by default on windows platforms
|
||
- CEA-708: Rust decoder is now default instead of C decoder
|
||
- CEA-708 subs are now extracted by default
|
||
- New: Add check for Minimum supported rust version (MSRV) (#1387)
|
||
- Fix: Fix CEA-708 Carriage Return command implementation
|
||
- Fix: Fix bug with startat/endat parameter (#1396)
|
||
- Fix: Mac Build processes (#1390)
|
||
- Fix: Fix bug with negative delay parameter (#1365)
|
||
|
||
0.93 (2021-08-16)
|
||
-----------------
|
||
- Minor Rust updates (format, typos, docs)
|
||
- Updated GUI
|
||
|
||
0.92 (2021-08-10)
|
||
-----------------
|
||
- Rust updates: Added srt writer
|
||
- Rust updates:-Added writers for transcripts and SAMI
|
||
- Added missing DLL to Windows installer
|
||
- Updated Windows GUI
|
||
|
||
0.91 (2021-07-26)
|
||
-----------------
|
||
- More Rust in the 708 decoder (Add Pen Presets and timing functions)
|
||
- Updated GUI
|
||
|
||
0.90 (2021-07-14)
|
||
-----------------
|
||
- New installer (WiX based)
|
||
- New GUI (flutter based)
|
||
- More Rust (the 708 decoder is being rewritten)
|
||
|
||
0.89 (2021-06-13)
|
||
-----------------
|
||
- Fix: Fix broken links in README
|
||
- Fix: Timing in DVB, sub duration check for timeout.
|
||
- New: Added support for SCC and CCD encoder formats
|
||
- New: Added support to output captions to MCC file (#733).
|
||
- New: Add support for censoring words ("Kid Friendly") (#1139)
|
||
- New: Extend support of capitalization for all BITMAP and 608 subtitles (#1214)
|
||
- New: Added an option to disable timestamps for WebVTT (In response to issue #1127)
|
||
- Fix: Change inet_ntop to inet_ntoa for Windows XP compatibility
|
||
- Fix: Added italics, underline, and color rendering support for -out=spupng with EIA608/teletext
|
||
- Fix: ccx_demuxer_mxf.c: Parse framerate from MXF captions to fix caption timings.
|
||
- Fix: hardsubx_decoder.c: Fix memory leaks using Leptonica API.
|
||
- Fix: linux/Makefile.am: added some sources to enable rpms to be created.
|
||
- Fix: Crash when using -sc (sentence case) option (#1115)
|
||
- Fix: Segmentation fault on VOB #1128
|
||
- Fix: Hang while processing video #1121
|
||
- Fix: lib_ccx.c: Initialize fatal error logging function before first usage in init_libraries
|
||
- Fix: A few (minor) memory leaks around the code.
|
||
- Fix: General code clean up / reformatting
|
||
- Fix: Fix multiple definitions with new -fno-common default in GCC 10
|
||
- Fix: Mac now builds reproducibly again without errors on the date command (#1230)
|
||
- Fix: Allow all oem modes with tesseract v4 (#1264)
|
||
- Doc: Updated ccextractor.cnf.sample.
|
||
- Update: Updated LibPNG to 1.6.37
|
||
- Remove: Python API (since no one cares about it and it's unmaintained)
|
||
- Remove: -cf , just use FFmpeg if you want a ES from a TS or PS, CCExtractor is a bad tool
|
||
for this.
|
||
- Fix: Segmentation fault on Windows
|
||
- Update: Updated libGPAC to 1.0.1
|
||
- Fix: Segmentation fault with unsupported and multitrack file reports
|
||
- Fix: Write subtitle header to multitrack outputs
|
||
- Fix: Write multitrack files to the output file directory
|
||
- Fix: Correct frame number calculation in SCC (#1340)
|
||
- Fix: Regression on Teletext that caused dates to be wrong (RT 78 on the sample platform)
|
||
- Fix: CEA-708: Better timing, fixes for missing subtitles
|
||
- Fix: timing for direct rollup
|
||
- Fix: timing for VOB files with multiple chapters
|
||
|
||
0.88 (2019-05-21)
|
||
-----------------
|
||
- New: More tapping points for debug image in ccextractor.
|
||
- New: Add support for tesseract 4.0
|
||
- Optimize: Remove multiple RGB to grey conversion in OCR.
|
||
- Fix: Update UTF8Proc to 2.2.0
|
||
- Fix: Update LibPNG to 1.6.35
|
||
- Fix: Update Protobuf-c to 1.3.1
|
||
- Fix: Warn instead of fatal when a 0xFF marker is missing
|
||
- Fix: Segfault in general_loop.c due to null pointer dereference (case of no encoder)
|
||
- Fix: Enable printing hdtv stats to console.
|
||
- Fix: Many typos in comments and output messages
|
||
- Fix: Ignore Visual Studio temporary project files
|
||
- New: Add support for non-Latin characters in stdout
|
||
- Fix: Check whether stream is empty
|
||
- New: Add support for EIA-608 inside .mkv
|
||
- New: Add support for DVB inside .mkv
|
||
- Fix: Added -latrusmap Map Latin symbols to Cyrillic ones in special cases
|
||
of Russian Teletext files (issue #1086)
|
||
- Fix: Several OCR crashes
|
||
|
||
0.87 (2018-10-23)
|
||
-----------------
|
||
- New: Upgrade libGPAC to 0.7.1.
|
||
- New: mp4 tx3g & multitrack subtitles.
|
||
- New: Guide to update dependencies (docs/Updating_Dependencies.txt).
|
||
- New: Add LICENSE File (#959).
|
||
- New: Display quantisation mode in info box (#954).
|
||
- New: Add instruction required to build ccextractor with HARDSUBX support (#946).
|
||
- New: Added version no. of libraries to --version.
|
||
- New: Added -quant (OCR quantization function).
|
||
- New: Python API now compatible with Python 3.
|
||
- Fix: linux/builddebug: Added non-local directories to the incluye search path so we don't
|
||
require a locally compiled tesseract or leptonica.
|
||
- Fix: Correct -HARDSUBX Bug In CMake, allow build with hardsubx using cmake (#966).
|
||
- Fix: possible segfaults in hardsubx_classifier.c due to strdup (#963).
|
||
- Fix: Improve the start and end timestamps of extracted burned in captions (#962).
|
||
- Fix: Update COMPILATION.md (#960).
|
||
- Fix: Fixed crash with "-out=report" and "-out=null".
|
||
- Fix: -nocf not working with OCR'ing (#958).
|
||
- Fix: segfault in add_cc_sub_text and initialize to NULL in init_encoder (#950).
|
||
- Fix: ccx_decoders_common.c: Copy data type when creating a copy of the subtitle structure.
|
||
- Fix: Implicit declaration of these functions throws warning during build (#948).
|
||
- Fix: ccx_decoders_common.c: Properly release allocated resources on free_subtitle().
|
||
- Fix: Added a datatype member to struct cc_subtitle - needed so we can properly free all
|
||
memory when void *data points to a structure that has its own pointers.
|
||
- Fix: dvb_subtitle_decoder.c: When combining image regions verify that the offset is
|
||
never negative.
|
||
- Fix: Updated traivis.yml to fix osx build (#947).
|
||
- Fix: Add utf8proc src file to cmake, updated header file (#944).
|
||
- Fix: Added required pointers on freep() calls.
|
||
- Fix: Removed dvb_debug_traces_to_stdout and used the usual dbg_print instead.
|
||
- Fix: Additional debug traces for DVB.
|
||
- Fix: Fix minor memory leak in ocr.c.
|
||
- Fix: Fix issue with displaying utf8proc version.
|
||
- Fix: Fix failing cmake due to liblept/tesseract header files.
|
||
- Fix: Added missing \n in params.c.
|
||
- Fix: builddebug: Use -fsanitize=address -fno-omit-frame-pointer.
|
||
- Fix: ccx_decoders_common.c: Removed trivial memory leak.
|
||
- Fix: ccx_encoders_srt.c: Made sure a pointer is non-NULL before dereferencing.
|
||
- Fix: dvb_subtitle_decoder.c: Initialize pointer members to NULL when creating a structure.
|
||
- Fix: lib_ccx.c: Initialize (memset 0) structure cc_subtitle after memory allocation.
|
||
- Fix: Added verboseness to error/warnings in dvb_subtitle_decoder.c.
|
||
- Fix: dvb_subtitle_decoder.c: Work on passing invalid streams errors upstream (plus some
|
||
warning messages) so we can eventually recover from this situation instead of crashing.
|
||
- Fix: telxcc.c: Currently setting a colour doesn't necessarily add a space even though the
|
||
specifications mandate it. (#930).
|
||
- Fix: dvb_subtitle_decoder.c: Fix null pointer derefence when region==NULL in write_dvb_sub.
|
||
- Fix: DVB Teletext subtitle incomplete.
|
||
- Fix: replace all 0xA characters within startbox with 0x20.
|
||
- Fix: DVB Teletext subtitle incomplete (#922).
|
||
- Fix: Add missing return value to one of the returns in process_tx3g().
|
||
- Fix: Typos and other minor bugs.
|
||
- Fix: Tidy CMakeLists & vcxproj (#920).
|
||
- Fix: Added m2ts and -mxf to help screen.
|
||
- Fix: Added MKV to demuxer_print_cfg.
|
||
- Fix: Added MXF to demuxer_print_cfg.
|
||
- Fix: "Out of order packets" error had wrong print() parameters.
|
||
- Fix: Updated Python documentation.
|
||
- Fix: Fix incorrect path in XML (#904).
|
||
- Fix: linux build script (non-debug): Don't hide warnings from compiler.
|
||
- Fix: linux build script (debug): Display what's step of the build script we're in.
|
||
- Fix: Make the build reproducible (#976).
|
||
- Fix: Remove instance of o1 and o2 from help.
|
||
- Fix: Colors of DVB subtitles with depth 2 broken due to a missing break.
|
||
- Fix: CEA-708: Caption loss due to CW command (#991).
|
||
- Fix: CEA-708: Update patch for windows priority with functions (#990).
|
||
|
||
0.86 (2018-01-09)
|
||
-----------------
|
||
- New: Preliminary MXF support
|
||
- New: Added a histogram in one-minute increments of the number of lines in a subtitle.
|
||
- New: Added Autoconf build scripts for CCExtractor to generate makefiles (mac).
|
||
- New: Added Autoconf build scripts for CCExtractor to generate makefiles (linux).
|
||
- New: Added .rpm package generation script.
|
||
- New: Added build/installation script for .pkg.tar.xz (Arch Linux).
|
||
- New: Added tarball generation script.
|
||
- New: Added --analyzevideo. If present the video stream will be processed even if the
|
||
subtitles are in a different stream. This is useful when we want video information
|
||
(resolution, frame type, etc). -vides now implies this option too.
|
||
[Note: Tentative - some possibly breaking changed were made for this, so if you
|
||
use it validate results]
|
||
- New: Added a GUI in the main CCExtractor binary (separate from the external GUIs
|
||
such as CCExtractorGUI).
|
||
- New: A Python binding extension so it's possible to use CCExtractor's tools from
|
||
Python.
|
||
- New: Added -nospupngocr (don't OCR bitmaps when generating spupng, faster)
|
||
- New: Add support for file split on keyframe (-segmentonkeyonly)
|
||
- New: Added WebVTT output from Matroska.
|
||
- New: Support for source-specific multicast.
|
||
- New: FreeType-based text renderer (-out=spupng with teletext/EIA608).
|
||
- New: Upgrade library UTF8proc
|
||
- New: Upgrade library win_iconv
|
||
- New: Upgrade library zlib
|
||
- New: Upgrade library LibPNG
|
||
- New: Support for Source-Specific Multicast
|
||
- New: Added Travis CI support
|
||
- New: Made error messages clearer, less ambiguous
|
||
- Fix: Prevent the OCR being initialized more than once (happened on multiprogram and
|
||
PAT changes)
|
||
- Fix: Makefiles, build scripts, etc... everything updated and corrected for all
|
||
platforms.
|
||
-Fix: Proper line ending for .srt files from bitmaps.
|
||
- Fix: OCR corrections using grayscale before extracting texts.
|
||
- Fix: End timestamps in transcripts from DVB.
|
||
- Fix: Forcing -noru to cause deduplication in ISDB
|
||
- Fix: TS: Skip NULL packets
|
||
- Fix: When NAL decoding fails, don't dump the whole decoded thing, limit to 160 bytes.
|
||
- Fix: Modify Autoconf scripts to generate tarball for mac from `/package_creators/tarball.sh`
|
||
and include GUI files in tarball
|
||
- Fix: Started work on libGPAC upgrade.
|
||
- Fix: DVB subtitle not extracted if there's no display segment
|
||
- Fix: Heap corruption in add_ocrtext2str
|
||
- Fix: bug that caused -out=spupng sometimes crashes
|
||
- Fix: Checks for text before newlines on DVB subtitles
|
||
- Fix: OCR issue caused by separated dvb subtitle regions
|
||
- Fix: DVB crash on specific condition (!rect->ocr_text)
|
||
- Fix: DVB bug (Multiple-line subtitle; Missing last line)
|
||
- Fix: --sentencecap for teletext samples
|
||
- Fix: Crash when image passed into OCR is empty
|
||
- Fix: Temporarily wrapped the Python API, not production ready yet
|
||
- Fix: -delay option in DVB
|
||
|
||
|
||
0.85b (2017-01-26)
|
||
------------------
|
||
- Fix: Base Windows binary (without OCR) compiled without DLL dependencies.
|
||
|
||
0.85 (2017-01-23)
|
||
-----------------
|
||
- New: Added FFMPEG 3.0 to Windows build - last one that is XP compatible.
|
||
- New: Major improvements in CEA-608 to WebVTT (styles, etc).
|
||
- New: Return a non-zero return code if no subtitles are found.
|
||
- New: Windows build files updated to Visual Studio 2015, new target platform is 140_xp.
|
||
- New: Added basic support of Tesseract 4.0.0.
|
||
- New: Added build script for .deb.
|
||
- New: Updated -debugdvbsub parameter to get the most relevant DVB traces for debugging.
|
||
- New: SMPTE-TT files are now compatible with Adobe Premiere.
|
||
- New: Updated libpng.
|
||
- New: Added 3rd party (Tracy from archive.org) static linux build script.
|
||
- New: Add chapter extraction for MP4 files.
|
||
- New: Return code 10 if no captions are found at all.
|
||
- Fix: Teletext duplicate lines in certain cases.
|
||
- Fix: Improved teletext timing.
|
||
- Fix: DVB timing is finally good.
|
||
- Fix: A few minor memory leaks.
|
||
- Fix: tesseract library file included in mac build command.
|
||
- Fix: Bad WTV timings in some cases.
|
||
- Fix: Mac build script.
|
||
- Fix: Memory optimization in HARDSUBX edit_distance.
|
||
- Fix: SubStation Alpha subtitles in bitmap.
|
||
- Fix: lept msg severity in linux.
|
||
- Fix: SSA, SPUPNG and VTT timing and skipping of subtitles for SAMI and TTML.
|
||
- Fix: SMPTE-TT : Added support for font color.
|
||
- Fix: SAMI unnecessary empty subtitle when extracting DVB subs.
|
||
- Fix: Skip the packet if the adaptation field length is broken.
|
||
- Fix: 708 - lots of work done in the decoder. Implementation of more commands. Better timing.
|
||
|
||
|
||
|
||
0.84 (2016-12-16)
|
||
-----------------
|
||
- New: In Windows, both with and without-OCR binaries are bundled, since the OCR one causes problems due to
|
||
dependencies in some system. So unless you need the OCR just use the non-OCR version.
|
||
- New: Added -sbs (sentence by sentence) for DVB output. Each frame in the output file contains a complete
|
||
sentence (experimental).
|
||
- New: Added -curlposturl. If used each output frame will be sent with libcurl by doing a POST to that URL.
|
||
- Fix: More code consistency checking in function names.
|
||
- Fix: linux build script now tries to verify dependencies.
|
||
- Fix: Mac build script was missing a directory.
|
||
|
||
|
||
0.83 (2016-12-13)
|
||
-----------------
|
||
- Fix: Duplicate lines in mp4 (specifically affects itunes).
|
||
- Fix: Timing in .mp4, timing now calculated for each CC pair instead of per atom.
|
||
- Fix: Typos everywhere in the documentation and source code.
|
||
- Fix: CMakeLists for build in cmake.
|
||
- Fix: -unixts option.
|
||
- Fix: FPS switching messages.
|
||
- Fix: Removed ugly debug statement with local path in HardsubX.
|
||
- Fix: Changed platform target to v120_xp in Visual Studio (so XP is supported again).
|
||
- Fix: Added detail in many error messages.
|
||
- Fix: Memory leaks in videos with XDS.
|
||
- Fix: Makefile compatibility issues with Raspberry pi.
|
||
- Fix: missing separation between WebVTT header and body.
|
||
- Fix: Stupid bug in M2TS that preventing it from working.
|
||
- Fix: OCR libraries dependencies for the release version in Windows.
|
||
- Fix: non-buffered reading from pipes.
|
||
- Fix: --stream option with stdin.
|
||
- New: terminate_asap to buffered_read_opt
|
||
- New: Added some TV-show specific spelling dictionaries.
|
||
- New: Updated GPAC library.
|
||
- New: ASS/SSA.
|
||
- New: Capture sigterm to do some clean up before terminating.
|
||
- New: Work on 708: Changed DefineWindow behavior, only clear text of an existing window is style has changed.
|
||
|
||
0.82 (2016-08-15)
|
||
-----------------
|
||
- New: HardsubX - Burned in subtitle extraction subsystem.
|
||
- New: Color Detection in DVB Subtitles
|
||
- Fix: Corrected sentence capitalization
|
||
- Fix: Skipping redundant bytes at the end of tx3g atom in MP4
|
||
- Fix: Illegal SRT files being created from DVB subtitles
|
||
- Fix: Incorrect Progress Display
|
||
|
||
0.81 (2016-06-13)
|
||
-----------------
|
||
- New: --version parameter for extensive version information (version number, compile date, executable hash, git commit (if appropriate))
|
||
- New: Add -sem (semaphore) to create a .sem file when an output file is open and delete it when it's closed.
|
||
- New: Add --append parameter. This will prevent overwriting of existing files.
|
||
- New: File Rotation support added. The user has to send a USR1 signal to rotate.
|
||
- Fix: Issues with files <1 Mb
|
||
- Fix: Preview of generated transcript.
|
||
- Fix: Statistics were not generated anymore.
|
||
- Fix: Correcting display of sub mode and info in transcripts.
|
||
- Fix: Teletext page number displayed in -UCLA.
|
||
- Fix: Removal of excessive XDS notices about aspect ratio info.
|
||
- Fix: Force Flushing of file buffers works for all files now.
|
||
- Fix: mp4 void atoms that was causing some .mp4 files to fail.
|
||
- Fix: Memory usage caused by EPG processing was high due to many non-dynamic buffers.
|
||
- Fix: Project files for Visual Studio now include OCR support in Windows.
|
||
|
||
0.80 (2016-04-24)
|
||
-----------------
|
||
- Fix: "Premature end of file" (one of the scenarios)
|
||
- Fix: XDS data is always parsed again (needed to extract information such as program name)
|
||
- Fix: Teletext parsing: @ was incorrectly exported as * - X/26 packet specifications in ETS 300 706 v1.2.1 now better followed
|
||
- Fix: Teletext parsing: Latin G2 subsets and accented characters not displaying properly
|
||
- Fix: Timing in -ucla
|
||
- Fix: Timing in ISDB (some instances)
|
||
- Fix: "mfra" mp4 box weight changed to 1 (this helps with correct file format detection)
|
||
- Fix: Fix for TARGET File is null.
|
||
- Fix: Fixed SegFaults while parsing parameters (if mandatory parameter is not present in -outinterval, -codec or -nocodec)
|
||
- Fix: Crash when input small is too small
|
||
- Fix: Update some URLs in code (references to docs)
|
||
- Fix: -delay now updates final timestamp in ISDB, too
|
||
- Fix: Removed minor compiler warnings
|
||
- Fix: Visual Studio solution files working again
|
||
- Fix: ffmpeg integration working again
|
||
- New: Added --forceflush (-ff). If used, output file descriptors will be flushed immediately after being written to
|
||
- New: Hexdump XDS packets that we cannot parse (shouldn't be many of those anyway)
|
||
- New: If input file cannot be open, provide a decent human readable explanation
|
||
- New: GXF support
|
||
|
||
0.79 (2016-01-09)
|
||
-----------------
|
||
- Support for Grid Format (g608)
|
||
- Show Correct number of teletext packet processed
|
||
- Removed Segfault on incorrect mp4 detection
|
||
- Remove xml header from transcript format
|
||
- Help message updated for Teletext
|
||
- Added --help and -h for help message
|
||
- Added --nohtmlescape option
|
||
- Added --noscte20 option
|
||
|
||
0.78 (2015-12-12)
|
||
-----------------
|
||
- Support to extract Closed Caption from MultiProgram at once.
|
||
- CEA-708: exporting to SAMI (.smi), Transcript (.txt), Timed Transcript (ttxt) and SubRip (.srt).
|
||
- CEA-708: 16 bit charset support (tested on Korean).
|
||
- CEA-708: Roll Up captions handling.
|
||
- Changed TCP connection protocol (BIN data is now wrapped in packets, added EPG support and keep-alive packets).
|
||
- TCP connection password prompt is removed. To set connection password use -tcppassword argument instead.
|
||
- Support ISDB Closed Caption.
|
||
- Added a new output format, simplexml (used internally by a CCExtractor user, may or may not be useful for
|
||
anyone else).
|
||
|
||
0.77 (2015-06-20)
|
||
-----------------
|
||
- Fixed bug in capitalization code ('I' was not being capitalized).
|
||
- GUI should now run in Windows 8 (using the include .Net runtime, since
|
||
3.5 cannot be installed in Windows 8 apparently).
|
||
- Fixed Mac build script, binary is now compiled with support for
|
||
files over 2 GB.
|
||
- Fixed bug in PMT code, damaged PMT sections could make CCExtractor
|
||
crash.
|
||
|
||
0.76 (2015-03-28)
|
||
-----------------
|
||
- Added basic M2TS support
|
||
- Added EPG support - you can now export the Program Guide to XML
|
||
- Some bug fixes
|
||
|
||
0.75 (2015-01-15)
|
||
-----------------
|
||
- Fixed issue with teletext to other then srt.
|
||
- CCExtractor can be used as library if compiled using cmake
|
||
- By default the Windows version adds BOM to generated UTF files (this is
|
||
because it's needed to open the files correctly) while all other
|
||
builds don't add it (because it messes with text processing tools).
|
||
You can use -bom and -nobom to change the behaviour.
|
||
|
||
0.74 (2014-09-24)
|
||
-----------------
|
||
- Fixed issue with -o1 -o2 and -12 parameters (where it would write output only in the o2 file)
|
||
- Fixed UCLA parameter issue. Now the UCLA parameter settings can't be overwritten anymore by later parameters that affect the custom transcript
|
||
- Switched order around for TLT and TT page number in custom transcript to match UCLA settings
|
||
- Added nobom parameter, for when files are processed by tools that can't handle the BOM. If using this, files might be not readable under windows.
|
||
- Segfault fix when no input files were given
|
||
- No more bin output when sending to server + possibility to send TT to server for processing
|
||
- Windows: Added the Microsoft redistributable MSVCR120.DLL to both the installation package and the application zip.
|
||
|
||
0.73 - GSOC (2014-08-19)
|
||
------------------------
|
||
- Added support of BIN format for Teletext
|
||
- Added start of librarization. This will allow in the future for other programs to use encoder/decoder functions and more.
|
||
|
||
0.72 - GSOC (2014-08-12)
|
||
------------------------
|
||
- Fix for WTV files with incorrect timing
|
||
- Added support for fps change using data from AVC video track in a H264 TS file.
|
||
- Added FFMpeg Support to enable all encapsulator and decoder provided by ffmpeg
|
||
|
||
0.71 - GSOC (2014-07-31)
|
||
------------------------
|
||
- Added feature to receive captions in BIN format according to CCExtractor's own
|
||
protocol over TCP (-tcp port [-tcppassword password])
|
||
- Added ability to send captions to the server described above or to the
|
||
online repository (-sendto host[:port])
|
||
- Added -stdin parameter for reading input stream from standard input
|
||
- Compilation in Cygwin using linux/Makefile
|
||
- Fix for .bin files when not using latin1 charset
|
||
- Correction of mp4 timing, when one timestamp points timing of two atom
|
||
|
||
0.70 - GSOC (2014-07-06)
|
||
------------------------
|
||
This is the first release that is part of Google's Summer of Code.
|
||
Anshul, Ruslan and Willem joined CCExtractor to work on a number of things
|
||
over the summer, and their work is already reaching the mainstream
|
||
version of CCExtractor.
|
||
|
||
- Added a huge dictionary submitted by Matt Stockard.
|
||
- Added DVB subtitles decoder, spupng in output
|
||
- Added support for cdt2 media atoms in QT video files. Now multiple atoms in
|
||
a single sample sequence are supported.
|
||
- Changed Makefile.
|
||
- Fixed some bugs.
|
||
- Added feature to print info about file's subtitles and streams (-out=report).
|
||
- Support Long PMT.
|
||
- Support Configuration file.
|
||
- There is an sample configuration file in doc/ folder with name
|
||
ccextractor.cnf.sample
|
||
- Just now only ccextractor.cnf named files kept beside ccextractor
|
||
executable is supported
|
||
- for details of which options can be set using configuration file,
|
||
please look at sample file.
|
||
|
||
- Added options for custom transcript output:
|
||
new parameter (-customtxt format), where the format must be like this: 1100100 (7 digits).
|
||
These indicate whether the next things should be displayed or not in the (timed) transcript:
|
||
- Display start time
|
||
- Display end time
|
||
- Display caption mode
|
||
- Display caption channel
|
||
- Use a relative timestamp (relative to the sample)
|
||
- Display XDS info
|
||
- Use colors
|
||
Examples:
|
||
0000101 is the default setting for transcripts
|
||
1110101 is the default for timed transcripts
|
||
1111001 is the default setting for -ucla
|
||
Make sure you use this parameter after others that might affect these
|
||
settings (-out, -ucla, -xds, -txt, -ttxt, ...)
|
||
- Fixed Negative timing Bug
|
||
|
||
0.69 (2014-04-05)
|
||
-----------------
|
||
- A few patches from Christopher Small, including proper support
|
||
for multiple multicast clients listening on the same port.
|
||
- GUI: Fixed teletext preview.
|
||
- GUI: Added a small indicator of data being received when reading from
|
||
UDP.
|
||
- GUI: Added UTF-8 support to preview Window (used for teletext).
|
||
- Fixes in Makefile and build script, compilation in linux and OSX failed
|
||
if another libpng was found in the system.
|
||
- WTV support directly in CCExtractor (no need for wtvccdump any more).
|
||
- Started refactoring and clean-up.
|
||
- Fix: MPEG clock rollover (happens each 26 hours) caused a time
|
||
discontinuity.
|
||
- Windows GUI: Started work on HDHomeRun support. For now it just looks
|
||
for HDHomeRun devices. Lots of other things will arrive in the next
|
||
versions.
|
||
- Windows GUI: Some code refactoring, since the HDHomeRun support makes
|
||
the code larger enough to require more than one source file :-)
|
||
|
||
0.68 (2013-12-24)
|
||
-----------------
|
||
- A couple of shared variables between 608 decoders were causing
|
||
problems when both fields were processed at the same time with
|
||
-12, fixed.
|
||
- Added BOM for UTF-8 files.
|
||
- Corrected a few extended characters in the UTF-8 encoding,
|
||
probably never used in real world captioning but since we got
|
||
a good test sample file...
|
||
- Color and fonts in PAC commands were ignored, fixed (Helen Buus).
|
||
- Added a new output format, spupng. It consists on one .png file
|
||
for each subtitle frame and one .xml with all the timing
|
||
(Heleen Buus).
|
||
- Some fixes (Chris Small).
|
||
|
||
0.67 (2013-10-09)
|
||
-----------------
|
||
- Padding bytes were being discarded early in the process in 0.66,
|
||
which is convenient for debugging, but it messes with timing in
|
||
.raw, which depends on padding. Fixed.
|
||
- MythTV's branch had a fixed size buffer that could not be enough
|
||
some times. Made dynamic.
|
||
- Better support for PAT changing mid-stream.
|
||
- Removed quotes in Start in .smi (format fix).
|
||
- Added multicast support (Chris Small)
|
||
- Added ability to select IP address to bind in UDP (Chris Small)
|
||
- Fixes in -unixts and -delay for teletext.
|
||
- Added -autodash : When two people are talking, add a dash as
|
||
needed (this is based on subtitle position). Only in .srt and
|
||
with -trim. Quite experimental, feedback appreciated.
|
||
- Added -latin1 to select Latin 1 as encoding. Default is now
|
||
UTF-8 (-utf8 still exists but it's not needed).
|
||
- Added -ru1, which emulates a (non-existing in real life) 1 line
|
||
roll-up mode.
|
||
|
||
|
||
0.66 (2013-07-01)
|
||
-----------------
|
||
- Fixed bug in auto detection code that triggered a message
|
||
about file being auto of sync.
|
||
- Added -investigate_packets
|
||
The PMT is used to select the most promising elementary stream
|
||
to get captions from. Sometimes captions are where you least
|
||
expect it so -datapid allows you to select a elementary stream
|
||
manually, in case the CC location is not obvious from the PMT
|
||
contents. To assist looking for the right stream, the parameter
|
||
"-investigate_packets" will have CCExtractor look inside each
|
||
stream, looking for CC markers, and report the streams that
|
||
are likely to contain CC data even if it can't be determined from
|
||
their PMT entry.
|
||
- Added -datastreamtype to manually selecting a stream based on
|
||
its type instead of its PID. Useful if your recording program
|
||
always hides the caption under the stream type.
|
||
- Added -streamtype so if an elementary stream is selected manually
|
||
for processing, the streamtype can be selected too. This can be
|
||
needed if you process, for example a stream that is declared as
|
||
"private MPEG" in the PMT, so CCExtractor can't tell what it is.
|
||
Usually you'll want -streamtype 2 (MPEG video) or -streamtype 6
|
||
(MPEG private data).
|
||
- PMT content listing improved, it now shows the stream type for
|
||
more types.
|
||
- Fixes in roll-up, cursor was being moved to column 1 if a
|
||
RU2, RU3 or RU4 was received even if already in roll-up mode.
|
||
- Added -autoprogram. If a multiprogram TS is processed and
|
||
-autoprogram is used, CCExtractor will analyze all PMTs and use
|
||
the first program that has a suitable data stream.
|
||
- Timed transcript (ttxt) now also exports the caption mode
|
||
(roll-up, paint-on, etc.) next to each line, as it's useful to
|
||
detect things like commercials.
|
||
- Content Advisory information from XDS is now decoded if it's
|
||
transmitted in "US TV parental guidelines" or "MPA".
|
||
Other encoding such as Canada's are not supported yet due
|
||
to lack of samples.
|
||
- Copy Management information from XDS is now decoded.
|
||
- Added -xds. If present and export format is timed transcript
|
||
(only), XDS information will be saved to file (same file as the
|
||
transcript, with XDS being clearly marked). Note that for now
|
||
all XDS data is exported even if it doesn't change, so the
|
||
transcript file will be significantly larger.
|
||
- Added some PaintOn support, at least enough to prevent it
|
||
from breaking things when the other modes are used.
|
||
- Removed afd_data() warning. AFD doesn't carry any caption related
|
||
data. AFD still detected in code in case we want to do something
|
||
with it later anyway.
|
||
- Ported last changes from Petr Kutalek's telxcc. Current version
|
||
is 2.4.4.
|
||
- In teletext mode when exporting to transcript (not .srt), an effort
|
||
is made to detect and merge line duplicates. This is done by using
|
||
the Levenshtein's distance, which is the number of changes requires
|
||
to convert one string to another. To simplify things, strings are
|
||
compared up to the length of the shortest one.
|
||
There are 3 parameters that can be used to tweak the thresholds:
|
||
-deblev: Enable debug so the calculated distance for each two
|
||
strings is displayed. The output includes both strings, the
|
||
calculated distance, the maximum allowed distance, and whether
|
||
the strings are ultimately considered equivalent or not, i.e.
|
||
the calculated distance is less or equal than the max allowed.
|
||
-levdistmincnt value: Minimum distance we always allow
|
||
regardless of the length of the strings. Default 2. This means
|
||
that if the calculated distance is 0, 1 or 2, we consider the
|
||
strings to be equivalent.
|
||
-levdistmaxpct value: Maximum distance we allow, as a
|
||
percentage of the shortest string length. Default 10%. For
|
||
example, consider a comparison of one string of 30 characters
|
||
and one of 60 characters. We want to determine whether the
|
||
first 30 characters of the longer string are more or less the
|
||
same as the shortest string, i.e. whether the longest string
|
||
is the shortest one plus new characters and maybe some
|
||
corrections. Since the shortest string is 30 characters and
|
||
the default percentage is 10%, we would allow a distance of
|
||
up to 3 between the first 30 characters.
|
||
- Added -lf : Use UNIX line terminator (LF) instead of Windows (CRLF).
|
||
- Added -noautotimeref: Prevent UTC reference from being auto set from
|
||
the stream data.
|
||
|
||
0.65 (2013-03-14)
|
||
-----------------
|
||
- Minor GUI changes for teletext
|
||
- Added end timestamps in timed transcripts
|
||
- Added support for SMPTE (patch by John Kemp)
|
||
- Initial support for MPEG2 video tracks inside MP4 files (thanks a
|
||
lot to GPAC's Jean who assisted in analyzing the sample and
|
||
doing the required changes in GPAC).
|
||
- Improved MP4 auto detection
|
||
- Support for PCR if PTS is not available (needed for some teletext
|
||
samples, and probably useful for everything else).
|
||
- Support for UDP streaming - finally. Use "-udp $port" to have
|
||
CCExtractor listen for a stream. I've only been able to test it
|
||
with an European HDHomeRun, but it should work fine with any other
|
||
tuner.
|
||
- Refactored PMT / PAT processing in transport streams, now allows to
|
||
display their contents (-parsePAT and -parsePMT) which makes
|
||
troubleshooting easier.
|
||
|
||
0.64 (2012-10-29)
|
||
-----------------
|
||
- Changed Window GUI size (larger).
|
||
- Added Teletext options to GUI.
|
||
- Added -teletext to force teletext mode even if not detected
|
||
- Added -noteletext to disable teletext detection. This can be needed
|
||
for streams that have both 608 data and teletext packets if you
|
||
need to process the 608 data (if teletext is detected it will
|
||
take precedence otherwise).
|
||
- Added -datapid to force a specific elementary stream to be used for
|
||
data (bypassing detections).
|
||
- Added -ru2 and -ru3 to limit the number of visible lines in roll-up
|
||
captions (bypassing whatever the broadcast says).
|
||
- Added support for a .hex (hexadecimal) dump of data.
|
||
- Added support for wtv in Windows. This is done by using a new program
|
||
(wtvccdump.exe) and a new DirectShow filter (CCExtractorDump.dll) that
|
||
process the .wtv using DirecShow's filters and export the line 21 data
|
||
to a .hex file. The GUI calls wtvccdump.exe as needed.
|
||
- Added --nogoptime to force PTS timing even when CCExtractor would
|
||
use GOP timing otherwise.
|
||
|
||
0.63 (2012-08-17)
|
||
-----------------
|
||
- Telext support added, by integrating Petr Kutalek's telxcc. Integration is
|
||
still quite basic (there's equivalent code from both CCExtractor and
|
||
telxcc) and some clean up is needed, but it works. Petr has announced that
|
||
he's abandoning telxcc so further development will happen directly in
|
||
CCExtractor.
|
||
- Some bug fixes, as usual.
|
||
|
||
0.62 (2012-05-23)
|
||
-----------------
|
||
- Corrected Mac build "script" (needed to add GPAC includes). Thanks to the
|
||
Mac users that sent this.
|
||
- Hauppauge mode now uses PES timing, needed for files that don't have
|
||
caption data during all the video (such as in commercial breaks).
|
||
- Added -mp4 and -in:mp4 to force the input to be processed as MP4.
|
||
- CC608 data embedded in a separate stream (as opposed as in the video
|
||
stream itself) in MP4 files is now supported (not heavily tested).
|
||
This should be rather useful since closed captioned files from iTunes
|
||
use this format.
|
||
- More CEA-708 work. The debugger is now able to dump the "TV" contents for
|
||
the first time. Also, a .srt can be generated, however timing is not quite
|
||
good yet (still need to figure out why).
|
||
- Added -svc (or --service) to select the CEA-708 services to be processed.
|
||
For example, -svc 1,2 will process the primary and secondary language
|
||
services. Valid values are 1-63, where 1 is the primary language, 2 is
|
||
the secondary language (this is part of the specification) and 3-63 are
|
||
provider defined.
|
||
- Rajesh Hingorani sent a fix for the MPEG decoder that fixes garbled output
|
||
or certain samples (we had none like this in our test collection). Thanks,
|
||
Rajesh.
|
||
|
||
0.61 (2012-03-08)
|
||
-----------------
|
||
- Fix: GCC 3.4.4 can now build CCExtractor.
|
||
- Fix: Damaged TS packets (those that come with 'error in transport' bit
|
||
on) are now skipped.
|
||
- Fix: Part of the changes for MP4 support (CC packets buffering in
|
||
particular) broke some stuff for other files, causing at least very
|
||
annoying character duplication. We hope we've fixed it without breaking
|
||
anything but please report).
|
||
- Some non-interesting cleanup.
|
||
|
||
0.60 (unreleased)
|
||
-----------------
|
||
- Add: MP4 support, using GPAC (a media library). Integration is currently
|
||
"enough so it works", but needs some more work. There's some duplicate
|
||
code, the stream must be a file (no streaming), etc.
|
||
- Fix: The Windows version was writing text files with double \r.
|
||
- Fix: Closed captions blocks with no data could cause a crash.
|
||
- Fix: -noru (to generate files without duplicate lines in
|
||
roll-up) was broken, with complete lines being missing.
|
||
- Fix: bin format not working as input.
|
||
|
||
0.59 (2011-10-07)
|
||
-----------------
|
||
- More AVC/H.264 work. pic_order_cnt_type != 0 will be processed now.
|
||
- Fix: Roll-up captions with interruptions for Text (with ResumeTextDisplay
|
||
in the middle of the caption data) were missing complete lines.
|
||
- Added a timed text transcript output format, probably only useful for
|
||
roll-up captions. Use --timedtranscript or -ttxt. Output is like this:
|
||
|
||
00:01:25,485 | HOST: LAST NIGHT THE REPUBLICAN
|
||
00:01:29,522 | HOPEFULS INTRODUCE THEMSELVES TO
|
||
00:01:30,623 | PRIMARY VOTERS.
|
||
|
||
- XDS parser. Not complete (no point in dealing with V-Chip stuff for
|
||
example), but enough to extract program and station information.
|
||
- Input streams can now come from standard input using - (just an hyphen)
|
||
as parameter.
|
||
- Added a new output format called 'null' (use -null or -out=null). This
|
||
format means "Don't produce any file", and is useful to have CCExtractor
|
||
process the stream (for XDS messages, debugging, etc) without actually
|
||
generating anything.
|
||
- Updated Windows GUI.
|
||
- Added -quiet => If used, CCExtractor will not write any message.
|
||
- Added -stdout => If used, the captions will be sent to stdout (console)
|
||
instead of file. Combined with -, CCExtractor can work as a filter in
|
||
a larger process, receiving the stream from stdin and sending the
|
||
captions to stdout.
|
||
- Some code clean up, minor refactoring.
|
||
- Teletext detection (not yet processing).
|
||
|
||
0.58 (2011-08-21)
|
||
-----------------
|
||
- Implemented new PTS based mode to order the caption information
|
||
of AVC/H.264 data streams. The old pic_order_cnt_lsb based method
|
||
is still available via the -poc or --usepicorder command switches.
|
||
- Removed a couple of those annoying "Impossible!" error messages
|
||
that appears when processing some (possibly broken, unsure) files.
|
||
- Added -nots --notypesettings to prevent italics and underline
|
||
codes from being displayed.
|
||
- Note to those not liking the paragraph symbol being used for the
|
||
music note: Submit a VALID replacement in latin-1.
|
||
- Added preliminary support for multiple program TS files. The
|
||
parameter --program-number (or -pn) will let you choose which
|
||
program number to process. If no number is passed and the TS
|
||
file contains more than one, CCExtractor will display a list of
|
||
found programs and terminate.
|
||
- Added support (basic, because I only received one sample) for some
|
||
Hauppauge cards that save CC data in their own format. Use the
|
||
parameter -haup to enable it (CCExtractor will display a notice
|
||
if it thinks that it's processing a Hauppauge capture anyway).
|
||
- Fixed bug in roll-up.
|
||
- More AVC work, now TS files from echostar that provided garbled
|
||
output are processed OK.
|
||
- Updated Windows GUI.
|
||
|
||
0.57 (2010-12-16)
|
||
-----------------
|
||
- Bug fixes in the Windows version. Some debug code was unintentionally
|
||
left in the released version.
|
||
|
||
0.56 (2010-12-09)
|
||
-----------------
|
||
- H264 support
|
||
- Other minor changes a lot less important
|
||
|
||
0.55 (2009-08-09)
|
||
-----------------
|
||
- Replace pattern matching code with improved parser for MPEG-2 elementary
|
||
streams.
|
||
- Fix parsing of ReplayTV 5000 captions.
|
||
- Add ability to decode SCTE 20 encoded captions.
|
||
- Make decoding of TS files more error tolerant.
|
||
- Start implementation of EIA-708 decoding (not active yet).
|
||
- Add -gt / --goptime switch to use GOP timing instead of PTS timing.
|
||
- Start implementation of AVC/H.264 decoding (not active yet).
|
||
- Fixed: The basic problem is that when 24fps movie film gets converted to 30fps NTSC
|
||
they repeat every 4th frame. Some pics have 3 fields of CC data with field 3 CC data
|
||
belongs to the same channel as field 1. The following pics have the fields reversed
|
||
because of the odd number of fields. I used top_field_first to tell when the channels
|
||
are reversed. See Table 6-1 of the SCTE 20 [Paul Fernquist]
|
||
|
||
0.54 (2009-04-16)
|
||
-----------------
|
||
- Add -nosync and -fullbin switches for debugging purposes.
|
||
- Remove -lg (--largegops) switch.
|
||
- Improve synchronization of captions for source files with
|
||
jumps in their time information or gaps in the caption
|
||
information.
|
||
- [R. Abarca] Changed Mac script, it now compiles/link
|
||
everything from the /src directory.
|
||
- It's now possible to have CCExtractor add credits
|
||
automatically.
|
||
- Added a feature to add start and end messages (for credits).
|
||
See help screen for details.
|
||
|
||
0.53 (2009-02-24)
|
||
-----------------
|
||
- Force generated RCWT files to have the same length as source file.
|
||
- Fix documentation for -startat / -endat switches.
|
||
- Make -startat / -endat work with all output formats.
|
||
- Fix sync check for raw/rcwt files.
|
||
- Improve timing of dvr-ms NTSC captions.
|
||
- Add -in=bin switch to read CCExtractor's own binary format.
|
||
- Fix problem with short input files (smaller 1MB).
|
||
- Clean up regular and debug output.
|
||
- Add -out=bin switch to write RCWT data.
|
||
- Remove -bo/--bufferoutput switch and functionality.
|
||
- [Volker] Added new generic binary format (RCWT
|
||
for Raw Captions With Time). This new format
|
||
allows one file to contain all the available
|
||
closed caption data instead of just one stream.
|
||
- Added --no_progress_bar to disable status
|
||
information (mostly used when debugging, as the
|
||
progress information is annoying in the middle
|
||
of debug logs).
|
||
- The Windows GUI was reported to freeze in some
|
||
conditions. Fixed.
|
||
- The Windows GUI is now targeted for .NET 2.0
|
||
instead of 3.5. This allows Windows 2000 to run
|
||
it (there's not .NET 3.5 for Windows 2000), as
|
||
requested by a couple of key users.
|
||
|
||
0.51 (unreleased)
|
||
-----------------
|
||
- Removed -autopad and -goppad, no longer needed.
|
||
- In preparation to a new binary format we have
|
||
renamed the current .bin to .raw. Raw files
|
||
have only CC data (with no header, timing, etc.).
|
||
- The input file format (when forced) is now
|
||
specified with
|
||
-in=format
|
||
such as -in=ts, -in=raw, -in=ps ...
|
||
The old switches (-ts, -ps, etc.) still work.
|
||
The only exception is -bin which has been removed
|
||
(reserved for the new binary format). Use
|
||
-in=raw to process a raw file.
|
||
- Removed -d, which when produced a raw file used
|
||
a DVD format. This has been merged into a new
|
||
output type "dvdraw". So now instead of using
|
||
-raw -d as before, use -out=dvdraw if you need
|
||
this.
|
||
- Removed --noff
|
||
- Added gui_mode_reports for frontend communications,
|
||
see related file.
|
||
- Windows GUI rewritten. Source code now included,
|
||
too.
|
||
- [Volker] Dish Network clean-up
|
||
|
||
0.50 (2008-12-12)
|
||
-----------------
|
||
- [Volker] Fix in DVR-MS NTSC timing
|
||
- [Volker] More clean-up
|
||
- Minor fixes
|
||
|
||
0.49 (2008-12-10)
|
||
-----------------
|
||
- [Volker] Major MPEG parser rework. Code much
|
||
cleaner now.
|
||
- Some stations transmit broken roll-up captions,
|
||
and for some reason don't send CRs but RUs...
|
||
Added work-around code to make captions readable.
|
||
- Started work on EIA-708 (DTV). Right now you can
|
||
add -debug-708 to get a dump of the 708 data.
|
||
An actually useful decoder will come soon.
|
||
- Some of the changes MIGHT HAVE BROKEN MythTV's
|
||
code. I don't use MythTV myself so I rely on
|
||
other people's samples and reports. If MythTV
|
||
is broken please let me know.
|
||
- Added new debug options.
|
||
- [Volker] Added support for DVR-MS NTSC files.
|
||
- Other minor bug fixes and changes.
|
||
|
||
0.46 (2008-11-24)
|
||
-----------------
|
||
- Added support for live streaming, CCExtractor
|
||
can now process files that are being recorded
|
||
at the same time.
|
||
|
||
- [Volker] Added a new DVR-MS loop - this is
|
||
completely new, DVR-MS specific code, so we no
|
||
longer use the generic MPEG code for DVR-MS.
|
||
DVR-MS should (or will be eventually at least)
|
||
be as reliable as TS.
|
||
Note: For now, it's only ATSC recordings, not
|
||
NTSC (analog) recordings.
|
||
|
||
0.45 (2008-11-14)
|
||
-----------------
|
||
- Added auto-detection of DVR-MS files.
|
||
- Added -asf to force DVR-MS mode.
|
||
- Added some specific support for DVR-MS
|
||
files. These format used to work
|
||
correctly in 0.34 (pure luck) but the
|
||
MPEG code rework broke it. It should
|
||
work as it used to.
|
||
- Updated Windows GUI to support the
|
||
new options.
|
||
- Added -lg --largegops
|
||
From the help screen:
|
||
Each Group-of-Picture comes with timing
|
||
information. When this info is too separate
|
||
(for example because there are a lot of
|
||
frames in a GOP) ccextractor may prefer not
|
||
to use GOP timing. Use this option is you
|
||
need ccextractor to use GOP timing in large
|
||
GOPs.
|
||
|
||
0.44 (2008-09-10)
|
||
-----------------
|
||
- Added an option to the GUI to process
|
||
individual files in batch, i.e. call
|
||
ccextractor once per file. Use it if you
|
||
want to process several unrelated files
|
||
in one go.
|
||
- Added an option to prevent duplicate
|
||
lines in roll-up captions.
|
||
- Several minor bug fixes.
|
||
- Updated the GUI to add the new options.
|
||
|
||
0.43 (2008-06-20)
|
||
-----------------
|
||
- Fixed a bug in the read loop (no less)
|
||
that caused some files to fail when
|
||
reading without buffering (which is
|
||
the default in the Linux build).
|
||
- Several improvements in the GUI, such as
|
||
saving current options as default.
|
||
|
||
0.42 (2008-06-17)
|
||
-----------------
|
||
- The option switch "-transcript" has been
|
||
changed to "--transcript". Also, "-txt"
|
||
has been added as the short alias.
|
||
- Windows GUI
|
||
- Updated help screen
|
||
|
||
0.41 (2008-06-15)
|
||
-----------------
|
||
- Default output is now .srt instead of .bin,
|
||
use -raw if you need the data dump instead of
|
||
.srt.
|
||
- Added -trim, which removes blank spaces at
|
||
the left and rights of each line in .srt.
|
||
Note that those spaces are there to help
|
||
deaf people know if the person talking is
|
||
at the left or the right of the screen, i.e.
|
||
there aren't useless. But if they annoy
|
||
you, go ahead...
|
||
|
||
0.40 (2008-05-20)
|
||
-----------------
|
||
- Fixed a bug in the sanity check function
|
||
that caused the Myth branch to abort.
|
||
- Fixed the OSX build script, it needed a
|
||
new #define to work.
|
||
|
||
0.39 (2008-05-11)
|
||
-----------------
|
||
- Added a -transcript. If used, the output will
|
||
have no time information. Also, if in roll-up
|
||
mode there will be no repeated lines.
|
||
- Lots of changes in the MPEG parser, most of
|
||
them submitted by Volker Quetschke.
|
||
- Fixed a bug in the CC decoder that could cause
|
||
the first line not to be cleared in roll-up
|
||
mode.
|
||
- CCExtractor can now follow number sequences in
|
||
file names, by suffixing the name with +.
|
||
For example,
|
||
|
||
DVD0001.VOB+
|
||
|
||
means DVD0001.VOB, DVD0002.VOB, etc. This works
|
||
for all files, so part001.ts+ does what you
|
||
could expect.
|
||
- Added -90090 which changes the clock frequency
|
||
from the MPEG standard 90000 to 90090. It
|
||
*could* (remains to be seen) help if there are
|
||
timing issues.
|
||
- Better support for Tivo files.
|
||
- By default ccextractor now considers the whole
|
||
input file list a one large file, instead of
|
||
several, independent, video files. This has
|
||
been changed because most programs (for example
|
||
DVDDecrypt) just cut the files by size.
|
||
If you need the old behaviour (because you
|
||
actually edited the video files and want to
|
||
join the subs), use -ve.
|
||
|
||
|
||
0.36 (unreleased)
|
||
-----------------
|
||
- Fixed bug in SMI, nbsp was missing a ;.
|
||
- Footer for SAMI files was incorrect (<body> and
|
||
<sami> tags were being opened again instead of
|
||
being closed).
|
||
- Displayed memory is now written to disk at end
|
||
of stream even if there is no command requesting
|
||
so (may prevent losing the last screen-full).
|
||
- Important change that could break scripts, but
|
||
that have been added because old behaviour was
|
||
annoying to most people: _1 and _2 at the end
|
||
of the output file names is now added ONLY if
|
||
-12 is used (i.e. when there are two output
|
||
files to produce). So
|
||
|
||
ccextractor -srt sopranos.mpg
|
||
|
||
now produces sopranos.srt instead of sopranos_1.srt.
|
||
If you use -12, i.e.
|
||
|
||
ccextractor -srt -12 sopranos.mpg
|
||
|
||
You get
|
||
|
||
sopranos_1.srt and
|
||
sopranos_2.srt
|
||
|
||
as usual.
|
||
|
||
|
||
0.35 (unreleased)
|
||
-----------------
|
||
- Added --defaultcolor to the help screen. Code
|
||
was already in 0.34 but the documentation wasn't
|
||
updated.
|
||
- Buffer is larger now, since I've found a sample
|
||
where 256 Kb isn't enough for a PES (go figure).
|
||
- At the end of the process, a ratio between
|
||
video length and time to process is displayed.
|
||
|
||
0.34 (2007-06-03)
|
||
-----------------
|
||
- Added some basic letter case and capitalization
|
||
support. For captions that broadcast in ALL
|
||
UPPERCASE (most of them), ccextractor can now
|
||
do the first part of the job.
|
||
|
||
--sentencecap or -sc will tell ccextractor to
|
||
follow the typical capitalization rules, such
|
||
as capitalize months, days of week, etc.
|
||
|
||
So from
|
||
YOU BETTER RESPECT
|
||
THIS ROBE, ALAN
|
||
|
||
You get
|
||
|
||
You better respect
|
||
this robe, alan.
|
||
|
||
--capfile or -caf also enables the case
|
||
processing part and adds an extra list of
|
||
words in the specified file, for example:
|
||
|
||
--capfile names.txt
|
||
|
||
where names.txt is just a plain text file
|
||
with the proper spelling for some words,
|
||
such as
|
||
|
||
Alan
|
||
Tony
|
||
|
||
So you get
|
||
|
||
You better respect
|
||
this robe, Alan.
|
||
|
||
Which is the correct spelling. You can
|
||
have a different spelling file per TV
|
||
show, or a large file with a lot of
|
||
words, etc.
|
||
- ccextractor has been reported to
|
||
compile and run on Mac with a minor
|
||
change in the build script, so I've
|
||
created a mac directory with the
|
||
modified script. I haven't tested it
|
||
myself.
|
||
- Windows build comes with a File Version
|
||
Number (0.0.0.34 in this version) in case
|
||
you want to check for version info.
|
||
|
||
0.33 (unreleased)
|
||
-----------------
|
||
- Added -scr or --screenfuls, to select the
|
||
number of screenfuls ccextractor should
|
||
write before exiting. A screenful is
|
||
a change of screen contents caused by
|
||
a CC command (not new characters). In
|
||
practice, this means that for .srt each
|
||
group of lines is a screenful, except when
|
||
using -dru (which produces a lot of
|
||
groups of lines because each new character
|
||
produces a new group).
|
||
- Completed tables for all encodings.
|
||
- Fixed bug in .srt related to milliseconds
|
||
in time lines.
|
||
- Font colors are back for .srt (apparently
|
||
some programs do support them after all).
|
||
Use -nofc or --nofontcolor if you don't
|
||
want these tags.
|
||
|
||
0.32 (unreleased)
|
||
-----------------
|
||
- Added -delay ms, which adds (or subtracts)
|
||
a number of milliseconds to all times in
|
||
.srt/.sami files. For example,
|
||
|
||
-delay 400
|
||
|
||
causes all subtitles to appear 400 ms later
|
||
than they would normally do, and
|
||
|
||
-delay -400
|
||
|
||
causes all subtitles to appear 400 ms before
|
||
they would normally do.
|
||
- Added -startat at -endat which lets you
|
||
select just a portion of data to be processed,
|
||
such as from minute 3 to minute 5. Check
|
||
help screen for exact syntax.
|
||
|
||
0.31 (unreleased)
|
||
-----------------
|
||
- Added -dru (direct rollup), which causes
|
||
roll-up captions to be written as
|
||
they would on TV instead of line by line.
|
||
This makes .srt/.sami files a lot longer,
|
||
and ugly too (each line is written many
|
||
times, two characters at time).
|
||
|
||
0.30 (2007-05-24)
|
||
-----------------
|
||
- Fix in extended char decoding, I wasn't
|
||
replacing the previous char.
|
||
- When a sequence code was found before
|
||
having a PTS, reported time was
|
||
undefined.
|
||
|
||
0.29 (unreleased)
|
||
-----------------
|
||
- Minor bug fix.
|
||
|
||
0.28 (unreleased)
|
||
-----------------
|
||
- Fixed a buffering related issue. Short version,
|
||
the first 2 Mb in non-TS mode were being
|
||
discarded.
|
||
- .srt no longer has <font> tags. No player
|
||
seems to process them so my guess is that
|
||
they are not part of the .srt "standard"
|
||
even if McPoodle add them.
|
||
|
||
0.27 (unreleased)
|
||
-----------------
|
||
- Modified sanitizing code, it's less aggressive
|
||
now. Ideally it should mean that characters
|
||
won't be missed anymore. We'll see.
|
||
|
||
0.26 (unreleased)
|
||
-----------------
|
||
- Added -gp (or -goppad) to make ccextractor use
|
||
GOP timing. Try it for non TS files where
|
||
subs start OK but desync as the video advances.
|
||
|
||
0.25 (unreleased)
|
||
-----------------
|
||
- Format detection is not perfect yet. I've added
|
||
-nomyth to prevent the MytvTV code path to be
|
||
called. I've seen apparently correct files that
|
||
make MythTV's MPEG decoder to choke. So, if it
|
||
doesn't work correctly automatically: Try
|
||
-nomyth and -myth. Hopefully one of the two
|
||
options will work.
|
||
|
||
|
||
0.24 (unreleased)
|
||
-----------------
|
||
- Fixed a bug that caused dvr-ms (Windows Media Center)
|
||
files to be incorrectly processed (letters out of
|
||
order all the time).
|
||
- Reworked input buffer code, faster now.
|
||
- Completed MythTV's MPEG decoder for Program Streams,
|
||
which results in better processing of some specific
|
||
files.
|
||
- Automatic file format detection for all kind of
|
||
files and closed caption storage method. No need to
|
||
tell ccextractor anything about your file (but you
|
||
still can).
|
||
|
||
|
||
0.22 (2007-05-15)
|
||
-----------------
|
||
- Added text mode handling into decoder, which gets rids
|
||
of junk when text mode data is present.
|
||
- Added support for certain (possibly non standard
|
||
compliant) DVDs that add more captions block in a
|
||
user data block than they should (such as Red October).
|
||
- Fix in roll-up init code that caused the previous popup
|
||
captions not to be written to disk.
|
||
- Other Minor bug fixes.
|
||
|
||
|
||
0.20 (2007-05-07)
|
||
-----------------
|
||
- Unicode should be decent now.
|
||
- Added support for Hauppauge PVR 250 cards, and (possibly)
|
||
many others (bttv) with the same closed caption recording
|
||
format.
|
||
This is the result of hacking MythTV's MPEG parser into
|
||
CCExtractor. Integration is not very good (to put it
|
||
midly) but it seems to work. Depending on the feedback I
|
||
may continue working on this or just leave it 'as it'
|
||
(good enough).
|
||
If you want to process a file generated by one of these
|
||
analog cards, use -myth. This is essential as it will
|
||
make the program take a totally different code path.
|
||
- Added .SAMI generation. I'm sure this can be improved,
|
||
though. If you have a good CSS for .SAMI files let me
|
||
know.
|
||
|
||
0.19 (2007-05-03)
|
||
-----------------
|
||
- Work on Dish Network streams, timing was completely broken.
|
||
It's fixed now at least for the samples I have, if it's not
|
||
completely fixed let me know. Credit for this goes to
|
||
Jack Ha who sent me a couple of samples and a first
|
||
implementation of a semi working-fix.
|
||
- Added support for several input files (see help screen for
|
||
details).
|
||
- Added Unicode and Latin-1 encoding.
|
||
|
||
|
||
0.17 (2007-04-29)
|
||
-----------------
|
||
- Extraction to .srt is almost complete - works correctly for
|
||
pop-up and roll-up captions, possibly not yet for paint-on
|
||
(mostly because I don't have any sample with paint-on captions
|
||
so I can't test).
|
||
- Minor bug fixes.
|
||
- Automatic TS/non-TS mode detection.
|
||
|
||
0.14 (2007-04-25)
|
||
-----------------
|
||
- Work on handling special cases related to the MPEG reference
|
||
clock: Roll over, jumps, etc.
|
||
- Modified padding code a bit: In particular, padding occurs
|
||
on B-Frames now.
|
||
- Started work on CC data parsing (use -608 to see output).
|
||
- Added built-in input buffering.
|
||
- Major code reorganization.
|
||
- Added a decent progress indicator.
|
||
- Added TS header synchronization (so the input file no longer
|
||
needs to start with a TS header).
|
||
- Minor bug fixes.
|
||
|
||
0.07 (2007-04-19)
|
||
-----------------
|
||
- Added MPEG reference clock parsing.
|
||
- Added auto padding in TS. Does miracles with timing.
|
||
- Added video information (as extracted from sequence header).
|
||
- Some code clean-up.
|
||
- FF sanity check enabled by default.
|