mirror of
https://github.com/CCExtractor/ccextractor.git
synced 2026-04-17 03:27:40 +00:00
[PR #1820] [MERGED] fix(708): Write consistent 2-byte UTF-16BE encoding for CEA-708 captions #2566
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
📋 Pull Request Information
Original PR: https://github.com/CCExtractor/ccextractor/pull/1820
Author: @cfsmp3
Created: 12/14/2025
Status: ✅ Merged
Merged: 12/14/2025
Merged by: @cfsmp3
Base:
master← Head:fix/issue-1451-utf16-encoding📝 Commits (2)
9e665a1fix(708): Write consistent 2-byte UTF-16BE encoding for CEA-708 captions238f411test(708): Update write_char test to expect 2-byte UTF-16BE output📊 Changes
2 files changed (+19 additions, -22 deletions)
View changed files
📝
src/lib_ccx/ccx_decoders_708_output.c(+7 -11)📝
src/rust/src/decoder/output.rs(+12 -11)📄 Description
Summary
write_utf16_charfunction in C (ccx_decoders_708_output.c) to always write 2 byteswrite_charfunction in Rust (decoder/output.rs) to always write 2 bytesProblem
When extracting CEA-708 captions with Japanese or Chinese characters using
--service all[UTF-16BE], the output was garbled:The root cause was that both C and Rust implementations wrote:
This created an invalid mix of 8-bit and 16-bit values that couldn't be properly converted.
Solution
Always write 2 bytes per character, ensuring valid UTF-16BE encoding. After the fix:
Test plan
Fixes #1451
🤖 Generated with Claude Code
🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.