mirror of
https://github.com/CCExtractor/ccextractor.git
synced 2026-04-20 21:20:28 +00:00
[PR #1871] [MERGED] fix(708): Support Korean EUC-KR encoding in CEA-708 decoder #2644
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
📋 Pull Request Information
Original PR: https://github.com/CCExtractor/ccextractor/pull/1871
Author: @cfsmp3
Created: 12/21/2025
Status: ✅ Merged
Merged: 12/21/2025
Merged by: @cfsmp3
Base:
master← Head:fix/korean-euc-kr-support📝 Commits (3)
da3dc52fix(708): Support Korean EUC-KR encoding in CEA-708 decoderd0caf23fix(timing): Use i64 instead of c_long for Windows compatibility73cd19ffix(rust): Use i64 instead of c_long for Windows compatibility📊 Changes
6 files changed (+125 additions, -41 deletions)
View changed files
📝
src/rust/src/avc/nal.rs(+2 -2)📝
src/rust/src/decoder/output.rs(+86 -15)📝
src/rust/src/decoder/tv_screen.rs(+19 -2)📝
src/rust/src/es/pic.rs(+1 -5)📝
src/rust/src/lib.rs(+3 -3)📝
src/rust/src/libccxr_exports/time.rs(+14 -14)📄 Description
Summary
Korean broadcasts use EUC-KR encoding (variable-width) in CEA-708 captions, where ASCII is 1 byte and Korean characters are 2 bytes. The decoder was always writing 2 bytes per character (UTF-16BE style), causing NULL bytes (0x00) to be inserted before every ASCII character (spaces, punctuation).
Changes
is_utf16_charset()function to detect fixed-width 16-bit encodings (UTF-16BE, UCS-2)write_char()to acceptuse_utf16flag:true: Always 2 bytes (UTF-16BE for Japanese/Chinese, maintains fix for #1451)false: 1 byte for ASCII, 2 bytes for extended chars (EUC-KR for Korean)write_row()before building output bufferBefore fix
After fix
Test plan
mbc.ts) - drama dialog extracted correctly0623_215529_CH9-1_KBS.mpg) - news broadcast extracted correctly--service "1[EUC-KR]"Closes #1065
🤖 Generated with Claude Code
🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.