mirror of
https://github.com/CCExtractor/ccextractor.git
synced 2026-02-03 21:23:48 +00:00
[PR #1821] fix(teletext): Add --ttxtforcelatin option to force Latin G0 charset #2573
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Original Pull Request: https://github.com/CCExtractor/ccextractor/pull/1821
State: closed
Merged: Yes
Summary
--ttxtforcelatincommand-line optionProblem
Some broadcast streams (e.g., UK Freesat recordings) incorrectly signal Cyrillic character set via X/28 or M/29 teletext packets when the actual content is Latin English text.
Before (garbled Cyrillic):
Expected (correct Latin):
Root Cause
The broadcast stream contains triplet value
0x1290which has:This causes CCExtractor to select CYRILLIC3 (Ukrainian) charset instead of Latin.
Solution
Added
--ttxtforcelatinoption that bypasses the Cyrillic character set detection and always uses Latin G0 charset.Changes
src/lib_ccx/lib_ccx.h: Addedforceg0latinfield to teletext configsrc/lib_ccx/telxcc.c: Modifiedset_g0_charset()to respectforceg0latinoptionsrc/rust/src/args.rs: Added--ttxtforcelatinCLI argumentsrc/rust/src/parser.rs: Added argument handlingsrc/rust/src/common.rs: Added field to struct conversionsrc/rust/lib_ccxr/src/teletext.rs: Addedforceg0latinto TeletextConfigTest plan
--ttxtforcelatinproduces correct Latin outputUsage
Fixes #1395
🤖 Generated with Claude Code