[PR #1821] fix(teletext): Add --ttxtforcelatin option to force Latin G0 charset #2573

New Issue

claunia · 2026-01-29T17:22:50Z

claunia commented

2026-01-29 17:22:50 +00:00

Original Pull Request: https://github.com/CCExtractor/ccextractor/pull/1821

State: closed
Merged: Yes

Summary

Added new --ttxtforcelatin command-line option
Forces teletext G0 character set to Latin, ignoring Cyrillic designations in the stream
Fixes garbled output where Latin text appears as Cyrillic characters

Problem

Some broadcast streams (e.g., UK Freesat recordings) incorrectly signal Cyrillic character set via X/28 or M/29 teletext packets when the actual content is Latin English text.

Before (garbled Cyrillic):

Но. Нот бацк тхен, анiваi.

Expected (correct Latin):

No. Not back then, anyway.

Root Cause

The broadcast stream contains triplet value 0x1290 which has:

Bits 10-13 = 0x1 (Cyrillic character set family per ETS 300 706 Table 32)
Bits 7-9 = 0x5 (Ukrainian option)

This causes CCExtractor to select CYRILLIC3 (Ukrainian) charset instead of Latin.

Solution

Added --ttxtforcelatin option that bypasses the Cyrillic character set detection and always uses Latin G0 charset.

Changes

src/lib_ccx/lib_ccx.h: Added forceg0latin field to teletext config
src/lib_ccx/telxcc.c: Modified set_g0_charset() to respect forceg0latin option
src/rust/src/args.rs: Added --ttxtforcelatin CLI argument
src/rust/src/parser.rs: Added argument handling
src/rust/src/common.rs: Added field to struct conversion
src/rust/lib_ccxr/src/teletext.rs: Added forceg0latin to TeletextConfig

Test plan

Downloaded sample from issue #1395
Reproduced the Cyrillic output issue
Verified --ttxtforcelatin produces correct Latin output
Build succeeds for both C and Rust components

Usage

ccextractor input.ts --ttxtforcelatin -o output.srt

Fixes #1395

🤖 Generated with Claude Code

**Original Pull Request:** https://github.com/CCExtractor/ccextractor/pull/1821 **State:** closed **Merged:** Yes --- ## Summary - Added new `--ttxtforcelatin` command-line option - Forces teletext G0 character set to Latin, ignoring Cyrillic designations in the stream - Fixes garbled output where Latin text appears as Cyrillic characters ## Problem Some broadcast streams (e.g., UK Freesat recordings) incorrectly signal Cyrillic character set via X/28 or M/29 teletext packets when the actual content is Latin English text. **Before (garbled Cyrillic):** ``` Но. Нот бацк тхен, анiваi. ``` **Expected (correct Latin):** ``` No. Not back then, anyway. ``` ## Root Cause The broadcast stream contains triplet value `0x1290` which has: - Bits 10-13 = 0x1 (Cyrillic character set family per ETS 300 706 Table 32) - Bits 7-9 = 0x5 (Ukrainian option) This causes CCExtractor to select CYRILLIC3 (Ukrainian) charset instead of Latin. ## Solution Added `--ttxtforcelatin` option that bypasses the Cyrillic character set detection and always uses Latin G0 charset. ## Changes - `src/lib_ccx/lib_ccx.h`: Added `forceg0latin` field to teletext config - `src/lib_ccx/telxcc.c`: Modified `set_g0_charset()` to respect `forceg0latin` option - `src/rust/src/args.rs`: Added `--ttxtforcelatin` CLI argument - `src/rust/src/parser.rs`: Added argument handling - `src/rust/src/common.rs`: Added field to struct conversion - `src/rust/lib_ccxr/src/teletext.rs`: Added `forceg0latin` to TeletextConfig ## Test plan - [x] Downloaded sample from issue #1395 - [x] Reproduced the Cyrillic output issue - [x] Verified `--ttxtforcelatin` produces correct Latin output - [x] Build succeeds for both C and Rust components ## Usage ```bash ccextractor input.ts --ttxtforcelatin -o output.srt ``` Fixes #1395 🤖 Generated with [Claude Code](https://claude.com/claude-code)

claunia added the pull-request label 2026-01-29 17:22:50 +00:00

Sign in to join this conversation.

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: starred/ccextractor#2573