[PR #1821] fix(teletext): Add --ttxtforcelatin option to force Latin G0 charset #2573

Open
opened 2026-01-29 17:22:50 +00:00 by claunia · 0 comments
Owner

Original Pull Request: https://github.com/CCExtractor/ccextractor/pull/1821

State: closed
Merged: Yes


Summary

  • Added new --ttxtforcelatin command-line option
  • Forces teletext G0 character set to Latin, ignoring Cyrillic designations in the stream
  • Fixes garbled output where Latin text appears as Cyrillic characters

Problem

Some broadcast streams (e.g., UK Freesat recordings) incorrectly signal Cyrillic character set via X/28 or M/29 teletext packets when the actual content is Latin English text.

Before (garbled Cyrillic):

Но. Нот бацк тхен, анiваi.

Expected (correct Latin):

No. Not back then, anyway.

Root Cause

The broadcast stream contains triplet value 0x1290 which has:

  • Bits 10-13 = 0x1 (Cyrillic character set family per ETS 300 706 Table 32)
  • Bits 7-9 = 0x5 (Ukrainian option)

This causes CCExtractor to select CYRILLIC3 (Ukrainian) charset instead of Latin.

Solution

Added --ttxtforcelatin option that bypasses the Cyrillic character set detection and always uses Latin G0 charset.

Changes

  • src/lib_ccx/lib_ccx.h: Added forceg0latin field to teletext config
  • src/lib_ccx/telxcc.c: Modified set_g0_charset() to respect forceg0latin option
  • src/rust/src/args.rs: Added --ttxtforcelatin CLI argument
  • src/rust/src/parser.rs: Added argument handling
  • src/rust/src/common.rs: Added field to struct conversion
  • src/rust/lib_ccxr/src/teletext.rs: Added forceg0latin to TeletextConfig

Test plan

  • Downloaded sample from issue #1395
  • Reproduced the Cyrillic output issue
  • Verified --ttxtforcelatin produces correct Latin output
  • Build succeeds for both C and Rust components

Usage

ccextractor input.ts --ttxtforcelatin -o output.srt

Fixes #1395

🤖 Generated with Claude Code

**Original Pull Request:** https://github.com/CCExtractor/ccextractor/pull/1821 **State:** closed **Merged:** Yes --- ## Summary - Added new `--ttxtforcelatin` command-line option - Forces teletext G0 character set to Latin, ignoring Cyrillic designations in the stream - Fixes garbled output where Latin text appears as Cyrillic characters ## Problem Some broadcast streams (e.g., UK Freesat recordings) incorrectly signal Cyrillic character set via X/28 or M/29 teletext packets when the actual content is Latin English text. **Before (garbled Cyrillic):** ``` Но. Нот бацк тхен, анiваi. ``` **Expected (correct Latin):** ``` No. Not back then, anyway. ``` ## Root Cause The broadcast stream contains triplet value `0x1290` which has: - Bits 10-13 = 0x1 (Cyrillic character set family per ETS 300 706 Table 32) - Bits 7-9 = 0x5 (Ukrainian option) This causes CCExtractor to select CYRILLIC3 (Ukrainian) charset instead of Latin. ## Solution Added `--ttxtforcelatin` option that bypasses the Cyrillic character set detection and always uses Latin G0 charset. ## Changes - `src/lib_ccx/lib_ccx.h`: Added `forceg0latin` field to teletext config - `src/lib_ccx/telxcc.c`: Modified `set_g0_charset()` to respect `forceg0latin` option - `src/rust/src/args.rs`: Added `--ttxtforcelatin` CLI argument - `src/rust/src/parser.rs`: Added argument handling - `src/rust/src/common.rs`: Added field to struct conversion - `src/rust/lib_ccxr/src/teletext.rs`: Added `forceg0latin` to TeletextConfig ## Test plan - [x] Downloaded sample from issue #1395 - [x] Reproduced the Cyrillic output issue - [x] Verified `--ttxtforcelatin` produces correct Latin output - [x] Build succeeds for both C and Rust components ## Usage ```bash ccextractor input.ts --ttxtforcelatin -o output.srt ``` Fixes #1395 🤖 Generated with [Claude Code](https://claude.com/claude-code)
claunia added the pull-request label 2026-01-29 17:22:50 +00:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/ccextractor#2573