- Added no_dvb_dedup field to ccx_s_options structure - Initialized to 0 (deduplication enabled by default) - Added --no-dvb-dedup CLI flag in Rust args parser - Added flag to Options struct in lib_ccxr - Wired flag through Rust-to-C FFI boundary in common.rs - Modified dvbsub_handle_display_segment to respect flag - Dedup logic only runs when no_dvb_dedup is false (default) - Added help text describing flag purpose
4.5 KiB
Technical Report: DVB Subtitle Deduplication & Stability Fixes
Author: The Scribe (Gemini CLI)
Date: January 13, 2026
Target: Senior Engineering / Maintainers
Scope: src/lib_ccx/dvb_subtitle_decoder.c
1. Executive Summary
We have resolved a critical noise issue in DVB subtitle extraction where "carousel" streams (common in European DVB broadcasts) caused massive repetition of identical subtitle entries. For example, a 60-second clip of multiprogram_spain.ts generated ~2,600 duplicate entries.
The Solution: We implemented a lightweight, hash-based deduplication layer within the DVB decoder pipeline. By computing a checksum of the rendered bitmap and color palette, we identify and discard frames that are visually identical to the previous frame before they enter the encoder pipeline.
Outcome:
- Repetition: Reduced from ~2,600 entries to 1 unique entry for the test case.
- Stability: Fixed potential double-free and segmentation faults associated with the skipped frames.
- Performance: Negligible overhead (FNV-1a hash on small bitmaps).
2. Architecture Changes
We injected a validation step into the write_dvb_sub function. The decoder now maintains state about the last rendered frame.
graph TD
A[DVB Packet] --> B[Parse Regions/Objects];
B --> C{Render Bitmap?};
C -- No --> D[Return];
C -- Yes --> E[Compute FNV-1a Hash];
E --> F{Hash == Previous?};
F -- Yes (Duplicate) --> G[Free Memory & Abort];
F -- No (New Content) --> H[Update Prev Hash];
H --> I[Assign to Subtitle Struct];
I --> J[OCR & Output];
3. Implementation Deep Dive
3.1 Context Expansion
We extended struct DVBSubContext to persist the hash of the previously processed frame.
typedef struct DVBSubContext {
// ... existing fields ...
uint32_t prev_bitmap_hash; // New State Tracking
// ...
} DVBSubContext;
This ensures that even across different calls to the decoder (as packets arrive), we remember the visual state of the subtitle stream.
3.2 Hashing Algorithm
We selected FNV-1a (32-bit) for its simplicity and speed on small data blocks (subtitle bitmaps are typically small, sparse arrays).
static uint32_t fnv1a_32(const uint8_t *data, size_t len) {
uint32_t hash = 2166136261u;
for (size_t i = 0; i < len; i++) {
hash ^= data[i];
hash *= 16777619u;
}
return hash;
}
3.3 The Logic in write_dvb_sub
This function is responsible for assembling regions into a final cc_bitmap. We inserted the check immediately after rendering but before committing the bitmap to the cc_subtitle object.
Key Logic:
- Render
rect->data0(pixels) andrect->data1(palette). - Calculate
hash(pixels) ^ hash(palette). - If Match:
- CRITICAL:
free(rect->data0),free(rect->data1),free(rect). - CRITICAL: Set
sub->nb_data = 0. This signals to the downstream encoder (encode_sub) that no data exists, preventing null pointer dereferences. - Return
0.
- CRITICAL:
- If Mismatch:
- Update
ctx->prev_bitmap_hash. - Assign
sub->data = rect. - Set
sub->nb_data = 1.
- Update
3.4 Safety & Crash Prevention
During implementation, we encountered two classes of bugs which we fixed:
-
Double Free: Initially,
sub->datawas assigned before the hash check. When we freedrect, thesubstruct still held a pointer to it. Later,free_subtitle(sub)tried to free it again.- Fix: We moved the assignment
sub->data = rectto after the deduplication check.
- Fix: We moved the assignment
-
Segmentation Fault: Returning
0fromwrite_dvb_subwithout resettingsub->nb_dataleft the subtitle object in an inconsistent state (positivenb_databutdata == NULL). Theencode_subfunction would then try to iterate over NULL data.- Fix: Explicitly set
sub->nb_data = 0in the skip block.
- Fix: Explicitly set
4. Verification
4.1 Carousel Stream (multiprogram_spain.ts)
- Before: ~2,659 entries (infinite repetitions of the same static screen).
- After: 1 entry.
- Log:
Done, processing time = 3 seconds Issues? Open a ticket here...
4.2 Standard Stream (mp_spain...C49.ts)
- Check: Regression testing.
- Result: Subtitles generated normally. No missing entries. File size matches expected baseline logic.
5. Conclusion
The DVB subsystem is now robust against carousel behavior. This fix is purely logical and does not rely on OCR stability, making it effective even for image-only extraction modes (like .spupng).