[PR #1925] feat(ocr): Add character blacklist and line-split options for better accuracy #2729

Closed
opened 2026-01-29 17:23:37 +00:00 by claunia · 0 comments
Owner

Original Pull Request: https://github.com/CCExtractor/ccextractor/pull/1925

State: closed
Merged: Yes


Summary

  • Add OCR character blacklist (enabled by default) to prevent common misrecognition errors like I|
  • Add optional --ocr-line-split mode for multi-line subtitle images
  • Inspired by subtile-ocr's proven approach

New Options

Option Default Description
--no-ocr-blacklist Blacklist ON Disable the character blacklist (|, \, `, _, ~)
--ocr-line-split OFF Split images into lines, use PSM 7 for each

Test Results (VOBSUB MKV sample)

Metric Before After (blacklist)
Pipe | errors 14 0

The blacklist completely eliminates pipe character misrecognition, matching subtile-ocr's accuracy.

Test plan

  • Build with OCR support
  • Test VOBSUB extraction with blacklist (default)
  • Test with --no-ocr-blacklist to verify it can be disabled
  • Test --ocr-line-split option
  • Verify help text displays correctly

🤖 Generated with Claude Code

**Original Pull Request:** https://github.com/CCExtractor/ccextractor/pull/1925 **State:** closed **Merged:** Yes --- ## Summary - Add OCR character blacklist (enabled by default) to prevent common misrecognition errors like `I` → `|` - Add optional `--ocr-line-split` mode for multi-line subtitle images - Inspired by subtile-ocr's proven approach ## New Options | Option | Default | Description | |--------|---------|-------------| | `--no-ocr-blacklist` | Blacklist ON | Disable the character blacklist (`\|`, `\`, `` ` ``, `_`, `~`) | | `--ocr-line-split` | OFF | Split images into lines, use PSM 7 for each | ## Test Results (VOBSUB MKV sample) | Metric | Before | After (blacklist) | |--------|--------|-------------------| | Pipe `\|` errors | 14 | **0** | The blacklist completely eliminates pipe character misrecognition, matching subtile-ocr's accuracy. ## Test plan - [x] Build with OCR support - [x] Test VOBSUB extraction with blacklist (default) - [x] Test with `--no-ocr-blacklist` to verify it can be disabled - [x] Test `--ocr-line-split` option - [x] Verify help text displays correctly 🤖 Generated with [Claude Code](https://claude.com/claude-code)
claunia added the pull-request label 2026-01-29 17:23:37 +00:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/ccextractor#2729