[PR #1920] [MERGED] feat(mp4): Add VOBSUB subtitle extraction with OCR for MP4 files #2716

Closed
opened 2026-01-29 17:23:34 +00:00 by claunia · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/CCExtractor/ccextractor/pull/1920
Author: @cfsmp3
Created: 12/28/2025
Status: Merged
Merged: 12/29/2025
Merged by: @cfsmp3

Base: masterHead: fix/issue-1371-mkv-vobsub-support


📝 Commits (6)

  • 2930c61 feat(mp4): Add VOBSUB subtitle extraction with OCR for MP4 files
  • 6fe612d fix: Guard ocr_text access with ENABLE_OCR preprocessor check
  • 635a305 build: Add vobsub_decoder to autoconf build system
  • ba2833b style: Fix clang-format indentation in vobsub_decoder.c
  • 463a4a8 build(windows): Add vobsub_decoder to Windows build
  • 8f64eeb ci: Trigger CI tests

📊 Changes

8 files changed (+884 additions, -7 deletions)

View changed files

📝 linux/Makefile.am (+2 -0)
📝 mac/Makefile.am (+2 -0)
📝 src/lib_ccx/matroska.c (+129 -4)
📝 src/lib_ccx/mp4.c (+173 -3)
src/lib_ccx/vobsub_decoder.c (+517 -0)
src/lib_ccx/vobsub_decoder.h (+53 -0)
📝 windows/ccextractor.vcxproj (+2 -0)
📝 windows/ccextractor.vcxproj.filters (+6 -0)

📄 Description

Summary

  • Adds support for extracting VOBSUB (bitmap) subtitles from MP4 files with OCR conversion to text formats
  • Creates shared vobsub_decoder module for SPU parsing and OCR integration
  • Detects subp:MPEG tracks in MP4 container and processes them through OCR pipeline

Changes

New Files

  • src/lib_ccx/vobsub_decoder.c - VOBSUB decoder with SPU parsing and OCR
  • src/lib_ccx/vobsub_decoder.h - Public API header

Modified Files

  • src/lib_ccx/mp4.c - Add VOBSUB track detection and processing
  • src/lib_ccx/matroska.c - Integrate shared VOBSUB decoder for MKV OCR support

Features

The VOBSUB decoder module provides:

  • SPU control sequence parsing (timing, colors, coordinates)
  • RLE-encoded bitmap decoding (interlaced format)
  • Palette parsing from idx header format
  • Integration with Tesseract OCR via ocr_rect()

Test Results

Tested with sample from issue #1349:

Track 3, type=subp subtype=MPEG
MP4: found 4 tracks: 1 avc, 0 hevc, 1 cc, 1 vobsub
Processing VOBSUB track (128 samples)
VOBSUB processing complete

Successfully extracted 61 subtitles with accurate OCR output:

1
00:22:31,000 --> 00:22:32,874
I have a message from the Shield of Light

2
00:22:33,333 --> 00:22:35,040
Remember the old ways

Test plan

  • Compiles with OCR support (-DWITH_OCR=ON)
  • Compiles without OCR support
  • Extracts VOBSUB from MP4 with accurate OCR text
  • Proper error message when OCR not available

Fixes #1349

🤖 Generated with Claude Code


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/CCExtractor/ccextractor/pull/1920 **Author:** [@cfsmp3](https://github.com/cfsmp3) **Created:** 12/28/2025 **Status:** ✅ Merged **Merged:** 12/29/2025 **Merged by:** [@cfsmp3](https://github.com/cfsmp3) **Base:** `master` ← **Head:** `fix/issue-1371-mkv-vobsub-support` --- ### 📝 Commits (6) - [`2930c61`](https://github.com/CCExtractor/ccextractor/commit/2930c61420d9d4c6970c069c4bcb1367f18d1f9e) feat(mp4): Add VOBSUB subtitle extraction with OCR for MP4 files - [`6fe612d`](https://github.com/CCExtractor/ccextractor/commit/6fe612db3e10489f6c93f373e9d722414bd83fa3) fix: Guard ocr_text access with ENABLE_OCR preprocessor check - [`635a305`](https://github.com/CCExtractor/ccextractor/commit/635a305c37eebfa57dce56a841e23e00a3fb74e6) build: Add vobsub_decoder to autoconf build system - [`ba2833b`](https://github.com/CCExtractor/ccextractor/commit/ba2833b819882838f5248f57bc7b8ec82641985d) style: Fix clang-format indentation in vobsub_decoder.c - [`463a4a8`](https://github.com/CCExtractor/ccextractor/commit/463a4a85a1cf814a9d0e43837a8de15d29bae0bf) build(windows): Add vobsub_decoder to Windows build - [`8f64eeb`](https://github.com/CCExtractor/ccextractor/commit/8f64eeb54f755a44b5bb0661a64ac2c1b7def4d7) ci: Trigger CI tests ### 📊 Changes **8 files changed** (+884 additions, -7 deletions) <details> <summary>View changed files</summary> 📝 `linux/Makefile.am` (+2 -0) 📝 `mac/Makefile.am` (+2 -0) 📝 `src/lib_ccx/matroska.c` (+129 -4) 📝 `src/lib_ccx/mp4.c` (+173 -3) ➕ `src/lib_ccx/vobsub_decoder.c` (+517 -0) ➕ `src/lib_ccx/vobsub_decoder.h` (+53 -0) 📝 `windows/ccextractor.vcxproj` (+2 -0) 📝 `windows/ccextractor.vcxproj.filters` (+6 -0) </details> ### 📄 Description ## Summary - Adds support for extracting VOBSUB (bitmap) subtitles from MP4 files with OCR conversion to text formats - Creates shared `vobsub_decoder` module for SPU parsing and OCR integration - Detects `subp:MPEG` tracks in MP4 container and processes them through OCR pipeline ## Changes ### New Files - `src/lib_ccx/vobsub_decoder.c` - VOBSUB decoder with SPU parsing and OCR - `src/lib_ccx/vobsub_decoder.h` - Public API header ### Modified Files - `src/lib_ccx/mp4.c` - Add VOBSUB track detection and processing - `src/lib_ccx/matroska.c` - Integrate shared VOBSUB decoder for MKV OCR support ## Features The VOBSUB decoder module provides: - SPU control sequence parsing (timing, colors, coordinates) - RLE-encoded bitmap decoding (interlaced format) - Palette parsing from idx header format - Integration with Tesseract OCR via `ocr_rect()` ## Test Results Tested with sample from issue #1349: ``` Track 3, type=subp subtype=MPEG MP4: found 4 tracks: 1 avc, 0 hevc, 1 cc, 1 vobsub Processing VOBSUB track (128 samples) VOBSUB processing complete ``` Successfully extracted 61 subtitles with accurate OCR output: ``` 1 00:22:31,000 --> 00:22:32,874 I have a message from the Shield of Light 2 00:22:33,333 --> 00:22:35,040 Remember the old ways ``` ## Test plan - [x] Compiles with OCR support (`-DWITH_OCR=ON`) - [x] Compiles without OCR support - [x] Extracts VOBSUB from MP4 with accurate OCR text - [x] Proper error message when OCR not available Fixes #1349 🤖 Generated with [Claude Code](https://claude.com/claude-code) --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
claunia added the pull-request label 2026-01-29 17:23:34 +00:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/ccextractor#2716