[PR #1919] [MERGED] feat(matroska): Add VOBSUB subtitle extraction support for MKV files #2711

Open
opened 2026-01-29 17:23:33 +00:00 by claunia · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/CCExtractor/ccextractor/pull/1919
Author: @cfsmp3
Created: 12/28/2025
Status: Merged
Merged: 12/28/2025
Merged by: @cfsmp3

Base: masterHead: fix/issue-1371-mkv-vobsub-support


📝 Commits (3)

  • 1fccb78 feat(matroska): Add VOBSUB subtitle extraction support for MKV files
  • 6f2a73d docs: Add VOBSUB extraction documentation and subtile-ocr Dockerfile
  • 9d14766 fix: Use #define instead of const int for VOBSUB_BLOCK_SIZE

📊 Changes

4 files changed (+401 additions, -9 deletions)

View changed files

docs/VOBSUB.md (+129 -0)
📝 src/lib_ccx/matroska.c (+232 -9)
📝 src/lib_ccx/matroska.h (+5 -0)
tools/vobsubocr/Dockerfile (+35 -0)

📄 Description

Summary

  • Adds full VOBSUB (S_VOBSUB) subtitle extraction support for Matroska files
  • Previously CCExtractor would just print "Error: VOBSUB not supported" and produce empty files
  • Now generates proper .idx and .sub file pairs that work with VLC, FFmpeg, and other players
  • Includes documentation and Docker tooling for OCR conversion to SRT

Changes

Core VOBSUB extraction (src/lib_ccx/matroska.c):

  • Added PS Pack header generation with SCR derived from timestamps
  • Added PES header generation with PTS for subtitle timing
  • Added save_vobsub_track() function to write both .idx and .sub files
  • Added hex format specifier (LLX_M) for cross-platform file position formatting
  • Standard 2048-byte block alignment for proper VOBSUB format

Documentation and tooling:

  • Added docs/VOBSUB.md - comprehensive guide for VOBSUB extraction and OCR conversion
  • Added tools/vobsubocr/Dockerfile - Docker image for subtile-ocr (VOBSUB to SRT conversion)

Usage

# Extract VOBSUB from MKV
ccextractor movie.mkv
# Output: movie_eng.idx + movie_eng.sub

# Convert to SRT (using Docker)
docker build -t subtile-ocr tools/vobsubocr/
docker run --rm -v $(pwd):/data subtile-ocr -l eng -o /data/movie.srt /data/movie_eng.idx

Test plan

  • Tested with sample #178 from issue (bugs.mkv with VOBSUB tracks)
  • Output validates with FFprobe as dvd_subtitle codec
  • Timestamps match reference extraction (mkvextract)
  • File positions match reference extraction
  • SPU data is identical to reference extraction
  • OCR with subtile-ocr produces identical SRT to mkvextract reference

Fixes #1371

🤖 Generated with Claude Code


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/CCExtractor/ccextractor/pull/1919 **Author:** [@cfsmp3](https://github.com/cfsmp3) **Created:** 12/28/2025 **Status:** ✅ Merged **Merged:** 12/28/2025 **Merged by:** [@cfsmp3](https://github.com/cfsmp3) **Base:** `master` ← **Head:** `fix/issue-1371-mkv-vobsub-support` --- ### 📝 Commits (3) - [`1fccb78`](https://github.com/CCExtractor/ccextractor/commit/1fccb783f279923bdffe85f166fdb853d8c45673) feat(matroska): Add VOBSUB subtitle extraction support for MKV files - [`6f2a73d`](https://github.com/CCExtractor/ccextractor/commit/6f2a73d706bf09a2f97289bdbbf14a25fb2f61af) docs: Add VOBSUB extraction documentation and subtile-ocr Dockerfile - [`9d14766`](https://github.com/CCExtractor/ccextractor/commit/9d14766b0ddd0650a853dd5d053ee2cbb592eb7a) fix: Use #define instead of const int for VOBSUB_BLOCK_SIZE ### 📊 Changes **4 files changed** (+401 additions, -9 deletions) <details> <summary>View changed files</summary> ➕ `docs/VOBSUB.md` (+129 -0) 📝 `src/lib_ccx/matroska.c` (+232 -9) 📝 `src/lib_ccx/matroska.h` (+5 -0) ➕ `tools/vobsubocr/Dockerfile` (+35 -0) </details> ### 📄 Description ## Summary - Adds full VOBSUB (S_VOBSUB) subtitle extraction support for Matroska files - Previously CCExtractor would just print "Error: VOBSUB not supported" and produce empty files - Now generates proper .idx and .sub file pairs that work with VLC, FFmpeg, and other players - Includes documentation and Docker tooling for OCR conversion to SRT ## Changes **Core VOBSUB extraction (`src/lib_ccx/matroska.c`):** - Added PS Pack header generation with SCR derived from timestamps - Added PES header generation with PTS for subtitle timing - Added `save_vobsub_track()` function to write both .idx and .sub files - Added hex format specifier (`LLX_M`) for cross-platform file position formatting - Standard 2048-byte block alignment for proper VOBSUB format **Documentation and tooling:** - Added `docs/VOBSUB.md` - comprehensive guide for VOBSUB extraction and OCR conversion - Added `tools/vobsubocr/Dockerfile` - Docker image for subtile-ocr (VOBSUB to SRT conversion) ## Usage ```bash # Extract VOBSUB from MKV ccextractor movie.mkv # Output: movie_eng.idx + movie_eng.sub # Convert to SRT (using Docker) docker build -t subtile-ocr tools/vobsubocr/ docker run --rm -v $(pwd):/data subtile-ocr -l eng -o /data/movie.srt /data/movie_eng.idx ``` ## Test plan - [x] Tested with sample #178 from issue (bugs.mkv with VOBSUB tracks) - [x] Output validates with FFprobe as `dvd_subtitle` codec - [x] Timestamps match reference extraction (mkvextract) - [x] File positions match reference extraction - [x] SPU data is identical to reference extraction - [x] OCR with subtile-ocr produces identical SRT to mkvextract reference Fixes #1371 🤖 Generated with [Claude Code](https://claude.com/claude-code) --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
claunia added the pull-request label 2026-01-29 17:23:33 +00:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/ccextractor#2711