Files
ccextractor/docker/README.md
Carlos Fernandez 3f45a4e136 fix(docker): Rewrite Dockerfile to fix broken builds
Fixes #1550 - Docker builds were broken after PR #1535 switched from
vendored GPAC to system GPAC.

Changes:
- Switch from Alpine to Debian Bookworm (Alpine's musl libc has issues
  with Rust bindgen's libclang dynamic loading)
- Support three build variants via BUILD_TYPE argument:
  - minimal: No OCR support
  - ocr (default): Tesseract OCR for bitmap subtitles
  - hardsubx: OCR + FFmpeg for burned-in subtitle extraction
- Support dual source modes via USE_LOCAL_SOURCE argument:
  - 0 (default): Clone from GitHub (standalone Dockerfile)
  - 1: Use local source (faster for developers)
- Add .dockerignore to exclude build artifacts (~2.7GB -> ~900KB context)
- Update README.md with comprehensive build instructions

Tested all three variants successfully:
- minimal: ~130MB image
- ocr: ~215MB image
- hardsubx: ~610MB image

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-20 17:27:42 +01:00

2.4 KiB

CCExtractor Docker Image

This Dockerfile builds CCExtractor with support for multiple build variants.

Build Variants

Variant Description Features
minimal Basic CCExtractor No OCR support
ocr With OCR support (default) Tesseract OCR for bitmap subtitles
hardsubx With burned-in subtitle extraction OCR + FFmpeg for hardcoded subtitles

Building

Standalone Build (from Dockerfile only)

You can build CCExtractor using just the Dockerfile - it will clone the source from GitHub:

# Default build (OCR enabled)
docker build -t ccextractor docker/

# Minimal build (no OCR)
docker build --build-arg BUILD_TYPE=minimal -t ccextractor docker/

# HardSubX build (OCR + FFmpeg for burned-in subtitles)
docker build --build-arg BUILD_TYPE=hardsubx -t ccextractor docker/

Build from Cloned Repository (faster)

If you have already cloned the repository, you can use local source for faster builds:

git clone https://github.com/CCExtractor/ccextractor.git
cd ccextractor

# Default build (OCR enabled)
docker build --build-arg USE_LOCAL_SOURCE=1 -f docker/Dockerfile -t ccextractor .

# Minimal build
docker build --build-arg USE_LOCAL_SOURCE=1 --build-arg BUILD_TYPE=minimal -f docker/Dockerfile -t ccextractor .

# HardSubX build
docker build --build-arg USE_LOCAL_SOURCE=1 --build-arg BUILD_TYPE=hardsubx -f docker/Dockerfile -t ccextractor .

Build Arguments

Argument Default Description
BUILD_TYPE ocr Build variant: minimal, ocr, or hardsubx
USE_LOCAL_SOURCE 0 Set to 1 to use local source instead of cloning
DEBIAN_VERSION bookworm-slim Debian version to use as base

Usage

Basic Usage

# Show version
docker run --rm ccextractor --version

# Show help
docker run --rm ccextractor --help

Processing Local Files

Mount your local directory to process files:

# Process a video file with output file
docker run --rm -v $(pwd):$(pwd) -w $(pwd) ccextractor input.mp4 -o output.srt

# Process using stdout
docker run --rm -v $(pwd):$(pwd) -w $(pwd) ccextractor input.mp4 --stdout > output.srt

Interactive Mode

docker run --rm -it --entrypoint=/bin/bash ccextractor

Image Size

The multi-stage build produces runtime images:

  • minimal: ~130MB
  • ocr: ~215MB (includes Tesseract)
  • hardsubx: ~610MB (includes Tesseract + FFmpeg)