[PR #759] [MERGED] [IMPROVEMENT] Adding grayscale conversion for better OCR #1572

Closed
opened 2026-01-29 17:17:16 +00:00 by claunia · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/CCExtractor/ccextractor/pull/759
Author: @Abhinav95
Created: 7/21/2017
Status: Merged
Merged: 7/21/2017
Merged by: @cfsmp3

Base: masterHead: master


📝 Commits (1)

  • b1cc95d Adding grayscale conversion for better OCR

📊 Changes

1 file changed (+8 additions, -3 deletions)

View changed files

📝 src/lib_ccx/ocr.c (+8 -3)

📄 Description

Please prefix your pull request with one of the following: [FEATURE] [FIX] [IMPROVEMENT].

In raising this pull request, I confirm the following (please check boxes):

  • I have read and understood the contributors guide.
  • I have checked that another pull request for this purpose does not exist.
  • I have considered, and confirmed that this submission will be valuable to others.
  • I accept that this submission may not be used, and the pull request closed at the will of the maintainer.
  • I give this submission freely, and claim no ownership to its content.

My familiarity with the project is as follows (check one):

  • I have never used CCExtractor.
  • I have used CCExtractor just a couple of times.
  • I absolutely love CCExtractor, but have not contributed previously.
  • I am an active contributor to CCExtractor.

Sometimes when DVB subtitle bitmaps involved transparent backgrounds, Tesseract OCR would fail to accurately recognize the text, which led to nonsensical outputs. This can be fixed by first converting the Leptonica pix used by Tesseract to a grayscale which solves the problems caused by the transparent elements.

This is a well documented problem on the Tesseract repository.


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/CCExtractor/ccextractor/pull/759 **Author:** [@Abhinav95](https://github.com/Abhinav95) **Created:** 7/21/2017 **Status:** ✅ Merged **Merged:** 7/21/2017 **Merged by:** [@cfsmp3](https://github.com/cfsmp3) **Base:** `master` ← **Head:** `master` --- ### 📝 Commits (1) - [`b1cc95d`](https://github.com/CCExtractor/ccextractor/commit/b1cc95d9726839f5308b32f76bfe01edcde0c19b) Adding grayscale conversion for better OCR ### 📊 Changes **1 file changed** (+8 additions, -3 deletions) <details> <summary>View changed files</summary> 📝 `src/lib_ccx/ocr.c` (+8 -3) </details> ### 📄 Description Please prefix your pull request with one of the following: **[FEATURE]** **[FIX]** **[IMPROVEMENT]**. **In raising this pull request, I confirm the following (please check boxes):** - [x] I have read and understood the [contributors guide](https://github.com/CCExtractor/ccextractor/blob/master/.github/CONTRIBUTING.md). - [x] I have checked that another pull request for this purpose does not exist. - [x] I have considered, and confirmed that this submission will be valuable to others. - [x] I accept that this submission may not be used, and the pull request closed at the will of the maintainer. - [x] I give this submission freely, and claim no ownership to its content. **My familiarity with the project is as follows (check one):** - [ ] I have never used CCExtractor. - [ ] I have used CCExtractor just a couple of times. - [ ] I absolutely love CCExtractor, but have not contributed previously. - [x] I am an active contributor to CCExtractor. --- Sometimes when DVB subtitle bitmaps involved transparent backgrounds, Tesseract OCR would fail to accurately recognize the text, which led to nonsensical outputs. This can be fixed by first converting the Leptonica pix used by Tesseract to a grayscale which solves the problems caused by the transparent elements. This is a well documented problem on the Tesseract repository. --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
claunia added the pull-request label 2026-01-29 17:17:16 +00:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/ccextractor#1572