[PR #1544] [CLOSED] [FEAT] Add flag for Page Segmentation Modes control #2258

Open
opened 2026-01-29 17:21:12 +00:00 by claunia · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/CCExtractor/ccextractor/pull/1544
Author: @Neo2SHYAlien
Created: 6/29/2023
Status: Closed

Base: masterHead: master


📄 Description

In raising this pull request, I confirm the following (please check boxes):

  • I have read and understood the contributors guide.
  • I have checked that another pull request for this purpose does not exist.
  • I have considered, and confirmed that this submission will be valuable to others.
  • I accept that this submission may not be used, and the pull request closed at the will of the maintainer.
  • I give this submission freely, and claim no ownership to its content.
  • I have mentioned this change in the changelog.

My familiarity with the project is as follows (check one):

  • I have never used CCExtractor.
  • I have used CCExtractor just a couple of times.
  • I absolutely love CCExtractor, but have not contributed previously.
  • I am an active contributor to CCExtractor.

I added an flag -psm for controlling PSM (Page Segmentation Modes) in Tesseract. The default option (3) gives me quite bad results. When I use 6, 11, or 12 for Bulgarian, it gives me much better OCR results. I haven't tested other languages yet, but I expect improvements as well if other mode is used.


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/CCExtractor/ccextractor/pull/1544 **Author:** [@Neo2SHYAlien](https://github.com/Neo2SHYAlien) **Created:** 6/29/2023 **Status:** ❌ Closed **Base:** `master` ← **Head:** `master` --- ### 📄 Description <!-- Please prefix your pull request with one of the following: **[FEATURE]** **[FIX]** **[IMPROVEMENT]**. --> **In raising this pull request, I confirm the following (please check boxes):** - [X] I have read and understood the [contributors guide](https://github.com/CCExtractor/ccextractor/blob/master/.github/CONTRIBUTING.md). - [X] I have checked that another pull request for this purpose does not exist. - [X] I have considered, and confirmed that this submission will be valuable to others. - [X] I accept that this submission may not be used, and the pull request closed at the will of the maintainer. - [X] I give this submission freely, and claim no ownership to its content. - [X] **I have mentioned this change in the [changelog](https://github.com/CCExtractor/ccextractor/blob/master/docs/CHANGES.TXT).** **My familiarity with the project is as follows (check one):** - [ ] I have never used CCExtractor. - [ ] I have used CCExtractor just a couple of times. - [X] I absolutely love CCExtractor, but have not contributed previously. - [ ] I am an active contributor to CCExtractor. --- I added an flag `-psm` for controlling PSM ([Page Segmentation Modes](https://tesseract-ocr.github.io/tessdoc/ImproveQuality.html#page-segmentation-method)) in Tesseract. The default option (3) gives me quite bad results. When I use 6, 11, or 12 for Bulgarian, it gives me much better OCR results. I haven't tested other languages yet, but I expect improvements as well if other mode is used. --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
claunia added the pull-request label 2026-01-29 17:21:12 +00:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/ccextractor#2258