[PR #2059] Fix/dvb subtitle ocr and spupng #2871

Closed
opened 2026-01-29 17:24:20 +00:00 by claunia · 0 comments
Owner

Original Pull Request: https://github.com/CCExtractor/ccextractor/pull/2059

State: closed
Merged: No


In raising this pull request, I confirm the following (please check boxes):

  • I have read and understood the contributors guide.
  • I have checked that another pull request for this purpose does not exist.
  • I have considered, and confirmed that this submission will be valuable to others.
  • I accept that this submission may not be used, and the pull request closed at the will of the maintainer.
  • I give this submission freely, and claim no ownership to its content.
  • I have mentioned this change in the changelog.

My familiarity with the project is as follows (check one):

  • I have never used CCExtractor.
  • I have used CCExtractor just a couple of times.
  • I absolutely love CCExtractor, but have not contributed previously.
  • I am an active contributor to CCExtractor.

DVB subtitle OCR extraction was failing in the options test on Linux due to three bugs:

The write_dvb_sub function used an undefined region variable when calling ocr_rect, causing crashes or incorrect OCR results.

The SPUPNG encoder wrote the closing tags immediately after the opening tags in write_spumux_header, so the output file had no subtitle content between the subpictures tags.

DVB subtitle regions were not marked as processed after OCR extraction, causing them to be processed multiple times and creating duplicate subtitle entries.

Solution
Fixed the undefined region variable by finding the first valid region from the display list and using that for the bgcolor parameter in ocr_rect.

Removed the code that prematurely wrote the footer in write_spumux_header. The footer now writes during normal cleanup in write_spumux_footer.

Added a loop at the end of write_dvb_sub to clear the dirty flag for all processed regions, preventing duplicate processing.

Added safety code for builds without OCR support to set ocr_text pointers to NULL, preventing use-after-free errors.

Testing
Tested with the failed test sample of linux platform https://sampleplatform.ccextractor.org/test/7992# SPUPNG output has proper XML structure with subpictures wrapper tags and all subtitle entries with OCR comments. PNG files are generated correctly.

As per what I know PR has to be raised to test the updated code. I will promptly close this PR if changes made by me prove to be invaluable :)

**Original Pull Request:** https://github.com/CCExtractor/ccextractor/pull/2059 **State:** closed **Merged:** No --- <!-- Please prefix your pull request with one of the following: **[FEATURE]** **[FIX]** **[IMPROVEMENT]**. --> **In raising this pull request, I confirm the following (please check boxes):** - [x] I have read and understood the [contributors guide](https://github.com/CCExtractor/ccextractor/blob/master/.github/CONTRIBUTING.md). - [x] I have checked that another pull request for this purpose does not exist. - [x] I have considered, and confirmed that this submission will be valuable to others. - [x] I accept that this submission may not be used, and the pull request closed at the will of the maintainer. - [x] I give this submission freely, and claim no ownership to its content. - [x] **I have mentioned this change in the [changelog](https://github.com/CCExtractor/ccextractor/blob/master/docs/CHANGES.TXT).** **My familiarity with the project is as follows (check one):** - [ ] I have never used CCExtractor. - [x] I have used CCExtractor just a couple of times. - [ ] I absolutely love CCExtractor, but have not contributed previously. - [ ] I am an active contributor to CCExtractor. --- DVB subtitle OCR extraction was failing in the options test on Linux due to three bugs: The write_dvb_sub function used an undefined region variable when calling ocr_rect, causing crashes or incorrect OCR results. The SPUPNG encoder wrote the closing tags immediately after the opening tags in write_spumux_header, so the output file had no subtitle content between the subpictures tags. DVB subtitle regions were not marked as processed after OCR extraction, causing them to be processed multiple times and creating duplicate subtitle entries. Solution Fixed the undefined region variable by finding the first valid region from the display list and using that for the bgcolor parameter in ocr_rect. Removed the code that prematurely wrote the footer in write_spumux_header. The footer now writes during normal cleanup in write_spumux_footer. Added a loop at the end of write_dvb_sub to clear the dirty flag for all processed regions, preventing duplicate processing. Added safety code for builds without OCR support to set ocr_text pointers to NULL, preventing use-after-free errors. Testing Tested with the failed test sample of linux platform https://sampleplatform.ccextractor.org/test/7992# SPUPNG output has proper XML structure with subpictures wrapper tags and all subtitle entries with OCR comments. PNG files are generated correctly. As per what I know PR has to be raised to test the updated code. I will promptly close this PR if changes made by me prove to be invaluable :)
claunia added the pull-request label 2026-01-29 17:24:20 +00:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/ccextractor#2871