[PR #1586] [MERGED] ocr_bitmap can run out of buffer memory copying the "last font tag" #2297

Closed
opened 2026-01-29 17:21:23 +00:00 by claunia · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/CCExtractor/ccextractor/pull/1586
Author: @jstrot
Created: 1/7/2024
Status: Merged
Merged: 3/11/2025
Merged by: @prateekmedia

Base: masterHead: qualip-ocr_bitmap-last_font_tag


📝 Commits (10+)

📊 Changes

3 files changed (+3 additions, -1 deletions)

View changed files

📝 docs/CHANGES.TXT (+1 -0)
📝 src/lib_ccx/dvd_subtitle_decoder.c (+1 -1)
📝 src/lib_ccx/ocr.c (+1 -0)

📄 Description

In raising this pull request, I confirm the following (please check boxes):

  • I have read and understood the contributors guide.
  • I have checked that another pull request for this purpose does not exist.
  • I have considered, and confirmed that this submission will be valuable to others.
  • I accept that this submission may not be used, and the pull request closed at the will of the maintainer.
  • I give this submission freely, and claim no ownership to its content.
  • I have mentioned this change in the changelog.

My familiarity with the project is as follows (check one):

  • I have never used CCExtractor.
  • I have used CCExtractor just a couple of times.
  • I absolutely love CCExtractor, but have not contributed previously.
  • I am an active contributor to CCExtractor.

Version: 0.94

During OCR of a VOB PS, ccextractor can run out of buffer space if it has to copy all text since the last font tag (which can also be the beginning of the input):

$ ./ccextractor -1 -cc2 -out=srt -utf8 test.vob -o test.srt
...
Error: In ocr_bitmap: Running out of memory. It shouldn't happen. Please report.

I believe the bug existed since that piece of code was introduced way back in 2017 (#844)

The fix simply makes sure the allocated buffer is big enough for this extra string.

Example crash under gdb:

$ gdb --args ./ccextractor -1 -cc2 -out=srt -utf8 test.vob -o test.srt
(gdb) run                     
Starting program: /home/jst/tools/src/ccextractor/linux/ccextractor -1 -cc2 -out=srt -utf8 test.vob -o test.srt  
[Thread debugging using libthread_db enabled]  
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".  
CCExtractor 0.94, Carlos Fernandez Sanz, Volker Quetschke.  
Teletext portions taken from Petr Kutalek's telxcc                                                               
--------------------------------------------------------------------------  
Input: test.vob                                                             
[Extract: 1] [Stream mode: Autodetect]                      
[Program : Auto ] [Hauppage mode: No] [Use MythTV code: Auto]  
[Timing mode: Auto] [Debug: No] [Buffer input: No]                          
[Use pic_order_cnt_lsb for H.264: No] [Print CC decoder traces: No]  
[Target format: .srt] [Encoding: UTF-8] [Delay: 0] [Trim lines: No]  
[Add font color data: Yes] [Add font typesetting: Yes]         
[Convert case: No][Filter profanity: No] [Video-edit join: No]  
[Extraction start time: not set (from start)]                        
[Extraction end time: not set (to end)]                              
[Live stream: No] [Clock frequency: 90000]              
[Teletext page: Autodetect]                                     
[Start credits text: None]                       
[Quantisation-mode: CCExtractor's internal function]  
                                                 
-----------------------------------------------------------------  
Opening file: test.vob                           
File seems to be a program stream, enabling PS mode   
Analyzing data in general mode                   
                                                                   
                                                 
New video information found                          
[720 * 480] [AR: 02 - 4:3] [FR: 04 - 29.97] [progressive: no]  
   
  0%  |  00:00                                   
...                          
Skip forward to the next Sequence or GOP start.  
 95%  |  19:38  
Skip forward to the next Sequence or GOP start.  
  
Skip forward to the next Sequence or GOP start.  
  
Thread 1 "ccextractor" hit Breakpoint 1, fatal (exit_code=1000, fmt=0x555555ee8da0 "In ocr_bitmap: Running out of memory. It shouldn't happen. Please report.\n") at ../src/lib_ccx/utility.c:272  
272             va_start(args, fmt);  
(gdb) up  
#1  0x00005555557976ed in ocr_bitmap (arg=0x602000008250, palette=0x602000b1c390, alpha=0x602000b1c3b0 "", indata=0x62a000726200 "", w=556, h=42, copy=0x60400003c210) at ../src/lib_ccx/ocr.c:638  
638                                                             fatal(CCX_COMMON_EXIT_BUG_BUG, "In ocr_bitmap: Running out of memory. It shouldn't happen. Please report.\n", errno);  
(gdb) list  
633                                             {  
634                                                     if ((new_text_out_iter - new_text_out) +  
635                                                             (last_font_tag_end - last_font_tag) >  
636                                                         length)  
637                                                     {  
638                                                             fatal(CCX_COMMON_EXIT_BUG_BUG, "In ocr_bitmap: Running out of memory. It shouldn't happen. Please report.\n", errno);  
639                                                     }  
640                                                     memcpy(new_text_out_iter, last_font_tag, last_font_tag_end - last_font_tag);  
641                                                     new_text_out_iter += last_font_tag_end - last_font_tag;  
642                                             }  
(gdb) p new_text_out_iter - new_text_out  
$1 = 96  
(gdb) p last_font_tag_end - last_font_tag  
$2 = 76  
(gdb) p length  
$3 = 158  
(gdb) p new_text_out_iter - new_text_out + last_font_tag_end - last_font_tag  
$4 = 172                                                                                                                                                                                                                                                                         

Before actually reaching this point I also had to fix an ASAN error with process_spu using memcpy on overlapping buffers. I can't say I understand why the buffers would be overlapping but using memmove at least fixes the error.

==611746==ERROR: AddressSanitizer: memcpy-param-overlap: memory ranges [0x7fffdf1eae84,0x7fffdf1eb528) and [0x7fffdf1ea800, 0x7fffdf1eaea4) overlap
    #0 0x7ffff786db25 in __interceptor_memcpy ../../../../src/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:899
    #1 0x5555556c2302 in process_spu ../src/lib_ccx/dvd_subtitle_decoder.c:387
    #2 0x5555556fe994 in process_data ../src/lib_ccx/general_loop.c:662
    #3 0x555555701650 in process_non_multiprogram_general_loop ../src/lib_ccx/general_loop.c:968
    #4 0x555555702248 in general_loop ../src/lib_ccx/general_loop.c:1062
    #5 0x5555556738ee in api_start ../src/ccextractor.c:204
    #6 0x555555675c39 in main ../src/ccextractor.c:465
    #7 0x7ffff64456c9 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
    #8 0x7ffff6445784 in __libc_start_main_impl ../csu/libc-start.c:360
    #9 0x555555672c50 in _start (/home/jst/tools/src/ccextractor/linux/ccextractor+0x11ec50) (BuildId: 466667d3e95ff9aa8e7b1165aeac946dcfc18371)

🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/CCExtractor/ccextractor/pull/1586 **Author:** [@jstrot](https://github.com/jstrot) **Created:** 1/7/2024 **Status:** ✅ Merged **Merged:** 3/11/2025 **Merged by:** [@prateekmedia](https://github.com/prateekmedia) **Base:** `master` ← **Head:** `qualip-ocr_bitmap-last_font_tag` --- ### 📝 Commits (10+) - [`c51f367`](https://github.com/CCExtractor/ccextractor/commit/c51f367f7990edbbcc23674d11edcd2e6f4a27bc) ASAN: process_spu copies overlapping buffers - [`e9cc967`](https://github.com/CCExtractor/ccextractor/commit/e9cc967879aed69332008eb84b9d73cc362786a8) ocr_bitmap: Make sure there is enough room for the last_font_tag - [`fc8dfa6`](https://github.com/CCExtractor/ccextractor/commit/fc8dfa64f570a8a8b92d9443fd3a73e004b1dd06) Update CHANGES.TXT - [`69a115e`](https://github.com/CCExtractor/ccextractor/commit/69a115e21bfe8d6347bfd5673545a15c790ed243) Merge branch 'master' of https://github.com/CCExtractor/ccextractor into qualip-ocr_bitmap-last_font_tag - [`81550c3`](https://github.com/CCExtractor/ccextractor/commit/81550c3bb67d4a94795e20c32f8e5487b2de49bc) Merge branch 'master' of https://github.com/CCExtractor/ccextractor into qualip-ocr_bitmap-last_font_tag - [`de88c63`](https://github.com/CCExtractor/ccextractor/commit/de88c638e316ab928996f802fc842f1ecee8ca30) Baseline formatting fixes - [`eb058ea`](https://github.com/CCExtractor/ccextractor/commit/eb058eab69804cfe2e9b7dfddb2a6f2022ca3ce8) fixup! Baseline formatting fixes - [`6f406ed`](https://github.com/CCExtractor/ccextractor/commit/6f406ed6befa31505a81df57a3e3858349556190) fixup! fixup! Baseline formatting fixes - [`4a25a04`](https://github.com/CCExtractor/ccextractor/commit/4a25a0420290e96fac306628deda6bb095ca5815) Fix rust comment formatting - [`6264d5b`](https://github.com/CCExtractor/ccextractor/commit/6264d5b9fb067f41c29a15afd36a17cc5e2c26c6) cxx_options.copy_from_rust: Avoid "mutable reference to mutable static" warning ### 📊 Changes **3 files changed** (+3 additions, -1 deletions) <details> <summary>View changed files</summary> 📝 `docs/CHANGES.TXT` (+1 -0) 📝 `src/lib_ccx/dvd_subtitle_decoder.c` (+1 -1) 📝 `src/lib_ccx/ocr.c` (+1 -0) </details> ### 📄 Description <!-- Please prefix your pull request with one of the following: **[FEATURE]** **[FIX]** **[IMPROVEMENT]**. --> **In raising this pull request, I confirm the following (please check boxes):** - [x] I have read and understood the [contributors guide](https://github.com/CCExtractor/ccextractor/blob/master/.github/CONTRIBUTING.md). - [x] I have checked that another pull request for this purpose does not exist. - [x] I have considered, and confirmed that this submission will be valuable to others. - [x] I accept that this submission may not be used, and the pull request closed at the will of the maintainer. - [x] I give this submission freely, and claim no ownership to its content. - [x] **I have mentioned this change in the [changelog](https://github.com/CCExtractor/ccextractor/blob/master/docs/CHANGES.TXT).** **My familiarity with the project is as follows (check one):** - [ ] I have never used CCExtractor. - [ ] I have used CCExtractor just a couple of times. - [x] I absolutely love CCExtractor, but have not contributed previously. - [ ] I am an active contributor to CCExtractor. --- Version: 0.94 During OCR of a VOB PS, ccextractor can run out of buffer space if it has to copy all text since the last font tag (which can also be the beginning of the input): ``` $ ./ccextractor -1 -cc2 -out=srt -utf8 test.vob -o test.srt ... Error: In ocr_bitmap: Running out of memory. It shouldn't happen. Please report. ``` I believe the bug existed since that piece of code was introduced way back in 2017 (#844) The fix simply makes sure the allocated buffer is big enough for this extra string. Example crash under gdb: ``` $ gdb --args ./ccextractor -1 -cc2 -out=srt -utf8 test.vob -o test.srt (gdb) run Starting program: /home/jst/tools/src/ccextractor/linux/ccextractor -1 -cc2 -out=srt -utf8 test.vob -o test.srt [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". CCExtractor 0.94, Carlos Fernandez Sanz, Volker Quetschke. Teletext portions taken from Petr Kutalek's telxcc -------------------------------------------------------------------------- Input: test.vob [Extract: 1] [Stream mode: Autodetect] [Program : Auto ] [Hauppage mode: No] [Use MythTV code: Auto] [Timing mode: Auto] [Debug: No] [Buffer input: No] [Use pic_order_cnt_lsb for H.264: No] [Print CC decoder traces: No] [Target format: .srt] [Encoding: UTF-8] [Delay: 0] [Trim lines: No] [Add font color data: Yes] [Add font typesetting: Yes] [Convert case: No][Filter profanity: No] [Video-edit join: No] [Extraction start time: not set (from start)] [Extraction end time: not set (to end)] [Live stream: No] [Clock frequency: 90000] [Teletext page: Autodetect] [Start credits text: None] [Quantisation-mode: CCExtractor's internal function] ----------------------------------------------------------------- Opening file: test.vob File seems to be a program stream, enabling PS mode Analyzing data in general mode New video information found [720 * 480] [AR: 02 - 4:3] [FR: 04 - 29.97] [progressive: no] 0% | 00:00 ... Skip forward to the next Sequence or GOP start. 95% | 19:38 Skip forward to the next Sequence or GOP start. Skip forward to the next Sequence or GOP start. Thread 1 "ccextractor" hit Breakpoint 1, fatal (exit_code=1000, fmt=0x555555ee8da0 "In ocr_bitmap: Running out of memory. It shouldn't happen. Please report.\n") at ../src/lib_ccx/utility.c:272 272 va_start(args, fmt); (gdb) up #1 0x00005555557976ed in ocr_bitmap (arg=0x602000008250, palette=0x602000b1c390, alpha=0x602000b1c3b0 "", indata=0x62a000726200 "", w=556, h=42, copy=0x60400003c210) at ../src/lib_ccx/ocr.c:638 638 fatal(CCX_COMMON_EXIT_BUG_BUG, "In ocr_bitmap: Running out of memory. It shouldn't happen. Please report.\n", errno); (gdb) list 633 { 634 if ((new_text_out_iter - new_text_out) + 635 (last_font_tag_end - last_font_tag) > 636 length) 637 { 638 fatal(CCX_COMMON_EXIT_BUG_BUG, "In ocr_bitmap: Running out of memory. It shouldn't happen. Please report.\n", errno); 639 } 640 memcpy(new_text_out_iter, last_font_tag, last_font_tag_end - last_font_tag); 641 new_text_out_iter += last_font_tag_end - last_font_tag; 642 } (gdb) p new_text_out_iter - new_text_out $1 = 96 (gdb) p last_font_tag_end - last_font_tag $2 = 76 (gdb) p length $3 = 158 (gdb) p new_text_out_iter - new_text_out + last_font_tag_end - last_font_tag $4 = 172 ``` Before actually reaching this point I also had to fix an ASAN error with process_spu using `memcpy` on overlapping buffers. I can't say I understand why the buffers would be overlapping but using `memmove` at least fixes the error. ``` ==611746==ERROR: AddressSanitizer: memcpy-param-overlap: memory ranges [0x7fffdf1eae84,0x7fffdf1eb528) and [0x7fffdf1ea800, 0x7fffdf1eaea4) overlap #0 0x7ffff786db25 in __interceptor_memcpy ../../../../src/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:899 #1 0x5555556c2302 in process_spu ../src/lib_ccx/dvd_subtitle_decoder.c:387 #2 0x5555556fe994 in process_data ../src/lib_ccx/general_loop.c:662 #3 0x555555701650 in process_non_multiprogram_general_loop ../src/lib_ccx/general_loop.c:968 #4 0x555555702248 in general_loop ../src/lib_ccx/general_loop.c:1062 #5 0x5555556738ee in api_start ../src/ccextractor.c:204 #6 0x555555675c39 in main ../src/ccextractor.c:465 #7 0x7ffff64456c9 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58 #8 0x7ffff6445784 in __libc_start_main_impl ../csu/libc-start.c:360 #9 0x555555672c50 in _start (/home/jst/tools/src/ccextractor/linux/ccextractor+0x11ec50) (BuildId: 466667d3e95ff9aa8e7b1165aeac946dcfc18371) ``` --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
claunia added the pull-request label 2026-01-29 17:21:23 +00:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/ccextractor#2297