[BUG] ocr.c writing outside allocated memory #570

Open
opened 2026-01-29 16:48:00 +00:00 by claunia · 0 comments
Owner

Originally created by @dcjm on GitHub (Apr 6, 2020).

I am raising this as an issue but I do have a fix for the problem so it could be a pull request. However, this is a full explanation of the problem in case you want to try a different approach.

When running ccextractor on some ts files I found two cases where malloc was reporting memory corruption. This was tested first with the 0.88 release and then with git master. I ran valgrind and got the following:

==32409== Invalid write of size 1
==32409== at 0x483CA14: memmove (vg_replace_strmem.c:1270)
==32409== by 0x13F423: ocr_bitmap (ocr.c:671)
==32409== by 0x13FEEC: ocr_rect (ocr.c:915)
==32409== by 0x17B828: write_dvb_sub (dvb_subtitle_decoder.c:1663)
==32409== by 0x17BBD9: dvbsub_handle_display_segment (dvb_subtitle_decoder.c:1726)
==32409== by 0x17C077: dvbsub_decode (dvb_subtitle_decoder.c:1832)
==32409== by 0x149BEA: process_data (general_loop.c:644)
==32409== by 0x14AB6E: general_loop (general_loop.c:1019)
==32409== by 0x137058: api_start (ccextractor.c:213)
==32409== by 0x137D7F: main (ccextractor.c:534)
==32409== Address 0x9b75967 is 0 bytes after a block of size 135 alloc'd
==32409== at 0x483577F: malloc (vg_replace_malloc.c:299)
==32409== by 0x13F07E: ocr_bitmap (ocr.c:595)
==32409== by 0x13FEEC: ocr_rect (ocr.c:915)
==32409== by 0x17B828: write_dvb_sub (dvb_subtitle_decoder.c:1663)
==32409== by 0x17BBD9: dvbsub_handle_display_segment (dvb_subtitle_decoder.c:1726)
==32409== by 0x17C077: dvbsub_decode (dvb_subtitle_decoder.c:1832)
==32409== by 0x149BEA: process_data (general_loop.c:644)
==32409== by 0x14AB6E: general_loop (general_loop.c:1019)
==32409== by 0x137058: api_start (ccextractor.c:213)
==32409== by 0x137D7F: main (ccextractor.c:534)
==32409==
==32409== Invalid write of size 1
==32409== at 0x13F466: ocr_bitmap (ocr.c:679)
==32409== by 0x13FEEC: ocr_rect (ocr.c:915)
==32409== by 0x17B828: write_dvb_sub (dvb_subtitle_decoder.c:1663)
==32409== by 0x17BBD9: dvbsub_handle_display_segment (dvb_subtitle_decoder.c:1726)
==32409== by 0x17C077: dvbsub_decode (dvb_subtitle_decoder.c:1832)
==32409== by 0x149BEA: process_data (general_loop.c:644)
==32409== by 0x14AB6E: general_loop (general_loop.c:1019)
==32409== by 0x137058: api_start (ccextractor.c:213)
==32409== by 0x137D7F: main (ccextractor.c:534)
==32409== Address 0x9b7596a is 3 bytes after a block of size 135 alloc'd
==32409== at 0x483577F: malloc (vg_replace_malloc.c:299)
==32409== by 0x13F07E: ocr_bitmap (ocr.c:595)
==32409== by 0x13FEEC: ocr_rect (ocr.c:915)
==32409== by 0x17B828: write_dvb_sub (dvb_subtitle_decoder.c:1663)
==32409== by 0x17BBD9: dvbsub_handle_display_segment (dvb_subtitle_decoder.c:1726)
==32409== by 0x17C077: dvbsub_decode (dvb_subtitle_decoder.c:1832)
==32409== by 0x149BEA: process_data (general_loop.c:644)
==32409== by 0x14AB6E: general_loop (general_loop.c:1019)
==32409== by 0x137058: api_start (ccextractor.c:213)
==32409== by 0x137D7F: main (ccextractor.c:534)

Adding assertions and then looking at the values with gdb showed that the assumption in the comment on line 635 is incorrect:

(gdb) print text_out
$1 = 0x5555570b91f0 "Of course. I'll do\nwhatever | can to h<font color="#ffff00">elp.\n"

(gdb) print text_out
$1 = 0x555556caa9a0 "Ford resented him.\nSo he cooked up <font color="#ffff00">a pack of lies\n"

The overflow comes about because the loop at line 650 leaves last_font_tag pointing to the start of the line but the call to strstr on line 658 sets last_font_tag_end to the closing > on the subsequent line. The fix is to add
if (last_font_tag_end > line_end) last_font_tag_end = NULL;
after line 658.

If you want to investigate this yourselves I can put the subtitle streams on a public server. There may be a better way to fix this.

Originally created by @dcjm on GitHub (Apr 6, 2020). I am raising this as an issue but I do have a fix for the problem so it could be a pull request. However, this is a full explanation of the problem in case you want to try a different approach. When running ccextractor on some ts files I found two cases where malloc was reporting memory corruption. This was tested first with the 0.88 release and then with git master. I ran valgrind and got the following: ==32409== Invalid write of size 1 ==32409== at 0x483CA14: memmove (vg_replace_strmem.c:1270) ==32409== by 0x13F423: ocr_bitmap (ocr.c:671) ==32409== by 0x13FEEC: ocr_rect (ocr.c:915) ==32409== by 0x17B828: write_dvb_sub (dvb_subtitle_decoder.c:1663) ==32409== by 0x17BBD9: dvbsub_handle_display_segment (dvb_subtitle_decoder.c:1726) ==32409== by 0x17C077: dvbsub_decode (dvb_subtitle_decoder.c:1832) ==32409== by 0x149BEA: process_data (general_loop.c:644) ==32409== by 0x14AB6E: general_loop (general_loop.c:1019) ==32409== by 0x137058: api_start (ccextractor.c:213) ==32409== by 0x137D7F: main (ccextractor.c:534) ==32409== Address 0x9b75967 is 0 bytes after a block of size 135 alloc'd ==32409== at 0x483577F: malloc (vg_replace_malloc.c:299) ==32409== by 0x13F07E: ocr_bitmap (ocr.c:595) ==32409== by 0x13FEEC: ocr_rect (ocr.c:915) ==32409== by 0x17B828: write_dvb_sub (dvb_subtitle_decoder.c:1663) ==32409== by 0x17BBD9: dvbsub_handle_display_segment (dvb_subtitle_decoder.c:1726) ==32409== by 0x17C077: dvbsub_decode (dvb_subtitle_decoder.c:1832) ==32409== by 0x149BEA: process_data (general_loop.c:644) ==32409== by 0x14AB6E: general_loop (general_loop.c:1019) ==32409== by 0x137058: api_start (ccextractor.c:213) ==32409== by 0x137D7F: main (ccextractor.c:534) ==32409== ==32409== Invalid write of size 1 ==32409== at 0x13F466: ocr_bitmap (ocr.c:679) ==32409== by 0x13FEEC: ocr_rect (ocr.c:915) ==32409== by 0x17B828: write_dvb_sub (dvb_subtitle_decoder.c:1663) ==32409== by 0x17BBD9: dvbsub_handle_display_segment (dvb_subtitle_decoder.c:1726) ==32409== by 0x17C077: dvbsub_decode (dvb_subtitle_decoder.c:1832) ==32409== by 0x149BEA: process_data (general_loop.c:644) ==32409== by 0x14AB6E: general_loop (general_loop.c:1019) ==32409== by 0x137058: api_start (ccextractor.c:213) ==32409== by 0x137D7F: main (ccextractor.c:534) ==32409== Address 0x9b7596a is 3 bytes after a block of size 135 alloc'd ==32409== at 0x483577F: malloc (vg_replace_malloc.c:299) ==32409== by 0x13F07E: ocr_bitmap (ocr.c:595) ==32409== by 0x13FEEC: ocr_rect (ocr.c:915) ==32409== by 0x17B828: write_dvb_sub (dvb_subtitle_decoder.c:1663) ==32409== by 0x17BBD9: dvbsub_handle_display_segment (dvb_subtitle_decoder.c:1726) ==32409== by 0x17C077: dvbsub_decode (dvb_subtitle_decoder.c:1832) ==32409== by 0x149BEA: process_data (general_loop.c:644) ==32409== by 0x14AB6E: general_loop (general_loop.c:1019) ==32409== by 0x137058: api_start (ccextractor.c:213) ==32409== by 0x137D7F: main (ccextractor.c:534) Adding assertions and then looking at the values with gdb showed that the assumption in the comment on line 635 is incorrect: (gdb) print text_out $1 = 0x5555570b91f0 "Of course. I'll do\nwhatever | can to h<font color=\"#ffff00\">elp.\n" (gdb) print text_out $1 = 0x555556caa9a0 "Ford resented him.\nSo he cooked up <font color=\"#ffff00\">a pack of lies\n" The overflow comes about because the loop at line 650 leaves `last_font_tag` pointing to the start of the line but the call to `strstr `on line 658 sets `last_font_tag_end `to the closing `>` on the subsequent line. The fix is to add ` if (last_font_tag_end > line_end) last_font_tag_end = NULL;` after line 658. If you want to investigate this yourselves I can put the subtitle streams on a public server. There may be a better way to fix this.
claunia added the difficulty: easyOCRHacktoberfest labels 2026-01-29 16:48:00 +00:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/ccextractor#570