mirror of
https://github.com/CCExtractor/ccextractor.git
synced 2026-02-03 21:23:48 +00:00
[BUG] Duplicating subtitles with & without font color tags, a lot of warnings and errors, empty images with spupng. #403
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @tsmarinov on GitHub (Mar 6, 2018).
Please prefix your issue with one of the following: [BUG], [PROPOSAL], [QUESTION].
CCExtractor version (using the --version parameter preferably) : X.X
In raising this issue, I confirm the following (please check boxes, eg [X] - and delete unchecked ones):
My familiarity with the project is as follows (check one, eg [X] - and delete unchecked ones):
Necessary information
-autoprogram./ccextractor -quant 0 -nofc -in=ts -datapid 0x1de7 -out=srt -stdout -nobom -trim -noteletext -codec dvbsub -dvblang bul -ocrlang bul cinemax.ts
**Video links (replace text below with your links) **
https://goo.gl/DmLjji
Please make the affected input file available for us (no screenshots, those don't help!). Public links to Dropbox, Google Drive, etc, are all fine. If it is not possible to make it available publicly, send us a private invitation (both Dropbox and Google Drive allow that). In this case we will download the file and upload it to the private developer repository.
Do not upload your file to any location that will require us to sign up or endure a wait list, slow downloads, etc. If your upload expires make sure you keep it active somehow (replace links if needed). Keep in mind that while we go over all tickets some may take a few days, and it's important we have the file available when we actually need it.
Additional information
{issue content here, replace this line with your issue content}
PS: Make sure you set an alert in GitHub so you get notifications about your ticket. We may need to ask questions and we do everything inside GitHub's system.
This is what i get when i use:
./ccextractor -quant 0 -nofc -in=ts -datapid 0x1de7 -out=srt -stdout -nobom -trim -noteletext -codec dvbsub -dvblang bul -ocrlang bul cinemax.tsduplicated subtitles: first is with fonts tags and then without them. Also when generating spupng, where are the non-color subtitles there are empty (transparent) PNGs
full log and ts is here: https://goo.gl/DmLjji
@thealphadollar commented on GitHub (Mar 6, 2018):
I would like to look into this bug in sometime but before me if someone wants to try, the below link could be a good starting point.
https://github.com/tesseract-ocr/tesseract/issues/427
@thealphadollar commented on GitHub (Mar 6, 2018):
The same issue can be reproduced in this sample from our own sample files.
The issue is, most probably, concerned with Tesseract.
@cfsmp3 commented on GitHub (Mar 8, 2018):
I can't reproduce the crash with current master.
@thealphadollar commented on GitHub (Mar 8, 2018):
@cfsmp3 I tried it on my system (Linux) and the issue is for real I would say. I used the latest commit, fetched just now :)
The git log for the version used:
Arguments used are exactly the same as given in issue,
-quant 0 -nofc -in=ts -datapid 0x1de7 -out=srt -stdout -nobom -trim -noteletext -codec dvbsub -dvblang bul -ocrlang bul cinemax.tsI'm using tesseract version
3.04.01-6, which is latest at the moment.This has slim chances, but may be a different version of Tesseract could be making the difference since I think this error is related to OCR'ing.
@cfsmp3 commented on GitHub (Mar 8, 2018):
I do see those messages Pix... as well, but not a segfault.
Does it segfault for you too?
On Wed, Mar 7, 2018 at 7:58 PM, Shivam Kumar Jha notifications@github.com
wrote:
@thealphadollar commented on GitHub (Mar 8, 2018):
With the
--nofontcolorparameter it gives me no segfault but produces the error shown in my last comment. When I don't use the "-nofc" parameter, it gives me a segfault.argument used:
ccextractor -quant 0 -in=ts -datapid 0x1de7 -out=srt -stdout -nobom -trim -noteletext -codec dvbsub -dvblang bul -ocrlang bul cinemax.ts@cfsmp3 commented on GitHub (Apr 9, 2018):
This trace gives a clue:
=================================================================
==323==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x610000003bf4 at pc 0x7fbda8ecb904 bp 0x7ffcae4e6ec0 sp 0x7ffcae4e6668
WRITE of size 65 at 0x610000003bf4 thread T0
#0 0x7fbda8ecb903 in __asan_memcpy (/usr/lib/x86_64-linux-gnu/libasan.so.2+0x8c903)
#1 0x42b7a3 in memcpy /usr/include/x86_64-linux-gnu/bits/string3.h:53
#2 0x42b7a3 in ocr_bitmap ../src/lib_ccx/ocr.c:571
#3 0x42ca2e in ocr_rect ../src/lib_ccx/ocr.c:815
#4 0x4665b5 in write_dvb_sub ../src/lib_ccx/dvb_subtitle_decoder.c:1664
#5 0x4665b5 in dvbsub_handle_display_segment ../src/lib_ccx/dvb_subtitle_decoder.c:1713
#6 0x468b8e in dvbsub_decode ../src/lib_ccx/dvb_subtitle_decoder.c:1821
#7 0x446886 in process_data ../src/lib_ccx/general_loop.c:651
#8 0x44839c in general_loop ../src/lib_ccx/general_loop.c:1027
#9 0x40815a in api_start ../src/ccextractor.c:209
#10 0x409744 in main ../src/ccextractor.c:532
#11 0x7fbda7bc782f in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2082f)
#12 0x4079f8 in _start (/usr/local/src/ccextractor/linux/ccextractor+0x4079f8)
0x610000003bf4 is located 0 bytes to the right of 180-byte region [0x610000003b40,0x610000003bf4)
allocated by thread T0 here:
#0 0x7fbda8ed7602 in malloc (/usr/lib/x86_64-linux-gnu/libasan.so.2+0x98602)
#1 0x42b4fc in ocr_bitmap ../src/lib_ccx/ocr.c:513
#2 0x42ca2e in ocr_rect ../src/lib_ccx/ocr.c:815
#3 0x4665b5 in write_dvb_sub ../src/lib_ccx/dvb_subtitle_decoder.c:1664
#4 0x4665b5 in dvbsub_handle_display_segment ../src/lib_ccx/dvb_subtitle_decoder.c:1713
#5 0x468b8e in dvbsub_decode ../src/lib_ccx/dvb_subtitle_decoder.c:1821
#6 0x446886 in process_data ../src/lib_ccx/general_loop.c:651
#7 0x44839c in general_loop ../src/lib_ccx/general_loop.c:1027
#8 0x40815a in api_start ../src/ccextractor.c:209
#9 0x409744 in main ../src/ccextractor.c:532
#10 0x7fbda7bc782f in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2082f)
SUMMARY: AddressSanitizer: heap-buffer-overflow ??:0 __asan_memcpy
Shadow bytes around the buggy address:
0x0c207fff8720: fa fa fa fa fa fa fa fa fd fd fd fd fd fd fd fd
0x0c207fff8730: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fa
0x0c207fff8740: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c207fff8750: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c207fff8760: fa fa fa fa fa fa fa fa 00 00 00 00 00 00 00 00
=>0x0c207fff8770: 00 00 00 00 00 00 00 00 00 00 00 00 00 00[04]fa
0x0c207fff8780: fa fa fa fa fa fa fa fa fd fd fd fd fd fd fd fd
0x0c207fff8790: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fa
0x0c207fff87a0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c207fff87b0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c207fff87c0: fa fa fa fa fa fa fa fa fd fd fd fd fd fd fd fd
Shadow byte legend (one shadow byte represents 8 application bytes):
Addressable: 00
Partially addressable: 01 02 03 04 05 06 07
Heap left redzone: fa
Heap right redzone: fb
Freed heap region: fd
Stack left redzone: f1
Stack mid redzone: f2
Stack right redzone: f3
Stack partial redzone: f4
Stack after return: f5
Stack use after scope: f8
Global redzone: f9
Global init order: f6
Poisoned by user: f7
Container overflow: fc
Array cookie: ac
Intra object redzone: bb
ASan internal: fe
==323==ABORTING
However, even though eventually we crash, there's lots of issues before it... so this issue is definitely a must-solve by the student in charge of the OCR improvements during GSoC.
@cfsmp3 commented on GitHub (Apr 9, 2018):
valgrind output:
==791== Invalid write of size 1
==791== at 0x4C3275B: memcpy@@GLIBC_2.14 (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==791== by 0x41C95A: ocr_bitmap (ocr.c:571)
==791== by 0x41D4BF: ocr_rect (ocr.c:815)
==791== by 0x43DE06: write_dvb_sub (dvb_subtitle_decoder.c:1664)
==791== by 0x43E104: dvbsub_handle_display_segment (dvb_subtitle_decoder.c:1713)
==791== by 0x43E583: dvbsub_decode (dvb_subtitle_decoder.c:1821)
==791== by 0x42ABCE: process_data (general_loop.c:651)
==791== by 0x42BB7F: general_loop (general_loop.c:1027)
==791== by 0x4072C3: api_start (ccextractor.c:209)
==791== by 0x407FDF: main (ccextractor.c:532)
==791== Address 0x155f8a84 is 0 bytes after a block of size 180 alloc'd
==791== at 0x4C2DB8F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==791== by 0x41C616: ocr_bitmap (ocr.c:513)
==791== by 0x41D4BF: ocr_rect (ocr.c:815)
==791== by 0x43DE06: write_dvb_sub (dvb_subtitle_decoder.c:1664)
==791== by 0x43E104: dvbsub_handle_display_segment (dvb_subtitle_decoder.c:1713)
==791== by 0x43E583: dvbsub_decode (dvb_subtitle_decoder.c:1821)
==791== by 0x42ABCE: process_data (general_loop.c:651)
==791== by 0x42BB7F: general_loop (general_loop.c:1027)
==791== by 0x4072C3: api_start (ccextractor.c:209)
==791== by 0x407FDF: main (ccextractor.c:532)
==791==
==791== Invalid write of size 1
==791== at 0x4C3275B: memcpy@@GLIBC_2.14 (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==791== by 0x41C9D5: ocr_bitmap (ocr.c:578)
==791== by 0x41D4BF: ocr_rect (ocr.c:815)
==791== by 0x43DE06: write_dvb_sub (dvb_subtitle_decoder.c:1664)
==791== by 0x43E104: dvbsub_handle_display_segment (dvb_subtitle_decoder.c:1713)
==791== by 0x43E583: dvbsub_decode (dvb_subtitle_decoder.c:1821)
==791== by 0x42ABCE: process_data (general_loop.c:651)
==791== by 0x42BB7F: general_loop (general_loop.c:1027)
==791== by 0x4072C3: api_start (ccextractor.c:209)
==791== by 0x407FDF: main (ccextractor.c:532)
==791== Address 0x155f8aa0 is 16 bytes after a block of size 192 in arena "client"
==791==
==791== Invalid write of size 1
==791== at 0x41CA18: ocr_bitmap (ocr.c:586)
==791== by 0x41D4BF: ocr_rect (ocr.c:815)
==791== by 0x43DE06: write_dvb_sub (dvb_subtitle_decoder.c:1664)
==791== by 0x43E104: dvbsub_handle_display_segment (dvb_subtitle_decoder.c:1713)
==791== by 0x43E583: dvbsub_decode (dvb_subtitle_decoder.c:1821)
==791== by 0x42ABCE: process_data (general_loop.c:651)
==791== by 0x42BB7F: general_loop (general_loop.c:1027)
==791== by 0x4072C3: api_start (ccextractor.c:209)
==791== by 0x407FDF: main (ccextractor.c:532)
==791== Address 0x155f8aa7 is 23 bytes after a block of size 192 in arena "client"
[etc]
@cfsmp3 commented on GitHub (Jan 25, 2020):
@tsmarinov Is this still happening?
@cfsmp3 commented on GitHub (Nov 21, 2021):
Closing due to original poster not responding.