mirror of
https://github.com/CCExtractor/ccextractor.git
synced 2026-02-03 21:23:48 +00:00
Regression - Latest builds hardsubx is broken and no longer detect burned in subtitles #837
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @rboy1 on GitHub (Aug 5, 2025).
When using build 0.93 and tessdata 3.04 trained files its able to complete OCR and extract burned in subtitles.
Here's the sample file: https://www.dropbox.com/scl/fi/o01wgvzhxj7rcjrcfbdet/burnedinsubs.mp4?rlkey=cuek4cwqx1h7fx6vqbalm4ijj&st=xitc3we3&dl=0
However when using the latest windows build and the same tessdata files it fails to detect any burned in subtitles. (with the default option and also with OEM 0 option)
@rboy1 commented on GitHub (Aug 5, 2025):
It's almost like it's crashing, but using the -debug flag also yields no additional information
Question - will building it without RUST revert the functionality back to 0.93 with the latest builds? Is that even possible when building for Windows via Visual Studio?
@rboy1 commented on GitHub (Aug 18, 2025):
I can confirm that building ccextractor without RUST works fine, this issue is specific to the RUST implementation
@hrideshmg commented on GitHub (Aug 30, 2025):
There have been changes to the C code since 0.93 and some of these do include bug fixes. You can compile for C only on visual studio by going to properties -> C/C++ -> Preprocesssor and adding the
DISABLE_RUSTpreprocessor directive.@rboy1 commented on GitHub (Aug 30, 2025):
I needed a single static executable so I used MinGW-w64 + MXE to cross compile and build it using the
WITHOUT_RUST=ONCMakefile directive.Can the Visual Studio version be configured to build a single static executable?
@hrideshmg commented on GitHub (Aug 30, 2025):
I don't think that is possible currently, while the binary is largely self contained, the GPAC DLL's are externally linked so they must be present alongside the exe. I believe this is because the dependency manager that we use for Windows (VCPKG) doesn't have GPAC in its library.
@hrideshmg commented on GitHub (Sep 2, 2025):
Hey @rboy1 could you just check if the issue is now fixed for you on Windows?
@rboy1 commented on GitHub (Sep 2, 2025):
@hrideshmg running some tests but I'm also already seeing some difference between the RUST build and the WITHOUT_RUST build.
When using OCR with dvb sub extraction I'm seeing differences in the output timestamps of the SRT files. Attaching 3 files SRT files to compare, one using the last 0.94 release, one where ccextractor was built without RUST and one which was the latest build taken from here: https://github.com/CCExtractor/ccextractor/actions/runs/17349672445/job/49254107929?pr=1741
As you can see the 0.94 release and the latest WITHOUT_RUST ccextractor builds create identical outputs, however the latest release with RUST generates timestamps which are offset by -9.999 seconds
without_rust.srt.txt
rust.srt.txt
0.94_release.srt.txt
@rboy1 commented on GitHub (Sep 2, 2025):
Ok here are the results against a test file with just the
--hardsubxoption and Tessdata 3.04.Is it running now, yes. Is it working as expected no.
Running the latest build WITHOUT_RUST and with RUST on the same burn in subtitles are generating different SRT files. The one with RUST is generating a lot of errors while doing OCR. Attaching both files for reference.
Something with the RUST code isn't a perfect replica of the C code (unless that was the intention) and it's generating less usable output compare to the C code.
rust.srt.txt
without_rust.srt.txt
@hrideshmg commented on GitHub (Sep 3, 2025):
It's not meant to be a perfect replica of the C code yes, I'm not sure what is causing the discrepancies though. I'm personally seeing mixed results with my files, some files do better on rust but others do better on C
@rboy1 commented on GitHub (Sep 3, 2025):
Hmm, if the code is being ported as is when any reason why it shouldn’t be a perfect replica ? Are any thresholds or parameters being changed during the porting?
Maybe have 2 OCR detection options as a CLi parameter. One which uses the C thresholds / values and one which use the new RUST threshold / values. That way users can pick which one works better for them. Then we get the best of both worlds!