mirror of
https://github.com/CCExtractor/ccextractor.git
synced 2026-02-15 05:26:07 +00:00
[BUG] A mix of 8-bit/16-bit chars sent to iconv #714
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @erankor on GitHub (Aug 24, 2022).
Necessary information
./ccextractor test.ts -svc all[UTF-16BE] -nofc -12Video links
http://cdnapi.kaltura.com/p/2035982/playManifest/entryId/1_frxnu0yr/flavorId/1_tr3kiz6l/format/download/a.ts
Additional information
Hi all,
I have some TS file with 708 subtitles in Japanese & Chinese that failed to decode properly.
After some debugging, I found that if I patch the function
write_utf16_charhere -https://github.com/CCExtractor/ccextractor/blob/master/src/lib_ccx/ccx_decoders_708_output.c#L113
to always output 2 byte chars (I changed the if to
if (1)), and I specify an encoding ofUTF-16BE, it decodes properly.This code looks off to me, as it creates a mix of 8-bit & 16-bit chars with no clear encoding (it's not UTF-8 and it's not UTF-16...).
Maybe when iconv is used, the function should always output 2 byte chars?
Or, alternatively, if it would use 2-bytes for ALL chars if there is ANY char that doesn't fit in 1-byte, it would also be ok (but this sounds more complex to do...).
Btw, VLC decodes the Japanese & Chinese properly, after changing the 'preferred closed captions decoder' setting from 608 to 708.
Thanks!
Eran