mirror of
https://github.com/CCExtractor/ccextractor.git
synced 2026-02-14 13:35:43 +00:00
[BUG] ISDB subtitles issues with encoding of special characters #438
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @jakubvojacek on GitHub (Aug 26, 2018).
CCExtractor version (using the --version parameter preferably) : e9d2a89768f10e6d269dcd0b9245895f3899a72d
In raising this issue, I confirm the following (please check boxes, eg [X] - and delete unchecked ones):
My familiarity with the project is as follows (check one, eg [X] - and delete unchecked ones):
Necessary information
ccextractor -datapid 0x116 -o test.vtt nsc.mp4Video links (replace text below with your links)
nsc.mp4 - https://goo.gl/iiKTAQ
Additional information
Hello,
the issue is with portugesse accent characters, such as á, ã, ê, .... Instead of these characters, the cccextractor shows ?. Probably some issue with encoding? Am I doing something wrong or is this an issue with ccextractor? Please find bellow samples from generated test.vtt file and manually fixed comparison
vs
Thank you
Jakub
@anshul1912 commented on GitHub (Aug 29, 2018):
I checked srt file of same video in vim editor, I was able to see the letters corrcetly.
@anshul1912 commented on GitHub (Aug 29, 2018):
In notepad++ if you select encode in ANSI, you can see those character correctly
@jakubvojacek commented on GitHub (Aug 29, 2018):
@anshul1912 ccextractor should be using utf8 by default, therefore no encoding change should be required I believe. Also, extracting these special characters from other subtitles sources such as dvb subtitles or CEA608 is working perfectly, the issue is only with ISDBT source.
I only have access to linux (we're using debian) and mac so I cannot try notepad++, anyway, I tried using vim and some other editors and changing the encoding - did not help.
Is it possible that ccextractor might work differently on windows vs linux platform? Since you can see the characters properly and I cannot.
Thank you
@anshul1912 commented on GitHub (Aug 29, 2018):
Hi
I was using vim on Ubuntu and notepad++ on windows.
Problem is from source, ISDB expect utf-8 in its data. But actually ANSI is
present.
Because of which editor do not interpret character well.
You can also use iconv to convert the file encoding.
Thanks
Anshul
On Wed, 29 Aug 2018, 5:26 pm jakubvojacek, notifications@github.com wrote:
@jakubvojacek commented on GitHub (Aug 29, 2018):
Thank you, using the iconv I was able to fix the encodings.