mirror of
https://github.com/SabreTools/MPF.git
synced 2026-02-03 21:29:27 +00:00
[Problem] Shift-JIS in cuesheet incorrectly interpreted as UTF-8, resulting in unreadable Japanese text #813
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @Daja177 on GitHub (Feb 27, 2025).
Originally assigned to: @mnadareski on GitHub.
Version
What version are you using?
Build
What runtime version are you using?
Describe the issue
When dumping an audio CD using Redumper with cuesheet metadata that includes Japanese text encoded in Shift-JIS, the cuesheet data is copied into the Redumper log as raw bytes. MPF then takes this data and interprets it as UTF-8, resulting in mangled, unreadable text in the generated
!submissionInfo.txtfile—regardless of whether you attempt to open it as UTF-8 or Shift-JIS.To Reproduce
Steps to reproduce the behavior:
!submissionInfo.txtfile and note that it is encoded as UTF-8, resulting in the Japanese text appearing mangled and unreadable—even if you attempt to open it as Shift-JIS.Expected behavior
MPF should detect that the cuesheet metadata is encoded in Shift-JIS and convert it properly to UTF-8 so that the Japanese text remains readable in the generated
!submissionInfo.txtfile.Screenshots

Cuesheet opened as Shift-JIS:
MPF generated

!submissionInfo.txt:MPF generated

!submissionInfo.txtopened as Shift-JIS:Additional context
This issue appears challenging to fix due to the difficulty in reliably detecting Shift-JIS encoding. Since audio CDs with such metadata are rare, the bug might only affect a small subset of users, but resolving it would improve the accuracy of MPF's submissions.
Logs from this dump
@mnadareski commented on GitHub (May 1, 2025):
Mixed encodings are an absolute pain to deal with and this has, unfortunately, come up quite a few times. One of the only ways around this would be to pass along the cuesheet data without trying to read it. Unfortunately, a few things rely on being able to read the cuesheet in some ways.
I don't anticipate that we will have an easy fix for this, nor one that will come quickly.
@mnadareski commented on GitHub (May 1, 2025):
To anyone who is reading this closed issue:
The fix was finding that someone had already figured out how to detect Shift-JIS within a byte array representing text. This allowed a couple of internal iterations of integrating that approach. It did lead to all full-file reads being translated into binary-first reads, but this does not seem to have any negative performance or output impacts.