[Problem] Shift-JIS in cuesheet incorrectly interpreted as UTF-8, resulting in unreadable Japanese text #813

Closed
opened 2026-01-29 16:22:38 +00:00 by claunia · 2 comments
Owner

Originally created by @Daja177 on GitHub (Feb 27, 2025).

Originally assigned to: @mnadareski on GitHub.

Version
What version are you using?

  • Stable release (3.3.0-679b40de7c089c3705ef008fbb720f6a4eabc6f1)

Build
What runtime version are you using?

  • .NET 9.0 running on Windows 11

Describe the issue
When dumping an audio CD using Redumper with cuesheet metadata that includes Japanese text encoded in Shift-JIS, the cuesheet data is copied into the Redumper log as raw bytes. MPF then takes this data and interprets it as UTF-8, resulting in mangled, unreadable text in the generated !submissionInfo.txt file—regardless of whether you attempt to open it as UTF-8 or Shift-JIS.

To Reproduce
Steps to reproduce the behavior:

  1. Dump an audio CD with MPF using Redumper as the dumping program.
  2. Open the resulting Redumper log file as Shift-JIS to confirm that the Japanese text in the cuesheet is displayed correctly.
  3. Open the generated !submissionInfo.txt file and note that it is encoded as UTF-8, resulting in the Japanese text appearing mangled and unreadable—even if you attempt to open it as Shift-JIS.

Expected behavior
MPF should detect that the cuesheet metadata is encoded in Shift-JIS and convert it properly to UTF-8 so that the Japanese text remains readable in the generated !submissionInfo.txt file.

Screenshots
Cuesheet opened as Shift-JIS:
Image

MPF generated !submissionInfo.txt:
Image

MPF generated !submissionInfo.txt opened as Shift-JIS:
Image

Additional context
This issue appears challenging to fix due to the difficulty in reliably detecting Shift-JIS encoding. Since audio CDs with such metadata are rare, the bug might only affect a small subset of users, but resolving it would improve the accuracy of MPF's submissions.

Logs from this dump

Originally created by @Daja177 on GitHub (Feb 27, 2025). Originally assigned to: @mnadareski on GitHub. **Version** What version are you using? - [x] Stable release (3.3.0-679b40de7c089c3705ef008fbb720f6a4eabc6f1) **Build** What runtime version are you using? - [x] .NET 9.0 running on Windows 11 **Describe the issue** When dumping an audio CD using Redumper with cuesheet metadata that includes Japanese text encoded in Shift-JIS, the cuesheet data is copied into the Redumper log as raw bytes. MPF then takes this data and interprets it as UTF-8, resulting in mangled, unreadable text in the generated `!submissionInfo.txt` file—regardless of whether you attempt to open it as UTF-8 or Shift-JIS. **To Reproduce** Steps to reproduce the behavior: 1. Dump an audio CD with MPF using Redumper as the dumping program. 2. Open the resulting Redumper log file as Shift-JIS to confirm that the Japanese text in the cuesheet is displayed correctly. 3. Open the generated `!submissionInfo.txt` file and note that it is encoded as UTF-8, resulting in the Japanese text appearing mangled and unreadable—even if you attempt to open it as Shift-JIS. **Expected behavior** MPF should detect that the cuesheet metadata is encoded in Shift-JIS and convert it properly to UTF-8 so that the Japanese text remains readable in the generated `!submissionInfo.txt` file. **Screenshots** Cuesheet opened as Shift-JIS: ![Image](https://github.com/user-attachments/assets/03a2ab05-3b54-46a1-bb97-10bab514f3d7) MPF generated `!submissionInfo.txt`: ![Image](https://github.com/user-attachments/assets/076533e3-f8df-402f-abe8-d0f831ee2eff) MPF generated `!submissionInfo.txt` opened as Shift-JIS: ![Image](https://github.com/user-attachments/assets/1d3ff1f8-9747-4fdb-b46b-09eca047c96b) **Additional context** This issue appears challenging to fix due to the difficulty in reliably detecting Shift-JIS encoding. Since audio CDs with such metadata are rare, the bug might only affect a small subset of users, but resolving it would improve the accuracy of MPF's submissions. [Logs from this dump](https://github.com/user-attachments/files/19012552/Kandagawa.Jet.Girls.DX.Jet.Pack.Sound.Track.Disc1_logs.zip)
claunia added the help wantedbug labels 2026-01-29 16:22:38 +00:00
Author
Owner

@mnadareski commented on GitHub (May 1, 2025):

Mixed encodings are an absolute pain to deal with and this has, unfortunately, come up quite a few times. One of the only ways around this would be to pass along the cuesheet data without trying to read it. Unfortunately, a few things rely on being able to read the cuesheet in some ways.

I don't anticipate that we will have an easy fix for this, nor one that will come quickly.

@mnadareski commented on GitHub (May 1, 2025): Mixed encodings are an absolute pain to deal with and this has, unfortunately, come up quite a few times. One of the only ways around this would be to pass along the cuesheet data without trying to read it. Unfortunately, a few things rely on being able to read the cuesheet in some ways. I don't anticipate that we will have an easy fix for this, nor one that will come quickly.
Author
Owner

@mnadareski commented on GitHub (May 1, 2025):

To anyone who is reading this closed issue:

The fix was finding that someone had already figured out how to detect Shift-JIS within a byte array representing text. This allowed a couple of internal iterations of integrating that approach. It did lead to all full-file reads being translated into binary-first reads, but this does not seem to have any negative performance or output impacts.

@mnadareski commented on GitHub (May 1, 2025): To anyone who is reading this closed issue: The fix was finding that someone had already figured out how to detect Shift-JIS within a byte array representing text. This allowed a couple of internal iterations of integrating that approach. It did lead to all full-file reads being translated into binary-first reads, but this does not seem to have any negative performance or output impacts.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: SabreTools/MPF#813