[PR #2035] fix mkvlang_params_check: prevent panic on multi-byte characters #2842

Closed
opened 2026-01-29 17:24:11 +00:00 by claunia · 0 comments
Owner

Original Pull Request: https://github.com/CCExtractor/ccextractor/pull/2035

State: closed
Merged: Yes


In raising this pull request, I confirm the following (please check boxes):

  • I have read and understood the contributors guide.
  • I have checked that another pull request for this purpose does not exist.
  • I have considered, and confirmed that this submission will be valuable to others.
  • I accept that this submission may not be used, and the pull request closed at the will of the maintainer.
  • I give this submission freely, and claim no ownership to its content.
  • I have mentioned this change in the changelog.

My familiarity with the project is as follows (check one):

  • I have never used CCExtractor.
  • I have used CCExtractor just a couple of times.
  • I absolutely love CCExtractor, but have not contributed previously.
  • I am an active contributor to CCExtractor.

Description

The mkvlang_params_check function validates language codes in MKV files, but the previous implementation assumed 1 byte per character and used string indices derived from to_lowercase(). This caused panics when multi-byte characters (like ç) were used, because slicing by byte indices can go out-of-bounds.

Problem

  • Multi-byte characters break the original index-based logic.
  • Inputs like "ç,eng" would trigger a runtime panic.
  • Standard ISO codes (like "chi,eng") worked, but any non-ASCII input was unsafe.

Fix

  • Split the input string on commas using lang.split(',').
  • Count characters with chars().count() instead of relying on byte indices.
  • Enforce the same rules as before: each code must be 3–6 characters, and 6-character codes must include a -
  • Handles non-ASCII input safely without panicking.

Example after fix

  • "chi,eng" → passes validation (3 chars each)
  • "chi-tra,eng" → passes validation (6 chars with -)
  • "ç,eng" → triggers a controlled fatal error, but does not panic
**Original Pull Request:** https://github.com/CCExtractor/ccextractor/pull/2035 **State:** closed **Merged:** Yes --- <!-- Please prefix your pull request with one of the following: **[FEATURE]** **[FIX]** **[IMPROVEMENT]**. --> **In raising this pull request, I confirm the following (please check boxes):** - [x] I have read and understood the [contributors guide](https://github.com/CCExtractor/ccextractor/blob/master/.github/CONTRIBUTING.md). - [x] I have checked that another pull request for this purpose does not exist. - [x] I have considered, and confirmed that this submission will be valuable to others. - [x] I accept that this submission may not be used, and the pull request closed at the will of the maintainer. - [x] I give this submission freely, and claim no ownership to its content. - [x] **I have mentioned this change in the [changelog](https://github.com/CCExtractor/ccextractor/blob/master/docs/CHANGES.TXT).** **My familiarity with the project is as follows (check one):** - [ ] I have never used CCExtractor. - [ ] I have used CCExtractor just a couple of times. - [ ] I absolutely love CCExtractor, but have not contributed previously. - [x] I am an active contributor to CCExtractor. --- ### Description The mkvlang_params_check function validates language codes in MKV files, but the previous implementation assumed 1 byte per character and used string indices derived from to_lowercase(). This caused panics when multi-byte characters (like ç) were used, because slicing by byte indices can go out-of-bounds. ### Problem - Multi-byte characters break the original index-based logic. - Inputs like "ç,eng" would trigger a runtime panic. - Standard ISO codes (like "chi,eng") worked, but any non-ASCII input was unsafe. ### Fix - Split the input string on commas using lang.split(','). - Count characters with chars().count() instead of relying on byte indices. - Enforce the same rules as before: each code must be 3–6 characters, and 6-character codes must include a - - Handles non-ASCII input safely without panicking. Example after fix - "chi,eng" → passes validation (3 chars each) - "chi-tra,eng" → passes validation (6 chars with -) - "ç,eng" → triggers a controlled fatal error, but does not panic
claunia added the pull-request label 2026-01-29 17:24:11 +00:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/ccextractor#2842