[PR #1733] [MERGED] fix: unicode encoding regression #2452

Open
opened 2026-01-29 17:22:13 +00:00 by claunia · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/CCExtractor/ccextractor/pull/1733
Author: @hrideshmg
Created: 8/19/2025
Status: Merged
Merged: 8/24/2025
Merged by: @prateekmedia

Base: masterHead: unicode_fix


📝 Commits (1)

  • 1ce23ab fix: unicode encoding regression

📊 Changes

1 file changed (+2 additions, -2 deletions)

View changed files

📝 src/rust/lib_ccxr/src/util/encoding.rs (+2 -2)

📄 Description

In raising this pull request, I confirm the following (please check boxes):

  • I have read and understood the contributors guide.
  • I have checked that another pull request for this purpose does not exist.
  • I have considered, and confirmed that this submission will be valuable to others.
  • I accept that this submission may not be used, and the pull request closed at the will of the maintainer.
  • I give this submission freely, and claim no ownership to its content.
  • I have mentioned this change in the changelog.

My familiarity with the project is as follows (check one):

  • I have never used CCExtractor.
  • I have used CCExtractor just a couple of times.
  • I absolutely love CCExtractor, but have not contributed previously.
  • I am an active contributor to CCExtractor.

CCextractor does not currently produce the correct output when we try to encode the subtitles in unicode (by passing --unicode). This is a regression currently reported by the sample platform under the options category.

The problem seems to be caused by order differences between the C and Rust enums. The C enum has the unicode entry at position 0 but the Rust enum has it at position 3.

I'm not exactly sure why this fixes the issue because we are using explicit match by value statements when converting between Rust and C so this shouldn't make a difference but it does yield the correct output nonetheless. I'm guessing its some bindgen quirk.


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/CCExtractor/ccextractor/pull/1733 **Author:** [@hrideshmg](https://github.com/hrideshmg) **Created:** 8/19/2025 **Status:** ✅ Merged **Merged:** 8/24/2025 **Merged by:** [@prateekmedia](https://github.com/prateekmedia) **Base:** `master` ← **Head:** `unicode_fix` --- ### 📝 Commits (1) - [`1ce23ab`](https://github.com/CCExtractor/ccextractor/commit/1ce23ab107c92b2222044074d8af694e2883287f) fix: unicode encoding regression ### 📊 Changes **1 file changed** (+2 additions, -2 deletions) <details> <summary>View changed files</summary> 📝 `src/rust/lib_ccxr/src/util/encoding.rs` (+2 -2) </details> ### 📄 Description <!-- Please prefix your pull request with one of the following: **[FEATURE]** **[FIX]** **[IMPROVEMENT]**. --> **In raising this pull request, I confirm the following (please check boxes):** - [ ] I have read and understood the [contributors guide](https://github.com/CCExtractor/ccextractor/blob/master/.github/CONTRIBUTING.md). - [ ] I have checked that another pull request for this purpose does not exist. - [ ] I have considered, and confirmed that this submission will be valuable to others. - [ ] I accept that this submission may not be used, and the pull request closed at the will of the maintainer. - [ ] I give this submission freely, and claim no ownership to its content. - [x] **I have mentioned this change in the [changelog](https://github.com/CCExtractor/ccextractor/blob/master/docs/CHANGES.TXT).** **My familiarity with the project is as follows (check one):** - [ ] I have never used CCExtractor. - [ ] I have used CCExtractor just a couple of times. - [ ] I absolutely love CCExtractor, but have not contributed previously. - [x] I am an active contributor to CCExtractor. --- CCextractor does not currently produce the correct output when we try to encode the subtitles in unicode (by passing `--unicode`). This is a regression currently reported by the sample platform under the options category. The problem seems to be caused by order differences between the C and Rust enums. The C [enum](https://github.com/CCExtractor/ccextractor/blob/master/src/lib_ccx/ccx_common_constants.h#L222) has the unicode entry at position 0 but the Rust [enum](https://github.com/CCExtractor/ccextractor/blob/master/src/rust/lib_ccxr/src/util/encoding.rs#L43) has it at position 3. I'm not exactly sure *why* this fixes the issue because we are using explicit match by value statements when converting between Rust and C so this shouldn't make a difference but it does yield the correct output nonetheless. I'm guessing its some bindgen quirk. --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
claunia added the pull-request label 2026-01-29 17:22:13 +00:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/ccextractor#2452