mirror of
https://github.com/CCExtractor/ccextractor.git
synced 2026-02-03 21:23:48 +00:00
[PR #1671] [MERGED] [FIX] Issue#1665 Enhanced Matroska Language Tag Handling #2376
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
📋 Pull Request Information
Original PR: https://github.com/CCExtractor/ccextractor/pull/1671
Author: @RemZapCypher
Created: 3/1/2025
Status: ✅ Merged
Merged: 3/23/2025
Merged by: @cfsmp3
Base:
master← Head:mkv-IETF-tag-fix📝 Commits (3)
dbc6a6dfix unknown element for IETF tage74e00aadded documentation changes54e07c4added formatting for clang-format📊 Changes
3 files changed (+95 additions, -27 deletions)
View changed files
📝
docs/CHANGES.TXT(+1 -0)📝
src/lib_ccx/matroska.c(+92 -27)📝
src/lib_ccx/matroska.h(+2 -0)📄 Description
In raising this pull request, I confirm the following:
My familiarity with the project is as follows:
Description
Introduced improved handling of language tags in the Matroska parser. It addresses an issue where IETF BCP47 language tags (e.g., "en-US") were not being correctly processed, leading to potential segmentation faults and inaccurate subtitle extraction. Like in issue #1665
The Initial Problem: Modern MKV Files and IETF Language Tags
Modern Matroska (MKV) files are increasingly using IETF BCP47 language tags to identify subtitle tracks. These tags offer more precision than the traditional 3-letter ISO 639-2 codes, allowing for specification of regional variations, scripts, and other linguistic details (e.g.,
en-GBfor British English,es-MXfor Mexican Spanish).The existing parser was primarily designed for the older 3-letter codes and did not fully account for the presence and proper handling of these IETF tags. This resulted in the parser failing to correctly identify and utilize the IETF language tags, leading to issues such as:
Summary of Changes
sub_track->lang_ietf = lang_ietf;during subtitle track creation to ensure IETF language tags are properly stored in thematroska_sub_trackstructure.generate_filename_from_track()to prioritize IETF language tags when available, creating more descriptive and accurate filenames.matroska_save_all()to first attempt matching against IETF language tags before falling back to 3-letter ISO 639 codes, improving language selection accuracy.lang_ietffield to prevent memory leaks and segmentation faults.This enhancement is crucial for:
How Has This Been Tested?
Thank you,
Tank0nf.
🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.