[PR #2008] fix(matroska): abort parsing on invalid EBML ID to prevent infinite loop #2811

Closed
opened 2026-01-29 17:24:01 +00:00 by claunia · 0 comments
Owner

Original Pull Request: https://github.com/CCExtractor/ccextractor/pull/2008

State: closed
Merged: No


[FIX] matroska: Abort parsing on invalid element ID (0xFFFFFFFF) to prevent infinite loops

In raising this pull request, I confirm the following (please check boxes):

  • I have read and understood the contributors guide.
  • I have checked that another pull request for this purpose does not exist.
  • I have considered, and confirmed that this submission will be valuable to others.
  • I accept that this submission may not be used, and the pull request closed at the will of the maintainer.
  • I give this submission freely, and claim no ownership to its content.
  • I have mentioned this change in the changelog.

My familiarity with the project is as follows (check one):

  • I have never used CCExtractor.
  • I have used CCExtractor just a couple of times.
  • I absolutely love CCExtractor, but have not contributed previously.
  • I am an active contributor to CCExtractor.

Summary

This PR fixes a critical regression where the legacy C Matroska parser (matroska.c) enters an infinite loop when encountering invalid EBML IDs (specifically 0xFFFFFFFF) or EOF conditions inside segment/cluster loops.

The Issue

In parse_segment, parse_segment_cluster, and related functions, the loop structure was:

  1. Read EBML ID.
  2. If ID is unknown, log warning and skip_bytes.
  3. If the ID is 0xFFFFFFFF (EOF/Error), skip_bytes skips 0 bytes.
  4. The loop repeats indefinitely, consuming 100% CPU and flooding logs.

The Fix

Added an explicit check for code == 0xFFFFFFFF in the parsing loops. The parser now detects this "Invalid ID" state and aborts the loop gracefully with a specific error message.

Verification

Tested against corrupted/truncated MKV samples that previously caused hangs.

  • Before: Infinite loop, process hangs.
    image

  • After: Logs "Invalid EBML ID... Aborting segment parsing" and exits successfully.
    image
    image

**Original Pull Request:** https://github.com/CCExtractor/ccextractor/pull/2008 **State:** closed **Merged:** No --- [FIX] matroska: Abort parsing on invalid element ID (0xFFFFFFFF) to prevent infinite loops **In raising this pull request, I confirm the following (please check boxes):** - [x] I have read and understood the [contributors guide](https://github.com/CCExtractor/ccextractor/blob/master/.github/CONTRIBUTING.md). - [x] I have checked that another pull request for this purpose does not exist. - [x] I have considered, and confirmed that this submission will be valuable to others. - [x] I accept that this submission may not be used, and the pull request closed at the will of the maintainer. - [x] I give this submission freely, and claim no ownership to its content. - [ ] **I have mentioned this change in the [changelog](https://github.com/CCExtractor/ccextractor/blob/master/docs/CHANGES.TXT).** **My familiarity with the project is as follows (check one):** - [ ] I have never used CCExtractor. - [ ] I have used CCExtractor just a couple of times. - [x] I absolutely love CCExtractor, but have not contributed previously. - [ ] I am an active contributor to CCExtractor. --- ### Summary This PR fixes a critical regression where the legacy C Matroska parser (`matroska.c`) enters an infinite loop when encountering invalid EBML IDs (specifically `0xFFFFFFFF`) or EOF conditions inside segment/cluster loops. ### The Issue In `parse_segment`, `parse_segment_cluster`, and related functions, the loop structure was: 1. Read EBML ID. 2. If ID is unknown, log warning and `skip_bytes`. 3. If the ID is `0xFFFFFFFF` (EOF/Error), `skip_bytes` skips 0 bytes. 4. The loop repeats indefinitely, consuming 100% CPU and flooding logs. ### The Fix Added an explicit check for `code == 0xFFFFFFFF` in the parsing loops. The parser now detects this "Invalid ID" state and aborts the loop gracefully with a specific error message. ### Verification Tested against corrupted/truncated MKV samples that previously caused hangs. * **Before:** Infinite loop, process hangs. <img width="884" height="688" alt="image" src="https://github.com/user-attachments/assets/eb74db0b-e042-4747-9708-40662d4b8c0b" /> * **After:** Logs "Invalid EBML ID... Aborting segment parsing" and exits successfully. <img width="937" height="770" alt="image" src="https://github.com/user-attachments/assets/f311bd87-3aa6-46a0-bd64-0cb89425ebf9" /> <img width="1267" height="467" alt="image" src="https://github.com/user-attachments/assets/9bd7e2f7-d32e-47c8-99cb-e737b9f19e8d" />
claunia added the pull-request label 2026-01-29 17:24:01 +00:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/ccextractor#2811