[PR #56] [MERGED] Extract Ms-cabs while reading instead of loading all datablocks into memory #91

New Issue

claunia · 2026-01-29T21:16:58Z

claunia commented

2026-01-29 21:16:58 +00:00

📋 Pull Request Information

Original PR: https://github.com/SabreTools/SabreTools.Serialization/pull/56
Author: @HeroponRikiBestest
Created: 1/15/2026
Status: ✅ Merged
Merged: 1/24/2026
Merged by: @mnadareski

Base: main ← Head: mscab-read-extract

📝 Commits (10+)

481e85d WIP
962167a WIP 2
25c29eb Todo: you're missing a read somehow and getting misaligned by two bytes? maybe properly implementing the buffer will magically fix it
91e2c48 continued blocks my behated
7e66a9f Pre-major-testing
ebfa57d Forgot to add summaries
15d9c27 Attempt to properly roll back. The state i wanted to roll back to wasn't in a commit before.
b8be815 Figured out the issue with the rolled back commit, this has to be a while loop because of 0 byte files. Reimplemented clean code.
631b46b Comment so I don't forget why it's like this
b2302c1 Skip unsupported compression types before opening filestream.

📊 Changes

3 files changed (+348 additions, -318 deletions)

View changed files

📝 SabreTools.Serialization/Readers/MicrosoftCabinet.cs (+4 -4)
📝 SabreTools.Serialization/Wrappers/MicrosoftCabinet.Extraction.cs (+324 -137)
📝 SabreTools.Serialization/Wrappers/MicrosoftCabinet.cs (+20 -177)

📄 Description

Also fixes https://github.com/SabreTools/SabreTools.Serialization/issues/44 somehow.

While this is marked as draft, it's basically done. The only things I still want to do are add failsafes so that as much extraction as possible can continue to happen if errors occur, fix some edge cases (at present, I only know of one file in one cab set that's a regression, the other issues seem to have already been present), and convert most of my TODOs into PR comments. While I did mention before that I wanted to split up the main extraction loop, trying to do so resulted in some really ugly code due to how much state needs to be shared, so I went back on that. I'm perfectly fine with splitting things up if still asked, though.

_{🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.}

## 📋 Pull Request Information **Original PR:** https://github.com/SabreTools/SabreTools.Serialization/pull/56 **Author:** [@HeroponRikiBestest](https://github.com/HeroponRikiBestest) **Created:** 1/15/2026 **Status:** ✅ Merged **Merged:** 1/24/2026 **Merged by:** [@mnadareski](https://github.com/mnadareski) **Base:** `main` ← **Head:** `mscab-read-extract` --- ### 📝 Commits (10+) - [`481e85d`](https://github.com/SabreTools/SabreTools.Serialization/commit/481e85def643d500e557a144baac8b3af13bbad1) WIP - [`962167a`](https://github.com/SabreTools/SabreTools.Serialization/commit/962167aafc553a66c8218fe5a6a9b2b8a583199f) WIP 2 - [`25c29eb`](https://github.com/SabreTools/SabreTools.Serialization/commit/25c29eb0e32e91d5dfe6648c4da29fe99b5b9a91) Todo: you're missing a read somehow and getting misaligned by two bytes? maybe properly implementing the buffer will magically fix it - [`91e2c48`](https://github.com/SabreTools/SabreTools.Serialization/commit/91e2c48a1f62a6e755fb7c35043274199c66ee87) continued blocks my behated - [`7e66a9f`](https://github.com/SabreTools/SabreTools.Serialization/commit/7e66a9f18aead9738480978f5979784b2a51591b) Pre-major-testing - [`ebfa57d`](https://github.com/SabreTools/SabreTools.Serialization/commit/ebfa57db1b4bfecca022ecb2467bfc954a14ecb3) Forgot to add summaries - [`15d9c27`](https://github.com/SabreTools/SabreTools.Serialization/commit/15d9c276797a7138074780ff41c9b6454c110c22) Attempt to properly roll back. The state i wanted to roll back to wasn't in a commit before. - [`b8be815`](https://github.com/SabreTools/SabreTools.Serialization/commit/b8be815c68000559a07d2f10257a4850f53e32d8) Figured out the issue with the rolled back commit, this has to be a while loop because of 0 byte files. Reimplemented clean code. - [`631b46b`](https://github.com/SabreTools/SabreTools.Serialization/commit/631b46b46f9822e3c645521f9e15da9136a88112) Comment so I don't forget why it's like this - [`b2302c1`](https://github.com/SabreTools/SabreTools.Serialization/commit/b2302c1992b2291ea2ef169b2aaf85605e5a594f) Skip unsupported compression types before opening filestream. ### 📊 Changes **3 files changed** (+348 additions, -318 deletions) <details> <summary>View changed files</summary> 📝 `SabreTools.Serialization/Readers/MicrosoftCabinet.cs` (+4 -4) 📝 `SabreTools.Serialization/Wrappers/MicrosoftCabinet.Extraction.cs` (+324 -137) 📝 `SabreTools.Serialization/Wrappers/MicrosoftCabinet.cs` (+20 -177) </details> ### 📄 Description Also fixes https://github.com/SabreTools/SabreTools.Serialization/issues/44 somehow. While this is marked as draft, it's basically done. The only things I still want to do are add failsafes so that as much extraction as possible can continue to happen if errors occur, fix some edge cases (at present, I only know of one file in one cab set that's a regression, the other issues seem to have already been present), and convert most of my TODOs into PR comments. While I did mention before that I wanted to split up the main extraction loop, trying to do so resulted in some really ugly code due to how much state needs to be shared, so I went back on that. I'm perfectly fine with splitting things up if still asked, though. --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>

claunia added the pull-request label 2026-01-29 21:16:58 +00:00

Sign in to join this conversation.

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: SabreTools/SabreTools.Serialization#91