Archive compatibility and performance getting worse with releases #472

Closed
opened 2026-01-29 22:12:35 +00:00 by claunia · 3 comments
Owner

Originally created by @schouffy on GitHub (Sep 11, 2021).

First I'd like to thank Adam and all the contributors for the library, I've been using it for years and it's been a big factor in the success of the comic reader I'm developing.

Now, Recently I tried to support RAR5 files in the app and it proves trickier than expected. The library supports it since 0.21, but I was stuck at 0.12.4 for several reasons.

I have a set of files that I know can be tricky to open for libraries, so I always prove test updates to the libraries with this set of comic books.
I have a testing tool that basically, loops through the files, find the images within them, extracts them and displays them in a test UI. It keeps track of which files opened, which errored, and also measures performance.

When trying to update from 0.12.4, I've come across the following issues:

  • 0.13.0 broke the support for some files which worked in 0.12.4, and also killed performance (up to 5 or 6 times slower for some of the archives)
  • 0.21.0 added support for RAR5, but broke support for more of my test archives (still working in 0.20.0), now indicating "file crc mismatch"

If I didn't specify any other version (I've tried up to 0.28.3), it's because they don't improve or break anything regarding file support, and performance doesn't change noticeably.

I should mention that all of the archives seem clean, and don't give any error when tested by winrar 6.0.2 or 7-zip 19.

I tried to step through the source code, but I hope someone can help before I dive in more because frankly, that may be too much of a challenge for me.

I can put together a repro project and all of the archives, if I get a sign of interest here.

Thanks again

Originally created by @schouffy on GitHub (Sep 11, 2021). First I'd like to thank Adam and all the contributors for the library, I've been using it for years and it's been a big factor in the success of the comic reader I'm developing. Now, Recently I tried to support RAR5 files in the app and it proves trickier than expected. The library supports it since 0.21, but I was stuck at 0.12.4 for several reasons. I have a set of files that I know can be tricky to open for libraries, so I always prove test updates to the libraries with this set of comic books. I have a testing tool that basically, loops through the files, find the images within them, extracts them and displays them in a test UI. It keeps track of which files opened, which errored, and also measures performance. When trying to update from 0.12.4, I've come across the following issues: - 0.13.0 broke the support for some files which worked in 0.12.4, and also killed performance (up to 5 or 6 times slower for some of the archives) - 0.21.0 added support for RAR5, but broke support for more of my test archives (still working in 0.20.0), now indicating "file crc mismatch" If I didn't specify any other version (I've tried up to 0.28.3), it's because they don't improve or break anything regarding file support, and performance doesn't change noticeably. I should mention that all of the archives seem clean, and don't give any error when tested by winrar 6.0.2 or 7-zip 19. I tried to step through the source code, but I hope someone can help before I dive in more because frankly, that may be too much of a challenge for me. I can put together a repro project and all of the archives, if I get a sign of interest here. Thanks again
claunia added the question label 2026-01-29 22:12:35 +00:00
Author
Owner

@adamhathcock commented on GitHub (Sep 12, 2021):

Having tests and examples will be useful.

Frankly, this project's scope is huge mainly due to the fact that file formats are hard. Zip is a minefield but straight-forward. Tar has so much legacy I don't know where to begin. Rar is okay but hard to track down. 7Zip is just gross.

Then there is the compression algorithims themselves which I know nothing about and don't even really try. Encryption and CRC are really afterthoughts for me. I've attempted to make everything async in a branch and maybe I was too ambitious with the changes (also converted to Span/Memory usage) but half the tests don't currently pass.

I'd be interested in seeing why 0.13 broke some stuff as well as slowed down.

My biggest issue is time between work and a young family. I also don't have the passion I used to. Though, I still play around with Blazor based comic readers when I get a chance.

I'm willing to help with questions via issues or even schedule a Google Meet or something if people want. I'd like this project to be more than just myself but after more than a decade, help has been sporatic and little. That's fine, iIt's the nature of open source. I just hoped to have a few maintainers by now.

@adamhathcock commented on GitHub (Sep 12, 2021): Having tests and examples will be useful. Frankly, this project's scope is huge mainly due to the fact that file formats are hard. Zip is a minefield but straight-forward. Tar has so much legacy I don't know where to begin. Rar is okay but hard to track down. 7Zip is just gross. Then there is the compression algorithims themselves which I know nothing about and don't even really try. Encryption and CRC are really afterthoughts for me. I've attempted to make everything async in a branch and maybe I was too ambitious with the changes (also converted to Span/Memory usage) but half the tests don't currently pass. I'd be interested in seeing why 0.13 broke some stuff as well as slowed down. My biggest issue is time between work and a young family. I also don't have the passion I used to. Though, I still play around with Blazor based comic readers when I get a chance. I'm willing to help with questions via issues or even schedule a Google Meet or something if people want. I'd like this project to be more than just myself but after more than a decade, help has been sporatic and little. That's fine, iIt's the nature of open source. I just hoped to have a few maintainers by now.
Author
Owner

@schouffy commented on GitHub (Sep 15, 2021):

Man... Do I feel stupid...

When I migrated my code after the refactoring of 0.13, I opened my archives through ReaderFactory.Open. (As the same code is used to open zip/rar/tar/7z/cbr/cbz/cbt/cb7 in my app).
When trying to put together a repro project to send it here, I used RarArchive.Open "just to try out". And everything works flawlessly using RarArchive. All the files pass and the performance is roughly the same as 0.12.4.

I'm really sorry for the noise. Isn't it expected that performance and compatibility should be the same using the ReaderFactory though?

Thanks for taking the time to reply, and again my many thanks for your hard work. I hope you didn't take offense for my initial message, and that I didn't appear rude as I really measure the value of what you do here.

@schouffy commented on GitHub (Sep 15, 2021): Man... Do I feel stupid... When I migrated my code after the refactoring of 0.13, I opened my archives through ReaderFactory.Open. (As the same code is used to open zip/rar/tar/7z/cbr/cbz/cbt/cb7 in my app). When trying to put together a repro project to send it here, I used RarArchive.Open "just to try out". And everything works flawlessly using RarArchive. All the files pass and the performance is roughly the same as 0.12.4. I'm really sorry for the noise. Isn't it expected that performance and compatibility should be the same using the ReaderFactory though? Thanks for taking the time to reply, and again my many thanks for your hard work. I hope you didn't take offense for my initial message, and that I didn't appear rude as I really measure the value of what you do here.
Author
Owner

@adamhathcock commented on GitHub (Sep 20, 2021):

No offense taken at all. People are coming from all types of backgrounds and you get a thick skin after a while in issues. I just try to assume people are usually nice and genuine.

On ReaderFactory vs ArchiveFactory....readers work assuming that the stream is forward-only so there is no random access. This means all bytes have to be read from the stream since it might be a network stream. Archives assume there is random access (e.g. a file) so you can skip all over the stream with no issue. Usually people want archive factory unless they're doing stuff over a network or a very very large file.

@adamhathcock commented on GitHub (Sep 20, 2021): No offense taken at all. People are coming from all types of backgrounds and you get a thick skin after a while in issues. I just try to assume people are usually nice and genuine. On ReaderFactory vs ArchiveFactory....readers work assuming that the stream is forward-only so there is no random access. This means all bytes have to be read from the stream since it might be a network stream. Archives assume there is random access (e.g. a file) so you can skip all over the stream with no issue. Usually people want archive factory unless they're doing stuff over a network or a very very large file.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/sharpcompress#472