mirror of
https://github.com/adamhathcock/sharpcompress.git
synced 2026-02-07 13:44:36 +00:00
Archive compatibility and performance getting worse with releases #472
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @schouffy on GitHub (Sep 11, 2021).
First I'd like to thank Adam and all the contributors for the library, I've been using it for years and it's been a big factor in the success of the comic reader I'm developing.
Now, Recently I tried to support RAR5 files in the app and it proves trickier than expected. The library supports it since 0.21, but I was stuck at 0.12.4 for several reasons.
I have a set of files that I know can be tricky to open for libraries, so I always prove test updates to the libraries with this set of comic books.
I have a testing tool that basically, loops through the files, find the images within them, extracts them and displays them in a test UI. It keeps track of which files opened, which errored, and also measures performance.
When trying to update from 0.12.4, I've come across the following issues:
If I didn't specify any other version (I've tried up to 0.28.3), it's because they don't improve or break anything regarding file support, and performance doesn't change noticeably.
I should mention that all of the archives seem clean, and don't give any error when tested by winrar 6.0.2 or 7-zip 19.
I tried to step through the source code, but I hope someone can help before I dive in more because frankly, that may be too much of a challenge for me.
I can put together a repro project and all of the archives, if I get a sign of interest here.
Thanks again
@adamhathcock commented on GitHub (Sep 12, 2021):
Having tests and examples will be useful.
Frankly, this project's scope is huge mainly due to the fact that file formats are hard. Zip is a minefield but straight-forward. Tar has so much legacy I don't know where to begin. Rar is okay but hard to track down. 7Zip is just gross.
Then there is the compression algorithims themselves which I know nothing about and don't even really try. Encryption and CRC are really afterthoughts for me. I've attempted to make everything async in a branch and maybe I was too ambitious with the changes (also converted to Span/Memory usage) but half the tests don't currently pass.
I'd be interested in seeing why 0.13 broke some stuff as well as slowed down.
My biggest issue is time between work and a young family. I also don't have the passion I used to. Though, I still play around with Blazor based comic readers when I get a chance.
I'm willing to help with questions via issues or even schedule a Google Meet or something if people want. I'd like this project to be more than just myself but after more than a decade, help has been sporatic and little. That's fine, iIt's the nature of open source. I just hoped to have a few maintainers by now.
@schouffy commented on GitHub (Sep 15, 2021):
Man... Do I feel stupid...
When I migrated my code after the refactoring of 0.13, I opened my archives through ReaderFactory.Open. (As the same code is used to open zip/rar/tar/7z/cbr/cbz/cbt/cb7 in my app).
When trying to put together a repro project to send it here, I used RarArchive.Open "just to try out". And everything works flawlessly using RarArchive. All the files pass and the performance is roughly the same as 0.12.4.
I'm really sorry for the noise. Isn't it expected that performance and compatibility should be the same using the ReaderFactory though?
Thanks for taking the time to reply, and again my many thanks for your hard work. I hope you didn't take offense for my initial message, and that I didn't appear rude as I really measure the value of what you do here.
@adamhathcock commented on GitHub (Sep 20, 2021):
No offense taken at all. People are coming from all types of backgrounds and you get a thick skin after a while in issues. I just try to assume people are usually nice and genuine.
On ReaderFactory vs ArchiveFactory....readers work assuming that the stream is forward-only so there is no random access. This means all bytes have to be read from the stream since it might be a network stream. Archives assume there is random access (e.g. a file) so you can skip all over the stream with no issue. Usually people want archive factory unless they're doing stuff over a network or a very very large file.