mirror of
https://github.com/adamhathcock/sharpcompress.git
synced 2026-02-03 21:23:38 +00:00
GZipped Excel xlsx files choke #232
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @mklaber on GitHub (Aug 28, 2017).
Excel's xlsx format is really just a Zipped XML file. If such files are gzipped, the
ReaderFactoryseems to try to un-gzip and then un-zip the content. This leads to anIEntry.Keyof the first parts of the file rather than the name of the file.To reproduce:
gzip -k Book1.xlsxReaderFactory:The
Entry.Keyvalue that is dumped isPK ! A7��n [Content_Types].xml �(�I'd expect the
Keyto be the file nameBook1.xlsx(or at least not the first lines of the file).Open to other suggestions on how it should work but as it stands I'd have to special case for
*.xlsx.gzfiles which seems to defeat the purpose of a generalReaderFactorythat can handle any of the supported formats you throw at it.Book1.xlsx
Book1.xlsx.gz
Update: it looks like the underlying issue is that
ReaderFactory's call toTarArchive.IsTarFilereturns true for *.xlsx files: https://github.com/adamhathcock/sharpcompress/blob/master/src/SharpCompress/Readers/ReaderFactory.cs#L48@Kim-SSi commented on GitHub (May 25, 2019):
@adamhathcock
I have been unable to figure out a good way to fix the TarArchive.IsTarFile detection. As part of the proposed fix I moved IsTarFile to the end of the Open.
The TarHeader will sometimes accept a file as a Tar in a compressed stream, gz, bz2 etc even when it is not.
What are your thoughts on adding an option to ReaderOptions, like TryOpenArchiveInStream?
Then make the Open call recursive on a compressed steams.
If this is an acceptable solution I am quite happy to create a PR.
https://github.com/adamhathcock/sharpcompress/blob/master/src/SharpCompress/Readers/ReaderFactory.cs#L29-L105
@MartinDemberger commented on GitHub (Oct 14, 2025):
I have a ZIP-File which worked with version 0.40 but with 0.41 it isn't working.
I'm sorry but the file contains customer data so I can't upload it. I tried to create my own file which reproduced this problem but wasn't able to. Something must be special in this file.
If you can give me a hint how to extract some helpful information from the zip file I can add them to this ticket.
In the meanwhile I'm stuck to version 0.40.
@mklaber commented on GitHub (Oct 14, 2025):
@MartinDemberger is it a gzipped Excel workbook? If not, start a new issue. Note that in this issue there is no "the zip file" involved. There's just a
*.xlsx(which itself is a zipfile) that is gzipped (not zipped).@MartinDemberger commented on GitHub (Oct 14, 2025):
Sorry, You are right. It's a zip file with two excel files inside.
@adamhathcock commented on GitHub (Oct 15, 2025):
There is a known issue with nested zips (which xlsx files are) and Reader. However, something being broke between versions is new.
I will take a look now at the root issue of this issue but nested archives with Reader are usually a problem as I find headers I don't expect
@adamhathcock commented on GitHub (Oct 15, 2025):
Ironically, I think 0.41.0 fixes the first issue