GZipped Excel xlsx files choke #232

Open
opened 2026-01-29 22:08:44 +00:00 by claunia · 6 comments
Owner

Originally created by @mklaber on GitHub (Aug 28, 2017).

Excel's xlsx format is really just a Zipped XML file. If such files are gzipped, the ReaderFactory seems to try to un-gzip and then un-zip the content. This leads to an IEntry.Key of the first parts of the file rather than the name of the file.

To reproduce:

  1. Create an Excel Workbook file (*.xlsx)
  2. Gzip it: gzip -k Book1.xlsx
  3. Read it with ReaderFactory:
	using (var stream = File.OpenRead(@"C:\\tmp\\Book1.xlsx.gz"))
	{
		using (var archive = ReaderFactory.Open(stream))
		{
			while (archive.MoveToNextEntry())
			{
				archive.Entry.Key.Dump();
			}
		}
	}

The Entry.Key value that is dumped is PK   ! A7��n   [Content_Types].xml �(� 

I'd expect the Key to be the file name Book1.xlsx (or at least not the first lines of the file).

Open to other suggestions on how it should work but as it stands I'd have to special case for *.xlsx.gz files which seems to defeat the purpose of a general ReaderFactory that can handle any of the supported formats you throw at it.

Book1.xlsx
Book1.xlsx.gz

Update: it looks like the underlying issue is that ReaderFactory's call to TarArchive.IsTarFile returns true for *.xlsx files: https://github.com/adamhathcock/sharpcompress/blob/master/src/SharpCompress/Readers/ReaderFactory.cs#L48

Originally created by @mklaber on GitHub (Aug 28, 2017). Excel's xlsx format is really just a Zipped XML file. If such files are gzipped, the `ReaderFactory` seems to try to un-gzip and then un-zip the content. This leads to an `IEntry.Key` of the first parts of the file rather than the name of the file. To reproduce: 1. Create an Excel Workbook file (*.xlsx) 2. Gzip it: `gzip -k Book1.xlsx` 3. Read it with `ReaderFactory`: ```csharp using (var stream = File.OpenRead(@"C:\\tmp\\Book1.xlsx.gz")) { using (var archive = ReaderFactory.Open(stream)) { while (archive.MoveToNextEntry()) { archive.Entry.Key.Dump(); } } } ``` The `Entry.Key` value that is dumped is `PK   ! A7��n   [Content_Types].xml �(� ` I'd expect the `Key` to be the file name `Book1.xlsx` (or at least not the first lines of the file). Open to other suggestions on how it should work but as it stands I'd have to special case for `*.xlsx.gz` files which seems to defeat the purpose of a general `ReaderFactory` that can handle any of the supported formats you throw at it. [Book1.xlsx](https://github.com/adamhathcock/sharpcompress/files/1257870/Book1.xlsx) [Book1.xlsx.gz](https://github.com/adamhathcock/sharpcompress/files/1257871/Book1.xlsx.gz) Update: it looks like the underlying issue is that `ReaderFactory`'s call to `TarArchive.IsTarFile` returns true for *.xlsx files: https://github.com/adamhathcock/sharpcompress/blob/master/src/SharpCompress/Readers/ReaderFactory.cs#L48
claunia added the bugup for grabs labels 2026-01-29 22:08:45 +00:00
Author
Owner

@Kim-SSi commented on GitHub (May 25, 2019):

@adamhathcock
I have been unable to figure out a good way to fix the TarArchive.IsTarFile detection. As part of the proposed fix I moved IsTarFile to the end of the Open.
The TarHeader will sometimes accept a file as a Tar in a compressed stream, gz, bz2 etc even when it is not.
What are your thoughts on adding an option to ReaderOptions, like TryOpenArchiveInStream?
Then make the Open call recursive on a compressed steams.
If this is an acceptable solution I am quite happy to create a PR.

https://github.com/adamhathcock/sharpcompress/blob/master/src/SharpCompress/Readers/ReaderFactory.cs#L29-L105

public static IReader Open(Stream stream, ReaderOptions options = null)
{
    stream.CheckNotNull("stream");
    options = options ?? new ReaderOptions()
    {
        LeaveStreamOpen = false
    };
    RewindableStream rewindableStream = new RewindableStream(stream);
    rewindableStream.StartRecording();
    if (ZipArchive.IsZipFile(rewindableStream, options.Password))
    {
        rewindableStream.Rewind(true);
        return ZipReader.Open(rewindableStream, options);
    }

    rewindableStream.Rewind(false);
    if (GZipArchive.IsGZipFile(rewindableStream))
    {
        rewindableStream.Rewind(false);
        GZipStream decompressedStream = new GZipStream(rewindableStream, CompressionMode.Decompress);
        if (options.TryOpenArchiveInStream)
        {
            try { return Open(decompressedStream, options); }
            catch (InvalidOperationException) { }
        }
        rewindableStream.Rewind(true);
        return GZipReader.Open(rewindableStream, options);
    }

    rewindableStream.Rewind(false);
    if (BZip2Stream.IsBZip2(rewindableStream))
    {
        rewindableStream.Rewind(false);
        BZip2Stream decompressedStream = new BZip2Stream(new NonDisposingStream(rewindableStream), CompressionMode.Decompress, false);
        if (options.TryOpenArchiveInStream)
        {
            try { return Open(decompressedStream, options); }
            catch (InvalidOperationException) { }
        }
    }

    rewindableStream.Rewind(false);
    if (LZipStream.IsLZipFile(rewindableStream))
    {
        rewindableStream.Rewind(false);
        LZipStream decompressedStream = new LZipStream(new NonDisposingStream(rewindableStream), CompressionMode.Decompress);
        if (options.TryOpenArchiveInStream)
        {
            try { return Open(decompressedStream, options); }
            catch (InvalidOperationException) { }
        }
    }

    rewindableStream.Rewind(false);
    if (RarArchive.IsRarFile(rewindableStream, options))
    {
        rewindableStream.Rewind(true);
        return RarReader.Open(rewindableStream, options);
    }

    rewindableStream.Rewind(false);
    if (XZStream.IsXZStream(rewindableStream))
    {
        rewindableStream.Rewind(true);
        XZStream decompressedStream = new XZStream(rewindableStream);
        if (options.TryOpenArchiveInStream)
        {
            try { return Open(decompressedStream, options); }
            catch (InvalidOperationException) { }
        }
    }

    rewindableStream.Rewind(false);
    if (TarArchive.IsTarFile(rewindableStream))
    {
        rewindableStream.Rewind(true);
        return TarReader.Open(rewindableStream, options);
    }
    throw new InvalidOperationException("Cannot determine compressed stream type.  Supported Reader Formats: Zip, GZip, BZip2, Tar, Rar, LZip, XZ");
}
@Kim-SSi commented on GitHub (May 25, 2019): @adamhathcock I have been unable to figure out a good way to fix the TarArchive.IsTarFile detection. As part of the proposed fix I moved IsTarFile to the end of the Open. The TarHeader will sometimes accept a file as a Tar in a compressed stream, gz, bz2 etc even when it is not. What are your thoughts on adding an option to ReaderOptions, like TryOpenArchiveInStream? Then make the Open call recursive on a compressed steams. If this is an acceptable solution I am quite happy to create a PR. [https://github.com/adamhathcock/sharpcompress/blob/master/src/SharpCompress/Readers/ReaderFactory.cs#L29-L105](https://github.com/adamhathcock/sharpcompress/blob/master/src/SharpCompress/Readers/ReaderFactory.cs#L29-L105) ``` public static IReader Open(Stream stream, ReaderOptions options = null) { stream.CheckNotNull("stream"); options = options ?? new ReaderOptions() { LeaveStreamOpen = false }; RewindableStream rewindableStream = new RewindableStream(stream); rewindableStream.StartRecording(); if (ZipArchive.IsZipFile(rewindableStream, options.Password)) { rewindableStream.Rewind(true); return ZipReader.Open(rewindableStream, options); } rewindableStream.Rewind(false); if (GZipArchive.IsGZipFile(rewindableStream)) { rewindableStream.Rewind(false); GZipStream decompressedStream = new GZipStream(rewindableStream, CompressionMode.Decompress); if (options.TryOpenArchiveInStream) { try { return Open(decompressedStream, options); } catch (InvalidOperationException) { } } rewindableStream.Rewind(true); return GZipReader.Open(rewindableStream, options); } rewindableStream.Rewind(false); if (BZip2Stream.IsBZip2(rewindableStream)) { rewindableStream.Rewind(false); BZip2Stream decompressedStream = new BZip2Stream(new NonDisposingStream(rewindableStream), CompressionMode.Decompress, false); if (options.TryOpenArchiveInStream) { try { return Open(decompressedStream, options); } catch (InvalidOperationException) { } } } rewindableStream.Rewind(false); if (LZipStream.IsLZipFile(rewindableStream)) { rewindableStream.Rewind(false); LZipStream decompressedStream = new LZipStream(new NonDisposingStream(rewindableStream), CompressionMode.Decompress); if (options.TryOpenArchiveInStream) { try { return Open(decompressedStream, options); } catch (InvalidOperationException) { } } } rewindableStream.Rewind(false); if (RarArchive.IsRarFile(rewindableStream, options)) { rewindableStream.Rewind(true); return RarReader.Open(rewindableStream, options); } rewindableStream.Rewind(false); if (XZStream.IsXZStream(rewindableStream)) { rewindableStream.Rewind(true); XZStream decompressedStream = new XZStream(rewindableStream); if (options.TryOpenArchiveInStream) { try { return Open(decompressedStream, options); } catch (InvalidOperationException) { } } } rewindableStream.Rewind(false); if (TarArchive.IsTarFile(rewindableStream)) { rewindableStream.Rewind(true); return TarReader.Open(rewindableStream, options); } throw new InvalidOperationException("Cannot determine compressed stream type. Supported Reader Formats: Zip, GZip, BZip2, Tar, Rar, LZip, XZ"); } ```
Author
Owner

@MartinDemberger commented on GitHub (Oct 14, 2025):

I have a ZIP-File which worked with version 0.40 but with 0.41 it isn't working.

I'm sorry but the file contains customer data so I can't upload it. I tried to create my own file which reproduced this problem but wasn't able to. Something must be special in this file.
If you can give me a hint how to extract some helpful information from the zip file I can add them to this ticket.

In the meanwhile I'm stuck to version 0.40.

@MartinDemberger commented on GitHub (Oct 14, 2025): I have a ZIP-File which worked with version 0.40 but with 0.41 it isn't working. I'm sorry but the file contains customer data so I can't upload it. I tried to create my own file which reproduced this problem but wasn't able to. Something must be special in this file. If you can give me a hint how to extract some helpful information from the zip file I can add them to this ticket. In the meanwhile I'm stuck to version 0.40.
Author
Owner

@mklaber commented on GitHub (Oct 14, 2025):

@MartinDemberger is it a gzipped Excel workbook? If not, start a new issue. Note that in this issue there is no "the zip file" involved. There's just a *.xlsx (which itself is a zipfile) that is gzipped (not zipped).

@mklaber commented on GitHub (Oct 14, 2025): @MartinDemberger is it a gzipped Excel workbook? If not, start a new issue. Note that in this issue there is no "the zip file" involved. There's just a `*.xlsx` (which itself is a zipfile) that is gzipped (not zipped).
Author
Owner

@MartinDemberger commented on GitHub (Oct 14, 2025):

Sorry, You are right. It's a zip file with two excel files inside.

@MartinDemberger commented on GitHub (Oct 14, 2025): Sorry, You are right. It's a zip file with two excel files inside.
Author
Owner

@adamhathcock commented on GitHub (Oct 15, 2025):

There is a known issue with nested zips (which xlsx files are) and Reader. However, something being broke between versions is new.

I will take a look now at the root issue of this issue but nested archives with Reader are usually a problem as I find headers I don't expect

@adamhathcock commented on GitHub (Oct 15, 2025): There is a known issue with nested zips (which xlsx files are) and Reader. However, something being broke between versions is new. I will take a look now at the root issue of this issue but nested archives with Reader are usually a problem as I find headers I don't expect
Author
Owner

@adamhathcock commented on GitHub (Oct 15, 2025):

Ironically, I think 0.41.0 fixes the first issue

@adamhathcock commented on GitHub (Oct 15, 2025): Ironically, I think 0.41.0 fixes the first issue
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/sharpcompress#232