Malformed zip file generated #521

Closed
opened 2026-01-29 22:13:14 +00:00 by claunia · 13 comments
Owner

Originally created by @aboryczko on GitHub (May 26, 2022).

Hi,

I'm using salvois/LargeXlsx library for generating Excel files (which are essentialy zip files) that uses this library.
When I try to open a generated file in Excel it says that it's corrupt. If I try to open the file with any unzip tool there are no errors. When I use the built-in ZipArchive class from .NET Core everything works as expected.
I've dug a little deeper into the files generated by this library and the .NET one I see that every 16 kB I get 5 bytes inputed into the output stream.
I don't know if this is some special marker but Excel doesn't recognize it and it messes up the content for it i.e. instead of 's="0"' i have 's="0 ý€"'.

a simple way to reproduce it:

var ms = new MemoryStream();
var zw = new ZipWriter(ms, new ZipWriterOptions(compressionType: CompressionType.Deflate) { DeflateCompressionLevel = CompressionLevel.None });
var payload = new string('\n', 100000);
using (var stream = zw.WriteToStream("test.txt", new ZipWriterEntryOptions()))
using (var streamWriter = new StreamWriter(stream: stream))
{
    streamWriter.Write(value: payload);
}

sequences are at offsets: 38, 32811, 65584,98357

Linking salvois/LargeXlsx#9

Originally created by @aboryczko on GitHub (May 26, 2022). Hi, I'm using salvois/LargeXlsx library for generating Excel files (which are essentialy zip files) that uses this library. When I try to open a generated file in Excel it says that it's corrupt. If I try to open the file with any unzip tool there are no errors. When I use the built-in ZipArchive class from .NET Core everything works as expected. I've dug a little deeper into the files generated by this library and the .NET one I see that every 16 kB I get 5 bytes inputed into the output stream. I don't know if this is some special marker but Excel doesn't recognize it and it messes up the content for it i.e. instead of 's="0"' i have 's="0 ý€"'. a simple way to reproduce it: ```csharp var ms = new MemoryStream(); var zw = new ZipWriter(ms, new ZipWriterOptions(compressionType: CompressionType.Deflate) { DeflateCompressionLevel = CompressionLevel.None }); var payload = new string('\n', 100000); using (var stream = zw.WriteToStream("test.txt", new ZipWriterEntryOptions())) using (var streamWriter = new StreamWriter(stream: stream)) { streamWriter.Write(value: payload); } ``` sequences are at offsets: 38, 32811, 65584,98357 Linking salvois/LargeXlsx#9
claunia added the bugup for grabs labels 2026-01-29 22:13:14 +00:00
Author
Owner

@Erior commented on GitHub (Jun 16, 2022):

I think you confuse CompressionType.None and CompressionType.Deflate , you will have extra information with CompressionType.Deflate, it is instructions for inflate to "copy the next 7fff bytes as is" and as said, it will repeat every 0x7fff
However, it is very uncommon to use DeflateCompressionLevel.None, many libraries limits you to atleast Level1 when you choose Deflate.
If you really do not want to compress, you usually choose CompressionType.None, if you want to use Deflate, it is better to choose Level1 as minimum.

@Erior commented on GitHub (Jun 16, 2022): I think you confuse CompressionType.None and CompressionType.Deflate , you will have extra information with CompressionType.Deflate, it is instructions for inflate to "copy the next 7fff bytes as is" and as said, it will repeat every 0x7fff However, it is very uncommon to use DeflateCompressionLevel.None, many libraries limits you to atleast Level1 when you choose Deflate. If you really do not want to compress, you usually choose CompressionType.None, if you want to use Deflate, it is better to choose Level1 as minimum.
Author
Owner

@aboryczko commented on GitHub (Jun 17, 2022):

I doesn't matter which level I choose, the issue is the same Excel recognizes the file as corrupt. I've used none in the examples because it is easier to find which characters are causing the issue.

@aboryczko commented on GitHub (Jun 17, 2022): I doesn't matter which level I choose, the issue is the same Excel recognizes the file as corrupt. I've used none in the examples because it is easier to find which characters are causing the issue.
Author
Owner

@Erior commented on GitHub (Jun 17, 2022):

Is there any tests in LargeXlsx when dumped to disk that shows the problem, could help me with understanding the problem/corruption, I guess right now I'm totally missing it.

@Erior commented on GitHub (Jun 17, 2022): Is there any tests in LargeXlsx when dumped to disk that shows the problem, could help me with understanding the problem/corruption, I guess right now I'm totally missing it.
Author
Owner

@aboryczko commented on GitHub (Jun 17, 2022):

something in the lines of:

static void Main()
{
    using var file = File.Create(@"C:\Temp\test.xlsx");
    using var xlsxWriter = new LargeXlsx.XlsxWriter(new NotSeekableWrapper(file));
    xlsxWriter.BeginWorksheet("Sheet 1");
    for (int i = 0; i < 1000; i++)
    {
        xlsxWriter.BeginRow();
        xlsxWriter.Write(i);
        xlsxWriter.Write($"Value 0{i}");
        xlsxWriter.Write($"Value 1{i}");
        xlsxWriter.Write($"Value 2{i}");
        xlsxWriter.Write($"Value 3{i}");
    }    
}

public class NotSeekableWrapper : Stream
{
    private readonly Stream _stream;

    public NotSeekableWrapper(Stream stream)
    {
        _stream = stream;
    }

    public override bool CanRead => _stream.CanRead;

    public override bool CanSeek => false;

    public override bool CanWrite => _stream.CanWrite;

    public override long Length => throw new NotSupportedException();

    private long _position;
    public override long Position { get => _position; set => throw new NotSupportedException(); }

    public override void Flush()
    {
        _stream.Flush();
    }

    public override int Read(byte[] buffer, int offset, int count)
    {
        var read = _stream.Read(buffer, offset, count);
        _position += read;
        return read;
    }

    public override long Seek(long offset, SeekOrigin origin)
    {
        throw new NotSupportedException();
    }

    public override void SetLength(long value)
    {
        throw new NotSupportedException();
    }

    public override void Write(byte[] buffer, int offset, int count)
    {
        _stream.Write(buffer, offset, count);
        _position += count;
    }
}

produces the malformed file.

@aboryczko commented on GitHub (Jun 17, 2022): something in the lines of: ```csharp static void Main() { using var file = File.Create(@"C:\Temp\test.xlsx"); using var xlsxWriter = new LargeXlsx.XlsxWriter(new NotSeekableWrapper(file)); xlsxWriter.BeginWorksheet("Sheet 1"); for (int i = 0; i < 1000; i++) { xlsxWriter.BeginRow(); xlsxWriter.Write(i); xlsxWriter.Write($"Value 0{i}"); xlsxWriter.Write($"Value 1{i}"); xlsxWriter.Write($"Value 2{i}"); xlsxWriter.Write($"Value 3{i}"); } } public class NotSeekableWrapper : Stream { private readonly Stream _stream; public NotSeekableWrapper(Stream stream) { _stream = stream; } public override bool CanRead => _stream.CanRead; public override bool CanSeek => false; public override bool CanWrite => _stream.CanWrite; public override long Length => throw new NotSupportedException(); private long _position; public override long Position { get => _position; set => throw new NotSupportedException(); } public override void Flush() { _stream.Flush(); } public override int Read(byte[] buffer, int offset, int count) { var read = _stream.Read(buffer, offset, count); _position += read; return read; } public override long Seek(long offset, SeekOrigin origin) { throw new NotSupportedException(); } public override void SetLength(long value) { throw new NotSupportedException(); } public override void Write(byte[] buffer, int offset, int count) { _stream.Write(buffer, offset, count); _position += count; } } ``` produces the malformed file.
Author
Owner

@Erior commented on GitHub (Jun 17, 2022):

Thank you, that does make excel unhappy, I will investigate

@Erior commented on GitHub (Jun 17, 2022): Thank you, that does make excel unhappy, I will investigate
Author
Owner

@Erior commented on GitHub (Jun 17, 2022):

Did not find a corruption, however, we do explicitly say that the files are "Label"'s in this scenario , and that is obviously not right, removing it made Excel happy on my side. It would be great if you can test it out.

@Erior commented on GitHub (Jun 17, 2022): Did not find a corruption, however, we do explicitly say that the files are "Label"'s in this scenario , and that is obviously not right, removing it made Excel happy on my side. It would be great if you can test it out.
Author
Owner

@aboryczko commented on GitHub (Jun 18, 2022):

@Erior sorry, I don't follow, could you explain a bit more how can I do it?

@aboryczko commented on GitHub (Jun 18, 2022): @Erior sorry, I don't follow, could you explain a bit more how can I do it?
Author
Owner

@Erior commented on GitHub (Jun 18, 2022):

Top of my head would be for you to build with the pull request changes and use that dll, or wait for @adamhathcock to accept the change and generate new release. Not sure if you can get binaries out from pull builds even though they are tested automatically.
Someone could build it for you and send dll's, that is however a matter of trust/security.
Perhaps Mr Hathcock knows of other options, I participate somewhat infrequently on github.

@Erior commented on GitHub (Jun 18, 2022): Top of my head would be for you to build with the pull request changes and use that dll, or wait for @adamhathcock to accept the change and generate new release. Not sure if you can get binaries out from pull builds even though they are tested automatically. Someone could build it for you and send dll's, that is however a matter of trust/security. Perhaps Mr Hathcock knows of other options, I participate somewhat infrequently on github.
Author
Owner

@adamhathcock commented on GitHub (Jun 20, 2022):

thanks for the fix @Erior

I'll push out the change soon as it was creating bad files.

@adamhathcock commented on GitHub (Jun 20, 2022): thanks for the fix @Erior I'll push out the change soon as it was creating bad files.
Author
Owner

@adamhathcock commented on GitHub (Jun 20, 2022):

https://www.nuget.org/packages/SharpCompress/0.32.1

@adamhathcock commented on GitHub (Jun 20, 2022): https://www.nuget.org/packages/SharpCompress/0.32.1
Author
Owner

@salvois commented on GitHub (Jul 11, 2022):

Hi @aboryczko , @Erior , @adamhathcock ,
SharpCompress 0.32.1 appears to fix the offending code posted above as per in salvois/LargeXlsx#9.
Many thanks for reporting and fixing this!
Salvo

@salvois commented on GitHub (Jul 11, 2022): Hi @aboryczko , @Erior , @adamhathcock , SharpCompress 0.32.1 appears to fix the offending code posted above as per in [salvois/LargeXlsx#9](https://github.com/salvois/LargeXlsx/issues/9). Many thanks for reporting and fixing this! Salvo
Author
Owner

@adamhathcock commented on GitHub (Jul 12, 2022):

🥳

@adamhathcock commented on GitHub (Jul 12, 2022): 🥳
Author
Owner

@Nanook commented on GitHub (Jul 15, 2022):

I'm closing this issue after confirming it's now working.

@Nanook commented on GitHub (Jul 15, 2022): I'm closing this issue after confirming it's now working.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/sharpcompress#521