Subfiles of root folder in ZIP archive are ignored #611

New Issue

claunia · 2026-01-29T22:14:38Z

claunia commented

2026-01-29 22:14:38 +00:00

Originally created by @ghost on GitHub (Jan 27, 2024).

When i extract zip file, only root folder appears in output directory, subfiles are ignored.

SharpCompress 0.36.0

Archive structure

\setons_reworked.v0010\
\setons_reworked.v0010\setons_reworked_script.lua
\setons_reworked.v0010\setons_reworked_scenario.lua
\setons_reworked.v0010\setons_reworked_save.lua
\setons_reworked.v0010\setons_reworked.scmap

Output folder

\setons_reworked.v0010

var extractOptions = new ExtractionOptions
{
    Overwrite = true,
    ExtractFullPath = true,
};
using var stream = File.OpenRead(archivePath);
using IReader archive = ext switch
{
    ".zip" => SharpCompress.Readers.Zip.ZipReader.Open(stream),
    _ => ReaderFactory.Open(stream)
};

while (archive.MoveToNextEntry())
{
    archive.WriteEntryToDirectory(outputDirectory, extractOptions);
}

Attached file setons_reworked.v0010.zip

Note: .net ZIP impl. works fine

System.IO.Compression.ZipFile.ExtractToDirectory(archivePath, outputDirectory, true);

Originally created by @ghost on GitHub (Jan 27, 2024). When i extract zip file, only root folder appears in output directory, subfiles are ignored. SharpCompress 0.36.0 Archive structure ``` \setons_reworked.v0010\ \setons_reworked.v0010\setons_reworked_script.lua \setons_reworked.v0010\setons_reworked_scenario.lua \setons_reworked.v0010\setons_reworked_save.lua \setons_reworked.v0010\setons_reworked.scmap ``` Output folder ``` \setons_reworked.v0010 ``` ```csharp var extractOptions = new ExtractionOptions { Overwrite = true, ExtractFullPath = true, }; using var stream = File.OpenRead(archivePath); using IReader archive = ext switch { ".zip" => SharpCompress.Readers.Zip.ZipReader.Open(stream), _ => ReaderFactory.Open(stream) }; while (archive.MoveToNextEntry()) { archive.WriteEntryToDirectory(outputDirectory, extractOptions); } ``` Attached file [setons_reworked.v0010.zip](https://github.com/adamhathcock/sharpcompress/files/14071576/setons_reworked.v0010.zip) Note: .net ZIP impl. works fine ```csharp System.IO.Compression.ZipFile.ExtractToDirectory(archivePath, outputDirectory, true); ```

claunia commented

2026-01-29 22:14:39 +00:00

@adamhathcock commented on GitHub (Jan 29, 2024):

doing a brief test, I saw that the zip was different in that there's a zero byte header somewhere.

Using the ArchiveFactory I get all 5 entries because it uses the zip dictionary. ZipFile does the same.

ReaderFactory doesn't use the zip dictionary so it has to be a certain structure. Right now, I don't know what that zero byte header is.

@adamhathcock commented on GitHub (Jan 29, 2024): doing a brief test, I saw that the zip was different in that there's a zero byte header somewhere. Using the `ArchiveFactory` I get all 5 entries because it uses the zip dictionary. `ZipFile` does the same. `ReaderFactory` doesn't use the zip dictionary so it has to be a certain structure. Right now, I don't know what that zero byte header is.

claunia commented

2026-01-29 22:14:39 +00:00

@DannyBoyk commented on GitHub (Jan 29, 2024):

There is an extra two bytes at the end of the first Local Entry header, 0x03 0x00, that occur before the Post Data Descriptor header. The ZipReader reads the next 4 byte header as 0x4B500003 instead of reading the Post Data Descriptor header of 0x08074B50. Because it doesn't recognize it has a header, it just says it's done and doesn't give any more entries.

Only question I have at the moment is where the 0x03 0x00 comes from and why 7-zip isn't mad about it. The extra field length in the local header says it's zero bytes, so there shouldn't be anything after the file name field, which is 22 bytes. There are 24 bytes of data with that 0x03 0x00. Because ZipReader only read forward and doesn't have the Central Directory header to rely on, it gives up when it can't find the next header. I feel like this zip file is malformed, but 7-zip isn't reporting any errors...

@DannyBoyk commented on GitHub (Jan 29, 2024): There is an extra two bytes at the end of the first Local Entry header, `0x03 0x00`, that occur before the Post Data Descriptor header. The ZipReader reads the next 4 byte header as `0x4B500003` instead of reading the Post Data Descriptor header of `0x08074B50`. Because it doesn't recognize it has a header, it just says it's done and doesn't give any more entries. Only question I have at the moment is where the `0x03 0x00` comes from and why 7-zip isn't mad about it. The extra field length in the local header says it's zero bytes, so there shouldn't be anything after the file name field, which is 22 bytes. There are 24 bytes of data with that `0x03 0x00`. Because ZipReader only read forward and doesn't have the Central Directory header to rely on, it gives up when it can't find the next header. I feel like this zip file is malformed, but 7-zip isn't reporting any errors...

claunia commented

2026-01-29 22:14:39 +00:00

@adamhathcock commented on GitHub (Jan 29, 2024):

I imagine 7Zip is also using central directory....I haven't used it in years so don't know if it would report something malformed.

I wonder what made that zip file.

@adamhathcock commented on GitHub (Jan 29, 2024): I imagine 7Zip is also using central directory....I haven't used it in years so don't know if it would report something malformed. I wonder what made that zip file.

claunia commented

2026-01-29 22:14:39 +00:00

@DannyBoyk commented on GitHub (Jan 29, 2024):

I agree that 7zip is probably just using the CD.

However, if they were streaming it in for some reason, it appears 7zip reads the local header and, then, if the item has a data descriptor, it "looks around" for it. So, it can skip over any errant bytes that might be there. Guessing they've seen something like this before? Don't think there is any way for SharpCompress to handle this unless it did the same.

      ReadLocalItem(item);
      item.FromLocal = true;
      bool isFinished = false;

      if (item.HasDescriptor())
      {
        RINOK(FindDescriptor(item, items.Size()))
        isFinished = !item.DescriptorWasRead;
      }
      else
      {
        if (item.PackSize >= ((UInt64)1 << 62))
          throw CUnexpectEnd();
        RINOK(IncreaseRealPosition(item.PackSize, isFinished))
      }

I think, @Eternal-ll , you'll have to avoid a Reader if you want to read these zip files and use one of the other options, @adamhathcock mentioned that uses the Central Directory headers.

@DannyBoyk commented on GitHub (Jan 29, 2024): I agree that 7zip is probably just using the CD. However, if they were streaming it in for some reason, it appears 7zip reads the local header and, then, if the item has a data descriptor, it "looks around" for it. So, it can skip over any errant bytes that might be there. Guessing they've seen something like this before? Don't think there is any way for SharpCompress to handle this unless it did the same. ``` ReadLocalItem(item); item.FromLocal = true; bool isFinished = false; if (item.HasDescriptor()) { RINOK(FindDescriptor(item, items.Size())) isFinished = !item.DescriptorWasRead; } else { if (item.PackSize >= ((UInt64)1 << 62)) throw CUnexpectEnd(); RINOK(IncreaseRealPosition(item.PackSize, isFinished)) } ``` I think, @Eternal-ll , you'll have to avoid a Reader if you want to read these zip files and use one of the other options, @adamhathcock mentioned that uses the Central Directory headers.

claunia commented

2026-01-29 22:14:39 +00:00

@adamhathcock commented on GitHub (Jan 29, 2024):

Reader being forward-only has a lot of restrictions that must libs don't do. Not really a good way to workaround this for Readers without loosing what good a Reader is.

@adamhathcock commented on GitHub (Jan 29, 2024): Reader being forward-only has a lot of restrictions that must libs don't do. Not really a good way to workaround this for Readers without loosing what good a Reader is.

claunia commented

2026-01-29 22:14:40 +00:00

@ghost commented on GitHub (Jan 30, 2024):

I think, @Eternal-ll , you'll have to avoid a Reader if you want to read these zip files and use one of the other options, @adamhathcock mentioned that uses the Central Directory headers.

Sure, i used read factory only to expose some internal extraction progress.

I wonder what made that zip file.

It was probably made by desktop client on Java that used implementation below.
1013d75fc9/faf-commons-data/src/main/java/com/faforever/commons/io/Zipper.java (L16)

@ghost commented on GitHub (Jan 30, 2024): > I think, @Eternal-ll , you'll have to avoid a Reader if you want to read these zip files and use one of the other options, @adamhathcock mentioned that uses the Central Directory headers. Sure, i used read factory only to expose some internal extraction progress. > I wonder what made that zip file. It was probably made by desktop client on Java that used implementation below. https://github.com/FAForever/faf-java-commons/blob/1013d75fc97698a581792aace1e2b5580bc69906/faf-commons-data/src/main/java/com/faforever/commons/io/Zipper.java#L16

claunia commented

2026-01-29 22:14:40 +00:00

@adamhathcock commented on GitHub (Jan 30, 2024):

I guess that's the apache implementation putting something extra we don't expect or against spec. Of course, the zip spec is fluid at best

@adamhathcock commented on GitHub (Jan 30, 2024): I guess that's the apache implementation putting something extra we don't expect or against spec. Of course, the zip spec is fluid at best

claunia commented

2026-01-29 22:14:41 +00:00

@DannyBoyk commented on GitHub (Jan 30, 2024):

Ah, I think I see what it is. For some reason, the Apache library is writing out 2 bytes for a directory. No idea what those represent; still puzzling through where ZipArchiveOutputStream and ZipArchiveEntry is getting that in the Apache library.

The PostDataDescriptor gives it away because it says the entry is 2 bytes compressed and 0 bytes uncompressed. So, the Apache implementation compressed 0 bytes into 2 bytes. :-P I guess SharpCompress could try to handle arbitrary data for a directory and just discard it? What do you think, @adamhathcock ?

EDIT: I think this is actually a bug in the Apache zip library for directories. They flush their deflater to the output stream even if nothing was written. That's why the uncompressed size is 0; they didn't read anything to send to the deflater.

@DannyBoyk commented on GitHub (Jan 30, 2024): Ah, I think I see what it is. For some reason, the Apache library is writing out 2 bytes for a directory. No idea what those represent; still puzzling through where ZipArchiveOutputStream and ZipArchiveEntry is getting that in the Apache library. The PostDataDescriptor gives it away because it says the entry is 2 bytes compressed and 0 bytes uncompressed. So, the Apache implementation compressed 0 bytes into 2 bytes. :-P I guess SharpCompress could try to handle arbitrary data for a directory and just discard it? What do you think, @adamhathcock ? EDIT: I think this is actually a bug in the Apache zip library for directories. They flush their deflater to the output stream even if nothing was written. That's why the uncompressed size is 0; they didn't read anything to send to the deflater.

claunia commented

2026-01-29 22:14:41 +00:00

@adamhathcock commented on GitHub (Jan 31, 2024):

I mean, we can add an explicit boolean property to handle this but I wouldn't want to add it generically.

It does sound like a bug to me.

@adamhathcock commented on GitHub (Jan 31, 2024): I mean, we can add an explicit boolean property to handle this but I wouldn't want to add it generically. It does sound like a bug to me.

claunia commented

2026-01-29 22:14:41 +00:00

@ghost commented on GitHub (Jan 31, 2024):

Thanks for investigating the issue, i think we can leave things as it is. It is not that much of problem for me as we got alternative solutions.

I will contact devs of java client, maybe they would look into that.

@ghost commented on GitHub (Jan 31, 2024): Thanks for investigating the issue, i think we can leave things as it is. It is not that much of problem for me as we got alternative solutions. I will contact devs of java client, maybe they would look into that.

claunia referenced this issue

2026-01-29 22:19:06 +00:00

[PR #611] [MERGED] Allowing to seek empty zip files #1124

Sign in to join this conversation.

Branches Tags

master

release

adam/merge-release-to-master

dependabot/nuget/xunit.v3-3.2.2

adam/more-explode-async

copilot/fix-infinite-loop-rar-archive

adam/data-descriptor-fix

adam/fix-tests-with-proper-rewind

copilot/fix-data-descriptor-stream-bug

adam/lmza-investigation

adam/create-rar-async

adam/async-rar2

copilot/support-multi-threading-path

copilot/sub-pr-1132-again

adam/memory-perf

copilot/add-performance-benchmarking

copilot/sub-pr-1121

copilot/add-password-support-zip-files

copilot/add-so-optimized-zip-support

adam/rar-async-only

copilot/add-buffered-stream-async-read

copilot/sub-pr-1076

copilot/fix-decompression-exception

copilot/fix-archivefactory-issue

copilot/rationalize-sourcestream-volumes

adam/open-async

copilot/add-ace-archive-support

copilot/sub-pr-1040-again

adam/more-async-3

copilot/fix-tararchive-incomplete-iteration

adam/multi-threaded

copilot/sub-pr-1040

adam/awesome-copilot

copilot/fix-ziparchive-extraction-issue

copilot/fix-tararchive-open-crash

copilot/fix-tar-xz-file-reading-issue

copilot/setup-copilot-instructions

copilot/fix-decompression-performance-issue

copilot/convert-stream-access-to-async

adam/enable-agent

adam/async-deflate

adam/async-rar

adam/more-cleanup

adam/zstd

async-2

zstandard

net461-tests

dmg

async

build-netcore3

recycle-memory-stream

presentation

pax

netcore2

zip_encryption

dotnet-tool

tar_redux

native_zlib

Issue-197

system_buffers

TarNames

7zip_sfx

portable_crypto

WinRT

new_7zip

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: starred/sharpcompress#611