mirror of
https://github.com/adamhathcock/sharpcompress.git
synced 2026-02-03 21:23:38 +00:00
TarArchive fails to find subsequent entries if you open an entry stream during entry iteration #405
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @DannyBoyk on GitHub (Jul 1, 2020).
We usually use TarReader, not TarArchive, so we hadn't seen this before. When using TarArchive (directly or from ArchiveFactory), if you open an entry stream while iterating over the entries, TarArchive fails to find the next header and silently indicates there are no more entries. The follow code will demonstrate this with any TAR archive of more than one file.
I stepped through the source and
TarHeaderFactory.ReadHeaderfails silently for the second entry in the archive. Without debugging further, it appears the archive is someone losing its position due to opening the entry stream. If you don't open the stream, all the entries are iterated over just fine.@DannyBoyk commented on GitHub (Jul 16, 2020):
Stepping through further, it appears that the EntryStream (which is just using the base stream of the whole files) is left right after the last valid byte of the Entry, but does not remove the padding bytes. So, when TarHeader.Read goes to read the next header, when it reads the next "block", it's not really at the next block and believes a valid header has not been found. It seems like the fundamental issue here is that
OpenEntryStreamdoesn't fully read the last block of the file.@DannyBoyk commented on GitHub (Jul 16, 2020):
More information. In
TarHeaderFactory.ReadHeader, you set the base stream to get up to the next header:But, when someone uses
OpenEntryStream, it changesBaseStream.Positionsince it's one and the same.When using
TarReader, it creates a standaloneTarReadOnlySubStreamand setsHeader.PackedStream. Hence whyTarReaderworks and notTarArchive.@DannyBoyk commented on GitHub (Jul 16, 2020):
Actually, the final piece of the puzzle is that
TarReadOnlySubStreamadvances the position in it's dispose method.It is also still just pointing at the one and only
BaseStream, but it takes care of the padding bytes. The path taken byTarArchivedoes not since it doesn't create any sub streams to skip the padding bytes.@adamhathcock commented on GitHub (Jul 17, 2020):
Thanks for tracking this down. Any chance of a PR and test? I'm surprised my own don't cover it.
@DannyBoyk commented on GitHub (Jul 17, 2020):
@adamhathcock , I think I found a quick and easy solution, but I was thinking about a different solution that might be better.
The quick and easy solution is to change
TarFilePart.GetCompressedStreamfrom this:to this:
Namely, changing from
ReadOnlySubStreamtoTarReadOnlySubSteam. That fixed the problem, but I have concerns outlined below.My proposed slightly more complicated solution would be to track the
BaseStream.Positionseparate from the SubStream position. I did a thought experiment and wondered what would happen withTarReadOnlySubStreamif they didn't read all the blocks of the entry, e.g. they open the entryStreambut only read a few bytes for some reason. TheDisposemethod only advances to the next block, so they might not yet be at the next header.I think the
BaseStream.Positionneeds to be managed separate from the SubStream position, somehow.@DannyBoyk commented on GitHub (Jul 17, 2020):
I just realized
TarReadOnlySubStreamhasCanSeekset tofalse. Not sure my solution is OK since the change falls under the condition that_seekableStreamis notnull.@DannyBoyk commented on GitHub (Jul 17, 2020):
Oh,
ReadOnlySubStreamalso hasCanSeekset to false. Never mind then; my solution should be valid so long as the user reads up to the last block of the entry. Otherwise, theBaseStream.Positionwill still be in the wrong place.@DannyBoyk commented on GitHub (Jul 20, 2020):
I think I have a decent fix in place. I'll put together a PR and new test case. The old test cases are not failing because all of the TarArchive tests ensure all the entries have been read before you attempt to read the streams. This occurs in
IArchiveEntryExtensions.WriteToif not earlier. I wasn't even aware of all these helper methods to write files out. The consequence of using them, though, is seeking within the file quite a bit, or loading many things into memory.@adamhathcock commented on GitHub (Jul 26, 2020):
Released https://www.nuget.org/packages/sharpcompress/0.26.0