Read ArchiveComment #231

Open
opened 2026-01-29 22:08:44 +00:00 by claunia · 5 comments
Owner

Originally created by @dos-ise on GitHub (Aug 22, 2017).

Is there a way to read an archivecomment?
Found a way to write it but not to read.

Originally created by @dos-ise on GitHub (Aug 22, 2017). Is there a way to read an archivecomment? Found a way to write it but not to read.
claunia added the enhancementup for grabs labels 2026-01-29 22:08:44 +00:00
Author
Owner

@dos-ise commented on GitHub (Jan 18, 2018):

I added a pullrequest for this feature.

https://github.com/adamhathcock/sharpcompress/pull/341

At the moment i am using the follwoing extension.

`using System.IO;
using System.Text;
using System;
using System.Linq;
using System.Reflection;

using SharpCompress.Archives.Zip;

namespace Knx.Ets.Osprey
{
public static class ZipArchiveExtension
{
private const int MAX_ITERATIONS_FOR_DIRECTORY_HEADER = 4096;

private const uint DIRECTORY_END_HEADER_BYTES = 0x06054b50;

public static string Comment(this ZipArchive archive)
{
  var onlyVolume = archive.Volumes.Single();
  var stream = onlyVolume.GetType().GetProperty("Stream", BindingFlags.Instance | BindingFlags.NonPublic | BindingFlags.Public).GetValue(onlyVolume) as Stream;
  BinaryReader reader = new BinaryReader(stream);
  SeekBackToHeader(stream, reader, DIRECTORY_END_HEADER_BYTES);

  var VolumeNumber = reader.ReadUInt16();
  var FirstVolumeWithDirectory = reader.ReadUInt16();
  var TotalNumberOfEntriesInDisk = reader.ReadUInt16();
  var TotalNumberOfEntries = reader.ReadUInt16();
  var DirectorySize = reader.ReadUInt32();
  var DirectoryStartOffsetRelativeToDisk = reader.ReadUInt32();
  var CommentLength = reader.ReadUInt16();
  var comment = reader.ReadBytes(CommentLength);

  var comm = Encoding.UTF8.GetString(comment, 0, comment.Length);
  return comm;
}

private static void SeekBackToHeader(Stream stream, BinaryReader reader, uint headerSignature)
{
  long offset = 0;
  uint signature;
  int iterationCount = 0;
  do
  {
    if ((stream.Length + offset) - 4 < 0)
    {
      throw new Exception("Failed to locate the Zip Header");
    }
    stream.Seek(offset - 4, SeekOrigin.End);
    signature = reader.ReadUInt32();
    offset--;
    iterationCount++;
    if (iterationCount > MAX_ITERATIONS_FOR_DIRECTORY_HEADER)
    {
      throw new Exception("Could not find Zip file Directory at the end of the file.  File may be corrupted.");
    }
  }
  while (signature != headerSignature);
}

}
}
`

@dos-ise commented on GitHub (Jan 18, 2018): I added a pullrequest for this feature. https://github.com/adamhathcock/sharpcompress/pull/341 At the moment i am using the follwoing extension. `using System.IO; using System.Text; using System; using System.Linq; using System.Reflection; using SharpCompress.Archives.Zip; namespace Knx.Ets.Osprey { public static class ZipArchiveExtension { private const int MAX_ITERATIONS_FOR_DIRECTORY_HEADER = 4096; private const uint DIRECTORY_END_HEADER_BYTES = 0x06054b50; public static string Comment(this ZipArchive archive) { var onlyVolume = archive.Volumes.Single(); var stream = onlyVolume.GetType().GetProperty("Stream", BindingFlags.Instance | BindingFlags.NonPublic | BindingFlags.Public).GetValue(onlyVolume) as Stream; BinaryReader reader = new BinaryReader(stream); SeekBackToHeader(stream, reader, DIRECTORY_END_HEADER_BYTES); var VolumeNumber = reader.ReadUInt16(); var FirstVolumeWithDirectory = reader.ReadUInt16(); var TotalNumberOfEntriesInDisk = reader.ReadUInt16(); var TotalNumberOfEntries = reader.ReadUInt16(); var DirectorySize = reader.ReadUInt32(); var DirectoryStartOffsetRelativeToDisk = reader.ReadUInt32(); var CommentLength = reader.ReadUInt16(); var comment = reader.ReadBytes(CommentLength); var comm = Encoding.UTF8.GetString(comment, 0, comment.Length); return comm; } private static void SeekBackToHeader(Stream stream, BinaryReader reader, uint headerSignature) { long offset = 0; uint signature; int iterationCount = 0; do { if ((stream.Length + offset) - 4 < 0) { throw new Exception("Failed to locate the Zip Header"); } stream.Seek(offset - 4, SeekOrigin.End); signature = reader.ReadUInt32(); offset--; iterationCount++; if (iterationCount > MAX_ITERATIONS_FOR_DIRECTORY_HEADER) { throw new Exception("Could not find Zip file Directory at the end of the file. File may be corrupted."); } } while (signature != headerSignature); } } } `
Author
Owner

@Numpsy commented on GitHub (Dec 14, 2018):

Hi,

I thought this would be a useful feature to have, so I had a look at the code and noticed that ZipArchive.LoadEntries contains code to populate the volume comment from the DirectoryEndHeader, but that didn't seem to be getting called.

I was wondering if doing something like 12a6d3977e to return the DirectoryEndHeader from ReadSeekableHeader in order to populate the volume comment would be a reasonable place to start? (you do have to have loaded the entries before reading the comment from the volume though).

@Numpsy commented on GitHub (Dec 14, 2018): Hi, I thought this would be a useful feature to have, so I had a look at the code and noticed that ZipArchive.LoadEntries contains code to populate the volume comment from the DirectoryEndHeader, but that didn't seem to be getting called. I was wondering if doing something like https://github.com/Numpsy/sharpcompress/commit/12a6d3977e548c3ccdf280ec6a88215552819e25 to return the DirectoryEndHeader from ReadSeekableHeader in order to populate the volume comment would be a reasonable place to start? (you do have to have loaded the entries before reading the comment from the volume though).
Author
Owner

@adamhathcock commented on GitHub (Dec 19, 2018):

I'm pretty sure the Archive API uses the dictionary at the end of the file to find entries and seek to them.

Maybe I'm misunderstanding the proposal.

@adamhathcock commented on GitHub (Dec 19, 2018): I'm pretty sure the Archive API uses the dictionary at the end of the file to find entries and seek to them. Maybe I'm misunderstanding the proposal.
Author
Owner

@Numpsy commented on GitHub (Dec 19, 2018):

The ZipArchive.LoadEntries function contains the code

            foreach (ZipHeader h in headerFactory.ReadSeekableHeader(stream))
            {
                if (h != null)
                {
                    switch (h.ZipHeaderType)
                    {
                        case ZipHeaderType.DirectoryEntry:
                            {
                                yield return new ZipArchiveEntry(this,
                                                                 new SeekableZipFilePart(headerFactory,
                                                                                         h as DirectoryEntryHeader,
                                                                                         stream));
                            }
                            break;
                        case ZipHeaderType.DirectoryEnd:
                            {
                                byte[] bytes = (h as DirectoryEndHeader).Comment;
                                volume.Comment = ReaderOptions.ArchiveEncoding.Decode(bytes);
                                yield break;
                            }
                    }
                }
}

Where the ZipHeaderType.DirectoryEnd case populates the zip volume comment from the comment field in the DirectoryEndHeader.
However, the ReadSeekableHeader function that it calls only seems to return the DirectoryEntryHeader instances, so that case wasn't getting hit.

My thought was that changing ReadSeekableHeader to return the DirectoryEndHeader as well as the DirectoryEntryHeaders would cause the volume comment to be populated (that's what my linked changeset does), which is a means of getting at the comment data at least, even though it's only done when the entries are loaded.

Given that SeekableZipHeaderFactory.ReadSeekableHeader does actually create and populate an instance of DirectoryEndHeader up front (including reading the comment) in order to do the rest of the work, I imagine that you could make use of that directly rather than just returning it at the end of the entries collection? (would be nice to efficiently get the comment out of the file without having to parse the entries if possible). I am however working out how this stuff works as I go along, so i'm not sure of the best aproach to that.

@Numpsy commented on GitHub (Dec 19, 2018): The ZipArchive.LoadEntries function contains the code ``` foreach (ZipHeader h in headerFactory.ReadSeekableHeader(stream)) { if (h != null) { switch (h.ZipHeaderType) { case ZipHeaderType.DirectoryEntry: { yield return new ZipArchiveEntry(this, new SeekableZipFilePart(headerFactory, h as DirectoryEntryHeader, stream)); } break; case ZipHeaderType.DirectoryEnd: { byte[] bytes = (h as DirectoryEndHeader).Comment; volume.Comment = ReaderOptions.ArchiveEncoding.Decode(bytes); yield break; } } } } ``` Where the ZipHeaderType.DirectoryEnd case populates the zip volume comment from the comment field in the DirectoryEndHeader. However, the ReadSeekableHeader function that it calls only seems to return the DirectoryEntryHeader instances, so that case wasn't getting hit. My thought was that changing ReadSeekableHeader to return the DirectoryEndHeader as well as the DirectoryEntryHeaders would cause the volume comment to be populated (that's what my linked changeset does), which is a means of getting at the comment data at least, even though it's only done when the entries are loaded. Given that SeekableZipHeaderFactory.ReadSeekableHeader does actually create and populate an instance of DirectoryEndHeader up front (including reading the comment) in order to do the rest of the work, I imagine that you could make use of that directly rather than just returning it at the end of the entries collection? (would be nice to efficiently get the comment out of the file without having to parse the entries if possible). I am however working out how this stuff works as I go along, so i'm not sure of the best aproach to that.
Author
Owner

@adamhathcock commented on GitHub (Dec 21, 2018):

I would definitely like to fix the ZipArchive not using the dictionary if that's true. I'm not sure when I'll be able to get around to it over the holiday season and coming months.

I'll have to refresh myself with the zip archive format but if the archive comment has the directory offset then that would definitely be more efficient that backwards scanning which I think I was my previous method.,

@adamhathcock commented on GitHub (Dec 21, 2018): I would definitely like to fix the ZipArchive not using the dictionary if that's true. I'm not sure when I'll be able to get around to it over the holiday season and coming months. I'll have to refresh myself with the zip archive format but if the archive comment has the directory offset then that would definitely be more efficient that backwards scanning which I think I was my previous method.,
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/sharpcompress#231