Read ArchiveComment #231

New Issue

claunia · 2026-01-29T22:08:44Z

claunia commented

2026-01-29 22:08:44 +00:00

Originally created by @dos-ise on GitHub (Aug 22, 2017).

Is there a way to read an archivecomment?
Found a way to write it but not to read.

Originally created by @dos-ise on GitHub (Aug 22, 2017). Is there a way to read an archivecomment? Found a way to write it but not to read.

claunia added the enhancement up for grabs labels 2026-01-29 22:08:44 +00:00

claunia commented

2026-01-29 22:08:45 +00:00

@dos-ise commented on GitHub (Jan 18, 2018):

I added a pullrequest for this feature.

https://github.com/adamhathcock/sharpcompress/pull/341

At the moment i am using the follwoing extension.

`using System.IO;
using System.Text;
using System;
using System.Linq;
using System.Reflection;

using SharpCompress.Archives.Zip;

namespace Knx.Ets.Osprey
{
public static class ZipArchiveExtension
{
private const int MAX_ITERATIONS_FOR_DIRECTORY_HEADER = 4096;

private const uint DIRECTORY_END_HEADER_BYTES = 0x06054b50;

public static string Comment(this ZipArchive archive)
{
  var onlyVolume = archive.Volumes.Single();
  var stream = onlyVolume.GetType().GetProperty("Stream", BindingFlags.Instance | BindingFlags.NonPublic | BindingFlags.Public).GetValue(onlyVolume) as Stream;
  BinaryReader reader = new BinaryReader(stream);
  SeekBackToHeader(stream, reader, DIRECTORY_END_HEADER_BYTES);

  var VolumeNumber = reader.ReadUInt16();
  var FirstVolumeWithDirectory = reader.ReadUInt16();
  var TotalNumberOfEntriesInDisk = reader.ReadUInt16();
  var TotalNumberOfEntries = reader.ReadUInt16();
  var DirectorySize = reader.ReadUInt32();
  var DirectoryStartOffsetRelativeToDisk = reader.ReadUInt32();
  var CommentLength = reader.ReadUInt16();
  var comment = reader.ReadBytes(CommentLength);

  var comm = Encoding.UTF8.GetString(comment, 0, comment.Length);
  return comm;
}

private static void SeekBackToHeader(Stream stream, BinaryReader reader, uint headerSignature)
{
  long offset = 0;
  uint signature;
  int iterationCount = 0;
  do
  {
    if ((stream.Length + offset) - 4 < 0)
    {
      throw new Exception("Failed to locate the Zip Header");
    }
    stream.Seek(offset - 4, SeekOrigin.End);
    signature = reader.ReadUInt32();
    offset--;
    iterationCount++;
    if (iterationCount > MAX_ITERATIONS_FOR_DIRECTORY_HEADER)
    {
      throw new Exception("Could not find Zip file Directory at the end of the file.  File may be corrupted.");
    }
  }
  while (signature != headerSignature);
}

}
}
`

@dos-ise commented on GitHub (Jan 18, 2018): I added a pullrequest for this feature. https://github.com/adamhathcock/sharpcompress/pull/341 At the moment i am using the follwoing extension. `using System.IO; using System.Text; using System; using System.Linq; using System.Reflection; using SharpCompress.Archives.Zip; namespace Knx.Ets.Osprey { public static class ZipArchiveExtension { private const int MAX_ITERATIONS_FOR_DIRECTORY_HEADER = 4096; private const uint DIRECTORY_END_HEADER_BYTES = 0x06054b50; public static string Comment(this ZipArchive archive) { var onlyVolume = archive.Volumes.Single(); var stream = onlyVolume.GetType().GetProperty("Stream", BindingFlags.Instance | BindingFlags.NonPublic | BindingFlags.Public).GetValue(onlyVolume) as Stream; BinaryReader reader = new BinaryReader(stream); SeekBackToHeader(stream, reader, DIRECTORY_END_HEADER_BYTES); var VolumeNumber = reader.ReadUInt16(); var FirstVolumeWithDirectory = reader.ReadUInt16(); var TotalNumberOfEntriesInDisk = reader.ReadUInt16(); var TotalNumberOfEntries = reader.ReadUInt16(); var DirectorySize = reader.ReadUInt32(); var DirectoryStartOffsetRelativeToDisk = reader.ReadUInt32(); var CommentLength = reader.ReadUInt16(); var comment = reader.ReadBytes(CommentLength); var comm = Encoding.UTF8.GetString(comment, 0, comment.Length); return comm; } private static void SeekBackToHeader(Stream stream, BinaryReader reader, uint headerSignature) { long offset = 0; uint signature; int iterationCount = 0; do { if ((stream.Length + offset) - 4 < 0) { throw new Exception("Failed to locate the Zip Header"); } stream.Seek(offset - 4, SeekOrigin.End); signature = reader.ReadUInt32(); offset--; iterationCount++; if (iterationCount > MAX_ITERATIONS_FOR_DIRECTORY_HEADER) { throw new Exception("Could not find Zip file Directory at the end of the file. File may be corrupted."); } } while (signature != headerSignature); } } } `

claunia commented

2026-01-29 22:08:45 +00:00

@Numpsy commented on GitHub (Dec 14, 2018):

Hi,

I thought this would be a useful feature to have, so I had a look at the code and noticed that ZipArchive.LoadEntries contains code to populate the volume comment from the DirectoryEndHeader, but that didn't seem to be getting called.

I was wondering if doing something like 12a6d3977e to return the DirectoryEndHeader from ReadSeekableHeader in order to populate the volume comment would be a reasonable place to start? (you do have to have loaded the entries before reading the comment from the volume though).

@Numpsy commented on GitHub (Dec 14, 2018): Hi, I thought this would be a useful feature to have, so I had a look at the code and noticed that ZipArchive.LoadEntries contains code to populate the volume comment from the DirectoryEndHeader, but that didn't seem to be getting called. I was wondering if doing something like https://github.com/Numpsy/sharpcompress/commit/12a6d3977e548c3ccdf280ec6a88215552819e25 to return the DirectoryEndHeader from ReadSeekableHeader in order to populate the volume comment would be a reasonable place to start? (you do have to have loaded the entries before reading the comment from the volume though).

claunia commented

2026-01-29 22:08:46 +00:00

@adamhathcock commented on GitHub (Dec 19, 2018):

I'm pretty sure the Archive API uses the dictionary at the end of the file to find entries and seek to them.

Maybe I'm misunderstanding the proposal.

@adamhathcock commented on GitHub (Dec 19, 2018): I'm pretty sure the Archive API uses the dictionary at the end of the file to find entries and seek to them. Maybe I'm misunderstanding the proposal.

claunia commented

2026-01-29 22:08:46 +00:00

@Numpsy commented on GitHub (Dec 19, 2018):

The ZipArchive.LoadEntries function contains the code

            foreach (ZipHeader h in headerFactory.ReadSeekableHeader(stream))
            {
                if (h != null)
                {
                    switch (h.ZipHeaderType)
                    {
                        case ZipHeaderType.DirectoryEntry:
                            {
                                yield return new ZipArchiveEntry(this,
                                                                 new SeekableZipFilePart(headerFactory,
                                                                                         h as DirectoryEntryHeader,
                                                                                         stream));
                            }
                            break;
                        case ZipHeaderType.DirectoryEnd:
                            {
                                byte[] bytes = (h as DirectoryEndHeader).Comment;
                                volume.Comment = ReaderOptions.ArchiveEncoding.Decode(bytes);
                                yield break;
                            }
                    }
                }
}

Where the ZipHeaderType.DirectoryEnd case populates the zip volume comment from the comment field in the DirectoryEndHeader.
However, the ReadSeekableHeader function that it calls only seems to return the DirectoryEntryHeader instances, so that case wasn't getting hit.

My thought was that changing ReadSeekableHeader to return the DirectoryEndHeader as well as the DirectoryEntryHeaders would cause the volume comment to be populated (that's what my linked changeset does), which is a means of getting at the comment data at least, even though it's only done when the entries are loaded.

Given that SeekableZipHeaderFactory.ReadSeekableHeader does actually create and populate an instance of DirectoryEndHeader up front (including reading the comment) in order to do the rest of the work, I imagine that you could make use of that directly rather than just returning it at the end of the entries collection? (would be nice to efficiently get the comment out of the file without having to parse the entries if possible). I am however working out how this stuff works as I go along, so i'm not sure of the best aproach to that.

@Numpsy commented on GitHub (Dec 19, 2018): The ZipArchive.LoadEntries function contains the code ``` foreach (ZipHeader h in headerFactory.ReadSeekableHeader(stream)) { if (h != null) { switch (h.ZipHeaderType) { case ZipHeaderType.DirectoryEntry: { yield return new ZipArchiveEntry(this, new SeekableZipFilePart(headerFactory, h as DirectoryEntryHeader, stream)); } break; case ZipHeaderType.DirectoryEnd: { byte[] bytes = (h as DirectoryEndHeader).Comment; volume.Comment = ReaderOptions.ArchiveEncoding.Decode(bytes); yield break; } } } } ``` Where the ZipHeaderType.DirectoryEnd case populates the zip volume comment from the comment field in the DirectoryEndHeader. However, the ReadSeekableHeader function that it calls only seems to return the DirectoryEntryHeader instances, so that case wasn't getting hit. My thought was that changing ReadSeekableHeader to return the DirectoryEndHeader as well as the DirectoryEntryHeaders would cause the volume comment to be populated (that's what my linked changeset does), which is a means of getting at the comment data at least, even though it's only done when the entries are loaded. Given that SeekableZipHeaderFactory.ReadSeekableHeader does actually create and populate an instance of DirectoryEndHeader up front (including reading the comment) in order to do the rest of the work, I imagine that you could make use of that directly rather than just returning it at the end of the entries collection? (would be nice to efficiently get the comment out of the file without having to parse the entries if possible). I am however working out how this stuff works as I go along, so i'm not sure of the best aproach to that.

claunia commented

2026-01-29 22:08:46 +00:00

@adamhathcock commented on GitHub (Dec 21, 2018):

I would definitely like to fix the ZipArchive not using the dictionary if that's true. I'm not sure when I'll be able to get around to it over the holiday season and coming months.

I'll have to refresh myself with the zip archive format but if the archive comment has the directory offset then that would definitely be more efficient that backwards scanning which I think I was my previous method.,

@adamhathcock commented on GitHub (Dec 21, 2018): I would definitely like to fix the ZipArchive not using the dictionary if that's true. I'm not sure when I'll be able to get around to it over the holiday season and coming months. I'll have to refresh myself with the zip archive format but if the archive comment has the directory offset then that would definitely be more efficient that backwards scanning which I think I was my previous method.,

claunia referenced this issue

2026-01-29 22:18:12 +00:00

[PR #231] Vs2017 #919

Sign in to join this conversation.

Branches Tags

master

release

adam/merge-release-to-master

dependabot/nuget/xunit.v3-3.2.2

adam/more-explode-async

copilot/fix-infinite-loop-rar-archive

adam/data-descriptor-fix

adam/fix-tests-with-proper-rewind

copilot/fix-data-descriptor-stream-bug

adam/lmza-investigation

adam/create-rar-async

adam/async-rar2

copilot/support-multi-threading-path

copilot/sub-pr-1132-again

adam/memory-perf

copilot/add-performance-benchmarking

copilot/sub-pr-1121

copilot/add-password-support-zip-files

copilot/add-so-optimized-zip-support

adam/rar-async-only

copilot/add-buffered-stream-async-read

copilot/sub-pr-1076

copilot/fix-decompression-exception

copilot/fix-archivefactory-issue

copilot/rationalize-sourcestream-volumes

adam/open-async

copilot/add-ace-archive-support

copilot/sub-pr-1040-again

adam/more-async-3

copilot/fix-tararchive-incomplete-iteration

adam/multi-threaded

copilot/sub-pr-1040

adam/awesome-copilot

copilot/fix-ziparchive-extraction-issue

copilot/fix-tararchive-open-crash

copilot/fix-tar-xz-file-reading-issue

copilot/setup-copilot-instructions

copilot/fix-decompression-performance-issue

copilot/convert-stream-access-to-async

adam/enable-agent

adam/async-deflate

adam/async-rar

adam/more-cleanup

adam/zstd

async-2

zstandard

net461-tests

dmg

async

build-netcore3

recycle-memory-stream

presentation

pax

netcore2

zip_encryption

dotnet-tool

tar_redux

native_zlib

Issue-197

system_buffers

TarNames

7zip_sfx

portable_crypto

WinRT

new_7zip

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: starred/sharpcompress#231