Question: How to get XZ uncompressed size #455

Open
opened 2026-01-29 22:12:21 +00:00 by claunia · 5 comments
Owner

Originally created by @x1unix on GitHub (Apr 26, 2021).

Hello, as far as I know XZ format has index section which contains archive metadata (most notably - uncompressed size).

I've skimmed through XZ implementation in this package and looks like sharpcompress can read XZ index, but it's impossible to get XZBlock information without reading and decompressing whole archive contents.

How can I get XZ index information using this library without extracting archive contents?

It would nice to have to populate uncompressed stream size in Length property.

Originally created by @x1unix on GitHub (Apr 26, 2021). Hello, as far as I know XZ format has index section which contains archive metadata (most notably - uncompressed size). I've skimmed through XZ implementation in this package and looks like sharpcompress can read XZ index, but it's impossible to get XZBlock information without reading and decompressing whole archive contents. How can I get XZ index information using this library without extracting archive contents? It would nice to have to populate uncompressed stream size in `Length` property.
claunia added the enhancementup for grabs labels 2026-01-29 22:12:21 +00:00
Author
Owner

@adamhathcock commented on GitHub (Apr 26, 2021):

If it's in the metadata, then it's something that's just been overlooked for whatever reason. Should be a relatively quick thing to do.

@adamhathcock commented on GitHub (Apr 26, 2021): If it's in the metadata, then it's something that's just been overlooked for whatever reason. Should be a relatively quick thing to do.
Author
Owner

@x1unix commented on GitHub (Apr 26, 2021):

@adamhathcock as far as I understand, uncompressed size can be calculated by reading XZIndex, but currently there is no known option to read only archive structure without unarchiving Xz contents (as XZStream returns extracted archive contents).

XZIndex becomes available only after a whole archive was read:

XzStream.cs

       public override int Read(byte[] buffer, int offset, int count)
        {
            int bytesRead = 0;
            if (_endOfStream)
            {
                return bytesRead;
            }

            if (!HeaderIsRead)
            {
                ReadHeader();
            }

            bytesRead = ReadBlocks(buffer, offset, count);
            if (bytesRead < count)
            {
                _endOfStream = true;
                ReadIndex();
                ReadFooter();
            }
            return bytesRead;
        }
@x1unix commented on GitHub (Apr 26, 2021): @adamhathcock as far as I understand, uncompressed size can be calculated by reading `XZIndex`, but currently there is no known option to read only archive structure without unarchiving Xz contents (as `XZStream` returns extracted archive contents). `XZIndex` becomes available only after a whole archive was read: **XzStream.cs** ```cs public override int Read(byte[] buffer, int offset, int count) { int bytesRead = 0; if (_endOfStream) { return bytesRead; } if (!HeaderIsRead) { ReadHeader(); } bytesRead = ReadBlocks(buffer, offset, count); if (bytesRead < count) { _endOfStream = true; ReadIndex(); ReadFooter(); } return bytesRead; } ```
Author
Owner

@x1unix commented on GitHub (Apr 26, 2021):

Similar issue in related lzma project - https://github.com/addaleax/lzma-native/issues/15

Might be useful for implementation.

@x1unix commented on GitHub (Apr 26, 2021): Similar issue in related lzma project - https://github.com/addaleax/lzma-native/issues/15 Might be useful for implementation.
Author
Owner

@adamhathcock commented on GitHub (Jun 4, 2021):

Zip has the same issue with streamed files where you don't know the size before compression.

We should be able to implement this size on XZ when using Archive strategy but not Reader strategy

@adamhathcock commented on GitHub (Jun 4, 2021): Zip has the same issue with streamed files where you don't know the size before compression. We should be able to implement this size on XZ when using Archive strategy but not Reader strategy
Author
Owner

@x1unix commented on GitHub (Jul 13, 2021):

@adamhathcock here is a simple snippet to calculate uncompressed size of XZ contents. Hope it helps.

Works only with seekable streams. For non-seakable streams, a whole file should be read before.

public class XzFileInfo
    {
        private const int XzHeaderSize = 12;
        public static ulong GetUncompressedSize(string filePath)
        {
            using var file = File.Open(filePath, FileMode.Open);

            // Read the footer from the end. Footer size is 12 bytes according to the spec.
            file.Seek(-XzHeaderSize, SeekOrigin.End);
            var footer = XZFooter.FromStream(file);
            Debug.WriteLine($"BackwardSize: {footer.BackwardSize}");

            // Get xz index offset from BackwardSize and seek to it.
            file.Seek(-(XzHeaderSize + footer.BackwardSize), SeekOrigin.End);
            var index = XZIndex.FromStream(file, false);
            Debug.WriteLine($"Index: number of records - {index.NumberOfRecords}");

            // Calculate total uncompressed size of each block. 
            var size = index.Records.Select(r => r.UncompressedSize).Aggregate((acc, x) => acc + x);
            Debug.WriteLine($"Total size of uncompressed archive: {UnitFormatter.FormatByteSize(size)} ({size} bytes)");
            return size;
        }
    }
@x1unix commented on GitHub (Jul 13, 2021): @adamhathcock here is a simple snippet to calculate uncompressed size of XZ contents. Hope it helps. Works only with seekable streams. For non-seakable streams, a whole file should be read before. ```csharp public class XzFileInfo { private const int XzHeaderSize = 12; public static ulong GetUncompressedSize(string filePath) { using var file = File.Open(filePath, FileMode.Open); // Read the footer from the end. Footer size is 12 bytes according to the spec. file.Seek(-XzHeaderSize, SeekOrigin.End); var footer = XZFooter.FromStream(file); Debug.WriteLine($"BackwardSize: {footer.BackwardSize}"); // Get xz index offset from BackwardSize and seek to it. file.Seek(-(XzHeaderSize + footer.BackwardSize), SeekOrigin.End); var index = XZIndex.FromStream(file, false); Debug.WriteLine($"Index: number of records - {index.NumberOfRecords}"); // Calculate total uncompressed size of each block. var size = index.Records.Select(r => r.UncompressedSize).Aggregate((acc, x) => acc + x); Debug.WriteLine($"Total size of uncompressed archive: {UnitFormatter.FormatByteSize(size)} ({size} bytes)"); return size; } } ```
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/sharpcompress#455