ArchiveFactory fails with this "tar.gz" file #610

Closed
opened 2026-01-29 22:14:34 +00:00 by claunia · 5 comments
Owner

Originally created by @jbrockerville on GitHub (Jan 3, 2024).

I made "archive-linux-tar.tar.gz" file with tar -czf archive.tar.gz tar.gz.txt (attachment renamed) and the ArchiveFactory contains an entry with a null Key. I think it might be using GZipArchive instead of TarArchive?

I then made "archive-linux-tar-gzip.tar.gz" with tar -cf archive.tar tar.gz.txt and gzip -5 archive.tar (attachment renamed) and that was fine.

NGL, I'm not a huge Linux guy, so I'm assuming both methods of creating a "tar.gz" are valid.

archive-linux-tar.tar.gz
archive-linux-tar-gzip.tar.gz

Originally created by @jbrockerville on GitHub (Jan 3, 2024). I made "archive-linux-tar.tar.gz" file with `tar -czf archive.tar.gz tar.gz.txt` (attachment renamed) and the `ArchiveFactory` contains an entry with a null `Key`. I think it might be using `GZipArchive` instead of `TarArchive`? I then made "archive-linux-tar-gzip.tar.gz" with `tar -cf archive.tar tar.gz.txt` and `gzip -5 archive.tar` (attachment renamed) and that was fine. NGL, I'm not a huge Linux guy, so I'm assuming both methods of creating a "tar.gz" are valid. [archive-linux-tar.tar.gz](https://github.com/adamhathcock/sharpcompress/files/13821750/archive-linux-tar.tar.gz) [archive-linux-tar-gzip.tar.gz](https://github.com/adamhathcock/sharpcompress/files/13821792/archive-linux-tar-gzip.tar.gz)
Author
Owner

@Erior commented on GitHub (Jan 4, 2024):

Tar files are forward reading archives (think old tape backup), try out the ReaderFactory, I think it will work much better for you.

I do see the detection for file.tar. is not really well supported for the ArchiveFactory, this could be improved, not sure what GZipArchive is supposed to do, it's not really an archive format as such.
Perhaps @adamhathcock can give advice on what is expected.

gzip as such may not contain a name for the file you compressed , check the "-N, --name" option, you can do "man gzip" or search on the web for more info regarding that.

@Erior commented on GitHub (Jan 4, 2024): Tar files are forward reading archives (think old tape backup), try out the ReaderFactory, I think it will work much better for you. I do see the detection for file.tar.<your fav compression> is not really well supported for the ArchiveFactory, this could be improved, not sure what GZipArchive is supposed to do, it's not really an archive format as such. Perhaps @adamhathcock can give advice on what is expected. gzip as such may not contain a name for the file you compressed , check the "-N, --name" option, you can do "man gzip" or search on the web for more info regarding that.
Author
Owner

@jbrockerville commented on GitHub (Jan 5, 2024):

Thanks for replying @Erior. The ReaderFactory works for all the different tar-gz files I made. However, I'm not using the ReaderFactory because using the ArchiveFactory gets me 7Zip support. The TarArchive class should handle the file I made. According to FORMATS.md anyways.

gzip as such may not contain a name for the file you compressed

Did you maybe get that backwards? The one-step file made with only tar has the null Key. The two-step file made with tar and then gzip is handled just fine.

@jbrockerville commented on GitHub (Jan 5, 2024): Thanks for replying @Erior. The `ReaderFactory` works for all the different tar-gz files I made. However, I'm not using the `ReaderFactory` because using the `ArchiveFactory` gets me 7Zip support. The `TarArchive` class *should* handle the file I made. According to [FORMATS.md](https://github.com/adamhathcock/sharpcompress/blob/master/FORMATS.md) anyways. >gzip as such may not contain a name for the file you compressed Did you maybe get that backwards? The one-step file made with only `tar` has the null `Key`. The two-step file made with `tar` and then `gzip` is handled just fine.
Author
Owner

@Erior commented on GitHub (Jan 5, 2024):

Problem, it is not detected as a Tar Archive, if you skip the name and just open the internal entry stream again you would get the tar archive.

For the second part, If you did "gzip -5n" you would get the same "no name" scenario for both streams.

@Erior commented on GitHub (Jan 5, 2024): Problem, it is not detected as a Tar Archive, if you skip the name and just open the internal entry stream again you would get the tar archive. For the second part, If you did "gzip -5n" you would get the same "no name" scenario for both streams.
Author
Owner

@jbrockerville commented on GitHub (Jan 5, 2024):

if you skip the name and just open the internal entry stream again you would get the tar archive.

Yeah, but I don't want to do that. But that's just my preference in this specific scenario--I want all the entries to have names. But a null Key is prolly the same design decision I'd make here. Oh well. I'll work around it.

If you did "gzip -5n" you would get the same "no name" scenario for both streams.

Hmm... I guess that's what tar is doing with the -z flag. Prolly the other compression options, too. Seems valid then.

Given all that, I guess this isn't a bug. Closing. Thanks for the discussion.

@jbrockerville commented on GitHub (Jan 5, 2024): >if you skip the name and just open the internal entry stream again you would get the tar archive. Yeah, but I don't want to do that. But that's just my preference in this specific scenario--I want all the entries to have names. But a `null` `Key` is prolly the same design decision I'd make here. Oh well. I'll work around it. >If you did "gzip -5n" you would get the same "no name" scenario for both streams. Hmm... I guess that's what `tar` is doing with the `-z` flag. Prolly the other compression options, too. Seems valid then. Given all that, I guess this isn't a bug. Closing. Thanks for the discussion.
Author
Owner

@adamhathcock commented on GitHub (Jan 8, 2024):

gzip is a compression around tar which is just a file format. Can't really have random access around a tar.gz

There is a header for gzip that may contain the filename of the tar but it's not required....haven't looked at the file format for gzip for years.

Hope this helps

@adamhathcock commented on GitHub (Jan 8, 2024): gzip is a compression around tar which is just a file format. Can't really have random access around a `tar.gz` There is a header for gzip that may contain the filename of the tar but it's not required....haven't looked at the file format for gzip for years. Hope this helps
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/sharpcompress#610