Handling concatenated GZip streams #312

Open
opened 2026-01-29 22:09:54 +00:00 by claunia · 0 comments
Owner

Originally created by @arvindshmicrosoft on GitHub (Jul 2, 2018).

Is there a way to correctly handled concatenated GZip streams such as the one here: https://commoncrawl.s3.amazonaws.com/crawl-data/CC-MAIN-2017-47/segments/1510934803848.60/wat/CC-MAIN-20171117170336-20171117190336-00002.warc.wat.gz?

There is a thread on this on StackOverflow (https://stackoverflow.com/questions/47743788/gzipstream-from-memorystream-only-returns-a-few-hundred-bytes) which in turn spawned the .NET Core issue (https://github.com/dotnet/corefx/issues/27279). In my basic test with SharpCompress, I am unable to figure out how to handle these types of files correctly. Any ideas would be welcome.

Originally created by @arvindshmicrosoft on GitHub (Jul 2, 2018). Is there a way to correctly handled concatenated GZip streams such as the one here: https://commoncrawl.s3.amazonaws.com/crawl-data/CC-MAIN-2017-47/segments/1510934803848.60/wat/CC-MAIN-20171117170336-20171117190336-00002.warc.wat.gz? There is a thread on this on StackOverflow (https://stackoverflow.com/questions/47743788/gzipstream-from-memorystream-only-returns-a-few-hundred-bytes) which in turn spawned the .NET Core issue (https://github.com/dotnet/corefx/issues/27279). In my basic test with SharpCompress, I am unable to figure out how to handle these types of files correctly. Any ideas would be welcome.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/sharpcompress#312