Tar archiving of huge data #261

Open
opened 2026-01-29 22:09:12 +00:00 by claunia · 8 comments
Owner

Originally created by @4ybaka on GitHub (Dec 1, 2017).

I want to use sharpcompress to create huge tar archives (TBs) without compression but I can't do it by next reasons:

  1. There is no async interface for TarWriter. Is it by design?
  2. There is no interface to write buffer. Yes, I can create another stream class to gather data but there is overhead to convert buffer (that I already have) into stream just to unwrap it into buffer inside sharpcompress.
  3. Actually TarHeader class will be enough for me but by some reason it is internal. Why?

Is there any chance that some of these issues will be resolved?

Originally created by @4ybaka on GitHub (Dec 1, 2017). I want to use sharpcompress to create huge tar archives (TBs) without compression but I can't do it by next reasons: 1. There is no async interface for TarWriter. Is it by design? 2. There is no interface to write buffer. Yes, I can create another stream class to gather data but there is overhead to convert buffer (that I already have) into stream just to unwrap it into buffer inside sharpcompress. 3. Actually TarHeader class will be enough for me but by some reason it is internal. Why? Is there any chance that some of these issues will be resolved?
Author
Owner

@turbolocust commented on GitHub (Dec 2, 2017):

You could capture the task and thus solve your first problem. See my solution here:
https://github.com/turbolocust/SimpleZIP/blob/master/SimpleZIP_UI/Application/Util/WriterUtils.cs

It can also be done without a child or nested task, but the immediate cancellation of the whole operation isn't as reliable then.

@turbolocust commented on GitHub (Dec 2, 2017): You could capture the task and thus solve your first problem. See my solution here: https://github.com/turbolocust/SimpleZIP/blob/master/SimpleZIP_UI/Application/Util/WriterUtils.cs It can also be done without a child or nested task, but the immediate cancellation of the whole operation isn't as reliable then.
Author
Owner

@adamhathcock commented on GitHub (Dec 2, 2017):

  1. Making everything async is kind of just a matter of search/replace. I’m not 100% there’s benefit but I love async/await so hey.

2). I’m not 100% sure what you’re asking for here

3)Not sure how this helps.

I don’t honestly see what’s blocking the creation of large tar files. Maybe a code sample will help.

@adamhathcock commented on GitHub (Dec 2, 2017): 1) Making everything async is kind of just a matter of search/replace. I’m not 100% there’s benefit but I love async/await so hey. 2). I’m not 100% sure what you’re asking for here 3)Not sure how this helps. I don’t honestly see what’s blocking the creation of large tar files. Maybe a code sample will help.
Author
Owner

@4ybaka commented on GitHub (Dec 2, 2017):

One of issues with archiving huge amount of data is resuming of process (due to instance reboot, IO failure, etc). At this moment I have next issues with sharpcompress:

  1. On Dispose TarWriter will "close" archive with double call of PadTo512(0, true);
  2. Providing sync stream to TarWriter means that I should block on all async operations in stream.
  3. In my API I have buffer pool with 4MB arrays. So to transfer 1GB need 262 buffers. But in real life used about 10-30 buffers (they just reused when don't need anymore). But sharpcompress will allocate additionally 13K buffers of 80KB each.

If TarHeader class will be available outside of library then it is pretty easy to implemenet resume logic - if written data length more then header size - just skip header and some part of content. Otherwise serialize header and skip part of it's content.

@4ybaka commented on GitHub (Dec 2, 2017): One of issues with archiving huge amount of data is resuming of process (due to instance reboot, IO failure, etc). At this moment I have next issues with sharpcompress: 1. On Dispose TarWriter will "close" archive with double call of PadTo512(0, true); 2. Providing sync stream to TarWriter means that I should block on all async operations in stream. 3. In my API I have buffer pool with 4MB arrays. So to transfer 1GB need 262 buffers. But in real life used about 10-30 buffers (they just reused when don't need anymore). But sharpcompress will allocate additionally 13K buffers of 80KB each. If TarHeader class will be available outside of library then it is pretty easy to implemenet resume logic - if written data length more then header size - just skip header and some part of content. Otherwise serialize header and skip part of it's content.
Author
Owner

@adamhathcock commented on GitHub (Dec 7, 2017):

Now using ArrayPool for Skip/Transfer: https://github.com/adamhathcock/sharpcompress/pull/326

This should help for 3

@adamhathcock commented on GitHub (Dec 7, 2017): Now using ArrayPool for Skip/Transfer: https://github.com/adamhathcock/sharpcompress/pull/326 This should help for 3
Author
Owner

@adamhathcock commented on GitHub (Dec 7, 2017):

Fix for 1: https://github.com/adamhathcock/sharpcompress/pull/327

I felt like I did it for a reason though

@adamhathcock commented on GitHub (Dec 7, 2017): Fix for 1: https://github.com/adamhathcock/sharpcompress/pull/327 I felt like I did it for a reason though
Author
Owner

@adamhathcock commented on GitHub (Dec 7, 2017):

I would like to make it async all the but that's a bigger PR.

@adamhathcock commented on GitHub (Dec 7, 2017): I would like to make it async all the but that's a bigger PR.
Author
Owner

@4ybaka commented on GitHub (Jan 11, 2018):

@adamhathcock do you have any thoughts regarding PR?

@4ybaka commented on GitHub (Jan 11, 2018): @adamhathcock do you have any thoughts regarding PR?
Author
Owner

@4ybaka commented on GitHub (Jan 16, 2018):

@adamhathcock when do you plan to create a new release? I want to use a new version with new writer options.

@4ybaka commented on GitHub (Jan 16, 2018): @adamhathcock when do you plan to create a new release? I want to use a new version with new writer options.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/sharpcompress#261