Add support for NetApp WAFL filesystem #84

Open
opened 2026-01-29 15:08:54 +00:00 by claunia · 4 comments
Owner

Originally created by @darkstar on GitHub (Aug 26, 2016).

NetApp WAFL is an advanced filesystem that works similar to TUX3 and BTRFS. It includes a RAID subsystem with one or 2 parity disks (like btrfs) and utilizes a "walking journal" type write strategy. It has a custom partition scheme and each disk is self-describing with respect to its position in the RAID tree.

You can not find any documentation outside of patents and the binaries for their OS (OnTap). If you want to start with this, I have some stuff that can help with that:

  • A full-featured simulator (running on Linux) which creates small (a few gb each) disk images which are used internally just like real physical disks.
  • A binary from the "real" systems that run either on x86 or mips architecture, for disassembly (they even included a *.map file for convenience ;-)... I prefer working with the x86 binary
  • Some documentation collected from various patents and publicly available sources on the NetApp support site
  • Some (sample) code to parse the disk headers and raid tree data from either virtual disks from the emulator or "real" disk images from a physical system, and some sample disk images

Interesting details:

  • it uses 4kb blocks everywhere
  • it uses dedicated parity disks which allows growing raid groups without parity recomputation
  • Some disks use 520 bytes/sector, with 8 additional bytes per sector (64 per block) of checksum data (probably T10PI)
  • SATA disks use 512 bytes sectors but reserve every ninth sector for (probably) the same parity information, so you have to do some funny calculations to arrive at the actual sectors for each block
  • Physical "partitions" are called "aggregates" and they contain logical "volumes" that can be resized arbitrarily. Aggregates can only be grown by adding entire disks.
  • the filesystem contains forward and reverse mappings for each block (similar to what XFS is doing recently) for features like compression and deduplication, and "shortcut mappings" to speed up the lookup of physical blocks given only the "virtual" block number in the volume (block number in the volume has to be translated to block number in the aggregate first and only then it can be read from disk)
  • The RAID tree information is actually versioned, so you can have different versions of the description of each "level" in the tree (if newer RAID versions introduce new features, for example)

If you're interested I could send you the code, although I'm kinda reluctant to put it up on GitHub directly since it has not been clean-room reverse engineered and I don't want to anger NetApp's lawyers ;-)

Originally created by @darkstar on GitHub (Aug 26, 2016). NetApp WAFL is an advanced filesystem that works similar to TUX3 and BTRFS. It includes a RAID subsystem with one or 2 parity disks (like btrfs) and utilizes a "walking journal" type write strategy. It has a custom partition scheme and each disk is self-describing with respect to its position in the RAID tree. You can not find any documentation outside of patents and the binaries for their OS (OnTap). If you want to start with this, I have some stuff that can help with that: - A full-featured simulator (running on Linux) which creates small (a few gb each) disk images which are used internally just like real physical disks. - A binary from the "real" systems that run either on x86 or mips architecture, for disassembly (they even included a *.map file for convenience ;-)... I prefer working with the x86 binary - Some documentation collected from various patents and publicly available sources on the NetApp support site - Some (sample) code to parse the disk headers and raid tree data from either virtual disks from the emulator or "real" disk images from a physical system, and some sample disk images Interesting details: - it uses 4kb blocks everywhere - it uses dedicated parity disks which allows growing raid groups without parity recomputation - Some disks use 520 bytes/sector, with 8 additional bytes per sector (64 per block) of checksum data (probably T10PI) - SATA disks use 512 bytes sectors but reserve every ninth sector for (probably) the same parity information, so you have to do some funny calculations to arrive at the actual sectors for each block - Physical "partitions" are called "aggregates" and they contain logical "volumes" that can be resized arbitrarily. Aggregates can only be grown by adding entire disks. - the filesystem contains forward and reverse mappings for each block (similar to what XFS is doing recently) for features like compression and deduplication, and "shortcut mappings" to speed up the lookup of physical blocks given only the "virtual" block number in the volume (block number in the volume has to be translated to block number in the aggregate first and only then it can be read from disk) - The RAID tree information is actually versioned, so you can have different versions of the description of each "level" in the tree (if newer RAID versions introduce new features, for example) If you're interested I could send you the code, although I'm kinda reluctant to put it up on GitHub directly since it has not been clean-room reverse engineered and I don't want to anger NetApp's lawyers ;-)
claunia added the feature requestfilesystem labels 2026-01-29 15:08:54 +00:00
Author
Owner

@darkstar commented on GitHub (Aug 26, 2016):

Oh btw. this could be tricky with the current implementation since you'd need to open multiple files concurrently. But this is also true for (full) btrfs-support (since it does its own RAID), for some VSMD files (those that are split at 2gb boundaries) and for things like BIN/CUE etc.

Probably, opening one of these files should either scan the containing directory for "matching" files by itself (either using a fixed name, using a fixed extension and wildcard as name, or using the extension from the first file and scanning all other files with that extension) or should prompt the user to select/find the "missing" files

@darkstar commented on GitHub (Aug 26, 2016): Oh btw. this _could_ be tricky with the current implementation since you'd need to open multiple files concurrently. But this is also true for (full) btrfs-support (since it does its own RAID), for some VSMD files (those that are split at 2gb boundaries) and for things like BIN/CUE etc. Probably, opening one of these files should either scan the containing directory for "matching" files by itself (either using a fixed name, using a fixed extension and wildcard as name, or using the extension from the first file and scanning all other files with that extension) or should prompt the user to select/find the "missing" files
Author
Owner

@claunia commented on GitHub (Aug 26, 2016):

Multiple files disc images (Cuesheet.cue, Track1.bin, Track2.bin, e.g.) are easily supported. Multiple disk volumes are not. It would need an API to be designed for that.

About "clean room reverse engineering", I'm not a lawyer but if I understood it well when mine explained, in the European Union, when it is done for interoperatibility (DiscImageChef can fall in this category), you can disassembly the code and watch at how it works (but not watch confidential source code) as long as you code the interoperable application but do not publish the direct findings or disassembled code.

In a nutshell, that as long as you has not watched confidential information, you must code it yourself, you cannot send me the information for me to code it.

If the information was get just from guessing (no disassembly involved, by just looking at the on disk structures), you can publish the information.

Also, currently any filesystem priority is to identify and get information about them (only implementing Identify() and GetInformation() methods). Full read-only support is the least priority.

And per your description, read-only support will require parity and compression support. Getting a dynamic ReedSolomon (one with changeable parameters) and a compression API are high priority right now.

I'm going to add the compression API as soon as I solve all current issues (but variable track sizes) as several disc image formats depend on them.

@claunia commented on GitHub (Aug 26, 2016): Multiple files disc images (Cuesheet.cue, Track1.bin, Track2.bin, e.g.) are easily supported. Multiple disk volumes are not. It would need an API to be designed for that. About "clean room reverse engineering", I'm not a lawyer but if I understood it well when mine explained, in the European Union, when it is done for interoperatibility (DiscImageChef can fall in this category), you can disassembly the code and watch at how it works (but not watch confidential source code) as long as you code the interoperable application but do not publish the direct findings or disassembled code. In a nutshell, that as long as you has not watched confidential information, you must code it yourself, you cannot send me the information for me to code it. If the information was get just from guessing (no disassembly involved, by just looking at the on disk structures), you can publish the information. Also, currently any filesystem priority is to identify and get information about them (only implementing Identify() and GetInformation() methods). Full read-only support is the least priority. And per your description, read-only support will require parity and compression support. Getting a dynamic ReedSolomon (one with changeable parameters) and a compression API are high priority right now. I'm going to add the compression API as soon as I solve all current issues (but variable track sizes) as several disc image formats depend on them.
Author
Owner

@darkstar commented on GitHub (Aug 26, 2016):

Okay, I think I understand what you're saying about the reverse-engineering issue...

Compression is not neccessarily required as it is an optional feature in later versions of the O/S. RS decoders are also not required since OnTap uses only XOR calculations in its RAID implementation.

However, I think a CRC API would be good, since lots of disk image formats do some kind of checksum to verify data. I have a very flexible CRC implementation with lots of different CRC parameters/polynomials already in place that I can offer. It could easily be extended to support other checksums as well.

You can find it here: http://pastebin.com/6fYUYNPA

@darkstar commented on GitHub (Aug 26, 2016): Okay, I think I understand what you're saying about the reverse-engineering issue... Compression is not neccessarily required as it is an optional feature in later versions of the O/S. RS decoders are also not required since OnTap uses only XOR calculations in its RAID implementation. However, I think a CRC API would be good, since lots of disk image formats do some kind of checksum to verify data. I have a very flexible CRC implementation with lots of different CRC parameters/polynomials already in place that I can offer. It could easily be extended to support other checksums as well. You can find it here: http://pastebin.com/6fYUYNPA
Author
Owner

@claunia commented on GitHub (Aug 27, 2016):

There is already a CRC API in https://github.com/claunia/DiscImageChef/tree/master/DiscImageChef.Checksums that includes ANSI CRC-16, ISO CRC-32 and ECMA CRC-64. The API also supports Adler-32, MD5, RIPEMD-160, SHA1, SHA2 and SpamSum.

Teledisk and CopyQM are using their own CRC implementations because they're the only ones using them at all, but the existing implementations could be easily made to work with variable polys and parameters.

RS is nonetheless required for CDs, DVDs, Blu-rays, magneto-opticals, SCSI/ATA hard disks, etc, so it's really a priority to have an implementation with variable parameters. Currently there is one with CD parameters for main channel hard coded (does not work with CD subchannel, less even other media).

@claunia commented on GitHub (Aug 27, 2016): There is already a CRC API in https://github.com/claunia/DiscImageChef/tree/master/DiscImageChef.Checksums that includes ANSI CRC-16, ISO CRC-32 and ECMA CRC-64. The API also supports Adler-32, MD5, RIPEMD-160, SHA1, SHA2 and SpamSum. Teledisk and CopyQM are using their own CRC implementations because they're the only ones using them at all, but the existing implementations could be easily made to work with variable polys and parameters. RS is nonetheless required for CDs, DVDs, Blu-rays, magneto-opticals, SCSI/ATA hard disks, etc, so it's really a priority to have an implementation with variable parameters. Currently there is one with CD parameters for main channel hard coded (does not work with CD subchannel, less even other media).
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: aaru-dps/Aaru-aaru-dps#84