From 259acd0694f48b69b3eb932d79821a4496c3556a Mon Sep 17 00:00:00 2001 From: Adam Hathcock Date: Sun, 29 Apr 2018 15:09:26 +0100 Subject: [PATCH] misc additions --- README.md | 2 +- build.sh | 2 +- reference/RAR 5.0 archive format.htm | 1197 ++++++++++++++++++++++++++ 3 files changed, 1199 insertions(+), 2 deletions(-) create mode 100644 reference/RAR 5.0 archive format.htm diff --git a/README.md b/README.md index b3f04c30..47fce793 100644 --- a/README.md +++ b/README.md @@ -38,7 +38,7 @@ I'm always looking for help or ideas. Please submit code or email with ideas. Un ## TODOs (always lots) -* RAR 5 support +* RAR 5 decryption support * 7Zip writing * Zip64 (Need writing and extend Reading) * Multi-volume Zip support. diff --git a/build.sh b/build.sh index 71dd07dc..d478e0a7 100755 --- a/build.sh +++ b/build.sh @@ -8,7 +8,7 @@ # Define directories. SCRIPT_DIR=$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd ) TOOLS_DIR=$SCRIPT_DIR/tools -CAKE_VERSION=0.26.0 +CAKE_VERSION=0.27.1 CAKE_DLL=$TOOLS_DIR/Cake.CoreCLR.$CAKE_VERSION/Cake.dll # Make sure the tools folder exist. diff --git a/reference/RAR 5.0 archive format.htm b/reference/RAR 5.0 archive format.htm new file mode 100644 index 00000000..ecaae5e2 --- /dev/null +++ b/reference/RAR 5.0 archive format.htm @@ -0,0 +1,1197 @@ + + + +RAR 5.0 archive format + + + + +

RAR 5.0 archive format

+ +

Here we describe basic data structures of archive format introduced +in RAR 5.0. If you need information about algorithms or more detailed +information on data structures, please use UnRAR source code.

+ +

Contents

+ + +

Data types

+ +

vint

+ +

Variable length integer. Can include one or more bytes, where lower +7 bits of every byte contain integer data and highest bit in every byte +is the continuation flag. If highest bit is 0, this is the last byte +in sequence. So first byte contains 7 least significant bits of integer +and continuation flag. Second byte, if present, contains next 7 bits +and so on.

+

Currently RAR format uses vint to store up to 64 bit integers, +resulting in 10 bytes maximum. This value may be increased in the future +if necessary for some reason.

+

Sometimes RAR needs to pre-allocate space for vint before knowing +its exact value. In such situation it can allocate more space than really +necessary and then fill several leading bytes with 0x80 hexadecimal, +which means 0 with continuation flag set.

+ +

byte, uint16, uint32, uint64

+ +

Byte, 16-, 32-, 64- bit unsigned integer in little endian format.

+ +

Variable length data

+ +

We use ellipsis ... to denote variable length data areas. + +

Hexadecimal values

+

We use 0x prefix to define hexadecimal values, such as 0xf000

+ + + +

General archive structure

+ +

General archive block format

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
FieldSizeDescription
Header CRC32uint32CRC32 of header data starting from Header size field +and up to and including the optional extra area.
Header sizevintSize of header data starting from Header type field +and up to and including the optional extra area. +This field must not be longer than 3 bytes in current implementation, +resulting in 2 MB maximum header size. +
Header typevintType of archive header. Possible values are: +
  1   Main archive header. +
  2   File header. +
  3   Service header. +
  4   Archive encryption header. +
  5   End of archive header. +
Header flagsvintFlags common for all headers: +
  0x0001   Extra area is present in the end of header. +
  0x0002   Data area is present in the end of header. +
  0x0004   Blocks with unknown type and this flag must be skipped when updating an archive. +
  0x0008   Data area is continuing from previous volume. +
  0x0010   Data area is continuing in next volume. +
  0x0020   Block depends on preceding file block. +
  0x0040   Preserve a child block if host block is modified. +
Extra area sizevintSize of extra area. Optional field, present only if 0x0001 header flag +is set.
Data sizevintSize of data area. Optional field, present only if 0x0002 header flag +is set.
......Fields specific for current block type. See concrete block types +description for details.
Extra area...Optional area containing additional header fields, present only +if 0x0001 header flag is set.
Data areavintOptional data area, present only if 0x0002 header flag is set. +Used to store large data amounts, such as compressed file data. +Not counted in Header CRC and Header size fields.
+ +

General extra area format

+ +

Extra area can include one or more records having the following format:

+ + + + + + + + + + + + + + + + + + + + + +
SizevintSize of record data starting from Type. +
TypevintRecord type. Different archive blocks have different associated extra area +record types. Read the concrete archive block description for details. +New record types can be added in the future, so unknown record types +need to be skipped without interrupting an operation. +
Data...Record dependent data. May be missing if record consists only from +size and type. +
+ + +

General archive layout

+ +
    Self-extracting module (optional)
+    RAR 5.0 signature
+    Archive encryption header (optional)
+    Main archive header
+    Archive comment service header (optional)
+
+    File header 1
+    Service headers (NTFS ACL, streams, etc.) for preceding file (optional).
+    ...
+    File header N
+    Service headers (NTFS ACL, streams, etc.) for preceding file (optional).
+
+    Recovery record (optional).
+    End of archive header.
+
+ +

Archive blocks

+ +

Self-extracting module (SFX)

+ +

Any data preceding the archive signature. Self-extracting module size +and contents is not defined. At the moment of writing this documentation +RAR assumes the maximum SFX module size to not exceed 1 MB, but this value +can be increased in the future.

+ +

RAR 5.0 signature

+ +

+RAR 5.0 signature consists of 8 bytes: +0x52 0x61 0x72 0x21 0x1A 0x07 0x01 0x00. +You need to search for this signature in supposed archive from beginning +and up to maximum SFX module size. +Just for comparison this is RAR 4.x 7 byte length signature: +0x52 0x61 0x72 0x21 0x1A 0x07 0x00. +

+ + +

Archive encryption header

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Header CRC32uint32
Header sizevint
Header typevint4
Header flagsvintFlags common for all headers
Encryption versionvint +Version of encryption algorithm. Now only 0 version (AES-256) +is supported.
Encryption flagsvint + 0x0001   Password check data is present. +
KDF count1 byte +Binary logarithm of iteration number for PBKDF2 function. RAR can refuse +to process KDF count exceeding some threshold. Concrete value of threshold +is a version dependent. +
Salt16 bytes +Salt value used globally for all encrypted archive headers. +
Check value12 bytes +Value used to verify the password validity. Present only if 0x0001 +encryption flag is set. First 8 bytes are calculated using additional +PBKDF2 rounds, 4 last bytes is the additional checksum. Together with +the standard header CRC32 we have 64 bit checksum to reliably verify +this field integrity and distinguish invalid password and damaged data. +Further details can be found in UnRAR source code. +
+ +

This header is present only in archives with encrypted headers. +Every next header after this one is started from 16 byte AES-256 +initialization vector followed by encrypted header data. Size of encrypted +header data block is aligned to 16 byte boundary.

+ +

Main archive header

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Header CRC32uint32
Header sizevint
Header typevint1
Header flagsvintFlags common for all headers
Extra area sizevintSize of extra area. Optional field, present only if 0x0001 header flag +is set.
Archive flagsvint +   0x0001   Volume. Archive is a part of multivolume set.
+   0x0002   Volume number field is present. This flag is present + in all volumes except first.
+   0x0004   Solid archive.
+   0x0008   Recovery record is present.
+   0x0010   Locked archive. +
Volume numbervintOptional field, present only if 0x0002 archive flag is set. +Not present for first volume, 1 for second volume, 2 for third and so on.
Extra area...Optional area containing additional header fields, present only +if 0x0001 header flag is set.
+ +

Extra area of main archive header can contain following record +types

+ + + + + + + +
TypeNameDescription
0x01LocatorContains positions of different service blocks, so they can be accessed +quickly, without scanning the entire archive. This record is optional. +If it is missing, it is still necessary to scan the entire archive to +verify presence of service blocks.
+ +

Locator record

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Sizevint
Typevint1
Flagsvint +   0x0001   Quick open record offset is present.
+   0x0002   Recovery record offset is present.
+
Quick open offsetvintDistance from beginning of quick open service block to beginning +of main archive header. Present only if 0x0001 flag is set. +If equal to 0, should be ignored. It can be set to zero if preallocated +space was not enough to store resulting offset. +
Recovery record offsetvintDistance from beginning of recovery record service block to beginning +of main archive header. Present only if 0x0002 flag is set. +If equal to 0, should be ignored. It can be set to zero if preallocated +space was not enough to store resulting offset. +
+ + +

File header and service header

+ +

These two header types use the similar data structure, so we describe them +both here.

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Header CRC32uint32
Header sizevint
Header typevint2 for file header, 3 for service header
Header flagsvintFlags common for all headers
Extra area sizevintSize of extra area. Optional field, present only if 0x0001 header flag +is set.
Data sizevintSize of data area. Optional field, present only if 0x0002 header flag +is set. For file header this field contains the packed file size.
File flagsvintFlags specific for these header types:
+   0x0001   Directory file system object (file header only).
+   0x0002   Time field in Unix format is present.
+   0x0004   CRC32 field is present.
+   0x0008   Unpacked size is unknown.
+ +

If flag 0x0008 is set, unpacked size field is still present, + but must be ignored and extraction must be performed until reaching + the end of compression stream. This flag can be set if actual file size + is larger than reported by OS or if file size is unknown such as + for all volumes except last when archiving from stdin to multivolume + archive.

+
Unpacked sizevintUnpacked file or service data size.
AttributesvintOperating system specific file attributes in case of file header. +Might be either used for data specific needs or just reserved and set to 0 +for service header.
mtimeuint32File modification time in Unix time format. +Optional, present if 0x0002 file flag is set.
Data CRC32uint32CRC32 of unpacked file or service data. For files split between volumes +it contains CRC32 of file packed data contained in current volume +for all file parts except the last. +Optional, present if 0x0004 file flag is set.
Compression informationvint +

Lower 6 bits (0x003f mask) contain the version of compression algorithm, +resulting in possible 0 - 63 values. Current version is 0.

+

7th bit (0x0040) defines the solid flag. If it is set, RAR continues +to use the compression dictionary left after processing preceding files. +It can be set only for file headers and is never set for service headers.

+

Bits 8 - 10 (0x0380 mask) define the compression method. Currently only +values 0 - 5 are used. 0 means no compression.

+

Bits 11 - 14 (0x3c00) define the minimum size of dictionary size required +to extract data. Value 0 means 128 KB, 1 - 256 KB, ..., 14 - 2048 MB, +15 - 4096 MB.

+
Host OSvintType of operating system used to create the archive.
+   0x0000   Windows.
+   0x0001   Unix.
+
Name lengthvintFile or service header name length.
Name? bytes +

Variable length field containing Name length bytes in UTF-8 +format without trailing zero.

+

For file header this is a name of archived file. Forward slash character +is used as the path separator both for Unix and Windows names. Backslashes +are treated as a part of name for Unix names and as invalid character +for Windows file names. Type of name is defined by Host OS field.

+

If Unix file name contains any high ASCII characters which cannot be +correctly converted to Unicode and UTF-8, we map such characters to +to 0xE080 - 0xE0FF private use Unicode area and insert 0xFFFE Unicode +non-character to resulting string to indicate that it contains mapped +characters, which need to be converted back when extracting. Concrete +position of 0xFFFE is not defined, we need to search the entire string +for it. Such mapped names are not portable and can be correctly unpacked +only on the same system where they were created.

+

For service header this field contains a name of service header. +Now the following names are used: + + + + + + +
CMTArchive comment
QOArchive quick open data
ACLNTFS file permissions
STMNTFS alternate data stream
RRRecovery record
+ +

+
Extra area...Optional area containing additional header fields, present only +if 0x0001 header flag is set.
Data areavintOptional data area, present only if 0x0002 header flag is set. +Store file data in case of file header or service data for service header. +Depending on the compression method value in Compression information +can be either uncompressed (compression method 0) or compressed. +
+ + +

File and service headers use the same types of extra area +records:

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
TypeNameDescription
0x01File encryptionFile encryption information.
0x02File hashFile data hash.
0x03File timeHigh precision file time.
0x04File versionFile version number.
0x05RedirectionFile system redirection.
0x06Unix ownerUnix owner and group information.
0x07Service dataService header data array.
+ + +

File encryption record

+ +

This record is present if file data is encrypted.

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Sizevint
Typevint0x01
Versionvint +Version of encryption algorithm. Now only 0 version (AES-256) +is supported.
Flagsvint +   0x0001   Password check data is present.
+   0x0002   Use tweaked checksums instead of plain checksums.
+ +

If flag 0x0002 is present, RAR transforms the checksum preserving + file or service data integrity, so it becomes dependent on encryption key. + It makes guessing file contents based on checksum impossible. + It affects both data CRC32 in file header and checksums + in file hash record in extra area.

+
KDF count1 byte +Binary logarithm of iteration number for PBKDF2 function. RAR can refuse +to process KDF count exceeding some threshold. Concrete value of threshold +is version dependent. +
Salt16 bytes +Salt value to set the decryption key for encrypted file. +
IV16 bytes +AES-256 initialization vector. +
Check value12 bytes +Value used to verify the password validity. Present only if 0x0001 +encryption flag is set. First 8 bytes are calculated using additional +PBKDF2 rounds, 4 last bytes is the additional checksum. Together with +the standard header CRC32 we have 64 bit checksum to reliably verify +this field integrity and distinguish invalid password and damaged data. +Further details can be found in UnRAR source code. +
+ + +

File hash record

+ +

Only the standard CRC32 checksum can be stored directly in file header. +If other hash is used, it is stored in this extra area record:

+ + + + + + + + + + + + + + + + + + + + + + + + + + + +
Sizevint
Typevint0x02
Hash typevint + 0x00   BLAKE2sp hash function.
+
Hash data? bytes + 32 bytes of BLAKE2sp for 0x00 hash type. +
+ +

For files split between volumes it contains a hash of file packed +data contained in current volume for all file parts except the last. +For files not split between volumes and for last parts of split files +it contains an unpacked data hash. +

+ + +

File time record

+ +

This record is used if it is necessary to store creation and last access +time or if 1 second precision of Unix mtime stored in file header is not +enough:

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Sizevint
Typevint0x03
Flagsvint + 0x0001   Time is stored in Unix time format if this flags is set + and in Windows FILETIME format otherwise
+ 0x0002   Modification time is present
+ 0x0004   Creation time is present
+ 0x0008   Last access time is present
+
mtimeuint32 or uint64Modification time. Present if 0x0002 flag is set. Depending on 0x0001 +flag can be in Unix time or Windows FILETIME format.
ctimeuint32 or uint64Creation time. Present if 0x0004 flag is set. Depending on 0x0001 +flag can be in Unix time or Windows FILETIME format.
ctimeuint32 or uint64Last access time. Present if 0x0008 flag is set. Depending on 0x0001 +flag can be in Unix time or Windows FILETIME format.
+ + +

File version record

+ +

This record is used in archives created with -ver switch.

+ + + + + + + + + + + + + + + + + + + + + + + + + + + +
Sizevint
Typevint0x04
Flagsvint + No file version flags are defined yet, so it is set to 0. +
Version numbervintFile version number.
+ + +

File system redirection record

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Sizevint
Typevint0x05
Redirection typevint + 0x0001   Unix symlink
+ 0x0002   Windows symlink
+ 0x0003   Windows junction
+ 0x0004   Hard link
+ 0x0005   File copy
+
Flagsvint + 0x0001   Link target is directory
+
Name lengthvintLength of link target name
NamevintLink target name in UTF-8 format without trailing zero
+ + +

Unix owner record

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Sizevint
Typevint0x06
Flagsvint + 0x0001   User name string is present
+ 0x0002   Group name string is present
+ 0x0004   Numeric user ID is present
+ 0x0008   Numeric group ID is present
+
User name lengthvintLength of owner user name. Present if 0x0001 flag is set.
User name? bytesOwner user name in native encoding. Not zero terminated. +Present if 0x0001 flag is set.
Group name lengthvintLength of owner group name. Present if 0x0002 flag is set.
Group name? bytesOwner group name in native encoding. Not zero terminated. +Present if 0x0002 flag is set.
User IDvintNumeric owner user ID. Present if 0x0004 flag is set.
Group IDvintNumeric owner group ID. Present if 0x0008 flag is set.
+ + +

Service data record

+ +

This record is used only by service headers to store additional +parameters.

+ + + + + + + + + + + + + + + + + + + + + +
Sizevint
Typevint0x07
Data? bytesConcrete contents of service data depends on service header type.
+ +

End of archive header

+ +

End of archive marker. RAR does not read anything after this header +letting to use third party tools to add extra information such as +a digital signature to archive.

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Header CRC32uint32
Header sizevint
Header typevint5
Header flagsvintFlags common for all headers
End of archive flagsvint +0x0001   Archive is volume and it is not last volume in the set
+
+ +

Service headers

+ +

RAR uses service headers based on the file header +data structure to store different supplementary information.

+ +

Archive comment header

+ +

Optional header storing the main archive comment. Contains CMT identifier +in file name field. Placed before any file headers and after the main +archive header. Comment data is stored in UTF-8 immediately after +the archive comment header. Now RAR does not use compression for archive +comments, so packed and unpacked data sizes in header are equal and they +both define the comment data size. Compression method in header is set +to 0.

+ +

Quick open header

+ +

Optional header storing the quick open record. Contains QO identifier +in file name field. Placed after all file headers, but before the recovery +record and end of archive header. It is possible to locate the quick open +header with locator record in main archive header. + +

Quick open record data is stored immediately after the quick open header. +RAR does not use compression for quick open data, so packed and unpacked +data sizes in header are equal and they both define the quick open data size. +Compression method in header is set to 0.

+ +

Quick open data is the array consisting of data cache structures. +Every data cache structure stores a portion of archived data +and has the following format:

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
FieldSizeDescription
Structure CRC32uint32CRC32 of structure data starting from Structure size field.
Structure sizevintSize of structure data starting from Flags field. +This field must not be longer than 3 bytes in current implementation, +resulting in 2 MB maximum size. +
FlagsvintCurrently set to 0.
OffsetvintOffset from beginning of quick open header to beginning of archive data +cached in current structure. We can use this value to calculate +the absolute position of archived data stored the current structure. +It is guaranteed that absolute archive positions referred by data cache +structures are always growing when going from beginning of structure array +to end.
Data sizevintSize of archive data stored in the current structure.
Data? bytesArchive data stored in the current structure.
+ +

Normally RAR uses the quick open data to store copies of file and service +headers. It can store either all headers or only a part of them. If required +header is missing in quick open data or if structure CRC32 is invalid, +data are read from its original archive position.

+ +

Using the quick open data is optional. You can skip it completely +and read only standard archive headers. But it is important to use the same +access pattern when reading file names to display them to user and +to extract files. Otherwise it would be possible to see one file name +and extract another in case the quick open data and real archive data +are intentionally created different. It could introduce a security threat. +So if you use the quick open data when displaying the archive contents, +use it when extracting. If you do not use it when displaying +the archive contents, do not use it when extracting.

+ + + + \ No newline at end of file