mirror of
https://github.com/aaru-dps/libaaruformat.git
synced 2025-12-16 19:24:40 +00:00
280 lines
9.2 KiB
Plaintext
280 lines
9.2 KiB
Plaintext
=== Deduplication Table (`DDT2`)
|
||
|
||
The deduplication table is a multi-level table of pointers to LBAs contained in the image.
|
||
It starts with the following header.
|
||
|
||
[source,c]
|
||
typedef struct DdtHeader2
|
||
{
|
||
uint32_t identifier; ///< Block identifier, must be BlockType::DeDuplicationTable2.
|
||
uint16_t type; ///< Data classification (\ref DataType) for sectors referenced by this table.
|
||
uint16_t compression; ///< Compression algorithm for this table body (\ref CompressionType).
|
||
uint8_t levels; ///< Total number of hierarchy levels (root depth); > 0.
|
||
uint8_t tableLevel; ///< Zero-based level index of this table (0 = root, increases downward).
|
||
uint64_t previousLevelOffset; ///< Absolute byte offset of the parent (previous) level table; 0 if root.
|
||
uint16_t negative; ///< Leading negative LBA count; added to external L to build internal index.
|
||
uint64_t blocks; ///< Total internal span (negative + usable + overflow) in logical sectors.
|
||
uint16_t overflow; ///< Trailing dumped sectors beyond user area (overflow range), still mapped with entries.
|
||
uint64_t
|
||
start; ///< Base internal index covered by this table (used for secondary tables; currently informational).
|
||
uint8_t blockAlignmentShift; ///< 2^blockAlignmentShift = block alignment boundary in bytes.
|
||
uint8_t dataShift; ///< 2^dataShift = sectors represented per increment in blockIndex field.
|
||
uint8_t tableShift; ///< 2^tableShift = number of logical sectors per primary entry (multi-level only; 0 for
|
||
///< single-level or secondary tables).
|
||
uint64_t entries; ///< Number of entries contained in (uncompressed) table payload.
|
||
uint64_t cmpLength; ///< Compressed payload size in bytes.
|
||
uint64_t length; ///< Uncompressed payload size in bytes.
|
||
uint64_t cmpCrc64; ///< CRC64-ECMA of compressed table payload.
|
||
uint64_t crc64; ///< CRC64-ECMA of uncompressed table payload.
|
||
} DdtHeader2;
|
||
|
||
==== Field Descriptions
|
||
|
||
[cols="2,2,2,6",options="header"]
|
||
|===
|
||
|Type
|
||
|Size
|
||
|Name
|
||
|Description
|
||
|
||
|uint32_t
|
||
|4 bytes
|
||
|identifier
|
||
|The deduplication table identifier, always `DDT2` or `DDTS`. The first level of a table is always `DDT2` and its presence is mandatory. Subtables will have `DDTS`
|
||
|
||
|uint16_t
|
||
|2 bytes
|
||
|type
|
||
|The data type pointed by this table. See Annex B.
|
||
|
||
|uint16_t
|
||
|2 bytes
|
||
|compression
|
||
|The compression algorithm used in the table. See Annex C.
|
||
|
||
|uint8_t
|
||
|1 byte
|
||
|levels
|
||
|How many levels of subtables are present. 1 means this is the only level.
|
||
|
||
|uint8_t
|
||
|1 byte
|
||
|tableLevel
|
||
|What level does this table correspond to
|
||
|
||
|uint64_t
|
||
|8 bytes
|
||
|previousLevel
|
||
|Pointer to absolute byte offset in the image file where the previous table level resides
|
||
|
||
|uint16_t
|
||
|2 bytes
|
||
|negative
|
||
|The negative displacement of LBA numbers. For media that can have negative LBAs, this establishes the number to substract to the table entry number
|
||
|
||
|uint64_t
|
||
|8 bytes
|
||
|blocks
|
||
|The number of blocks in the media. This includes all blocks in the media, including the ones represented by the negative displacement as well as overflow displacement.
|
||
|
||
|uint16_t
|
||
|2 bytes
|
||
|overflow
|
||
|The positive overflow displacement of LBA numbers. For media that can have blocks beyond the end of the user area, this is the number of how many such blocks there are in the image.
|
||
|
||
|uint64_t
|
||
|8 bytes
|
||
|start
|
||
|The first LBA contained in this table. It must be 0 for ‘DDT2’ blocks and can be other number for subtables ‘DDTS’
|
||
|
||
|uint8_t
|
||
|1 byte
|
||
|blockAlignmentShift
|
||
|Determines block alignment boundaries using the formula 1 << blockAlignmentShift.
|
||
|
||
|uint8_t
|
||
|1 byte
|
||
|dataShift
|
||
|Determines the maximum number of data items in a block using the formula 1 << dataShift.
|
||
|
||
|uint8_t
|
||
|1 byte
|
||
|tableShift
|
||
|Shift used to calculate the number of sectors in a deduplication table entry, using the formula 1 << tableShift.
|
||
|
||
|uint64_t
|
||
|8 bytes
|
||
|entries
|
||
|How many pointers follow this header.
|
||
|
||
|uint32_t
|
||
|4 bytes
|
||
|cmpLength
|
||
|The size in bytes of the compressed table that follows this header.
|
||
|
||
|uint32_t
|
||
|4 bytes
|
||
|length
|
||
|The size in bytes of the table block when decompressed.
|
||
|
||
|uint64_t
|
||
|8 bytes
|
||
|cmpCrc64
|
||
|The CRC64-ECMA checksum of the compressed table that follows this header.
|
||
|
||
|uint64_t
|
||
|8 bytes
|
||
|crc64
|
||
|The CRC64-ECMA checksum of the decompressed table.
|
||
|===
|
||
|
||
==== Interpretation of Deduplication Table Entries
|
||
|
||
Decoding deduplication tables may seem complex initially, but the logic is structured and manageable.
|
||
Three parameters are critical for interpreting deduplication table entries:
|
||
|
||
- *block_alignment_shift*
|
||
- *table_shift*
|
||
- *data_shift*
|
||
|
||
These parameters are stored in both the master header and each deduplication table header to support reliable decoding.
|
||
|
||
===== Block Alignment
|
||
|
||
Each block in the image is aligned to a boundary of `2 << block_alignment_shift`.
|
||
This alignment is essential for technical consistency and performance.
|
||
|
||
===== Table Shift
|
||
|
||
The `table_shift` parameter defines how many blocks (or sectors) are represented by each entry, based on the deduplication table level.
|
||
In multi-level tables, this value governs an exponential reduction in scope per level.
|
||
|
||
For example:
|
||
|
||
[cols="1,2",options="header"]
|
||
|===
|
||
| Level
|
||
| Sectors per Entry
|
||
|
||
| 1
|
||
| (2 << table_shift)^2 = 262144
|
||
|
||
| 2
|
||
| 2 << table_shift = 512
|
||
|
||
| 3
|
||
| 1
|
||
|===
|
||
|
||
Tables with more than two levels are rare, but implementations should be resilient enough to handle unexpected depths gracefully.
|
||
|
||
===== Entry Format Across Levels
|
||
|
||
In non-terminal levels (i.e., all except the last), each entry contains:
|
||
|
||
- Relevant metadata flags for its sector range
|
||
- An offset pointing to the next deduplication level
|
||
|
||
To obtain the byte offset in the image file, multiply this offset by `2 << block_alignment_shift`.
|
||
|
||
In the last level, the `data_shift` is applied as follows to determine the specific item within a data block:
|
||
|
||
.Example calculation
|
||
[source]
|
||
----
|
||
Given:
|
||
- Entry value = 0x35006
|
||
- data_shift = 5
|
||
- block_alignment_shift = 9
|
||
|
||
Step 1: Mask and shift
|
||
0x35006 >> 5 = 0x1A80
|
||
|
||
Step 2: Compute byte offset
|
||
0x1A80 * (2 << 9) = 0x6A0000
|
||
|
||
Step 3: Determine item index
|
||
0x35006 & 0x1F = 6
|
||
|
||
Result:
|
||
Sector is stored at byte offset 0x6A0000 as item number 6 in the data block.
|
||
----
|
||
|
||
===== Deduplication Table Status
|
||
|
||
Each deduplication table entry includes a status nibble that conveys the current condition or interpretation of the corresponding sector(s).
|
||
The following flags are defined:
|
||
|
||
[cols="2,1,6",options="header"]
|
||
|===
|
||
|Status
|
||
|Value
|
||
|Description
|
||
|
||
|NotDumped
|
||
|`0x0`
|
||
|Sector(s) have not yet been acquired during image dumping.
|
||
|
||
|Dumped
|
||
|`0x1`
|
||
|Sector(s) have been successfully dumped without error.
|
||
|
||
|Errored
|
||
|`0x2`
|
||
|Sector(s) encountered an error during dumping and may be incomplete or corrupt.
|
||
|
||
|Mode1Correct
|
||
|`0x3`
|
||
|Sector contains valid MODE 1 data with regenerable suffix or prefix. This status is applicable only to deduplication tables of type `CdSectorPrefix` or `CdSectorSuffix`.
|
||
|
||
|Mode2Form1Ok
|
||
|`0x4`
|
||
|Sector suffix is verified and regenerable, corresponding to MODE 2 Form 1. This status is valid only for tables of type `CdSectorSuffix`.
|
||
|
||
|Mode2Form2Ok
|
||
|`0x5`
|
||
|Sector suffix matches MODE 2 Form 2 format with a valid CRC. Valid only for `CdSectorSuffix`.
|
||
|
||
|Mode2Form2NoCrc
|
||
|`0x6`
|
||
|Sector suffix matches MODE 2 Form 2 format but contains an empty or missing CRC. Applicable only to `CdSectorSuffix`.
|
||
|
||
|Twin
|
||
|`0x7`
|
||
|Pointer references a twin sector table (see below).
|
||
|
||
|Unrecorded
|
||
|`0x8`
|
||
|Sector is physically unrecorded; repeated reads return non-deterministic or random data.
|
||
|
||
|Encrypted
|
||
|`0x9`
|
||
|Sector content is encrypted and stored in its original encrypted form within the image.
|
||
|
||
|Unencrypted
|
||
|`0xA`
|
||
|Sector content was originally encrypted on media but is stored decrypted in the image.
|
||
|===
|
||
|
||
[NOTE]
|
||
====
|
||
When status values are set in a deduplication table entry that references a subordinate level, the status applies collectively to all sectors represented by that sublevel—unless the specified status explicitly overrides or alters this behavior for individual sectors.
|
||
====
|
||
|
||
===== Negative and Overflow Sectors
|
||
|
||
In most storage media, the accessible range of blocks or sectors—referred to as the *user area*—represents the logical region intended for data read and write operations.
|
||
|
||
However, certain media types contain additional blocks situated outside this user area that are accessible through alternate means.
|
||
These blocks often hold metadata or structural information with significant preservation value.
|
||
To ensure such data is retained, these sectors must be representable within the deduplication table.
|
||
|
||
Blocks located before the start of the user area are classified as *negative sectors*.
|
||
Common examples include the first track pregap or Lead-In areas found on Compact Discs.
|
||
|
||
Conversely, sectors found beyond the end of the user area are categorized as *overflow sectors*.
|
||
Examples include replication metadata on floppy disks, typically recorded in track 81, and the Lead-Out area of Compact Discs.
|
||
|
||
To calculate the precise number of user area sectors represented in the deduplication table, the total number of blocks on the medium is adjusted by subtracting both negative and overflow sectors.
|
||
This ensures the deduplication map reflects only the standard user-accessible region while retaining awareness of displaced block data.
|