mirror of
https://github.com/aaru-dps/docs.git
synced 2025-12-16 11:14:37 +00:00
120 lines
6.6 KiB
Plaintext
120 lines
6.6 KiB
Plaintext
|
||
*** LHA, LZH, LZS (LHArc compressed files)
|
||
*** Document revision: 1.3
|
||
*** Last updated: March 11, 2004
|
||
*** Compiler/Editor: Peter Schepers
|
||
*** Contributors/samples: Joe Forster/STA, net documents
|
||
|
||
These files are created with LHA on the C64 (or C128), and can present
|
||
special problems to the typical PC user. The compression used is LH1, an
|
||
old method used on LZH 1.xx (pre-version 2), so any version of LHA on the
|
||
PC can uncompress them. However, LHA allows filenames of up to 18
|
||
characters long, and DOS doesn't know how to handle them (Windows 95 unLHA
|
||
utilities will extract the full filename). Usually, some of the files
|
||
already uncompressed will be overwritten by other files just being
|
||
uncompressed because the name seems the same to DOS. To LHA however, the
|
||
filenames are quite different.
|
||
|
||
LHA archives always have a string two bytes into the file ("-L??-") which
|
||
describe the type of compression used. Over the development life of LHA
|
||
there have been several different compression algorithms used. The "??" in
|
||
the "-L??-" can be one of several possibilites, but on the C64 it is likely
|
||
limited to "H0" (no compression) and "H1". Newer versions of LHA/LZH use
|
||
other combinations like "H2", "H3", "H4", "H5", "ZS", "Z5", and "Z4". The
|
||
letters typically used in the compression string come from a combination of
|
||
the creators initials of the LZ algorithm, Lempel/Ziv, and the author of
|
||
the LHA program, Haruyasu Yoshizaki.
|
||
|
||
The following is a sample of an LHA header. Note the string to search for
|
||
at byte $0002:
|
||
|
||
00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F ASCII
|
||
----------------------------------------------- ----------------
|
||
0000: 24 93 2D 6C 68 31 2D 39 02 00 00 16 04 00 00 00 ..-lh1-.........
|
||
0010: 08 4C 14 00 00 0E 73 79 73 2E 48 6F 75 73 65 20 ................
|
||
0020: 4D 34 00 53 DE 06 11 1C 12 C4 C8 FA 3A 5B DC CE ................
|
||
0030: B2 FA 38 1E 46 B0 B6 9E 9B 75 7A 49 71 72 B3 53 ................
|
||
0040: 6E 4E B4 A0 BF 5E 95 B3 05 8A 75 D5 6C E3 03 4A ................
|
||
0050: 2C 54 F4 AF 05 18 59 E2 F4 34 4A 0A 28 D4 33 E2 ................
|
||
0060: C4 9D 04 D7 C7 8B 91 66 0E E5 DE 98 3C 92 CC B5 ................
|
||
|
||
The header layout is fairly basic. The header for each file starts *two*
|
||
bytes before the "-lh?-" string. The above example has already been trimmed
|
||
down to start at these two bytes. Each header has the same layout, only the
|
||
length varies due to the length of the filename. Here is a breakdown of the
|
||
above example.
|
||
|
||
Bytes: $0000: 24 - Length of header (known as "LEN", not including this
|
||
and the next byte). If it is zero, we are at the end
|
||
of the file.
|
||
0001: 93 - Header checksum
|
||
0002: 2D 6C 68 31 2D - LHA compression type "-LH1-"
|
||
0007: 39 02 00 00 - Compressed file size ($00000239)
|
||
000B: 16 04 00 00 - Uncompressed file size ($00000416)
|
||
000F: 00 08 4C 14 - Time/date stamp
|
||
0013: 00 - File attribute
|
||
0014: 00 - Header level
|
||
00 = non-extended header
|
||
01, 02 = extended header
|
||
0015: 0E - Length of the following filename
|
||
0016: 73 79 73 2E 48 6F 75 - Filename, with a zero and filetype
|
||
73 65 20 4D 34 00 53 appended ("SYS.HOUSE M4<4D>S"). The
|
||
name can be up to 18 characters in
|
||
length. Note the length *includes*
|
||
the zero and filetype, making the
|
||
actual filename length 2 bytes
|
||
shorter.
|
||
0024: DE 06 - File data checksum (starts at LEN)
|
||
0026: 11 1C 12 C4 C8 FA... - File data (starts at LEN+2)
|
||
|
||
The header checksum at byte $0001 is calculated by adding the bytes in
|
||
the header from $0002 (LHA compression type) to LEN+1 (File data checksum),
|
||
without carry.
|
||
|
||
The time/date stamp (bytes $000F-$0012), is broken down as follows:
|
||
|
||
Bytes:$000F-0010: Time of last modification:
|
||
BITS 0- 4: Seconds divided by 2
|
||
(0-58, only even numbers)
|
||
BITS 5-10: Minutes (0-59)
|
||
BITS 11-15: Hours (0-23, no AM or PM)
|
||
Bytes:$0011-0012: Date of last modification:
|
||
BITS 0- 4: Day (1-31)
|
||
BITS 5- 9: Month (1-12)
|
||
BITS 10-15: Year minus 1980
|
||
|
||
The format of the compressed data is much too complex to get into here.
|
||
Understanding the layout would require knowledge of Huffman coding and
|
||
sliding dictionaries, and is nowhere near as simple as ZipCode! The
|
||
description given in the LHA source code for the different compression
|
||
modes are as follows:
|
||
|
||
-lh0- no compression, file stored
|
||
-lh1- 4k sliding dictionary (max 60 bytes) + dynamic Huffman + fixed
|
||
encoding of position
|
||
-lh2- 8k sliding dictionary (max 256 bytes) + dynamic Huffman
|
||
-lh3- 8k sliding dictionary (max 256 bytes) + static Huffman
|
||
-lh4- 4k sliding dictionary (max 256 bytes) + static Huffman +
|
||
improved encoding of position and trees
|
||
-lh5- 8k sliding dictionary (max 256 bytes) + static Huffman +
|
||
improved encoding of position and trees
|
||
-lzs- 2k sliding dictionary (max 17 bytes)
|
||
-lz4- no compression, file stored
|
||
-lz5- 4k sliding dictionary (max 17 bytes)
|
||
|
||
There are several utilities that you can use to decompress these files,
|
||
like the already-mentioned LHA on the PC, or Star LHA, one of the many
|
||
excellent utilities contained in the Star Commander distribution package.
|
||
If you use Star LHA, keep in mind it needs the program LHA v2.14 (or newer)
|
||
to extract. If an older version of LHA is used (such as the common version
|
||
2.13), then the files being extracted will be corrupt. It will extract the
|
||
files directly into a D64 image, so the long C64 filenames will not be
|
||
lost.
|
||
|
||
To an emulator user there is no use to these files, as their only real
|
||
usage on a C64 was for storage and transmission benefits. The standard
|
||
compression program on the PC is PKZIP (or ZIP compatibles), so unless you
|
||
have some need to send *compressed* files back the C64, there is no use in
|
||
using LHA.
|
||
|