mirror of
https://github.com/aaru-dps/docs.git
synced 2025-12-16 19:24:38 +00:00
Added information about commodore formats
This commit is contained in:
119
Commodore/LHA.TXT
Normal file
119
Commodore/LHA.TXT
Normal file
@@ -0,0 +1,119 @@
|
||||
|
||||
*** LHA, LZH, LZS (LHArc compressed files)
|
||||
*** Document revision: 1.3
|
||||
*** Last updated: March 11, 2004
|
||||
*** Compiler/Editor: Peter Schepers
|
||||
*** Contributors/samples: Joe Forster/STA, net documents
|
||||
|
||||
These files are created with LHA on the C64 (or C128), and can present
|
||||
special problems to the typical PC user. The compression used is LH1, an
|
||||
old method used on LZH 1.xx (pre-version 2), so any version of LHA on the
|
||||
PC can uncompress them. However, LHA allows filenames of up to 18
|
||||
characters long, and DOS doesn't know how to handle them (Windows 95 unLHA
|
||||
utilities will extract the full filename). Usually, some of the files
|
||||
already uncompressed will be overwritten by other files just being
|
||||
uncompressed because the name seems the same to DOS. To LHA however, the
|
||||
filenames are quite different.
|
||||
|
||||
LHA archives always have a string two bytes into the file ("-L??-") which
|
||||
describe the type of compression used. Over the development life of LHA
|
||||
there have been several different compression algorithms used. The "??" in
|
||||
the "-L??-" can be one of several possibilites, but on the C64 it is likely
|
||||
limited to "H0" (no compression) and "H1". Newer versions of LHA/LZH use
|
||||
other combinations like "H2", "H3", "H4", "H5", "ZS", "Z5", and "Z4". The
|
||||
letters typically used in the compression string come from a combination of
|
||||
the creators initials of the LZ algorithm, Lempel/Ziv, and the author of
|
||||
the LHA program, Haruyasu Yoshizaki.
|
||||
|
||||
The following is a sample of an LHA header. Note the string to search for
|
||||
at byte $0002:
|
||||
|
||||
00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F ASCII
|
||||
----------------------------------------------- ----------------
|
||||
0000: 24 93 2D 6C 68 31 2D 39 02 00 00 16 04 00 00 00 ..-lh1-.........
|
||||
0010: 08 4C 14 00 00 0E 73 79 73 2E 48 6F 75 73 65 20 ................
|
||||
0020: 4D 34 00 53 DE 06 11 1C 12 C4 C8 FA 3A 5B DC CE ................
|
||||
0030: B2 FA 38 1E 46 B0 B6 9E 9B 75 7A 49 71 72 B3 53 ................
|
||||
0040: 6E 4E B4 A0 BF 5E 95 B3 05 8A 75 D5 6C E3 03 4A ................
|
||||
0050: 2C 54 F4 AF 05 18 59 E2 F4 34 4A 0A 28 D4 33 E2 ................
|
||||
0060: C4 9D 04 D7 C7 8B 91 66 0E E5 DE 98 3C 92 CC B5 ................
|
||||
|
||||
The header layout is fairly basic. The header for each file starts *two*
|
||||
bytes before the "-lh?-" string. The above example has already been trimmed
|
||||
down to start at these two bytes. Each header has the same layout, only the
|
||||
length varies due to the length of the filename. Here is a breakdown of the
|
||||
above example.
|
||||
|
||||
Bytes: $0000: 24 - Length of header (known as "LEN", not including this
|
||||
and the next byte). If it is zero, we are at the end
|
||||
of the file.
|
||||
0001: 93 - Header checksum
|
||||
0002: 2D 6C 68 31 2D - LHA compression type "-LH1-"
|
||||
0007: 39 02 00 00 - Compressed file size ($00000239)
|
||||
000B: 16 04 00 00 - Uncompressed file size ($00000416)
|
||||
000F: 00 08 4C 14 - Time/date stamp
|
||||
0013: 00 - File attribute
|
||||
0014: 00 - Header level
|
||||
00 = non-extended header
|
||||
01, 02 = extended header
|
||||
0015: 0E - Length of the following filename
|
||||
0016: 73 79 73 2E 48 6F 75 - Filename, with a zero and filetype
|
||||
73 65 20 4D 34 00 53 appended ("SYS.HOUSE M4<4D>S"). The
|
||||
name can be up to 18 characters in
|
||||
length. Note the length *includes*
|
||||
the zero and filetype, making the
|
||||
actual filename length 2 bytes
|
||||
shorter.
|
||||
0024: DE 06 - File data checksum (starts at LEN)
|
||||
0026: 11 1C 12 C4 C8 FA... - File data (starts at LEN+2)
|
||||
|
||||
The header checksum at byte $0001 is calculated by adding the bytes in
|
||||
the header from $0002 (LHA compression type) to LEN+1 (File data checksum),
|
||||
without carry.
|
||||
|
||||
The time/date stamp (bytes $000F-$0012), is broken down as follows:
|
||||
|
||||
Bytes:$000F-0010: Time of last modification:
|
||||
BITS 0- 4: Seconds divided by 2
|
||||
(0-58, only even numbers)
|
||||
BITS 5-10: Minutes (0-59)
|
||||
BITS 11-15: Hours (0-23, no AM or PM)
|
||||
Bytes:$0011-0012: Date of last modification:
|
||||
BITS 0- 4: Day (1-31)
|
||||
BITS 5- 9: Month (1-12)
|
||||
BITS 10-15: Year minus 1980
|
||||
|
||||
The format of the compressed data is much too complex to get into here.
|
||||
Understanding the layout would require knowledge of Huffman coding and
|
||||
sliding dictionaries, and is nowhere near as simple as ZipCode! The
|
||||
description given in the LHA source code for the different compression
|
||||
modes are as follows:
|
||||
|
||||
-lh0- no compression, file stored
|
||||
-lh1- 4k sliding dictionary (max 60 bytes) + dynamic Huffman + fixed
|
||||
encoding of position
|
||||
-lh2- 8k sliding dictionary (max 256 bytes) + dynamic Huffman
|
||||
-lh3- 8k sliding dictionary (max 256 bytes) + static Huffman
|
||||
-lh4- 4k sliding dictionary (max 256 bytes) + static Huffman +
|
||||
improved encoding of position and trees
|
||||
-lh5- 8k sliding dictionary (max 256 bytes) + static Huffman +
|
||||
improved encoding of position and trees
|
||||
-lzs- 2k sliding dictionary (max 17 bytes)
|
||||
-lz4- no compression, file stored
|
||||
-lz5- 4k sliding dictionary (max 17 bytes)
|
||||
|
||||
There are several utilities that you can use to decompress these files,
|
||||
like the already-mentioned LHA on the PC, or Star LHA, one of the many
|
||||
excellent utilities contained in the Star Commander distribution package.
|
||||
If you use Star LHA, keep in mind it needs the program LHA v2.14 (or newer)
|
||||
to extract. If an older version of LHA is used (such as the common version
|
||||
2.13), then the files being extracted will be corrupt. It will extract the
|
||||
files directly into a D64 image, so the long C64 filenames will not be
|
||||
lost.
|
||||
|
||||
To an emulator user there is no use to these files, as their only real
|
||||
usage on a C64 was for storage and transmission benefits. The standard
|
||||
compression program on the PC is PKZIP (or ZIP compatibles), so unless you
|
||||
have some need to send *compressed* files back the C64, there is no use in
|
||||
using LHA.
|
||||
|
||||
Reference in New Issue
Block a user