Difference between revisions of "LZ77"
(Local code) |
|||
(6 intermediate revisions by 3 users not shown) | |||
Line 1: | Line 1: | ||
− | Many files on the Wii are compressed using the '''LZ77''' compression algorithm. This | + | Many files on the Wii are compressed using the '''LZ77''' compression algorithm. This page has documentation on how the algorithm works, and example code. |
+ | |||
+ | ==LZ77 Algorithm== | ||
+ | ===Header=== | ||
+ | In LZ77-compressed Wii files, there is first a simple 8-byte header: | ||
+ | {| class="wikitable" | ||
+ | |- | ||
+ | ! '''Offset''' | ||
+ | ! '''Size''' | ||
+ | ! '''Description''' | ||
+ | |- | ||
+ | | 0 bytes | ||
+ | | 4 bytes | ||
+ | | Magic number - always 0x4C5A3737 ("LZ77" in ASCII) | ||
+ | |- | ||
+ | | 4 bytes | ||
+ | | 4 bytes | ||
+ | | Data about the compression - the first three bytes are the size of the uncompressed data and fourth is the compression method, which should always be 0x10 - note this value is stored little endian. | ||
+ | |} | ||
+ | |||
+ | ===Data=== | ||
+ | Immediately after the header comes the actual compressed data. The algorithm is very simple: the data is divided in chunks, and each chunk starts with a flag byte. The flag byte indicates whether an entry in the chunk is literal data or a reference to earlier in the data. A chunk has 8 entries, and thus each of the bits in the flag byte gives this indication: if it's 0 the entry is a single byte of literal data. If it's 1 the entry is a reference to earlier in the data. For example, if the flag byte of a chunk is 0xA7 (10100111 in binary), the first entry is a reference, the second a byte of literal data, the third another reference, and so on. Note that an entry of literal data is always a single byte, whereas a reference is two bytes long. | ||
+ | |||
+ | ===References=== | ||
+ | A reference is a very simple two-byte structure: | ||
+ | {| class="wikitable" | ||
+ | |- | ||
+ | ! '''Offset''' | ||
+ | ! '''Size''' | ||
+ | ! '''Description''' | ||
+ | |- | ||
+ | | 0 bytes | ||
+ | | 4 bits | ||
+ | | A single hexadecimal digit - the length of the data referred to by the reference subtract 3 | ||
+ | |- | ||
+ | | 4 bits | ||
+ | | 12 bits | ||
+ | | Use to calculate offset within uncompressed data referred to by the reference. | ||
+ | |} | ||
+ | |||
+ | Offset is calculated by the current "place" you're writing to in the uncompressed data subtract the value in the reference subtract 1. For example, if your current "place" is 0x9EB3 a reference 0x24AD would refer to 2 + 3 = '''5''' bytes of data at offset 0x9EB3 - 0x4AD - 1 = '''0x9A05'''. | ||
+ | |||
+ | ==Example Code== | ||
+ | This python code by Marcan will decompress a LZ77-compressed file: | ||
+ | |||
<source lang="python"> | <source lang="python"> | ||
import sys, struct | import sys, struct | ||
Line 60: | Line 104: | ||
f2.close() | f2.close() | ||
</source> | </source> | ||
+ | |||
+ | ==Tools== | ||
+ | [https://github.com/JGShew/LZ77-Decompressor LZ77 Decompressor] by JGSHEW can be used to decompress LZ77-compressed files. | ||
+ | |||
[[Category:Local code]] | [[Category:Local code]] | ||
+ | [[Category:File formats]] |
Revision as of 19:39, 16 May 2020
Many files on the Wii are compressed using the LZ77 compression algorithm. This page has documentation on how the algorithm works, and example code.
LZ77 Algorithm
Header
In LZ77-compressed Wii files, there is first a simple 8-byte header:
Offset | Size | Description |
---|---|---|
0 bytes | 4 bytes | Magic number - always 0x4C5A3737 ("LZ77" in ASCII) |
4 bytes | 4 bytes | Data about the compression - the first three bytes are the size of the uncompressed data and fourth is the compression method, which should always be 0x10 - note this value is stored little endian. |
Data
Immediately after the header comes the actual compressed data. The algorithm is very simple: the data is divided in chunks, and each chunk starts with a flag byte. The flag byte indicates whether an entry in the chunk is literal data or a reference to earlier in the data. A chunk has 8 entries, and thus each of the bits in the flag byte gives this indication: if it's 0 the entry is a single byte of literal data. If it's 1 the entry is a reference to earlier in the data. For example, if the flag byte of a chunk is 0xA7 (10100111 in binary), the first entry is a reference, the second a byte of literal data, the third another reference, and so on. Note that an entry of literal data is always a single byte, whereas a reference is two bytes long.
References
A reference is a very simple two-byte structure:
Offset | Size | Description |
---|---|---|
0 bytes | 4 bits | A single hexadecimal digit - the length of the data referred to by the reference subtract 3 |
4 bits | 12 bits | Use to calculate offset within uncompressed data referred to by the reference. |
Offset is calculated by the current "place" you're writing to in the uncompressed data subtract the value in the reference subtract 1. For example, if your current "place" is 0x9EB3 a reference 0x24AD would refer to 2 + 3 = 5 bytes of data at offset 0x9EB3 - 0x4AD - 1 = 0x9A05.
Example Code
This python code by Marcan will decompress a LZ77-compressed file:
import sys, struct
class WiiLZ77:
TYPE_LZ77 = 1
def __init__(self, file, offset):
self.file = file
self.offset = offset
self.file.seek(self.offset)
hdr = struct.unpack("<I",self.file.read(4))[0]
self.uncompressed_length = hdr>>8
self.compression_type = hdr>>4 & 0xF
if self.compression_type != self.TYPE_LZ77:
raise ValueError("Unsupported compression method %d"%self.compression_type)
def uncompress(self):
dout = ""
self.file.seek(self.offset + 0x4)
while len(dout) < self.uncompressed_length:
flags = struct.unpack("<B",self.file.read(1))[0]
for i in range(8):
if flags & 0x80:
info = struct.unpack(">H",self.file.read(2))[0]
num = 3 + ((info>>12)&0xF)
disp = info & 0xFFF
ptr = len(dout) - (info & 0xFFF) - 1
for i in range(num):
dout += dout[ptr]
ptr+=1
if len(dout) >= self.uncompressed_length:
break
else:
dout += self.file.read(1)
flags <<= 1
if len(dout) >= self.uncompressed_length:
break
self.data = dout
return self.data
f = open(sys.argv[1])
hdr = f.read(4)
if hdr != "LZ77":
f.seek(0)
unc = WiiLZ77(f, f.tell())
du = unc.uncompress()
f2 = open(sys.argv[2],"w")
f2.write(du)
f2.close()
Tools
LZ77 Decompressor by JGSHEW can be used to decompress LZ77-compressed files.