LZ77
Many files on the Wii are compressed using the LZ77 compression algorithm. This page has documentation on how the algorithm works, and example code.
LZ77 Algorithm
Header
In LZ77-compressed Wii files, there is first a simple 8-byte header:
Offset | Size | Description |
---|---|---|
0 bytes | 4 bytes | Magic number - always 0x4C5A3737 ("LZ77" in ASCII) |
4 bytes | 4 bytes | Data about the compression - the first three bytes are the size of the uncompressed data and fourth is the compression method, which should always be 0x10 - note this value is stored little endian. |
Data
Immediately after the header comes the actual compressed data. The algorithm is very simple: the data is divided in chunks, and each chunk starts with a flag byte. The flag byte indicates whether an entry in the chunk is literal data or a reference to earlier in the data. A chunk has 8 entries, and thus each of the bits in the flag byte gives this indication: if it's 0 the entry is a single byte of literal data. If it's 1 the entry is a reference to earlier in the data. For example, if the flag byte of a chunk is 0xA7 (10100111 in binary), the first entry is a reference, the second a byte of literal data, the third another reference, and so on. Note that an entry of literal data is always a single byte, whereas a reference is two bytes long.
References
A reference is a very simple two-byte structure:
Offset | Size | Description |
---|---|---|
0 bytes | 4 bits | A single hexadecimal digit - the length of the data referred to by the reference subtract 3 |
4 bits | 12 bits | Use to calculate offset within uncompressed data referred to by the reference. |
Offset is calculated by the current "place" you're writing to in the uncompressed data subtract the value in the reference subtract 1. For example, if your current "place" is 0x9EB3 a reference 0x24AD would refer to 2 + 3 = 5 bytes of data at offset 0x9EB3 - 0x4AD - 1 = 0x9A05.
LZ77 compression https://www.allaboutcircuits.com/ip-cores/uncategorized/deflatecore/
Example Code
This python code by Marcan will decompress a LZ77-compressed file:
import sys, struct
class WiiLZ77:
TYPE_LZ77 = 1
def __init__(self, file, offset):
self.file = file
self.offset = offset
self.file.seek(self.offset)
hdr = struct.unpack("<I",self.file.read(4))[0]
self.uncompressed_length = hdr>>8
self.compression_type = hdr>>4 & 0xF
if self.compression_type != self.TYPE_LZ77:
raise ValueError("Unsupported compression method %d"%self.compression_type)
def uncompress(self):
dout = ""
self.file.seek(self.offset + 0x4)
while len(dout) < self.uncompressed_length:
flags = struct.unpack("<B",self.file.read(1))[0]
for i in range(8):
if flags & 0x80:
info = struct.unpack(">H",self.file.read(2))[0]
num = 3 + ((info>>12)&0xF)
disp = info & 0xFFF
ptr = len(dout) - (info & 0xFFF) - 1
for i in range(num):
dout += dout[ptr]
ptr+=1
if len(dout) >= self.uncompressed_length:
break
else:
dout += self.file.read(1)
flags <<= 1
if len(dout) >= self.uncompressed_length:
break
self.data = dout
return self.data
f = open(sys.argv[1])
hdr = f.read(4)
if hdr != "LZ77":
f.seek(0)
unc = WiiLZ77(f, f.tell())
du = unc.uncompress()
f2 = open(sys.argv[2],"w")
f2.write(du)
f2.close()
Tools
[1] by JGSHEW can be used to decompress LZ77-compressed files.