Difference between revisions of "LZ77"
(Make this not wrong)
|Line 16:||Line 16:|
| 4 bytes
| 4 bytes
| 4 bytes
| 4 bytes
| Data about the compression - the first three bytes are the size of the uncompressed data and fourth is the compression method, which should always be 0x10 - note this value is stored
| Data about the compression - the first three bytes are the size of the uncompressed data and fourth is the compression method, which should always be 0x10 - note this value is stored endian.
Revision as of 12:28, 16 May 2020
Many files on the Wii are compressed using the LZ77 compression algorithm. This page has documentation on how the algorithm works, and example code.
In LZ77-compressed Wii files, there is first a simple 8-byte header:
|0 bytes||4 bytes||Magic number - always 0x4C5A3737 ("LZ77" in ASCII)|
|4 bytes||4 bytes||Data about the compression - the first three bytes are the size of the uncompressed data and fourth is the compression method, which should always be 0x10 - note this value is stored little endian.|
Immediately after the header comes the actual compressed data. The algorithm is very simple: the data is divided in chunks, and each chunk starts with a flag byte. The flag byte indicates whether an entry in the chunk is literal data or a reference to earlier in the data. A chunk has 8 entries, and thus each of the bits in the flag byte gives this indication: if it's 0 the entry is a single byte of literal data. If it's 1 the entry is a reference to earlier in the data. For example, if the flag byte of a chunk is 0xA7 (10100111 in binary), the first entry is a reference, the second a byte of literal data, the third another reference, and so on. Note that an entry of literal data is always a single byte, whereas a reference is two bytes long.
A reference is a very simple two-byte structure:
|0 bytes||4 bits||A single hexadecimal digit - the length of the data referred to by the reference subtract 3|
|4 bits||12 bits||Use to calculate offset within uncompressed data referred to by the reference.|
Offset is calculated by the current "place" you're writing to in the uncompressed data subtract the value in the reference subtract 1. For example, if your current "place" is 0x9EB3 a reference 0x24AD would refer to 2 + 3 = 5 bytes of data at offset 0x9EB3 - 0x4AD - 1 = 0x9A05.
This python code by Marcan will decompress a LZ77-compressed file:
import sys, struct class WiiLZ77: TYPE_LZ77 = 1 def __init__(self, file, offset): self.file = file self.offset = offset self.file.seek(self.offset) hdr = struct.unpack("<I",self.file.read(4)) self.uncompressed_length = hdr>>8 self.compression_type = hdr>>4 & 0xF if self.compression_type != self.TYPE_LZ77: raise ValueError("Unsupported compression method %d"%self.compression_type) def uncompress(self): dout = "" self.file.seek(self.offset + 0x4) while len(dout) < self.uncompressed_length: flags = struct.unpack("<B",self.file.read(1)) for i in range(8): if flags & 0x80: info = struct.unpack(">H",self.file.read(2)) num = 3 + ((info>>12)&0xF) disp = info & 0xFFF ptr = len(dout) - (info & 0xFFF) - 1 for i in range(num): dout += dout[ptr] ptr+=1 if len(dout) >= self.uncompressed_length: break else: dout += self.file.read(1) flags <<= 1 if len(dout) >= self.uncompressed_length: break self.data = dout return self.data f = open(sys.argv) hdr = f.read(4) if hdr != "LZ77": f.seek(0) unc = WiiLZ77(f, f.tell()) du = unc.uncompress() f2 = open(sys.argv,"w") f2.write(du) f2.close()