LZ77

From WiiBrew
Jump to navigation Jump to search

Many files on the Wii are compressed using the LZ77 compression algorithm. This page has documentation on how the algorithm works, and example code.

LZ77 Algorithm

Header

In LZ77-compressed Wii files, there is first a simple 8-byte header:

Offset Size Description
0 bytes 4 bytes Magic number - always 0x4C5A3737 ("LZ77" in ASCII)
4 bytes 3 bytes Size of uncompressed data
7 bytes 1 byte Compression method - should always be 0x01

Data

Immediately after the header comes the actual compressed data. The algorithm is very simple: the data is divided in chunks, and each chunk starts with a flag byte. The flag byte indicates whether an entry in the chunk is literal data or a reference to earlier in the data. A chunk has 8 entries, and thus each of the bits in the flag byte gives this indication: if it's 0 the entry is a single byte of literal data. If it's 1 the entry is a reference to earlier in the data. For example, if the flag byte of a chunk is 0xA7 (10100111 in binary), the first entry is a reference, the second a byte of literal data, the third another reference, and so on. Note that an entry of literal data is always a single byte, whereas a reference is two bytes long.

References

A reference is a very simple two-byte structure:

Offset Size Description
0 bytes 4 bits A single hexadecimal digit - the length of the data referred to by the reference subtract 3
4 bits 12 bits Offset - the offset within the decompressed data of the data referred to by the reference

For example, a reference 0x24AD would refer to 2 + 3 = 5 bytes of data at offset 0x4AD in the uncompressed data.

Example Code

This code by Marcan will decompress a LZ77-compressed file:

import sys, struct
 
class WiiLZ77:
    TYPE_LZ77 = 1
    def __init__(self, file, offset):
        self.file = file
        self.offset = offset

        self.file.seek(self.offset)

        hdr = struct.unpack("<I",self.file.read(4))[0]
        self.uncompressed_length = hdr>>8
        self.compression_type = hdr>>4 & 0xF

        if self.compression_type != self.TYPE_LZ77:
            raise ValueError("Unsupported compression method %d"%self.compression_type)

    def uncompress(self):
        dout = ""

        self.file.seek(self.offset + 0x4)

        while len(dout) < self.uncompressed_length:
            flags = struct.unpack("<B",self.file.read(1))[0]

            for i in range(8):
                if flags & 0x80:
                    info = struct.unpack(">H",self.file.read(2))[0]
                    num = 3 + ((info>>12)&0xF)
                    disp = info & 0xFFF
                    ptr = len(dout) - (info & 0xFFF) - 1
                    for i in range(num):
                        dout += dout[ptr]
                        ptr+=1
                        if len(dout) >= self.uncompressed_length:
                            break
                else:
                    dout += self.file.read(1)
                flags <<= 1
                if len(dout) >= self.uncompressed_length:
                    break

        self.data = dout
        return self.data
 
f = open(sys.argv[1])
 
hdr = f.read(4)
if hdr != "LZ77":
    f.seek(0)
 
unc = WiiLZ77(f, f.tell())
 
du = unc.uncompress()
 
f2 = open(sys.argv[2],"w")
f2.write(du)
f2.close()