LZ77

From WiiBrew
Jump to navigation Jump to search

Many files on the Wii are compressed using the LZ77 compression algorithm. This page has documentation on how the algorithm works, and example code.

LZ77 Algorithm

Header

In LZ77-compressed Wii files, there is first a simple 8-byte header:

Offset Size Description
0 bytes 4 bytes Magic number - always 0x4C5A3737 ("LZ77" in ASCII)
4 bytes 4 bytes Data about the compression - the first three bytes are the size of the uncompressed data and fourth is the compression method, which should always be 0x10 - note this value is stored little endian.

Data

Immediately after the header comes the actual compressed data. The algorithm is very simple: the data is divided in chunks, and each chunk starts with a flag byte. The flag byte indicates whether an entry in the chunk is literal data or a reference to earlier in the data. A chunk has 8 entries, and thus each of the bits in the flag byte gives this indication: if it's 0 the entry is a single byte of literal data. If it's 1 the entry is a reference to earlier in the data. For example, if the flag byte of a chunk is 0xA7 (10100111 in binary), the first entry is a reference, the second a byte of literal data, the third another reference, and so on. Note that an entry of literal data is always a single byte, whereas a reference is two bytes long.

References

A reference is a very simple two-byte structure:

Offset Size Description
0 bytes 4 bits A single hexadecimal digit - the length of the data referred to by the reference subtract 3
4 bits 12 bits Use to calculate offset within uncompressed data referred to by the reference.

Offset is calculated by the current "place" you're writing to in the uncompressed data subtract the value in the reference subtract 1. For example, if your current "place" is 0x9EB3 a reference 0x24AD would refer to 2 + 3 = 5 bytes of data at offset 0x9EB3 - 0x4AD - 1 = 0x9A05.

LZ77 compression https://www.allaboutcircuits.com/ip-cores/uncategorized/deflatecore/

Example Code

This python code by Marcan will decompress a LZ77-compressed file:

import sys, struct
 
class WiiLZ77:
    TYPE_LZ77 = 1
    def __init__(self, file, offset):
        self.file = file
        self.offset = offset

        self.file.seek(self.offset)

        hdr = struct.unpack("<I",self.file.read(4))[0]
        self.uncompressed_length = hdr>>8
        self.compression_type = hdr>>4 & 0xF

        if self.compression_type != self.TYPE_LZ77:
            raise ValueError("Unsupported compression method %d"%self.compression_type)

    def uncompress(self):
        dout = ""

        self.file.seek(self.offset + 0x4)

        while len(dout) < self.uncompressed_length:
            flags = struct.unpack("<B",self.file.read(1))[0]

            for i in range(8):
                if flags & 0x80:
                    info = struct.unpack(">H",self.file.read(2))[0]
                    num = 3 + ((info>>12)&0xF)
                    disp = info & 0xFFF
                    ptr = len(dout) - (info & 0xFFF) - 1
                    for i in range(num):
                        dout += dout[ptr]
                        ptr+=1
                        if len(dout) >= self.uncompressed_length:
                            break
                else:
                    dout += self.file.read(1)
                flags <<= 1
                if len(dout) >= self.uncompressed_length:
                    break

        self.data = dout
        return self.data
 
f = open(sys.argv[1])
 
hdr = f.read(4)
if hdr != "LZ77":
    f.seek(0)
 
unc = WiiLZ77(f, f.tell())
 
du = unc.uncompress()
 
f2 = open(sys.argv[2],"w")
f2.write(du)
f2.close()

Tools

[1] by JGSHEW can be used to decompress LZ77-compressed files.