Difference between revisions of "LZ77"

From WiiBrew
Jump to navigation Jump to search
(Make this not wrong)
Line 15: Line 15:
 
|-
 
|-
 
| 4 bytes
 
| 4 bytes
| 3 bytes
+
| 4 bytes
| Size of uncompressed data
+
| Data about the compression - the first three bytes are the size of the uncompressed data and fourth is the compression method, which should always be 0x10 - note this value is stored big endian.
|-
 
| 7 bytes
 
| 1 byte
 
| Compression method - should always be 0x01
 
 
|}
 
|}
  
Line 40: Line 36:
 
| 4 bits
 
| 4 bits
 
| 12 bits
 
| 12 bits
| Offset - the offset within the decompressed data of the data referred to by the reference
+
| Use to calculate offset within uncompressed data referred to by the reference.
 
|}
 
|}
  
For example, a reference 0x24AD would refer to 2 + 3 = '''5''' bytes of data at offset 0x4AD in the uncompressed data.
+
Offset is calculated by the current "place" you're writing to in the uncompressed data subtract the value in the reference subtract 1. For example, if your current "place" is 0x9EB3 a reference 0x24AD would refer to 2 + 3 = '''5''' bytes of data at offset 0x9EB3 - 0x4AD - 1 = '''0x9A05'''.
  
 
==Example Code==
 
==Example Code==

Revision as of 13:27, 16 May 2020

Many files on the Wii are compressed using the LZ77 compression algorithm. This page has documentation on how the algorithm works, and example code.

LZ77 Algorithm

Header

In LZ77-compressed Wii files, there is first a simple 8-byte header:

Offset Size Description
0 bytes 4 bytes Magic number - always 0x4C5A3737 ("LZ77" in ASCII)
4 bytes 4 bytes Data about the compression - the first three bytes are the size of the uncompressed data and fourth is the compression method, which should always be 0x10 - note this value is stored big endian.

Data

Immediately after the header comes the actual compressed data. The algorithm is very simple: the data is divided in chunks, and each chunk starts with a flag byte. The flag byte indicates whether an entry in the chunk is literal data or a reference to earlier in the data. A chunk has 8 entries, and thus each of the bits in the flag byte gives this indication: if it's 0 the entry is a single byte of literal data. If it's 1 the entry is a reference to earlier in the data. For example, if the flag byte of a chunk is 0xA7 (10100111 in binary), the first entry is a reference, the second a byte of literal data, the third another reference, and so on. Note that an entry of literal data is always a single byte, whereas a reference is two bytes long.

References

A reference is a very simple two-byte structure:

Offset Size Description
0 bytes 4 bits A single hexadecimal digit - the length of the data referred to by the reference subtract 3
4 bits 12 bits Use to calculate offset within uncompressed data referred to by the reference.

Offset is calculated by the current "place" you're writing to in the uncompressed data subtract the value in the reference subtract 1. For example, if your current "place" is 0x9EB3 a reference 0x24AD would refer to 2 + 3 = 5 bytes of data at offset 0x9EB3 - 0x4AD - 1 = 0x9A05.

Example Code

This python code by Marcan will decompress a LZ77-compressed file:

import sys, struct
 
class WiiLZ77:
    TYPE_LZ77 = 1
    def __init__(self, file, offset):
        self.file = file
        self.offset = offset

        self.file.seek(self.offset)

        hdr = struct.unpack("<I",self.file.read(4))[0]
        self.uncompressed_length = hdr>>8
        self.compression_type = hdr>>4 & 0xF

        if self.compression_type != self.TYPE_LZ77:
            raise ValueError("Unsupported compression method %d"%self.compression_type)

    def uncompress(self):
        dout = ""

        self.file.seek(self.offset + 0x4)

        while len(dout) < self.uncompressed_length:
            flags = struct.unpack("<B",self.file.read(1))[0]

            for i in range(8):
                if flags & 0x80:
                    info = struct.unpack(">H",self.file.read(2))[0]
                    num = 3 + ((info>>12)&0xF)
                    disp = info & 0xFFF
                    ptr = len(dout) - (info & 0xFFF) - 1
                    for i in range(num):
                        dout += dout[ptr]
                        ptr+=1
                        if len(dout) >= self.uncompressed_length:
                            break
                else:
                    dout += self.file.read(1)
                flags <<= 1
                if len(dout) >= self.uncompressed_length:
                    break

        self.data = dout
        return self.data
 
f = open(sys.argv[1])
 
hdr = f.read(4)
if hdr != "LZ77":
    f.seek(0)
 
unc = WiiLZ77(f, f.tell())
 
du = unc.uncompress()
 
f2 = open(sys.argv[2],"w")
f2.write(du)
f2.close()