Paired single

Paired singles are a unique part of the Gekko/Broadway processors used in the Gamecube and Wii. They provide fast vector math by keeping two single-precision floating point numbers in a single floating pointer register, and multiplying across between registers. This page will demonstrate how these instructions are to be used.

Quantization and Dequantization

All numbers must be quantized before being put into Paired Singles. For conversion from non-floats, in order to allow for greater flexibility, there is a form of scaling implemented. All quantization is controlled by the GQRs (Graphics Quantization Registers). The GQRs are 32bit registers containing the conversion types and scaling factors for storing and loading. (During loading, it dequantizes. During storing, it quantizes.)

GQR
	31	30	29	28	27	26	25	24	23	22	21	20	19	18	17	16
Access	U		R/W						U					R/W
Field			L_Scale											L_Type
	15	14	13	12	11	10	9	8	7	6	5	4	3	2	1	0
Access	U		R/W						U					R/W
Field			S_Scale											S_Type

Field	Description
L_*	Values for dequantization.
S_*	Values for quantization.
Scale	Signed. During dequantization divide the number by (2^scale). During quantization, multiply the number by (2^scale).
Type	0: Float (this does no scaling during de/quantization), 4: Unsigned 8bit, 5: Unsigned 16bit, 6: Signed 8bit, 7: Signed 16bit.

Loading and Storing

To load and store Paired-singles, one must use the psq_l and psq_st instructions respectively, or one of their variants.

psq_l

psq_l frD, d(rA), W, I

This instruction dequantizes values from the memory address in d+(rA|0) and puts them into PS0 and PS1 in frD. If W is 1, however, it only dequantizes one number, and places that into PS0. PS1 is loaded with 1.0 always when W is 1. I specifies the GQR to use for dequantization parameters. The two numbers read from the memory are directly after each other, regardless of size (for example, if the GQR specified to load as a u16, you would have d+(rA|0) point to a two-element array of u16s)

psq_lx

psq_lx frD, rA, rB, W, I

This instruction acts exactly like psq_l, except instead of (rA) being offset by d, it is offset by (rB).

psq_lu

psq_lu frD, d(rA), W, I

This instruction acts exactly like psq_l, except rA cannot be 0, and d+(rA) is placed back into rA.

psq_lux

psq_lux frD, rA, rB, W, I

This instruction acts exactly like psq_lx, except rA cannot be 0, and d+(rA) is placed back into rA.

psq_st

psq_st frD, d(rA), W, I

This instruction quantizes values from the Paired Singles in frD and places them in the memory address in d+(rA|0). If W is 1, however, it only quantizes PS0. I specifies the GQR to use for dequantization parameters. The two numbers written to memory are directly after each other, regardless of size (for example, if the GQR specified to store as a u16, d+(rA|0) would be treated as a two-element array of u16s)

psq_stx

psq_stx frD, rA, rB, W, I

This instruction acts exactly like psq_st, except instead of (rA) being offset by d, it is offset by (rB).

psq_stu

psq_stu frD, d(rA), W, I

This instruction acts exactly like psq_st, except rA cannot be 0, and d+(rA) is placed back into rA.

psq_stux

psq_stux frD, rA, rB, W, I

This instruction acts exactly like psq_stx, except rA cannot be 0, and d+(rA) is placed back into rA.