Difference between revisions of "Paired single"

Revision as of 22:52, 10 July 2010

Paired singles are a unique part of the Gekko/Broadway processors used in the Gamecube and Wii. They provide fast vector math by keeping two single-precision floating point numbers in a single floating point register, and doing math across registers. This page will demonstrate how these instructions work.

Quantization and Dequantization

All numbers must be quantized before being put into Paired Singles. For conversion from non-floats, in order to allow for greater flexibility, there is a form of scaling implemented. All quantization is controlled by the GQRs (Graphics Quantization Registers). The GQRs are 32bit registers containing the conversion types and scaling factors for storing and loading. (During loading, it dequantizes. During storing, it quantizes.)

GQR
	31	30	29	28	27	26	25	24	23	22	21	20	19	18	17	16
Access	U		R/W						U					R/W
Field			L_Scale											L_Type
	15	14	13	12	11	10	9	8	7	6	5	4	3	2	1	0
Access	U		R/W						U					R/W
Field			S_Scale											S_Type

Field	Description
L_*	Values for dequantization.
S_*	Values for quantization.
Scale	Signed. During dequantization divide the number by (2^scale). During quantization, multiply the number by (2^scale).
Type	0: Float (this does no scaling during de/quantization), 4: Unsigned 8bit, 5: Unsigned 16bit, 6: Signed 8bit, 7: Signed 16bit.

Loading and Storing

To load and store Paired-singles, one must use the psq_l and psq_st instructions respectively, or one of their variants.

psq_l

psq_l      frD, d(rA), W, I

This instruction dequantizes values from the memory address in d+(rA|0) and puts them into PS0 and PS1 in frD. If W is 1, however, it only dequantizes one number, and places that into PS0. PS1 is loaded with 1.0 always when W is 1. I specifies the GQR to use for dequantization parameters. The two numbers read from the memory are directly after each other, regardless of size (for example, if the GQR specified to load as a u16, you would have d+(rA|0) point to a two-element array of u16s)

psq_lx

psq_lx     frD, rA, rB, W, I

This instruction acts exactly like psq_l, except instead of (rA) being offset by d, it is offset by (rB).

psq_lu

psq_lu     frD, d(rA), W, I

This instruction acts exactly like psq_l, except rA cannot be 0, and d+(rA) is placed back into rA.

psq_lux

psq_lux    frD, rA, rB, W, I

This instruction acts exactly like psq_lx, except rA cannot be 0, and d+(rA) is placed back into rA.

psq_st

psq_st     frD, d(rA), W, I

This instruction quantizes values from the Paired Singles in frD and places them in the memory address in d+(rA|0). If W is 1, however, it only quantizes PS0. I specifies the GQR to use for dequantization parameters. The two numbers written to memory are directly after each other, regardless of size (for example, if the GQR specified to store as a u16, d+(rA|0) would be treated as a two-element array of u16s)

psq_stx

psq_stx    frD, rA, rB, W, I

This instruction acts exactly like psq_st, except instead of (rA) being offset by d, it is offset by (rB).

psq_stu

psq_stu    frD, d(rA), W, I

This instruction acts exactly like psq_st, except rA cannot be 0, and d+(rA) is placed back into rA.

psq_stux

psq_stux   frD, rA, rB, W, I

This instruction acts exactly like psq_stx, except rA cannot be 0, and d+(rA) is placed back into rA.

Single Parameter Operations

These functions operate on one FPR.

ps_abs

ps_abs     frD, frB

frD(ps0) = abs(frB(ps0))
frD(ps1) = abs(frB(ps1))

ps_mr

ps_mr      frD, frB

frD(ps0) = frB(ps0)
frD(ps1) = frB(ps1)

ps_nabs

ps_nabs    frD, frB

frD(ps0) = -abs(frB(ps0))
frD(ps1) = -abs(frB(ps1))

ps_neg

ps_neg     frD, frB

frD(ps0) = -frB(ps0)
frD(ps1) = -frB(ps1)

ps_res

ps_res     frD, frB

frD(ps0) = -1/frB(ps0)
frD(ps1) = -1/frB(ps1)

Accurate to a precision of 1/4096.

ps_rsqrte

ps_rsqrte  frD, frB

frD(ps0) = -1/sqrt(frB(ps0))
frD(ps1) = -1/sqrt(frB(ps1))

Accurate to a precision of 1/4096.

Basic Math

Simple everyday math.

ps_add

ps_add     frD, frA, frB

frD(ps0) = frA(ps0) + frB(ps0)
frD(ps1) = frA(ps1) + frB(ps1)

ps_div

ps_div     frD, frA, frB

frD(ps0) = frA(ps0) / frB(ps0)
frD(ps1) = frA(ps1) / frB(ps1)

ps_mul

ps_mul     frD, frA, frC

frD(ps0) = frA(ps0) * frC(ps0)
frD(ps1) = frA(ps1) * frC(ps1)

ps_sub

ps_sub     frD, frA, frB

frD(ps0) = frA(ps0) - frB(ps0)
frD(ps1) = frA(ps1) - frB(ps1)

Comparison

ps_cmpo0

ps_cmpo0   crfD, frA, frB
ps_cmpu0   crfD, frA, frB

cfrD = frA(ps0) compare frB(ps0)

ps_cmpo1

ps_cmpo1   crfD, frA, frB
ps_cmpu1   crfD, frA, frB

cfrD = frA(ps1) compare frB(ps1)

Complex Multiply

These instructions multiply in complex ways

ps_madd

ps_madd    frD, frA, frC, frB

frD(ps0) = frA(ps0) * frC(ps0) + frB(ps0)
frD(ps1) = frA(ps1) * frC(ps1) + frB(ps1)

ps_madds0

ps_madds0  frD, frA, frC, frB

frD(ps0) = frA(ps0) * frC(ps0) + frB(ps0)
frD(ps1) = frA(ps1) * frC(ps0) + frB(ps1)

ps_madds1

ps_madds1  frD, frA, frC, frB

frD(ps0) = frA(ps0) * frC(ps1) + frB(ps0)
frD(ps1) = frA(ps1) * frC(ps1) + frB(ps1)

ps_msub

ps_msub    frD, frA, frC, frB

frD(ps0) = frA(ps0) * frC(ps0) - frB(ps0)
frD(ps1) = frA(ps1) * frC(ps1) - frB(ps1)

ps_muls0

ps_muls0   frD, frA, frC

frD(ps0) = frA(ps0) * frC(ps0)
frD(ps1) = frA(ps1) * frC(ps0)

ps_muls1

ps_muls1   frD, frA, frC

frD(ps0) = frA(ps0) * frC(ps1)
frD(ps1) = frA(ps1) * frC(ps1)

ps_nmadd

ps_nmadd   frD, frA, frC, frB

frD(ps0) = -(frA(ps0) * frC(ps0) + frB(ps0))
frD(ps1) = -(frA(ps1) * frC(ps1) + frB(ps1))

ps_nmsub

ps_nmsub   frD, frA, frC, frB

frD(ps0) = -(frA(ps0) * frC(ps0) - frB(ps0))
frD(ps1) = -(frA(ps1) * frC(ps1) - frB(ps1))

Miscellaneous

Whatever doesn't fit into the other categories

ps_merge00

ps_merge00 frD, frA, frB

frD(ps0) = frA(ps0)
frD(ps1) = frB(ps0)

ps_merge01

ps_merge01 frD, frA, frB

frD(ps0) = frA(ps0)
frD(ps1) = frB(ps1)

ps_merge10

ps_merge10 frD, frA, frB

frD(ps0) = frA(ps1)
frD(ps1) = frB(ps0)

ps_merge11

ps_merge11 frD, frA, frB

frD(ps0) = frA(ps1)
frD(ps1) = frB(ps1)

ps_sel

ps_sel     frD, frA, frC, frB

if(frA(ps0) >= 0)
        frD(ps0) = frC(ps0)
else
        frD(ps0) = frB(ps0)
if(frA(ps1) >= 0)
        frD(ps1) = frC(ps1)
else
        frD(ps1) = frB(ps1)

ps_sum0

ps_sum0    frD, frA, frC, frB

frD(ps0) = frA(ps0) + frB(ps1)
frD(ps1) = frC(ps1)

ps_sum1

ps_sum1    frD, frA, frC, frB

frD(ps0) = frC(ps0)
frD(ps1) = frA(ps0) + frB(ps1)

@@ Line 1: / Line 1: @@
-Paired singles are a unique part of the Gekko/[[Hardware/Broadway|Broadway]] processors used in the Gamecube and Wii. They provide fast vector math by keeping two single-precision floating point numbers in a single floating point register, and doing math across registers. This page will demonstrate how these instructions are to be used.
+Paired singles are a unique part of the Gekko/[[Hardware/Broadway|Broadway]] processors used in the Gamecube and Wii. They provide fast vector math by keeping two single-precision floating point numbers in a single floating point register, and doing math across registers. This page will demonstrate how these instructions work.
 == Quantization and Dequantization ==
@@ Line 21: / Line 21: @@
 To load and store Paired-singles, one must use the psq_l and psq_st instructions respectively, or one of their variants.
 === psq_l ===
-  psq_l     frD, d(rA), W, I
+  psq_l      frD, d(rA), W, I
 This instruction dequantizes values from the memory address in '''d'''+('''rA'''|0) and puts them into PS0 and PS1 in '''frD'''. If '''W''' is 1, however, it only dequantizes one number, and places that into PS0. PS1 is loaded with 1.0 always when '''W''' is 1. '''I''' specifies the GQR to use for dequantization parameters. The two numbers read from the memory are directly after each other, regardless of size (for example, if the GQR specified to load as a u16, you would have '''d'''+('''rA'''|0) point to a two-element array of u16s)
 ===== psq_lx =====
-  psq_lx    frD, rA, rB, W, I
+  psq_lx     frD, rA, rB, W, I
 This instruction acts exactly like psq_l, except instead of ('''rA''') being offset by '''d''', it is offset by ('''rB''').
 ===== psq_lu =====
-  psq_lu    frD, d(rA), W, I
+  psq_lu     frD, d(rA), W, I
 This instruction acts exactly like psq_l, except '''rA''' cannot be 0, and '''d'''+('''rA''') is placed back into '''rA'''.
 ===== psq_lux =====
-  psq_lux   frD, rA, rB, W, I
+  psq_lux    frD, rA, rB, W, I
 This instruction acts exactly like psq_lx, except '''rA''' cannot be 0, and '''d'''+('''rA''') is placed back into '''rA'''.
 === psq_st ===
-  psq_st    frD, d(rA), W, I
+  psq_st     frD, d(rA), W, I
 This instruction quantizes values from the Paired Singles in '''frD''' and places them in the memory address in '''d'''+('''rA'''|0). If '''W''' is 1, however, it only quantizes PS0. '''I''' specifies the GQR to use for dequantization parameters. The two numbers written to memory are directly after each other, regardless of size (for example, if the GQR specified to store as a u16, '''d'''+('''rA'''|0) would be treated as a two-element array of u16s)
 ===== psq_stx =====
-  psq_stx   frD, rA, rB, W, I
+  psq_stx    frD, rA, rB, W, I
 This instruction acts exactly like psq_st, except instead of ('''rA''') being offset by '''d''', it is offset by ('''rB''').
 ===== psq_stu =====
-  psq_stu   frD, d(rA), W, I
+  psq_stu    frD, d(rA), W, I
 This instruction acts exactly like psq_st, except '''rA''' cannot be 0, and '''d'''+('''rA''') is placed back into '''rA'''.
 ===== psq_stux =====
-  psq_stux  frD, rA, rB, W, I
+  psq_stux   frD, rA, rB, W, I
 This instruction acts exactly like psq_stx, except '''rA''' cannot be 0, and '''d'''+('''rA''') is placed back into '''rA'''.
@@ Line 49: / Line 49: @@
 These functions operate on one FPR.
 === ps_abs ===
-  ps_abs    frD, frB
+  ps_abs     frD, frB
   frD(ps0) = abs(frB(ps0))
@@ Line 55: / Line 55: @@
 === ps_mr ===
-  ps_mr     frD, frB
+  ps_mr      frD, frB
   frD(ps0) = frB(ps0)
@@ Line 61: / Line 61: @@
 === ps_nabs ===
-  ps_nabs   frD, frB
+  ps_nabs    frD, frB
   frD(ps0) = -abs(frB(ps0))
@@ Line 67: / Line 67: @@
 === ps_neg ===
-  ps_neg    frD, frB
+  ps_neg     frD, frB
   frD(ps0) = -frB(ps0)
@@ Line 73: / Line 73: @@
 === ps_res ===
-  ps_res    frD, frB
+  ps_res     frD, frB
   frD(ps0) = -1/frB(ps0)
@@ Line 80: / Line 80: @@
 === ps_rsqrte ===
-  ps_rsqrte frD, frB
+  ps_rsqrte  frD, frB
   frD(ps0) = -1/sqrt(frB(ps0))
@@ Line 89: / Line 89: @@
 Simple everyday math.
 === ps_add ===
-  ps_add    frD, frA, frB
+  ps_add     frD, frA, frB
   frD(ps0) = frA(ps0) + frB(ps0)
@@ Line 95: / Line 95: @@
 === ps_div ===
-  ps_div    frD, frA, frB
+  ps_div     frD, frA, frB
   frD(ps0) = frA(ps0) / frB(ps0)
@@ Line 101: / Line 101: @@
 === ps_mul ===
-  ps_mul    frD, frA, frC
+  ps_mul     frD, frA, frC
   frD(ps0) = frA(ps0) * frC(ps0)
@@ Line 107: / Line 107: @@
 === ps_sub ===
-  ps_sub    frD, frA, frB
+  ps_sub     frD, frA, frB
   frD(ps0) = frA(ps0) - frB(ps0)
@@ Line 114: / Line 114: @@
 == Comparison ==
 === ps_cmpo0 ===
-  ps_cmpo0  crfD, frA, frB
+  ps_cmpo0   crfD, frA, frB
-  ps_cmpu0  crfD, frA, frB
+  ps_cmpu0   crfD, frA, frB
   cfrD = frA(ps0) compare frB(ps0)
 === ps_cmpo1 ===
-  ps_cmpo1  crfD, frA, frB
+  ps_cmpo1   crfD, frA, frB
-  ps_cmpu1  crfD, frA, frB
+  ps_cmpu1   crfD, frA, frB
   cfrD = frA(ps1) compare frB(ps1)
@@ Line 128: / Line 128: @@
 These instructions multiply in complex ways
 === ps_madd ===
-  ps_madd   frD, frA, frC, frB
+  ps_madd    frD, frA, frC, frB
   frD(ps0) = frA(ps0) * frC(ps0) + frB(ps0)
@@ Line 134: / Line 134: @@
 === ps_madds0 ===
-  ps_madds0 frD, frA, frC, frB
+  ps_madds0  frD, frA, frC, frB
   frD(ps0) = frA(ps0) * frC(ps0) + frB(ps0)
@@ Line 140: / Line 140: @@
 === ps_madds1 ===
-  ps_madds1 frD, frA, frC, frB
+  ps_madds1  frD, frA, frC, frB
   frD(ps0) = frA(ps0) * frC(ps1) + frB(ps0)
@@ Line 146: / Line 146: @@
 === ps_msub ===
-  ps_msub   frD, frA, frC, frB
+  ps_msub    frD, frA, frC, frB
   frD(ps0) = frA(ps0) * frC(ps0) - frB(ps0)
@@ Line 152: / Line 152: @@
 === ps_muls0 ===
-  ps_muls0  frD, frA, frC
+  ps_muls0   frD, frA, frC
   frD(ps0) = frA(ps0) * frC(ps0)
@@ Line 158: / Line 158: @@
 === ps_muls1 ===
-  ps_muls1  frD, frA, frC
+  ps_muls1   frD, frA, frC
   frD(ps0) = frA(ps0) * frC(ps1)
@@ Line 164: / Line 164: @@
 === ps_nmadd ===
-  ps_nmadd  frD, frA, frC, frB
+  ps_nmadd   frD, frA, frC, frB
   frD(ps0) = -(frA(ps0) * frC(ps0) + frB(ps0))
@@ Line 170: / Line 170: @@
 === ps_nmsub ===
-  ps_nmsub  frD, frA, frC, frB
+  ps_nmsub   frD, frA, frC, frB
   frD(ps0) = -(frA(ps0) * frC(ps0) - frB(ps0))
   frD(ps1) = -(frA(ps1) * frC(ps1) - frB(ps1))
+== Miscellaneous ==
+Whatever doesn't fit into the other categories
+=== ps_merge00 ===
+ ps_merge00 frD, frA, frB
+ frD(ps0) = frA(ps0)
+ frD(ps1) = frB(ps0)
+=== ps_merge01 ===
+ ps_merge01 frD, frA, frB
+ frD(ps0) = frA(ps0)
+ frD(ps1) = frB(ps1)
+=== ps_merge10 ===
+ ps_merge10 frD, frA, frB
+ frD(ps0) = frA(ps1)
+ frD(ps1) = frB(ps0)
+=== ps_merge11 ===
+ ps_merge11 frD, frA, frB
+ frD(ps0) = frA(ps1)
+ frD(ps1) = frB(ps1)
+=== ps_sel ===
+ ps_sel     frD, frA, frC, frB
+ if(frA(ps0) >= 0)
+         frD(ps0) = frC(ps0)
+ else
+         frD(ps0) = frB(ps0)
+ if(frA(ps1) >= 0)
+         frD(ps1) = frC(ps1)
+ else
+         frD(ps1) = frB(ps1)
+=== ps_sum0 ===
+ ps_sum0    frD, frA, frC, frB
+ frD(ps0) = frA(ps0) + frB(ps1)
+ frD(ps1) = frC(ps1)
+=== ps_sum1 ===
+ ps_sum1    frD, frA, frC, frB
+ frD(ps0) = frC(ps0)
+ frD(ps1) = frA(ps0) + frB(ps1)