Difference between revisions of "Paired single"
(Added misc. that should finish it.) |
(bolded title) |
||
(6 intermediate revisions by 5 users not shown) | |||
Line 1: | Line 1: | ||
− | Paired singles are a unique part of the Gekko/[[Hardware/Broadway|Broadway]] processors used in the Gamecube and Wii. They provide fast vector math by keeping two single-precision floating point numbers in a single floating point register, and doing math across registers. This page will demonstrate how these instructions work. | + | '''Paired singles''' are a unique part of the Gekko/[[Hardware/Broadway|Broadway]] processors used in the Gamecube and Wii. They provide fast vector math by keeping two single-precision floating point numbers in a single floating point register, and doing math across registers. This page will demonstrate how these instructions work. |
== Quantization and Dequantization == | == Quantization and Dequantization == | ||
Line 31: | Line 31: | ||
===== psq_lux ===== | ===== psq_lux ===== | ||
psq_lux frD, rA, rB, W, I | psq_lux frD, rA, rB, W, I | ||
− | This instruction acts exactly like psq_lx, except '''rA''' cannot be 0, and ''' | + | This instruction acts exactly like psq_lx, except '''rA''' cannot be 0, and '''rB'''+('''rA''') is placed back into '''rA'''. |
=== psq_st === | === psq_st === | ||
Line 44: | Line 44: | ||
===== psq_stux ===== | ===== psq_stux ===== | ||
psq_stux frD, rA, rB, W, I | psq_stux frD, rA, rB, W, I | ||
− | This instruction acts exactly like psq_stx, except '''rA''' cannot be 0, and ''' | + | This instruction acts exactly like psq_stx, except '''rA''' cannot be 0, and '''rB'''+('''rA''') is placed back into '''rA'''. |
== Single Parameter Operations == | == Single Parameter Operations == | ||
These functions operate on one FPR. | These functions operate on one FPR. | ||
=== ps_abs === | === ps_abs === | ||
+ | Single floating-point absolute value on both ps0 and ps1. | ||
ps_abs frD, frB | ps_abs frD, frB | ||
Line 55: | Line 56: | ||
=== ps_mr === | === ps_mr === | ||
+ | Move both ps0 and ps1 from one fpr to another. | ||
ps_mr frD, frB | ps_mr frD, frB | ||
Line 61: | Line 63: | ||
=== ps_nabs === | === ps_nabs === | ||
+ | Single floating-point negative abs value on both ps0 and ps1. | ||
ps_nabs frD, frB | ps_nabs frD, frB | ||
Line 67: | Line 70: | ||
=== ps_neg === | === ps_neg === | ||
+ | Single floating-point negate on both ps0 and ps1. | ||
ps_neg frD, frB | ps_neg frD, frB | ||
Line 73: | Line 77: | ||
=== ps_res === | === ps_res === | ||
+ | Reciprocal of ps0 and ps1. | ||
ps_res frD, frB | ps_res frD, frB | ||
Line 80: | Line 85: | ||
=== ps_rsqrte === | === ps_rsqrte === | ||
+ | Single floating-point reciprocal sqrt estimate. | ||
ps_rsqrte frD, frB | ps_rsqrte frD, frB | ||
Line 89: | Line 95: | ||
Simple everyday math. | Simple everyday math. | ||
=== ps_add === | === ps_add === | ||
+ | Single floating-point add on both ps0 and ps1. | ||
ps_add frD, frA, frB | ps_add frD, frA, frB | ||
Line 94: | Line 101: | ||
frD(ps1) = frA(ps1) + frB(ps1) | frD(ps1) = frA(ps1) + frB(ps1) | ||
− | === | + | === ps_sub === |
− | + | Single floating-point subtract on both ps0 and ps1. | |
+ | ps_sub frD, frA, frB | ||
− | frD(ps0) = frA(ps0) | + | frD(ps0) = frA(ps0) - frB(ps0) |
− | frD(ps1) = frA(ps1) | + | frD(ps1) = frA(ps1) - frB(ps1) |
=== ps_mul === | === ps_mul === | ||
+ | Single floating-point multiply on both ps0 and ps1. | ||
ps_mul frD, frA, frC | ps_mul frD, frA, frC | ||
Line 106: | Line 115: | ||
frD(ps1) = frA(ps1) * frC(ps1) | frD(ps1) = frA(ps1) * frC(ps1) | ||
− | === | + | === ps_div === |
− | + | Single floating-point divide on both ps0 and ps1. | |
+ | ps_div frD, frA, frB | ||
− | frD(ps0) = frA(ps0) | + | frD(ps0) = frA(ps0) / frB(ps0) |
− | frD(ps1) = frA(ps1) | + | frD(ps1) = frA(ps1) / frB(ps1) |
== Comparison == | == Comparison == | ||
=== ps_cmpo0 === | === ps_cmpo0 === | ||
+ | Ordered compare of ps0 values. | ||
ps_cmpo0 crfD, frA, frB | ps_cmpo0 crfD, frA, frB | ||
ps_cmpu0 crfD, frA, frB | ps_cmpu0 crfD, frA, frB | ||
Line 120: | Line 131: | ||
=== ps_cmpo1 === | === ps_cmpo1 === | ||
+ | Ordered compare of ps1 values. | ||
ps_cmpo1 crfD, frA, frB | ps_cmpo1 crfD, frA, frB | ||
ps_cmpu1 crfD, frA, frB | ps_cmpu1 crfD, frA, frB | ||
Line 128: | Line 140: | ||
These instructions multiply in complex ways | These instructions multiply in complex ways | ||
=== ps_madd === | === ps_madd === | ||
+ | Single floating-point madd on both ps0 and ps1. | ||
ps_madd frD, frA, frC, frB | ps_madd frD, frA, frC, frB | ||
Line 134: | Line 147: | ||
=== ps_madds0 === | === ps_madds0 === | ||
+ | Scalar-vector multiply-add using ps0 for scalar. | ||
ps_madds0 frD, frA, frC, frB | ps_madds0 frD, frA, frC, frB | ||
Line 140: | Line 154: | ||
=== ps_madds1 === | === ps_madds1 === | ||
+ | Scalar-vector multiply-add using ps1 for scalar. | ||
ps_madds1 frD, frA, frC, frB | ps_madds1 frD, frA, frC, frB | ||
Line 146: | Line 161: | ||
=== ps_msub === | === ps_msub === | ||
+ | Single floating-point msub on both ps0 and ps1. | ||
ps_msub frD, frA, frC, frB | ps_msub frD, frA, frC, frB | ||
Line 152: | Line 168: | ||
=== ps_muls0 === | === ps_muls0 === | ||
+ | Scalar-vector multiply using ps0 for scalar. | ||
ps_muls0 frD, frA, frC | ps_muls0 frD, frA, frC | ||
Line 158: | Line 175: | ||
=== ps_muls1 === | === ps_muls1 === | ||
+ | Scalar-vector multiply using ps1 for scalar. | ||
ps_muls1 frD, frA, frC | ps_muls1 frD, frA, frC | ||
Line 164: | Line 182: | ||
=== ps_nmadd === | === ps_nmadd === | ||
+ | Single floating-point nmadd on both ps0 and ps1. | ||
ps_nmadd frD, frA, frC, frB | ps_nmadd frD, frA, frC, frB | ||
Line 170: | Line 189: | ||
=== ps_nmsub === | === ps_nmsub === | ||
+ | Single floating-point nmsub on both ps0 and ps1. | ||
ps_nmsub frD, frA, frC, frB | ps_nmsub frD, frA, frC, frB | ||
Line 176: | Line 196: | ||
== Miscellaneous == | == Miscellaneous == | ||
− | Whatever doesn't fit into the other categories | + | Whatever doesn't fit into the other categories. |
=== ps_merge00 === | === ps_merge00 === | ||
+ | Register move allowing swap/merge of ps0 values. | ||
ps_merge00 frD, frA, frB | ps_merge00 frD, frA, frB | ||
Line 184: | Line 205: | ||
=== ps_merge01 === | === ps_merge01 === | ||
+ | Register move allowing swap/merge of ps0 and ps1 values. | ||
ps_merge01 frD, frA, frB | ps_merge01 frD, frA, frB | ||
Line 190: | Line 212: | ||
=== ps_merge10 === | === ps_merge10 === | ||
+ | Register move allowing swap/merge of ps1 and ps0 values. | ||
ps_merge10 frD, frA, frB | ps_merge10 frD, frA, frB | ||
Line 196: | Line 219: | ||
=== ps_merge11 === | === ps_merge11 === | ||
+ | Register move allowing swap/merge of ps0 values. | ||
ps_merge11 frD, frA, frB | ps_merge11 frD, frA, frB | ||
Line 202: | Line 226: | ||
=== ps_sel === | === ps_sel === | ||
+ | Single floating-point select on both ps0 and ps1. | ||
ps_sel frD, frA, frC, frB | ps_sel frD, frA, frC, frB | ||
Line 214: | Line 239: | ||
=== ps_sum0 === | === ps_sum0 === | ||
+ | Add a ps0 value to a ps1 value, result in ps0. | ||
ps_sum0 frD, frA, frC, frB | ps_sum0 frD, frA, frC, frB | ||
Line 220: | Line 246: | ||
=== ps_sum1 === | === ps_sum1 === | ||
+ | Add a ps0 value to a ps1 value, result in ps1. | ||
ps_sum1 frD, frA, frC, frB | ps_sum1 frD, frA, frC, frB | ||
frD(ps0) = frC(ps0) | frD(ps0) = frC(ps0) | ||
frD(ps1) = frA(ps0) + frB(ps1) | frD(ps1) = frA(ps0) + frB(ps1) | ||
+ | |||
+ | [[Category:Broadway Hardware]] |
Latest revision as of 10:18, 23 July 2024
Paired singles are a unique part of the Gekko/Broadway processors used in the Gamecube and Wii. They provide fast vector math by keeping two single-precision floating point numbers in a single floating point register, and doing math across registers. This page will demonstrate how these instructions work.
Quantization and Dequantization
All numbers must be quantized before being put into Paired Singles. For conversion from non-floats, in order to allow for greater flexibility, there is a form of scaling implemented. All quantization is controlled by the GQRs (Graphics Quantization Registers). The GQRs are 32bit registers containing the conversion types and scaling factors for storing and loading. (During loading, it dequantizes. During storing, it quantizes.)
GQR | ||||||||||||||||
31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | |
Access | U | R/W | U | R/W | ||||||||||||
Field | L_Scale | L_Type | ||||||||||||||
15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | |
Access | U | R/W | U | R/W | ||||||||||||
Field | S_Scale | S_Type |
Field | Description |
L_* | Values for dequantization. |
S_* | Values for quantization. |
Scale | Signed. During dequantization divide the number by (2^scale). During quantization, multiply the number by (2^scale). |
Type | 0: Float (this does no scaling during de/quantization), 4: Unsigned 8bit, 5: Unsigned 16bit, 6: Signed 8bit, 7: Signed 16bit. |
Loading and Storing
To load and store Paired-singles, one must use the psq_l and psq_st instructions respectively, or one of their variants.
psq_l
psq_l frD, d(rA), W, I
This instruction dequantizes values from the memory address in d+(rA|0) and puts them into PS0 and PS1 in frD. If W is 1, however, it only dequantizes one number, and places that into PS0. PS1 is loaded with 1.0 always when W is 1. I specifies the GQR to use for dequantization parameters. The two numbers read from the memory are directly after each other, regardless of size (for example, if the GQR specified to load as a u16, you would have d+(rA|0) point to a two-element array of u16s)
psq_lx
psq_lx frD, rA, rB, W, I
This instruction acts exactly like psq_l, except instead of (rA) being offset by d, it is offset by (rB).
psq_lu
psq_lu frD, d(rA), W, I
This instruction acts exactly like psq_l, except rA cannot be 0, and d+(rA) is placed back into rA.
psq_lux
psq_lux frD, rA, rB, W, I
This instruction acts exactly like psq_lx, except rA cannot be 0, and rB+(rA) is placed back into rA.
psq_st
psq_st frD, d(rA), W, I
This instruction quantizes values from the Paired Singles in frD and places them in the memory address in d+(rA|0). If W is 1, however, it only quantizes PS0. I specifies the GQR to use for dequantization parameters. The two numbers written to memory are directly after each other, regardless of size (for example, if the GQR specified to store as a u16, d+(rA|0) would be treated as a two-element array of u16s)
psq_stx
psq_stx frD, rA, rB, W, I
This instruction acts exactly like psq_st, except instead of (rA) being offset by d, it is offset by (rB).
psq_stu
psq_stu frD, d(rA), W, I
This instruction acts exactly like psq_st, except rA cannot be 0, and d+(rA) is placed back into rA.
psq_stux
psq_stux frD, rA, rB, W, I
This instruction acts exactly like psq_stx, except rA cannot be 0, and rB+(rA) is placed back into rA.
Single Parameter Operations
These functions operate on one FPR.
ps_abs
Single floating-point absolute value on both ps0 and ps1.
ps_abs frD, frB
frD(ps0) = abs(frB(ps0)) frD(ps1) = abs(frB(ps1))
ps_mr
Move both ps0 and ps1 from one fpr to another.
ps_mr frD, frB
frD(ps0) = frB(ps0) frD(ps1) = frB(ps1)
ps_nabs
Single floating-point negative abs value on both ps0 and ps1.
ps_nabs frD, frB
frD(ps0) = -abs(frB(ps0)) frD(ps1) = -abs(frB(ps1))
ps_neg
Single floating-point negate on both ps0 and ps1.
ps_neg frD, frB
frD(ps0) = -frB(ps0) frD(ps1) = -frB(ps1)
ps_res
Reciprocal of ps0 and ps1.
ps_res frD, frB
frD(ps0) = -1/frB(ps0) frD(ps1) = -1/frB(ps1)
Accurate to a precision of 1/4096.
ps_rsqrte
Single floating-point reciprocal sqrt estimate.
ps_rsqrte frD, frB
frD(ps0) = -1/sqrt(frB(ps0)) frD(ps1) = -1/sqrt(frB(ps1))
Accurate to a precision of 1/4096.
Basic Math
Simple everyday math.
ps_add
Single floating-point add on both ps0 and ps1.
ps_add frD, frA, frB
frD(ps0) = frA(ps0) + frB(ps0) frD(ps1) = frA(ps1) + frB(ps1)
ps_sub
Single floating-point subtract on both ps0 and ps1.
ps_sub frD, frA, frB
frD(ps0) = frA(ps0) - frB(ps0) frD(ps1) = frA(ps1) - frB(ps1)
ps_mul
Single floating-point multiply on both ps0 and ps1.
ps_mul frD, frA, frC
frD(ps0) = frA(ps0) * frC(ps0) frD(ps1) = frA(ps1) * frC(ps1)
ps_div
Single floating-point divide on both ps0 and ps1.
ps_div frD, frA, frB
frD(ps0) = frA(ps0) / frB(ps0) frD(ps1) = frA(ps1) / frB(ps1)
Comparison
ps_cmpo0
Ordered compare of ps0 values.
ps_cmpo0 crfD, frA, frB ps_cmpu0 crfD, frA, frB
cfrD = frA(ps0) compare frB(ps0)
ps_cmpo1
Ordered compare of ps1 values.
ps_cmpo1 crfD, frA, frB ps_cmpu1 crfD, frA, frB
cfrD = frA(ps1) compare frB(ps1)
Complex Multiply
These instructions multiply in complex ways
ps_madd
Single floating-point madd on both ps0 and ps1.
ps_madd frD, frA, frC, frB
frD(ps0) = frA(ps0) * frC(ps0) + frB(ps0) frD(ps1) = frA(ps1) * frC(ps1) + frB(ps1)
ps_madds0
Scalar-vector multiply-add using ps0 for scalar.
ps_madds0 frD, frA, frC, frB
frD(ps0) = frA(ps0) * frC(ps0) + frB(ps0) frD(ps1) = frA(ps1) * frC(ps0) + frB(ps1)
ps_madds1
Scalar-vector multiply-add using ps1 for scalar.
ps_madds1 frD, frA, frC, frB
frD(ps0) = frA(ps0) * frC(ps1) + frB(ps0) frD(ps1) = frA(ps1) * frC(ps1) + frB(ps1)
ps_msub
Single floating-point msub on both ps0 and ps1.
ps_msub frD, frA, frC, frB
frD(ps0) = frA(ps0) * frC(ps0) - frB(ps0) frD(ps1) = frA(ps1) * frC(ps1) - frB(ps1)
ps_muls0
Scalar-vector multiply using ps0 for scalar.
ps_muls0 frD, frA, frC
frD(ps0) = frA(ps0) * frC(ps0) frD(ps1) = frA(ps1) * frC(ps0)
ps_muls1
Scalar-vector multiply using ps1 for scalar.
ps_muls1 frD, frA, frC
frD(ps0) = frA(ps0) * frC(ps1) frD(ps1) = frA(ps1) * frC(ps1)
ps_nmadd
Single floating-point nmadd on both ps0 and ps1.
ps_nmadd frD, frA, frC, frB
frD(ps0) = -(frA(ps0) * frC(ps0) + frB(ps0)) frD(ps1) = -(frA(ps1) * frC(ps1) + frB(ps1))
ps_nmsub
Single floating-point nmsub on both ps0 and ps1.
ps_nmsub frD, frA, frC, frB
frD(ps0) = -(frA(ps0) * frC(ps0) - frB(ps0)) frD(ps1) = -(frA(ps1) * frC(ps1) - frB(ps1))
Miscellaneous
Whatever doesn't fit into the other categories.
ps_merge00
Register move allowing swap/merge of ps0 values.
ps_merge00 frD, frA, frB
frD(ps0) = frA(ps0) frD(ps1) = frB(ps0)
ps_merge01
Register move allowing swap/merge of ps0 and ps1 values.
ps_merge01 frD, frA, frB
frD(ps0) = frA(ps0) frD(ps1) = frB(ps1)
ps_merge10
Register move allowing swap/merge of ps1 and ps0 values.
ps_merge10 frD, frA, frB
frD(ps0) = frA(ps1) frD(ps1) = frB(ps0)
ps_merge11
Register move allowing swap/merge of ps0 values.
ps_merge11 frD, frA, frB
frD(ps0) = frA(ps1) frD(ps1) = frB(ps1)
ps_sel
Single floating-point select on both ps0 and ps1.
ps_sel frD, frA, frC, frB
if(frA(ps0) >= 0) frD(ps0) = frC(ps0) else frD(ps0) = frB(ps0) if(frA(ps1) >= 0) frD(ps1) = frC(ps1) else frD(ps1) = frB(ps1)
ps_sum0
Add a ps0 value to a ps1 value, result in ps0.
ps_sum0 frD, frA, frC, frB
frD(ps0) = frA(ps0) + frB(ps1) frD(ps1) = frC(ps1)
ps_sum1
Add a ps0 value to a ps1 value, result in ps1.
ps_sum1 frD, frA, frC, frB
frD(ps0) = frC(ps0) frD(ps1) = frA(ps0) + frB(ps1)