Difference between revisions of "Paired single"
m (more typo :[) |
(Added complex multiplication) |
||
Line 50: | Line 50: | ||
=== ps_abs === | === ps_abs === | ||
ps_abs frD, frB | ps_abs frD, frB | ||
− | + | ||
+ | frD(ps0) = abs(frB(ps0)) | ||
+ | frD(ps1) = abs(frB(ps1)) | ||
+ | |||
=== ps_mr === | === ps_mr === | ||
ps_mr frD, frB | ps_mr frD, frB | ||
− | + | ||
+ | frD(ps0) = frB(ps0) | ||
+ | frD(ps1) = frB(ps1) | ||
+ | |||
=== ps_nabs === | === ps_nabs === | ||
ps_nabs frD, frB | ps_nabs frD, frB | ||
− | + | ||
+ | frD(ps0) = -abs(frB(ps0)) | ||
+ | frD(ps1) = -abs(frB(ps1)) | ||
+ | |||
=== ps_neg === | === ps_neg === | ||
ps_neg frD, frB | ps_neg frD, frB | ||
− | + | ||
+ | frD(ps0) = -frB(ps0) | ||
+ | frD(ps1) = -frB(ps1) | ||
+ | |||
=== ps_res === | === ps_res === | ||
ps_res frD, frB | ps_res frD, frB | ||
− | + | ||
+ | frD(ps0) = -1/frB(ps0) | ||
+ | frD(ps1) = -1/frB(ps1) | ||
+ | Accurate to a precision of 1/4096. | ||
+ | |||
=== ps_rsqrte === | === ps_rsqrte === | ||
ps_rsqrte frD, frB | ps_rsqrte frD, frB | ||
− | + | ||
+ | frD(ps0) = -1/sqrt(frB(ps0)) | ||
+ | frD(ps1) = -1/sqrt(frB(ps1)) | ||
+ | Accurate to a precision of 1/4096. | ||
== Basic Math == | == Basic Math == | ||
Line 71: | Line 90: | ||
=== ps_add === | === ps_add === | ||
ps_add frD, frA, frB | ps_add frD, frA, frB | ||
− | + | ||
+ | frD(ps0) = frA(ps0) + frB(ps0) | ||
+ | frD(ps1) = frA(ps1) + frB(ps1) | ||
+ | |||
=== ps_div === | === ps_div === | ||
ps_div frD, frA, frB | ps_div frD, frA, frB | ||
− | + | ||
+ | frD(ps0) = frA(ps0) / frB(ps0) | ||
+ | frD(ps1) = frA(ps1) / frB(ps1) | ||
+ | |||
=== ps_mul === | === ps_mul === | ||
ps_mul frD, frA, frC | ps_mul frD, frA, frC | ||
− | + | ||
+ | frD(ps0) = frA(ps0) * frC(ps0) | ||
+ | frD(ps1) = frA(ps1) * frC(ps1) | ||
+ | |||
=== ps_sub === | === ps_sub === | ||
ps_sub frD, frA, frB | ps_sub frD, frA, frB | ||
− | + | ||
+ | frD(ps0) = frA(ps0) - frB(ps0) | ||
+ | frD(ps1) = frA(ps1) - frB(ps1) | ||
+ | |||
+ | == Comparison == | ||
+ | === ps_cmpo0 === | ||
+ | ps_cmpo0 crfD, frA, frB | ||
+ | ps_cmpu0 crfD, frA, frB | ||
+ | |||
+ | cfrD = frA(ps0) compare frB(ps0) | ||
+ | |||
+ | === ps_cmpo1 === | ||
+ | ps_cmpo1 crfD, frA, frB | ||
+ | ps_cmpu1 crfD, frA, frB | ||
+ | |||
+ | cfrD = frA(ps1) compare frB(ps1) | ||
+ | |||
+ | == Complex Multiply == | ||
+ | These instructions multiply in complex ways | ||
+ | === ps_madd === | ||
+ | ps_madd frD, frA, frC, frB | ||
+ | |||
+ | frD(ps0) = frA(ps0) * frC(ps0) + frB(ps0) | ||
+ | frD(ps1) = frA(ps1) * frC(ps1) + frB(ps1) | ||
+ | |||
+ | === ps_madds0 === | ||
+ | ps_madds0 frD, frA, frC, frB | ||
+ | |||
+ | frD(ps0) = frA(ps0) * frC(ps0) + frB(ps0) | ||
+ | frD(ps1) = frA(ps1) * frC(ps0) + frB(ps1) | ||
+ | |||
+ | === ps_madds1 === | ||
+ | ps_madds1 frD, frA, frC, frB | ||
+ | |||
+ | frD(ps0) = frA(ps0) * frC(ps1) + frB(ps0) | ||
+ | frD(ps1) = frA(ps1) * frC(ps1) + frB(ps1) | ||
+ | |||
+ | === ps_msub === | ||
+ | ps_msub frD, frA, frC, frB | ||
+ | |||
+ | frD(ps0) = frA(ps0) * frC(ps0) - frB(ps0) | ||
+ | frD(ps1) = frA(ps1) * frC(ps1) - frB(ps1) | ||
+ | |||
+ | === ps_muls0 === | ||
+ | ps_muls0 frD, frA, frC | ||
+ | |||
+ | frD(ps0) = frA(ps0) * frC(ps0) | ||
+ | frD(ps1) = frA(ps1) * frC(ps0) | ||
+ | |||
+ | === ps_muls1 === | ||
+ | ps_muls1 frD, frA, frC | ||
+ | |||
+ | frD(ps0) = frA(ps0) * frC(ps1) | ||
+ | frD(ps1) = frA(ps1) * frC(ps1) | ||
+ | |||
+ | === ps_nmadd === | ||
+ | ps_nmadd frD, frA, frC, frB | ||
+ | |||
+ | frD(ps0) = -(frA(ps0) * frC(ps0) + frB(ps0)) | ||
+ | frD(ps1) = -(frA(ps1) * frC(ps1) + frB(ps1)) | ||
+ | |||
+ | === ps_nmsub === | ||
+ | ps_nmsub frD, frA, frC, frB | ||
+ | |||
+ | frD(ps0) = -(frA(ps0) * frC(ps0) - frB(ps0)) | ||
+ | frD(ps1) = -(frA(ps1) * frC(ps1) - frB(ps1)) |
Revision as of 22:42, 10 July 2010
Paired singles are a unique part of the Gekko/Broadway processors used in the Gamecube and Wii. They provide fast vector math by keeping two single-precision floating point numbers in a single floating point register, and doing math across registers. This page will demonstrate how these instructions are to be used.
Quantization and Dequantization
All numbers must be quantized before being put into Paired Singles. For conversion from non-floats, in order to allow for greater flexibility, there is a form of scaling implemented. All quantization is controlled by the GQRs (Graphics Quantization Registers). The GQRs are 32bit registers containing the conversion types and scaling factors for storing and loading. (During loading, it dequantizes. During storing, it quantizes.)
GQR | ||||||||||||||||
31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | |
Access | U | R/W | U | R/W | ||||||||||||
Field | L_Scale | L_Type | ||||||||||||||
15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | |
Access | U | R/W | U | R/W | ||||||||||||
Field | S_Scale | S_Type |
Field | Description |
L_* | Values for dequantization. |
S_* | Values for quantization. |
Scale | Signed. During dequantization divide the number by (2^scale). During quantization, multiply the number by (2^scale). |
Type | 0: Float (this does no scaling during de/quantization), 4: Unsigned 8bit, 5: Unsigned 16bit, 6: Signed 8bit, 7: Signed 16bit. |
Loading and Storing
To load and store Paired-singles, one must use the psq_l and psq_st instructions respectively, or one of their variants.
psq_l
psq_l frD, d(rA), W, I
This instruction dequantizes values from the memory address in d+(rA|0) and puts them into PS0 and PS1 in frD. If W is 1, however, it only dequantizes one number, and places that into PS0. PS1 is loaded with 1.0 always when W is 1. I specifies the GQR to use for dequantization parameters. The two numbers read from the memory are directly after each other, regardless of size (for example, if the GQR specified to load as a u16, you would have d+(rA|0) point to a two-element array of u16s)
psq_lx
psq_lx frD, rA, rB, W, I
This instruction acts exactly like psq_l, except instead of (rA) being offset by d, it is offset by (rB).
psq_lu
psq_lu frD, d(rA), W, I
This instruction acts exactly like psq_l, except rA cannot be 0, and d+(rA) is placed back into rA.
psq_lux
psq_lux frD, rA, rB, W, I
This instruction acts exactly like psq_lx, except rA cannot be 0, and d+(rA) is placed back into rA.
psq_st
psq_st frD, d(rA), W, I
This instruction quantizes values from the Paired Singles in frD and places them in the memory address in d+(rA|0). If W is 1, however, it only quantizes PS0. I specifies the GQR to use for dequantization parameters. The two numbers written to memory are directly after each other, regardless of size (for example, if the GQR specified to store as a u16, d+(rA|0) would be treated as a two-element array of u16s)
psq_stx
psq_stx frD, rA, rB, W, I
This instruction acts exactly like psq_st, except instead of (rA) being offset by d, it is offset by (rB).
psq_stu
psq_stu frD, d(rA), W, I
This instruction acts exactly like psq_st, except rA cannot be 0, and d+(rA) is placed back into rA.
psq_stux
psq_stux frD, rA, rB, W, I
This instruction acts exactly like psq_stx, except rA cannot be 0, and d+(rA) is placed back into rA.
Single Parameter Operations
These functions operate on one FPR.
ps_abs
ps_abs frD, frB
frD(ps0) = abs(frB(ps0)) frD(ps1) = abs(frB(ps1))
ps_mr
ps_mr frD, frB
frD(ps0) = frB(ps0) frD(ps1) = frB(ps1)
ps_nabs
ps_nabs frD, frB
frD(ps0) = -abs(frB(ps0)) frD(ps1) = -abs(frB(ps1))
ps_neg
ps_neg frD, frB
frD(ps0) = -frB(ps0) frD(ps1) = -frB(ps1)
ps_res
ps_res frD, frB
frD(ps0) = -1/frB(ps0) frD(ps1) = -1/frB(ps1)
Accurate to a precision of 1/4096.
ps_rsqrte
ps_rsqrte frD, frB
frD(ps0) = -1/sqrt(frB(ps0)) frD(ps1) = -1/sqrt(frB(ps1))
Accurate to a precision of 1/4096.
Basic Math
Simple everyday math.
ps_add
ps_add frD, frA, frB
frD(ps0) = frA(ps0) + frB(ps0) frD(ps1) = frA(ps1) + frB(ps1)
ps_div
ps_div frD, frA, frB
frD(ps0) = frA(ps0) / frB(ps0) frD(ps1) = frA(ps1) / frB(ps1)
ps_mul
ps_mul frD, frA, frC
frD(ps0) = frA(ps0) * frC(ps0) frD(ps1) = frA(ps1) * frC(ps1)
ps_sub
ps_sub frD, frA, frB
frD(ps0) = frA(ps0) - frB(ps0) frD(ps1) = frA(ps1) - frB(ps1)
Comparison
ps_cmpo0
ps_cmpo0 crfD, frA, frB ps_cmpu0 crfD, frA, frB
cfrD = frA(ps0) compare frB(ps0)
ps_cmpo1
ps_cmpo1 crfD, frA, frB ps_cmpu1 crfD, frA, frB
cfrD = frA(ps1) compare frB(ps1)
Complex Multiply
These instructions multiply in complex ways
ps_madd
ps_madd frD, frA, frC, frB
frD(ps0) = frA(ps0) * frC(ps0) + frB(ps0) frD(ps1) = frA(ps1) * frC(ps1) + frB(ps1)
ps_madds0
ps_madds0 frD, frA, frC, frB
frD(ps0) = frA(ps0) * frC(ps0) + frB(ps0) frD(ps1) = frA(ps1) * frC(ps0) + frB(ps1)
ps_madds1
ps_madds1 frD, frA, frC, frB
frD(ps0) = frA(ps0) * frC(ps1) + frB(ps0) frD(ps1) = frA(ps1) * frC(ps1) + frB(ps1)
ps_msub
ps_msub frD, frA, frC, frB
frD(ps0) = frA(ps0) * frC(ps0) - frB(ps0) frD(ps1) = frA(ps1) * frC(ps1) - frB(ps1)
ps_muls0
ps_muls0 frD, frA, frC
frD(ps0) = frA(ps0) * frC(ps0) frD(ps1) = frA(ps1) * frC(ps0)
ps_muls1
ps_muls1 frD, frA, frC
frD(ps0) = frA(ps0) * frC(ps1) frD(ps1) = frA(ps1) * frC(ps1)
ps_nmadd
ps_nmadd frD, frA, frC, frB
frD(ps0) = -(frA(ps0) * frC(ps0) + frB(ps0)) frD(ps1) = -(frA(ps1) * frC(ps1) + frB(ps1))
ps_nmsub
ps_nmsub frD, frA, frC, frB
frD(ps0) = -(frA(ps0) * frC(ps0) - frB(ps0)) frD(ps1) = -(frA(ps1) * frC(ps1) - frB(ps1))