9.7.2. Extended-Precision Integer Arithmetic Instructions
Instructions add.cc, addc, sub.cc, subc, mad.cc and madc reference an implicitly specified condition code register (CC) having a single carry flag bit (CC.CF) holding carry-in/carry-out or borrow-in/borrow-out. These instructions support extended-precision integer addition, subtraction, and multiplication. No other instructions access the condition code, and there is no support for setting, clearing, or testing the condition code. The condition code register is not preserved across calls and is mainly intended for use in straight-line code sequences for computing extended-precision integer addition, subtraction, and multiplication.
The extended-precision arithmetic instructions are:
- add.cc, addc
- sub.cc, subc
- mad.cc, madc
9.7.2.1. Extended-Precision Arithmetic Instructions: add.cc
add.cc
Add two values with carry-out.
Syntax
add.cc.type d, a, b;
.type = { .u32, .s32, .u64, .s64 };Description
Performs integer addition and writes the carry-out value into the condition code register.
Semantics
d = a + b;
carry-out written to CC.CFNotes
No integer rounding modifiers.
No saturation.
Behavior is the same for unsigned and signed integers.
PTX ISA Notes
32-bit add.cc introduced in PTX ISA version 1.2.
64-bit add.cc introduced in PTX ISA version 4.3.
Target ISA Notes
32-bit add.cc is supported on all target architectures.
64-bit add.cc requires sm_20 or higher.
Examples
@p add.cc.u32 x1,y1,z1; // extended-precision addition of
@p addc.cc.u32 x2,y2,z2; // two 128-bit values
@p addc.cc.u32 x3,y3,z3;
@p addc.u32 x4,y4,z4;9.7.2.2. Extended-Precision Arithmetic Instructions: addc
addc
Add two values with carry-in and optional carry-out.
Syntax
addc{.cc}.type d, a, b;
.type = { .u32, .s32, .u64, .s64 };Description
Performs integer addition with carry-in and optionally writes the carry-out value into the condition code register.
Semantics
d = a + b + CC.CF;
if .cc specified, carry-out written to CC.CFNotes
No integer rounding modifiers.
No saturation.
Behavior is the same for unsigned and signed integers.
PTX ISA Notes
32-bit addc introduced in PTX ISA version 1.2.
64-bit addc introduced in PTX ISA version 4.3.
Target ISA Notes
32-bit addc is supported on all target architectures.
64-bit addc requires sm_20 or higher.
Examples
@p add.cc.u32 x1,y1,z1; // extended-precision addition of
@p addc.cc.u32 x2,y2,z2; // two 128-bit values
@p addc.cc.u32 x3,y3,z3;
@p addc.u32 x4,y4,z4;9.7.2.3. Extended-Precision Arithmetic Instructions: sub.cc
sub.cc
Subtract one value from another, with borrow-out.
Syntax
sub.cc.type d, a, b;
.type = { .u32, .s32, .u64, .s64 };Description
Performs integer subtraction and writes the borrow-out value into the condition code register.
Semantics
d = a - b;
borrow-out written to CC.CFNotes
No integer rounding modifiers.
No saturation.
Behavior is the same for unsigned and signed integers.
PTX ISA Notes
32-bit sub.cc introduced in PTX ISA version 1.2.
64-bit sub.cc introduced in PTX ISA version 4.3.
Target ISA Notes
32-bit sub.cc is supported on all target architectures.
64-bit sub.cc requires sm_20 or higher.
Examples
@p sub.cc.u32 x1,y1,z1; // extended-precision subtraction
@p subc.cc.u32 x2,y2,z2; // of two 128-bit values
@p subc.cc.u32 x3,y3,z3;
@p subc.u32 x4,y4,z4;9.7.2.4. Extended-Precision Arithmetic Instructions: subc
subc
Subtract one value from another, with borrow-in and optional borrow-out.
Syntax
subc{.cc}.type d, a, b;
.type = { .u32, .s32, .u64, .s64 };Description
Performs integer subtraction with borrow-in and optionally writes the borrow-out value into the condition code register.
Semantics
d = a - (b + CC.CF);
if .cc specified, borrow-out written to CC.CFNotes
No integer rounding modifiers.
No saturation.
Behavior is the same for unsigned and signed integers.
PTX ISA Notes
32-bit subc introduced in PTX ISA version 1.2.
64-bit subc introduced in PTX ISA version 4.3.
Target ISA Notes
32-bit subc is supported on all target architectures.
64-bit subc requires sm_20 or higher.
Examples
@p sub.cc.u32 x1,y1,z1; // extended-precision subtraction
@p subc.cc.u32 x2,y2,z2; // of two 128-bit values
@p subc.cc.u32 x3,y3,z3;
@p subc.u32 x4,y4,z4;9.7.2.5. Extended-Precision Arithmetic Instructions: mad.cc
mad.cc
Multiply two values, extract high or low half of result, and add a third value with carry-out.
Syntax
mad{.hi,.lo}.cc.type d, a, b, c;
.type = { .u32, .s32, .u64, .s64 };Description
Multiplies two values, extracts either the high or low part of the result, and adds a third value. Writes the result to the destination register and the carry-out from the addition into the condition code register.
Semantics
t = a * b;
d = t<63..32> + c; // for .hi variant
d = t<31..0> + c; // for .lo variant
carry-out from addition is written to CC.CFNotes
Generally used in combination with madc and addc to implement extended-precision multi-word multiplication. See madc for an example.
PTX ISA Notes
32-bit mad.cc introduced in PTX ISA version 3.0.
64-bit mad.cc introduced in PTX ISA version 4.3.
Target ISA Notes
Requires target sm_20 or higher.
Examples
@p mad.lo.cc.u32 d,a,b,c;
mad.lo.cc.u32 r,p,q,r;9.7.2.6. Extended-Precision Arithmetic Instructions: madc
madc
Multiply two values, extract high or low half of result, and add a third value with carry-in and optional carry-out.
Syntax
madc{.hi,.lo}{.cc}.type d, a, b, c;
.type = { .u32, .s32, .u64, .s64 };Description
Multiplies two values, extracts either the high or low part of the result, and adds a third value along with carry-in. Writes the result to the destination register and optionally writes the carry-out from the addition into the condition code register.
Semantics
t = a * b;
d = t<63..32> + c + CC.CF; // for .hi variant
d = t<31..0> + c + CC.CF; // for .lo variant
if .cc specified, carry-out from addition is written to CC.CFNotes
Generally used in combination with mad.cc and addc to implement extended-precision multi-word multiplication. See example below.
PTX ISA Notes
32-bit madc introduced in PTX ISA version 3.0.
64-bit madc introduced in PTX ISA version 4.3.
Target ISA Notes
Requires target sm_20 or higher.
Examples
// extended-precision multiply: [r3,r2,r1,r0] = [r5,r4] * [r7,r6]
mul.lo.u32 r0,r4,r6; // r0=(r4*r6).[31:0], no carry-out
mul.hi.u32 r1,r4,r6; // r1=(r4*r6).[63:32], no carry-out
mad.lo.cc.u32 r1,r5,r6,r1; // r1+=(r5*r6).[31:0], may carry-out
madc.hi.u32 r2,r5,r6,0; // r2 =(r5*r6).[63:32]+carry-in,
// no carry-out
mad.lo.cc.u32 r1,r4,r7,r1; // r1+=(r4*r7).[31:0], may carry-out
madc.hi.cc.u32 r2,r4,r7,r2; // r2+=(r4*r7).[63:32]+carry-in,
// may carry-out
addc.u32 r3,0,0; // r3 = carry-in, no carry-out
mad.lo.cc.u32 r2,r5,r7,r2; // r2+=(r5*r7).[31:0], may carry-out
madc.hi.u32 r3,r5,r7,r3; // r3+=(r5*r7).[63:32]+carry-in