9.7.7. Half Precision Comparison Instructions
The comparison instructions are:
- set
- setp
9.7.7.1. Half Precision Comparison Instructions: set
set
Compare two numeric values with a relational operator, and optionally combine this result with a predicate value by applying a Boolean operator.
Syntax
set.CmpOp{.ftz}.f16.stype d, a, b;
set.CmpOp.BoolOp{.ftz}.f16.stype d, a, b, {!}c;
set.CmpOp.bf16.stype d, a, b;
set.CmpOp.BoolOp.bf16.stype d, a, b, {!}c;
set.CmpOp{.ftz}.dtype.f16 d, a, b;
set.CmpOp.BoolOp{.ftz}.dtype.f16 d, a, b, {!}c;
.dtype = { .u16, .s16, .u32, .s32}
set.CmpOp.dtype.bf16 d, a, b;
set.CmpOp.BoolOp.dtype.bf16 d, a, b, {!}c;
.dtype = { .u16, .s16, .u32, .s32}
set.CmpOp{.ftz}.dtype.f16x2 d, a, b;
set.CmpOp.BoolOp{.ftz}.dtype.f16x2 d, a, b, {!}c;
.dtype = { .f16x2, .u32, .s32}
set.CmpOp.dtype.bf16x2 d, a, b;
set.CmpOp.BoolOp.dtype.bf16x2 d, a, b, {!}c;
.dtype = { .bf16x2, .u32, .s32}
.CmpOp = { eq, ne, lt, le, gt, ge,
equ, neu, ltu, leu, gtu, geu, num, nan };
.BoolOp = { and, or, xor };
.stype = { .b16, .b32, .b64,
.u16, .u32, .u64,
.s16, .s32, .s64,
.f16, .f32, .f64};Description
Compares two numeric values and optionally combines the result with another predicate value by applying a Boolean operator.
Result of this computation is written in destination register in the following way:
If result is True,
0xffffffffis written for destination types.u32/.s32.0xffffis written for destination types.u16/.s16.1.0in target precision floating point format is written for destination type.f16,.bf16.
If result is False,
0x0is written for all integer destination types.0.0in target precision floating point format is written for destination type.f16,.bf16.
If the source type is .f16x2 or .bf16x2 then result of individual operations are packed in the 32-bit destination operand.
Operand c has type .pred.
Semantics
if (stype == .f16x2 || stype == .bf16x2) {
fA[0] = a[0:15];
fA[1] = a[16:31];
fB[0] = b[0:15];
fB[1] = b[16:31];
t[0] = (fA[0] CmpOp fB[0]) ? 1 : 0;
t[1] = (fA[1] CmpOp fB[1]) ? 1 : 0;
if (dtype == .f16x2 || stype == .bf16x2) {
for (i = 0; i < 2; i++) {
d[i] = BoolOp(t[i], c) ? 1.0 : 0.0;
}
} else {
for (i = 0; i < 2; i++) {
d[i] = BoolOp(t[i], c) ? 0xffff : 0;
}
}
} else if (dtype == .f16 || stype == .bf16) {
t = (a CmpOp b) ? 1 : 0;
d = BoolOp(t, c) ? 1.0 : 0.0;
} else { // Integer destination type
trueVal = (isU16(dtype) || isS16(dtype)) ? 0xffff : 0xffffffff;
t = (a CmpOp b) ? 1 : 0;
d = BoolOp(t, c) ? trueVal : 0;
}Floating Point Notes
The ordered comparisons are eq, ne, lt, le, gt, ge. If either operand is NaN, the result is False.
To aid comparison operations in the presence of NaN values, unordered versions are included: equ, neu, ltu, leu, gtu, geu. If both operands are numeric values (not NaN), then these comparisons have the same result as their ordered counterparts. If either operand is NaN, then the result of these comparisons is True.
num returns True if both operands are numeric values (not NaN), and nan returns True if either operand is NaN.
Subnormal numbers: By default, subnormal numbers are supported.
When .ftz modifier is specified then subnormal inputs and results are flushed to sign preserving zero.
PTX ISA Notes
Introduced in PTX ISA version 4.2.
set.{u16, u32, s16, s32}.f16 and set.{u32, s32}.f16x2 are introduced in PTX ISA version 6.5.
set.{u16, u32, s16, s32}.bf16, set.{u32, s32, bf16x2}.bf16x2, set.bf16.{s16,u16,f16,b16,s32,u32,f32,b32,s64,u64,f64,b64} are introduced in PTX ISA version 7.8.
Target ISA Notes
Requires sm_53 or higher.
set.{u16, u32, s16, s32}.bf16, set.{u32, s32, bf16x2}.bf16x2, set.bf16.{s16,u16,f16,b16,s32,u32,f32,b32,s64,u64,f64,b64} require sm_90 or higher.
Examples
set.lt.and.f16.f16 d,a,b,r;
set.eq.f16x2.f16x2 d,i,n;
set.eq.u32.f16x2 d,i,n;
set.lt.and.u16.f16 d,a,b,r;
set.ltu.or.bf16.f16 d,u,v,s;
set.equ.bf16x2.bf16x2 d,j,m;
set.geu.s32.bf16x2 d,j,m;
set.num.xor.s32.bf16 d,u,v,s;9.7.7.2. Half Precision Comparison Instructions: setp
setp
Compare two numeric values with a relational operator, and optionally combine this result with a predicate value by applying a Boolean operator.
Syntax
setp.CmpOp{.ftz}.f16 p, a, b;
setp.CmpOp.BoolOp{.ftz}.f16 p, a, b, {!}c;
setp.CmpOp{.ftz}.f16x2 p|q, a, b;
setp.CmpOp.BoolOp{.ftz}.f16x2 p|q, a, b, {!}c;
setp.CmpOp.bf16 p, a, b;
setp.CmpOp.BoolOp.bf16 p, a, b, {!}c;
setp.CmpOp.bf16x2 p|q, a, b;
setp.CmpOp.BoolOp.bf16x2 p|q, a, b, {!}c;
.CmpOp = { eq, ne, lt, le, gt, ge,
equ, neu, ltu, leu, gtu, geu, num, nan };
.BoolOp = { and, or, xor };Description
Compares two values and combines the result with another predicate value by applying a Boolean operator. This result is written to the destination operand.
Operand c, p and q has type .pred.
For instruction type .f16, operands a and b have type .b16 or .f16.
For instruction type .f16x2, operands a and b have type .b32.
For instruction type .bf16, operands a and b have type .b16.
For instruction type .bf16x2, operands a and b have type .b32.
Semantics
if (type == .f16 || type == .bf16) {
t = (a CmpOp b) ? 1 : 0;
p = BoolOp(t, c);
} else if (type == .f16x2 || type == .bf16x2) {
fA[0] = a[0:15];
fA[1] = a[16:31];
fB[0] = b[0:15];
fB[1] = b[16:31];
t[0] = (fA[0] CmpOp fB[0]) ? 1 : 0;
t[1] = (fA[1] CmpOp fB[1]) ? 1 : 0;
p = BoolOp(t[0], c);
q = BoolOp(t[1], c);
}Floating Point Notes
The ordered comparisons are eq, ne, lt, le, gt, ge. If either operand is NaN, the result is False.
To aid comparison operations in the presence of NaN values, unordered versions are included: equ, neu, ltu, leu, gtu, geu. If both operands are numeric values (not NaN), then these comparisons have the same result as their ordered counterparts. If either operand is NaN, then the result of these comparisons is True.
num returns True if both operands are numeric values (not NaN), and nan returns True if either operand is NaN.
Subnormal numbers: By default, subnormal numbers are supported.
setp.ftz.{f16,f16x2} flushes subnormal inputs to sign-preserving zero.
PTX ISA Notes
Introduced in PTX ISA version 4.2.
setp.{bf16/bf16x2} introduced in PTX ISA version 7.8.
Target ISA Notes
Requires sm_53 or higher.
setp.{bf16/bf16x2} requires sm_90 or higher.
Examples
setp.lt.and.f16x2 p|q,a,b,r;
@q setp.eq.f16 p,i,n;
setp.gt.or.bf16x2 u|v,c,d,s;
@q setp.eq.bf16 u,j,m;