PTX ISA v9.2

9.7.14.5. Matrix Fragments for mma.m8n8k16

A warp executing mma.m8n8k16 will compute an MMA operation of shape .m8n8k16.

Elements of the matrix are distributed across the threads in a warp so each thread of the warp holds a fragment of the matrix.

Multiplicand A:

.atype Fragment Elements (low to high)
.s8 / .u8 A vector expression containing a single .b32 register, containing four .s8 or .u8 elements from the matrix A. a0, a1, a2, a3

The layout of the fragments held by different threads is shown in Figure 56.

MMA .m8n8k16 fragment layout for matrix A with .u8/.s8 type

Figure 56 MMA .m8n8k16 fragment layout for matrix A with .u8/.s8 type

The row and column of a matrix fragment can be computed as:

groupID           = %laneid >> 2
threadID_in_group = %laneid % 4

row = groupID

col =  (threadID_in_group * 4) + i       for ai    where i = {0,..,3}

Multiplicand B:

.btype Fragment Elements (low to high)
.s8 / .u8 A vector expression containing a single .b32 register, containing four .s8 or .u8 elements from the matrix B. b0, b1, b2, b3

The layout of the fragments held by different threads is shown in Figure 57.

MMA .m8n8k16 fragment layout for matrix B with .u8/.s8 type

Figure 57 MMA .m8n8k16 fragment layout for matrix B with .u8/.s8 type

The row and column of a matrix fragment can be computed as:

groupID           = %laneid >> 2
threadID_in_group = %laneid % 4

row =  (threadID_in_group * 4) + i         for bi    where i = {0,..,3}

col =    groupID

Accumulators (C or D):

.ctype / .dtype Fragment Elements (low to high)
.s32 A vector expression containing of two .s32 registers. c0, c1

The layout of the fragments held by different threads is shown in Figure 58.

MMA .m8n8k16 fragment layout for accumulator matrix C/D with .s32 type

Figure 58 MMA .m8n8k16 fragment layout for accumulator matrix C/D with .s32 type

The row and column of a matrix fragment can be computed as:

groupID           = %laneid >> 2
threadID_in_group = %laneid % 4

row = groupID

col = (threadID_in_group * 2) + i         for ci    where i = {0, 1}
esc
Type to search across all documentation