9.7.16.2.1. Matrix Shape

The matrix multiply and accumulate operations support a limited set of shapes for the operand matrices A, B and D. The shapes of all three matrix operands are collectively described by the tuple MxNxK where A is MxK matrix, B is a KxN matrix, and D is a MxN matrix.

Table 39 shows matrix shapes that are supported for the specified types for the tcgen05.mma operation.

Table 39 Various combinations of .kind and shapes

`.kind::*`	Has `.ws`	CTA Group	Sparsity	dtype	atype/btype	Shapes Supported
`.kind::f16`	No `.ws`	1	Dense	`.f16`	`.f16`	64xNxK, 128xNxK; N = {8, 16, 24, … 256} steps of 8; K = 16
			Dense	`.f32`	`.f16`, `.bf16`	64xNxK, 128xNxK; N = {8, 16, 24, … 256} steps of 8; K = 16
			Sparse	`.f16`	`.f16`	K = 32
			Sparse	`.f32`	`.f16`, `.bf16`	K = 32
		2	Dense	`.f16`	`.f16`	128xNxK, 256xNxK; N = {16, 32, … 256} steps of 16; K = 16
			Dense	`.f32`	`.f16`, `.bf16`	128xNxK, 256xNxK; N = {16, 32, … 256} steps of 16; K = 16
			Sparse	`.f16`	`.f16`	K = 32
			Sparse	`.f32`	`.f16`, `.bf16`	K = 32
	`.ws`	1	Dense	`.f16`	`.f16`	32xNxK, 64xNxK, 128xNxK; N = {64, 128, 256}; K = 16
			Dense	`.f32`	`.f16`, `.bf16`	32xNxK, 64xNxK, 128xNxK; N = {64, 128, 256}; K = 16
			Sparse	`.f16`	`.f16`	N = {64, 128}; K = 32
			Sparse	`.f32`	`.f16`, `.bf16`	N = {64, 128}; K = 32
		2	Either	`.f16`	`.f16`	Invalid
			Either	`.f32`	`.f16`, `.bf16`	Invalid
`.kind::tf32`	No `.ws`	1	Dense	`.f32`	`.tf32`	64xNxK, 128xNxK; N = {8, 16, 24, … 256} steps of 8; K = 8
			Sparse			K = 16
		2	Dense	`.f32`	`.tf32`	128xNxK, 256xNxK; N = {16, 32, … 256} steps of 16; K = 8
			Sparse			K = 16
	`.ws`	1	Dense	`.f32`	`.tf32`	32xNxK, 64xNxK, 128xNxK; N = {64, 128, 256}; K = 8
			Sparse			N = {64, 128}; K = 16
		2	Dense			Invalid
			Sparse			Invalid
`.kind::f8f6f4`	No `.ws`	1	Dense	`.f32`	`.f16`, `.e4m3`, `.e5m2`, `.e2m3`, `.e3m2`, `.e2m1`	64xNxK, 128xNxK; N = {8, 16, … 256} steps of 8; K = 32
			Sparse			K = 64
		2	Dense	`.f32`	`.f16`, `.e4m3`, `.e5m2`, `.e2m3`, `.e3m2`, `.e2m1`	128xNxK, 256xNxK; N = {16, 32, … 256} steps of 16; K = 32
			Sparse			K = 64
	`.ws`	1	Dense	`.f32`	`.f16`, `.e4m3`, `.e5m2`, `.e2m3`, `.e3m2`, `.e2m1`	32xNxK, 64xNxK, 128xNxK; N = {64, 128, 256}; K = 32
			Sparse			N = {64, 128}; K = 64
		2	Dense			Invalid
			Sparse			Invalid
`.kind::mxf8f6f4`	No `.ws`	1	Dense	`.f32`	`.e4m3`, `.e5m2`, `.e2m3`, `.e3m2`, `.e2m1` x (Scale) `.ue8m0`	128xNxK; N = {8, 16, … 256} steps of 8; K = 32
			Sparse			K = 64
		2	Dense	`.f32`	`.e4m3`, `.e5m2`, `.e2m3`, `.e3m2`, `.e2m1` x (Scale) `.ue8m0`	128xNxK, 256xNxK; N = {16, 32, … 256} steps of 16; K = 32
			Sparse		256xNxK; K = 64
	`.ws`	1	Dense			Invalid
			Sparse			Invalid
		2	Dense			Invalid
			Sparse			Invalid
`.kind::i8`	No `.ws`	1	Dense	`.s32`	`.s8`, `.u8`	64xNxK, 128xNxK; N = {8, 16, 24, 32, 48, … 256} steps of 16 after N > 32; K = 32
			Sparse			K = 64
		2	Dense	`.s32`	`.s8`, `.u8`	128xNxK, 256xNxK; N = {32, 64, … 256} steps of 32; K = 32
			Sparse			K = 64
	`.ws`	1	Dense	`.s32`	`.s8`, `.u8`	32xNxK, 64xNxK, 128xNxK; N = {64, 128, 256}; K = 32
			Sparse			N = {64, 128}; K = 64
		2	Dense			Invalid
			Sparse			Invalid
`.kind::mxf4`	No `.ws`	1	Dense	`.f32`	`.e2m1` x (Scale) `.ue8m0`	128xNxK; N = {8, 16, … 256} steps of 8; K = 64
			Sparse			K = 128
		2	Dense	`.f32`	`.e2m1` x (Scale) `.ue8m0`	128xNxK, 256xNxK, 256xNxK1; N = {16, 32, … 256} steps of 16; K = 64, K1 = 96
			Sparse		256xNxK; K = 128
	`.ws`	1 / 2	Either			Invalid
`.kind::mxf4nvf4`	No `.ws`	1	Dense	`.f32`	`.e2m1` x (Scale) `.ue8m0`, `.ue4m3`	128xNxK; N = {8, 16, … 256} steps of 8; K = 64
			Sparse			K = 128
		2	Dense	`.f32`	`.e2m1` x (Scale) `.ue8m0`, `.ue4m3`	128xNxK, 256xNxK, 256xNxK1; N = {16, 32, … 256} steps of 16; K = 64, K1 = 96
			Sparse		256xNxK; K = 128
	`.ws`	1 / 2	Either			Invalid

9.7.16.2.1.1. Target ISA Note

K = 96 is only supported for following architecture-specific targets:

sm_103a.

9.7.16.2.2. Specifying Matrix Shape

M and N can be specified in the Instruction descriptor.

K can be specified explicitly if there are multiple values of K supported for a given MMA variant. Otherwise, if K can be uniquely determined as per the Table 39, then K cannot be explicitly specified.