9.7.16.2.1. Matrix Shape
The matrix multiply and accumulate operations support a limited set of shapes for the operand matrices A, B and D. The shapes of all three matrix operands are collectively described by the tuple MxNxK where A is MxK matrix, B is a KxN matrix, and D is a MxN matrix.
Table 39 shows matrix shapes that are supported for the specified types for the tcgen05.mma operation.
Table 39 Various combinations of .kind and shapes
.kind::* |
Has .ws |
CTA Group | Sparsity | dtype | atype/btype | Shapes Supported |
|---|---|---|---|---|---|---|
.kind::f16 |
No .ws |
1 | Dense | .f16 |
.f16 |
64xNxK, 128xNxK; N = {8, 16, 24, … 256} steps of 8; K = 16 |
| Dense | .f32 |
.f16, .bf16 |
64xNxK, 128xNxK; N = {8, 16, 24, … 256} steps of 8; K = 16 | |||
| Sparse | .f16 |
.f16 |
K = 32 | |||
| Sparse | .f32 |
.f16, .bf16 |
K = 32 | |||
| 2 | Dense | .f16 |
.f16 |
128xNxK, 256xNxK; N = {16, 32, … 256} steps of 16; K = 16 | ||
| Dense | .f32 |
.f16, .bf16 |
128xNxK, 256xNxK; N = {16, 32, … 256} steps of 16; K = 16 | |||
| Sparse | .f16 |
.f16 |
K = 32 | |||
| Sparse | .f32 |
.f16, .bf16 |
K = 32 | |||
.ws |
1 | Dense | .f16 |
.f16 |
32xNxK, 64xNxK, 128xNxK; N = {64, 128, 256}; K = 16 | |
| Dense | .f32 |
.f16, .bf16 |
32xNxK, 64xNxK, 128xNxK; N = {64, 128, 256}; K = 16 | |||
| Sparse | .f16 |
.f16 |
N = {64, 128}; K = 32 | |||
| Sparse | .f32 |
.f16, .bf16 |
N = {64, 128}; K = 32 | |||
| 2 | Either | .f16 |
.f16 |
Invalid | ||
| Either | .f32 |
.f16, .bf16 |
Invalid | |||
.kind::tf32 |
No .ws |
1 | Dense | .f32 |
.tf32 |
64xNxK, 128xNxK; N = {8, 16, 24, … 256} steps of 8; K = 8 |
| Sparse | K = 16 | |||||
| 2 | Dense | .f32 |
.tf32 |
128xNxK, 256xNxK; N = {16, 32, … 256} steps of 16; K = 8 | ||
| Sparse | K = 16 | |||||
.ws |
1 | Dense | .f32 |
.tf32 |
32xNxK, 64xNxK, 128xNxK; N = {64, 128, 256}; K = 8 | |
| Sparse | N = {64, 128}; K = 16 | |||||
| 2 | Dense | Invalid | ||||
| Sparse | Invalid | |||||
.kind::f8f6f4 |
No .ws |
1 | Dense | .f32 |
.f16, .e4m3, .e5m2, .e2m3, .e3m2, .e2m1 |
64xNxK, 128xNxK; N = {8, 16, … 256} steps of 8; K = 32 |
| Sparse | K = 64 | |||||
| 2 | Dense | .f32 |
.f16, .e4m3, .e5m2, .e2m3, .e3m2, .e2m1 |
128xNxK, 256xNxK; N = {16, 32, … 256} steps of 16; K = 32 | ||
| Sparse | K = 64 | |||||
.ws |
1 | Dense | .f32 |
.f16, .e4m3, .e5m2, .e2m3, .e3m2, .e2m1 |
32xNxK, 64xNxK, 128xNxK; N = {64, 128, 256}; K = 32 | |
| Sparse | N = {64, 128}; K = 64 | |||||
| 2 | Dense | Invalid | ||||
| Sparse | Invalid | |||||
.kind::mxf8f6f4 |
No .ws |
1 | Dense | .f32 |
.e4m3, .e5m2, .e2m3, .e3m2, .e2m1 x (Scale) .ue8m0 |
128xNxK; N = {8, 16, … 256} steps of 8; K = 32 |
| Sparse | K = 64 | |||||
| 2 | Dense | .f32 |
.e4m3, .e5m2, .e2m3, .e3m2, .e2m1 x (Scale) .ue8m0 |
128xNxK, 256xNxK; N = {16, 32, … 256} steps of 16; K = 32 | ||
| Sparse | 256xNxK; K = 64 | |||||
.ws |
1 | Dense | Invalid | |||
| Sparse | Invalid | |||||
| 2 | Dense | Invalid | ||||
| Sparse | Invalid | |||||
.kind::i8 |
No .ws |
1 | Dense | .s32 |
.s8, .u8 |
64xNxK, 128xNxK; N = {8, 16, 24, 32, 48, … 256} steps of 16 after N > 32; K = 32 |
| Sparse | K = 64 | |||||
| 2 | Dense | .s32 |
.s8, .u8 |
128xNxK, 256xNxK; N = {32, 64, … 256} steps of 32; K = 32 | ||
| Sparse | K = 64 | |||||
.ws |
1 | Dense | .s32 |
.s8, .u8 |
32xNxK, 64xNxK, 128xNxK; N = {64, 128, 256}; K = 32 | |
| Sparse | N = {64, 128}; K = 64 | |||||
| 2 | Dense | Invalid | ||||
| Sparse | Invalid | |||||
.kind::mxf4 |
No .ws |
1 | Dense | .f32 |
.e2m1 x (Scale) .ue8m0 |
128xNxK; N = {8, 16, … 256} steps of 8; K = 64 |
| Sparse | K = 128 | |||||
| 2 | Dense | .f32 |
.e2m1 x (Scale) .ue8m0 |
128xNxK, 256xNxK, 256xNxK1; N = {16, 32, … 256} steps of 16; K = 64, K1 = 96 | ||
| Sparse | 256xNxK; K = 128 | |||||
.ws |
1 / 2 | Either | Invalid | |||
.kind::mxf4nvf4 |
No .ws |
1 | Dense | .f32 |
.e2m1 x (Scale) .ue8m0, .ue4m3 |
128xNxK; N = {8, 16, … 256} steps of 8; K = 64 |
| Sparse | K = 128 | |||||
| 2 | Dense | .f32 |
.e2m1 x (Scale) .ue8m0, .ue4m3 |
128xNxK, 256xNxK, 256xNxK1; N = {16, 32, … 256} steps of 16; K = 64, K1 = 96 | ||
| Sparse | 256xNxK; K = 128 | |||||
.ws |
1 / 2 | Either | Invalid |
9.7.16.2.1.1. Target ISA Note
K = 96 is only supported for following architecture-specific targets:
- sm_103a.
9.7.16.2.2. Specifying Matrix Shape
M and N can be specified in the Instruction descriptor.
K can be specified explicitly if there are multiple values of K supported for a given MMA variant. Otherwise, if K can be uniquely determined as per the Table 39, then K cannot be explicitly specified.