9.7.16.4.3. Zero-Column Mask Descriptor
The zero-column mask descriptor is used to generate a mask that specifies which columns of B matrix will have zero value for the MMA operation regardless of the values present in the shared memory. The total size of the generated mask is N-bits.
A 0-bit in the mask specifies that values of the corresponding column in matrix B should be used for the MMA operation. A 1-bit in the mask specifies 0s must be used for the entire column for the MMA operation.
The zero-column mask descriptor is a 64-bit value in registers with the following layout:
Table 45 Zero-Column Mask descriptor layout
| Bits | Size (bits) | Field Name | Description |
|---|---|---|---|
| 0–7 | 8 | Start Count 0 (sc0) | Specifies the LSBs that must be skipped for sub-mask mask-i |
| 8–15 | 8 | Start Count 1 (sc1) | Specifies the LSBs that must be skipped for sub-mask mask-i |
| 16–23 | 8 | Start Count 2 (sc2) | Specifies the LSBs that must be skipped for sub-mask mask-i |
| 24–31 | 8 | Start Count 3 (sc3) | Specifies the LSBs that must be skipped for sub-mask mask-i |
| 32 | 1 | First Span 0 (fs0) | Specifies the starting value for sub-mask mask-i |
| 33 | 1 | First Span 1 (fs1) | Specifies the starting value for sub-mask mask-i |
| 34 | 1 | First Span 2 (fs2) | Specifies the starting value for sub-mask mask-i |
| 35 | 1 | First Span 3 (fs3) | Specifies the starting value for sub-mask mask-i |
| 36–38 | 3 | Reserved | |
| 39 | 1 | Non-Zero Mask | Value 0 indicates generated mask will have all 0s; Value 1 indicates the mask has to be generated |
| 40–47 | 8 | Skip Span | (Count of consecutive columns where B matrix is used) - 1 |
| 48–55 | 8 | Use Span | (Count of consecutive columns where 0s are used) - 1 |
| 56–61 | 6 | Column Shift | Shifts column by specified amount. Thus allows MMA on non-0 starting column. Max shift amount = 16 for M=32; Max shift amount = 32 otherwise |
The zero-column mask is made up of one or more sub-mask depending on M, as shown in the table:
| M | Zero-Column Mask breakup | Sub-masks | First Span used | Start Column used |
|---|---|---|---|---|
| 128 | Single sub-mask of size N-bits | mask0 | fs0 | sc0 |
| 64 | Two sub-masks, each with size of N/2 bits | mask0, mask1 | fs0, fs1 | sc0, sc1 |
| 32 | Four sub-masks, each with size of N/4 bits | mask0, mask1, mask2, mask3 | fs0, fs1, fs2, fs3 | sc0, sc1, sc2, sc3 |
The following table shows the coverage of the sub-masks across N-dimension:
| Sub-mask | M=128 | M=64 | M=32 |
|---|---|---|---|
| mask0 | Columns [0, N-1] | Columns [0, N/2-1] | Columns [0, N/4-1] |
| mask1 | – | Columns [N/2, N-1] | Columns [N/4, N/2-1] |
| mask2 | – | – | Columns [N/2, (N/4\*3)-1] |
| mask3 | – | – | Columns [(N/4\*3), N-1] |
The following examples shows zero-column mask descriptor and their corresponding mask generated:
Example 1: M = 128
Input zero-column mask descriptor:
| Start count | First span | Non-Zero Mask | Skip Span | Use Span | Shift |
|---|---|---|---|---|---|
| {0, 0, 0, 0} | {0, 0, 0, 0} | 0 | 4 | 3 | 0 |
Output zero-column mask: 0x0.
As Non-Zero Mask field is 0, the mask is 0x0. All the columns of the matrix B will be used for the MMA operation.
Example 2: M = 128
Input zero-column mask descriptor:
| Start count | First span | Non-Zero Mask | Skip Span | Use Span | Shift |
|---|---|---|---|---|---|
| {-, -, -, 0} | {-, -, -, 0} | 1 | 2 | 3 | 0 |
Output mask0: 0b … 111 0000 111 0000 (size = N)
Example 3: M = 64
Input zero-column mask descriptor:
| Start count {.., sc1, sc0} | First span {.., fs1, fs0} | Non-Zero Mask | Skip Span | Use Span | Shift |
|---|---|---|---|---|---|
| {-, -, 0, 0} | {-, -, 0, 1} | 1 | 2 | 3 | 0 |
Output mask0: 0b … 111 0000 111 0000 111
Output mask1: 0b … 0000 111 0000 111 0000
Example 4: M = 32
Input zero-column mask descriptor:
| Start count {sc3, sc2, sc1, sc0} | First span {fs3, fs2, fs1, fs0} | Non-Zero Mask | Skip Span | Use Span | Shift |
|---|---|---|---|---|---|
| {1, 2, 1, 0} | {0, 0, 1, 1} | 1 | 2 | 3 | 2 |
Output mask0: 0b … 0000 111 0000 111
Output mask1: 0b … 0000 111 0000 11
Output mask2: 0b … 111 0000 111 00
Output mask3: 0b … 111 0000 111 000
If N = 128 then B Matrix with columns from 2 to 129 will be used for the MMA operation, due to the shift of 2.