9.7.11. Surface Instructions

This section describes PTX instructions for accessing surfaces. PTX supports the following operations on surface descriptors:

Static initialization of surface descriptors.

Module-scope and per-entry scope definitions of surface descriptors.

Ability to query fields within surface descriptors.

These instructions provide access to surface memory.

  • suld
  • sust
  • sured
  • suq

9.7.11.1. Surface Instructions: suld

suld

Load from surface memory.

Syntax

suld.b.geom{.cop}.vec.dtype.clamp  d, [a, b];  // unformatted

.geom  = { .1d, .2d, .3d, .a1d, .a2d };
.cop   = { .ca, .cg, .cs, .cv };               // cache operation
.vec   = { none, .v2, .v4 };
.dtype = { .b8 , .b16, .b32, .b64 };
.clamp = { .trap, .clamp, .zero };

Description

suld.b.{1d,2d,3d}

Load from surface memory using a surface coordinate vector. The instruction loads data from the surface named by operand a at coordinates given by operand b into destination d. Operand a is a .surfref variable or .u64 register. Operand b is a scalar or singleton tuple for 1d surfaces; is a two-element vector for 2d surfaces; and is a four-element vector for 3d surfaces, where the fourth element is ignored. Coordinate elements are of type .s32.

suld.b performs an unformatted load of binary data. The lowest dimension coordinate represents a byte offset into the surface and is not scaled, and the size of the data transfer matches the size of destination operand d.

suld.b.{a1d,a2d}

Surface layer selection, followed by a load from the selected surface. The instruction first selects a surface layer from the surface array named by operand a using the index given by the first element of the array coordinate vector b. The instruction then loads data from the selected surface at coordinates given by the remaining elements of operand b into destination d. Operand a is a .surfref variable or .u64 register. Operand b is a bit-size type vector or tuple containing an index into the array of surfaces followed by coordinates within the selected surface, as follows:

For 1d surface arrays, operand b has type .v2.b32. The first element is interpreted as an unsigned integer index (.u32) into the surface array, and the second element is interpreted as a 1d surface coordinate of type .s32.

For 2d surface arrays, operand b has type .v4.b32. The first element is interpreted as an unsigned integer index (.u32) into the surface array, and the next two elements are interpreted as 2d surface coordinates of type .s32. The fourth element is ignored.

A surface base address is assumed to be aligned to a 16 byte boundary, and the address given by the coordinate vector must be naturally aligned to a multiple of the access size. If an address is not properly aligned, the resulting behavior is undefined; i.e., the access may proceed by silently masking off low-order address bits to achieve proper rounding, or the instruction may fault.

The .clamp field specifies how to handle out-of-bounds addresses:

.trap: causes an execution trap on out-of-bounds addresses

.clamp: loads data at the nearest surface location (sized appropriately)

.zero: loads zero for out-of-bounds addresses

Indirect surface access

Beginning with PTX ISA version 3.1, indirect surface access is supported for target architecture sm_20 or higher. In indirect access, operand a is a .u64 register holding the address of a .surfref variable.

PTX ISA Notes

suld.b.trap introduced in PTX ISA version 1.5.

Additional clamp modifiers and cache operations introduced in PTX ISA version 2.0.

suld.b.3d and suld.b.{a1d,a2d} introduced in PTX ISA version 3.0.

Indirect surface access introduced in PTX ISA version 3.1.

Target ISA Notes

suld.b supported on all target architectures.

sm_1x targets support only the .trap clamping modifier.

suld.3d and suld.{a1d,a2d} require sm_20 or higher.

Indirect surface access requires sm_20 or higher.

Cache operations require sm_20 or higher.

Examples

ptx
suld.b.1d.v4.b32.trap  {s1,s2,s3,s4}, [surf_B, {x}];
suld.b.3d.v2.b64.trap  {r1,r2}, [surf_A, {x,y,z,w}];
suld.b.a1d.v2.b32      {r0,r1}, [surf_C, {idx,x}];
suld.b.a2d.b32         r0, [surf_D, {idx,x,y,z}];  // z ignored

9.7.11.2. Surface Instructions: sust

sust

Store to surface memory.

Syntax

sust.b.{1d,2d,3d}{.cop}.vec.ctype.clamp  [a, b], c;  // unformatted
sust.p.{1d,2d,3d}.vec.b32.clamp          [a, b], c;  // formatted

sust.b.{a1d,a2d}{.cop}.vec.ctype.clamp   [a, b], c;  // unformatted

.cop   = { .wb, .cg, .cs, .wt };                     // cache operation
.vec   = { none, .v2, .v4 };
.ctype = { .b8 , .b16, .b32, .b64 };
.clamp = { .trap, .clamp, .zero };

Description

sust.{1d,2d,3d}

Store to surface memory using a surface coordinate vector. The instruction stores data from operand c to the surface named by operand a at coordinates given by operand b. Operand a is a .surfref variable or .u64 register. Operand b is a scalar or singleton tuple for 1d surfaces; is a two-element vector for 2d surfaces; and is a four-element vector for 3d surfaces, where the fourth element is ignored. Coordinate elements are of type .s32.

sust.b performs an unformatted store of binary data. The lowest dimension coordinate represents a byte offset into the surface and is not scaled. The size of the data transfer matches the size of source operand c.

sust.p performs a formatted store of a vector of 32-bit data values to a surface sample. The source vector elements are interpreted left-to-right as R, G, B, and A surface components. These elements are written to the corresponding surface sample components. Source elements that do not occur in the surface sample are ignored. Surface sample components that do not occur in the source vector will be written with an unpredictable value. The lowest dimension coordinate represents a sample offset rather than a byte offset.

The source data interpretation is based on the surface sample format as follows: If the surface format contains UNORM, SNORM, or FLOAT data, then .f32 is assumed; if the surface format contains UINT data, then .u32 is assumed; if the surface format contains SINT data, then .s32 is assumed. The source data is then converted from this type to the surface sample format.

sust.b.{a1d,a2d}

Surface layer selection, followed by an unformatted store to the selected surface. The instruction first selects a surface layer from the surface array named by operand a using the index given by the first element of the array coordinate vector b. The instruction then stores the data in operand c to the selected surface at coordinates given by the remaining elements of operand b. Operand a is a .surfref variable or .u64 register. Operand b is a bit-size type vector or tuple containing an index into the array of surfaces followed by coordinates within the selected surface, as follows:

For 1d surface arrays, operand b has type .v2.b32. The first element is interpreted as an unsigned integer index (.u32) into the surface array, and the second element is interpreted as a 1d surface coordinate of type .s32.

For 2d surface arrays, operand b has type .v4.b32. The first element is interpreted as an unsigned integer index (.u32) into the surface array, and the next two elements are interpreted as 2d surface coordinates of type .s32. The fourth element is ignored.

A surface base address is assumed to be aligned to a 16 byte boundary, and the address given by the coordinate vector must be naturally aligned to a multiple of the access size. If an address is not properly aligned, the resulting behavior is undefined; i.e., the access may proceed by silently masking off low-order address bits to achieve proper rounding, or the instruction may fault.

The .clamp field specifies how to handle out-of-bounds addresses:

.trap: causes an execution trap on out-of-bounds addresses

.clamp: stores data at the nearest surface location (sized appropriately)

.zero: drops stores to out-of-bounds addresses

Indirect surface access

Beginning with PTX ISA version 3.1, indirect surface access is supported for target architecture sm_20 or higher. In indirect access, operand a is a .u64 register holding the address of a .surfref variable.

PTX ISA Notes

sust.b.trap introduced in PTX ISA version 1.5. sust.p, additional clamp modifiers, and cache operations introduced in PTX ISA version 2.0.

sust.b.3d and sust.b.{a1d,a2d} introduced in PTX ISA version 3.0.

Indirect surface access introduced in PTX ISA version 3.1.

Target ISA Notes

sust.b supported on all target architectures.

sm_1x targets support only the .trap clamping modifier.

sust.3d and sust.{a1d,a2d} require sm_20 or higher.

sust.p requires sm_20 or higher.

Indirect surface access requires sm_20 or higher.

Cache operations require sm_20 or higher.

Examples

ptx
sust.p.1d.v4.b32.trap  [surf_B, {x}], {f1,f2,f3,f4};
sust.b.3d.v2.b64.trap  [surf_A, {x,y,z,w}], {r1,r2};
sust.b.a1d.v2.b64      [surf_C, {idx,x}], {r1,r2};
sust.b.a2d.b32         [surf_D, {idx,x,y,z}], r0;  // z ignored

9.7.11.3. Surface Instructions: sured

sured

Reduce surface memory.

Syntax

sured.b.op.geom.ctype.clamp  [a,b],c; // byte addressing
sured.p.op.geom.ctype.clamp  [a,b],c; // sample addressing

.op    = { .add, .min, .max, .and, .or };
.geom  = { .1d, .2d, .3d };
.ctype = { .u32, .u64, .s32, .b32, .s64 };  // for sured.b
.ctype = { .b32, .b64 };                    // for sured.p
.clamp = { .trap, .clamp, .zero };

Description

Reduction to surface memory using a surface coordinate vector. The instruction performs a reduction operation with data from operand c to the surface named by operand a at coordinates given by operand b. Operand a is a .surfref variable or .u64 register. Operand b is a scalar or singleton tuple for 1d surfaces; is a two-element vector for 2d surfaces; and is a four-element vector for 3d surfaces, where the fourth element is ignored. Coordinate elements are of type .s32.

sured.b performs an unformatted reduction on .u32, .s32, .b32, .u64, or .s64 data. The lowest dimension coordinate represents a byte offset into the surface and is not scaled. Operation add applies to .u32, .u64, and .s32 types; min and max apply to .u32, .s32, .u64 and .s64 types; operations and and or apply to .b32 type.

sured.p performs an unformatted reduction on sample-addressed data. The lowest dimension coordinate represents a sample offset rather than a byte offset and is scaled by the data-size given by .ctype. The instruction type .b64 is restricted to min and max operations. For type .b32, the data is interpreted as .u32 or .s32 based on the surface sample format as follows: if the surface format contains UINT data, then .u32 is assumed; if the surface format contains SINT data, then .s32 is assumed. For type .b64, if the surface format contains UINT data, then .u64 is assumed; if the surface format contains SINT data, then .s64 is assumed.

A surface base address is assumed to be aligned to a 16 byte boundary, and the address given by the coordinate vector must be naturally aligned to a multiple of the access size. If an address is not properly aligned, the resulting behavior is undefined; i.e., the access may proceed by silently masking off low-order address bits to achieve proper rounding, or the instruction may fault.

The .clamp field specifies how to handle out-of-bounds addresses:

.trap: causes an execution trap on out-of-bounds addresses

.clamp: stores data at the nearest surface location (sized appropriately)

.zero: drops stores to out-of-bounds addresses

Indirect surface access

Beginning with PTX ISA version 3.1, indirect surface access is supported for target architecture sm_20 or higher. In indirect access, operand a is a .u64 register holding the address of a .surfref variable.

PTX ISA Notes

Introduced in PTX ISA version 2.0.

Indirect surface access introduced in PTX ISA version 3.1.

.u64/.s64/.b64 types with .min/.max operations introduced in PTX ISA version 8.1.

Target ISA Notes

sured requires sm_20 or higher.

Indirect surface access requires sm_20 or higher.

.u64/.s64/.b64 types with .min/.max operations requires sm_50 or higher.

Examples

ptx
sured.b.add.2d.u32.trap  [surf_A, {x,y}], r1;
sured.p.min.1d.u32.trap  [surf_B, {x}], r1;
sured.b.max.1d.u64.trap  [surf_C, {x}], r1;
sured.p.min.1d.b64.trap  [surf_D, {x}], r1;

9.7.11.4. Surface Instructions: suq

suq

Query a surface attribute.

Syntax

suq.query.b32   d, [a];

.query = { .width, .height, .depth,
           .channel_data_type, .channel_order,
           .array_size, .memory_layout };

Description

Query an attribute of a surface. Operand a is a .surfref variable or a .u64 register.

Query Returns
.width, .height, .depth value in elements
.channel_data_type Unsigned integer corresponding to source language's channel data type enumeration. If the source language combines channel data type and channel order into a single enumeration type, that value is returned for both channel_data_type and channel_order queries.
.channel_order Unsigned integer corresponding to source language's channel order enumeration. If the source language combines channel data type and channel order into a single enumeration type, that value is returned for both channel_data_type and channel_order queries.
.array_size For a surface array, number of surfaces in array, 0 otherwise.
.memory_layout 1 for surface with linear memory layout; 0 otherwise.

Indirect surface access

Beginning with PTX ISA version 3.1, indirect surface access is supported for target architecture sm_20 or higher. In indirect access, operand a is a .u64 register holding the address of a .surfref variable.

PTX ISA Notes

Introduced in PTX ISA version 1.5.

Channel data type and channel order queries added in PTX ISA version 2.1.

Indirect surface access introduced in PTX ISA version 3.1.

The .array_size query was added in PTX ISA version 4.1.

The .memory_layout query was added in PTX ISA version 4.2.

Target ISA Notes

Supported on all target architectures.

Indirect surface access requires sm_20 or higher.

Examples

ptx
suq.width.b32       %r1, [surf_A];