NVIDIA / PTX ISA

PTX ISA v9.2

Parallel Thread Execution Instruction Set Architecture

Community Markdown reference — 98 pages covering the full PTX instruction set including warp-level MMA, wgmma (Hopper), and TensorCore Gen5 tcgen05 (Blackwell).

MCP Quickstart

Add the hosted MCP server to your AI client to let it query PTX ISA docs directly.

~/.claude/claude_desktop_config.json

{
  "mcpServers": {
    "ptx-isa": {
      "type": "http",
      "url": "https://ptx.poole.ai/mcp"
    }
  }
}

list_pages

List all pages, optionally filtered by keyword

read_page

Read a page's full Markdown by slug

search

Relevance-ranked full-text search with contextual snippets

Core Chapters

12

Programming Model

PTX Machine Model

State Spaces, Types, and Variables

Texture Sampler and Surface Types

im2col::w and im2col::w::128 modes

Instruction Operands

Abstracting the ABI

Memory Consistency Model

Chapter 9 — Instruction Set

6

Extended-Precision Integer Arithmetic Instructions

Mixed Precision Floating-Point Instructions

Comparison and Selection Instructions

Half Precision Comparison Instructions

Surface Instructions

Control Flow Instructions

Warp-Level MMA §9.7.14

28

Block Scaling for mma.sync

Matrix Data-types

Warp-level matrix load instruction: ldmatrix

Matrix Fragments for mma.m16n8k128

Matrix Fragments for mma.m16n8k16

Matrix Fragments for mma.m16n8k256

Matrix Fragments for mma.m16n8k32

Matrix Fragments for mma.m16n8k4

Matrix Fragments for mma.m16n8k64

Matrix Fragments for mma.m16n8k8

Matrix Fragments for mma.m8n8k128

Matrix Fragments for mma.m8n8k16

Matrix Fragments for mma.m8n8k32

Matrix Fragments for mma.m8n8k4

Examples of integer type

Multiply-and-Accumulate Instruction: mma

Warp-level matrix transpose instruction: movmatrix

Warp-level matrix store instruction: stmatrix

Multiply-and-Accumulate Instruction: mma.sp / mma.sp::ordered_metadata

Matrix fragments for multiply-accumulate operation with sparse matrix A

Matrix Fragments for sparse mma.m16n8k128 with .u4 / .s4 integer type

Matrix multiply-accumulate operation using mma.sp instruction with sparse matrix A

Matrix Fragments for WMMA

Warp-level Matrix Load Instruction: wmma.load

Warp-level Matrix Multiply-and-Accumulate Instruction: wmma.mma

Matrix Storage for WMMA

Warp-level Matrix Store Instruction: wmma.store

wgmma §9.7.15

8

Matrix Data-types

Matrix Descriptor Format

Register Fragments and Shared Memory Matrix Layouts

Shared Memory Matrix Layout

Asynchronous Warpgroup Level Multiply-and-Accumulate Operation using wgmma.mma_async.sp instruction

Asynchronous Multiply-and-Accumulate Instruction: wgmma.mma_async

tcgen05 §9.7.16

8

Data Movement Shape

Instruction Descriptor

Issue Granularity

Memory Consistency Model for 5th generation of TensorCore operations

Shared Memory Descriptor

Major-ness supported by Strides

Zero-Column Mask Descriptor