NVIDIA / PTX ISA
PTX ISA v9.2
Parallel Thread Execution Instruction Set Architecture
Community Markdown reference — 92 pages covering the full PTX instruction set including warp-level MMA, wgmma (Hopper), and TensorCore Gen5 tcgen05 (Blackwell).
MCP Quickstart
Add the hosted MCP server to your AI client to let it query PTX ISA docs directly.
~/.claude/claude_desktop_config.json
{
"mcpServers": {
"ptx-isa": {
"type": "http",
"url": "https://ptx.poole.ai/mcp"
}
}
}
ptx_list
List all pages, optionally filtered by keyword
ptx_read
Read a page's full Markdown by slug
ptx_search
Full-text search across all pages
Core Chapters
12Chapter 9 — Instruction Set
6Warp-Level MMA §9.7.14
229.7.14.3
Block Scaling for mma.sync
9.7.14.2
Matrix Data-types
9.7.14.1
Matrix Shape
9.7.14.5.15
Warp-level matrix load instruction: ldmatrix
9.7.14.5
Matrix multiply-accumulate operation using mma instruction
9.7.14.5.5
Matrix Fragments for mma.m8n8k128
9.7.14.5.7
Matrix Fragments for mma.m16n8k8
9.7.14.5.9
Matrix Fragments for mma.m16n8k16 with integer type
9.7.14.5.12
Matrix Fragments for mma.m16n8k128
Examples of integer type
9.7.14.5.14
Multiply-and-Accumulate Instruction: mma
9.7.14.5.17
Warp-level matrix transpose instruction: movmatrix
9.7.14.5.16
Warp-level matrix store instruction: stmatrix
9.7.14.6.3
Multiply-and-Accumulate Instruction: mma.sp / mma.sp::ordered_metadata
9.7.14.6.2
Matrix fragments for multiply-accumulate operation with sparse matrix A
9.7.14.6.2.8
Matrix Fragments for sparse mma.m16n8k128 with .u4 / .s4 integer type
9.7.14.6
Matrix multiply-accumulate operation using mma.sp instruction with sparse matrix A
9.7.14.4.1
Matrix Fragments for WMMA
9.7.14.4.3
Warp-level Matrix Load Instruction: wmma.load
9.7.14.4.5
Warp-level Matrix Multiply-and-Accumulate Instruction: wmma.mma
9.7.14.4.2
Matrix Storage for WMMA
9.7.14.4.4
Warp-level Matrix Store Instruction: wmma.store
wgmma §9.7.15
89.7.15.4
Async Proxy
9.7.15.3
Matrix Data-types
9.7.15.2
Matrix Shape
9.7.15.5.1.2.2
Matrix Descriptor Format
9.7.15.5.1
Register Fragments and Shared Memory Matrix Layouts
9.7.15.5.1.2
Shared Memory Matrix Layout
9.7.15.6
Asynchronous Warpgroup Level Multiply-and-Accumulate Operation using wgmma.mma_async.sp instruction
9.7.15.5.2
Asynchronous Multiply-and-Accumulate Instruction: wgmma.mma_async
tcgen05 §9.7.16
89.7.16.2.3
Data Movement Shape
9.7.16.4.2
Instruction Descriptor
9.7.16.5
Issue Granularity
9.7.16.2.1
Matrix Shape
9.7.16.6
Memory Consistency Model for 5th generation of TensorCore operations
9.7.16.4.1
Shared Memory Descriptor
9.7.16.3
Major-ness supported by Strides
9.7.16.4.3
Zero-Column Mask Descriptor