NVIDIA / PTX ISA
PTX ISA v9.2
Parallel Thread Execution Instruction Set Architecture
Community Markdown reference — 98 pages covering the full PTX instruction set including warp-level MMA, wgmma (Hopper), and TensorCore Gen5 tcgen05 (Blackwell).
MCP Quickstart
Add the hosted MCP server to your AI client to let it query PTX ISA docs directly.
~/.claude/claude_desktop_config.json
{
"mcpServers": {
"ptx-isa": {
"type": "http",
"url": "https://ptx.poole.ai/mcp"
}
}
}
list_pages
List all pages, optionally filtered by keyword
read_page
Read a page's full Markdown by slug
search
Relevance-ranked full-text search with contextual snippets
Core Chapters
12Chapter 9 — Instruction Set
6Warp-Level MMA §9.7.14
289.7.14.3
Block Scaling for mma.sync
9.7.14.2
Matrix Data-types
9.7.14.1
Matrix Shape
9.7.14.5.15
Warp-level matrix load instruction: ldmatrix
9.7.14.5
Matrix Fragments for mma.m16n8k128
9.7.14.5
Matrix Fragments for mma.m16n8k16
9.7.14.5
Matrix Fragments for mma.m16n8k256
9.7.14.5
Matrix Fragments for mma.m16n8k32
9.7.14.5
Matrix Fragments for mma.m16n8k4
9.7.14.5
Matrix Fragments for mma.m16n8k64
9.7.14.5
Matrix Fragments for mma.m16n8k8
9.7.14.5
Matrix Fragments for mma.m8n8k128
9.7.14.5
Matrix Fragments for mma.m8n8k16
9.7.14.5
Matrix Fragments for mma.m8n8k32
9.7.14.5
Matrix Fragments for mma.m8n8k4
Examples of integer type
9.7.14.5.14
Multiply-and-Accumulate Instruction: mma
9.7.14.5.17
Warp-level matrix transpose instruction: movmatrix
9.7.14.5.16
Warp-level matrix store instruction: stmatrix
9.7.14.6.3
Multiply-and-Accumulate Instruction: mma.sp / mma.sp::ordered_metadata
9.7.14.6.2
Matrix fragments for multiply-accumulate operation with sparse matrix A
9.7.14.6.2.8
Matrix Fragments for sparse mma.m16n8k128 with .u4 / .s4 integer type
9.7.14.6
Matrix multiply-accumulate operation using mma.sp instruction with sparse matrix A
9.7.14.4.1
Matrix Fragments for WMMA
9.7.14.4.3
Warp-level Matrix Load Instruction: wmma.load
9.7.14.4.5
Warp-level Matrix Multiply-and-Accumulate Instruction: wmma.mma
9.7.14.4.2
Matrix Storage for WMMA
9.7.14.4.4
Warp-level Matrix Store Instruction: wmma.store
wgmma §9.7.15
89.7.15.4
Async Proxy
9.7.15.3
Matrix Data-types
9.7.15.2
Matrix Shape
9.7.15.5.1.2.2
Matrix Descriptor Format
9.7.15.5.1
Register Fragments and Shared Memory Matrix Layouts
9.7.15.5.1.2
Shared Memory Matrix Layout
9.7.15.6
Asynchronous Warpgroup Level Multiply-and-Accumulate Operation using wgmma.mma_async.sp instruction
9.7.15.5.2
Asynchronous Multiply-and-Accumulate Instruction: wgmma.mma_async
tcgen05 §9.7.16
89.7.16.2.3
Data Movement Shape
9.7.16.4.2
Instruction Descriptor
9.7.16.5
Issue Granularity
9.7.16.2.1
Matrix Shape
9.7.16.6
Memory Consistency Model for 5th generation of TensorCore operations
9.7.16.4.1
Shared Memory Descriptor
9.7.16.3
Major-ness supported by Strides
9.7.16.4.3
Zero-Column Mask Descriptor