Listen "Just enough CUDA to be dangerous"
Episode Synopsis
Ever wanted to learn about CUDA but not sure where to start? In this sixteen minute episode I try to jam in as much CUDA knowledge as could be reasonably expected in a podcast. You won't know how to write a kernel after this episode, but you'll know about what a GPU is, what the general CUDA programming model is, why asynchronous execution makes everything complicated, and some general principles PyTorch abides by when designing CUDA kernels.Further reading:PyTorch docs on CUDA semantics https://pytorch.org/docs/stable/notes/cuda.htmlThe book I was recommended for learning CUDA when I first showed up at PyToch: Programming Massively Parallel Processors https://www.amazon.com/Programming-Massively-Parallel-Processors-Hands/dp/0128119861The environment variable that makes CUDA synchronous is CUDA_LAUNCH_BLOCKING=1. cuda-memcheck is also useful for debugging CUDA problems https://docs.nvidia.com/cuda/cuda-memcheck/index.html
More episodes of the podcast PyTorch Developer Podcast
Compiler collectives
04/08/2024
TORCH_TRACE and tlparse
29/04/2024
Higher order operators
21/04/2024
Inductor - Post-grad FX passes
12/04/2024
CUDA graph trees
24/03/2024
Min-cut partitioner
17/03/2024
AOTInductor
02/03/2024
Tensor subclasses and PT2
24/02/2024
Compiled autograd
19/02/2024
PT2 extension points
05/02/2024
ZARZA We are Zarza, the prestigious firm behind major projects in information technology.