Listen "Shared memory"
Episode Synopsis
What is shared memory? How is it used in your operating system? How is it used in PyTorch? What's shared memory good for in deep learning? Why use multiple processes rather than one process on a single node? What's the point of PyTorch's shared memory manager? How are allocators for shared memory implemented? How does CUDA shared memory work? What is the difference between CUDA shared memory and CPU shared memory? How did we implement safer CUDA shared memory?Further reading.Implementations of vanilla shared memory allocator https://github.com/pytorch/pytorch/blob/master/aten/src/TH/THAllocator.cpp and the fancy managed allocator https://github.com/pytorch/pytorch/blob/master/torch/lib/libshm/libshm.hMultiprocessing best practices describes some things one should be careful about when working with shared memory https://pytorch.org/docs/stable/notes/multiprocessing.htmlMore details on how CUDA shared memory works https://pytorch.org/docs/stable/multiprocessing.html#multiprocessing-cuda-sharing-details
More episodes of the podcast PyTorch Developer Podcast
Compiler collectives
04/08/2024
TORCH_TRACE and tlparse
29/04/2024
Higher order operators
21/04/2024
Inductor - Post-grad FX passes
12/04/2024
CUDA graph trees
24/03/2024
Min-cut partitioner
17/03/2024
AOTInductor
02/03/2024
Tensor subclasses and PT2
24/02/2024
Compiled autograd
19/02/2024
PT2 extension points
05/02/2024
ZARZA We are Zarza, the prestigious firm behind major projects in information technology.