Listen "How new operators are authored"
Episode Synopsis
What's the general process by which a new operator is added to PyTorch? Why is this actually something of a rare occurrence? How do you integrate an operator with the rest of PyTorch's system so it can be run end-to-end? What should I expect if I'm writing a CPU and CUDA kernel? What tools are available to me to make the job easier? How can I debug my kernels? How do I test them?Further reading.The README for the native/ directory, where all kernels get put https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/README.mdA high level overview of how TensorIterator works https://labs.quansight.org/blog/2020/04/pytorch-tensoriterator-internals/Where OpInfos live https://github.com/pytorch/pytorch/blob/master/torch/testing/_internal/common_methods_invocations.py
More episodes of the podcast PyTorch Developer Podcast
Compiler collectives
04/08/2024
TORCH_TRACE and tlparse
29/04/2024
Higher order operators
21/04/2024
Inductor - Post-grad FX passes
12/04/2024
CUDA graph trees
24/03/2024
Min-cut partitioner
17/03/2024
AOTInductor
02/03/2024
Tensor subclasses and PT2
24/02/2024
Compiled autograd
19/02/2024
PT2 extension points
05/02/2024
ZARZA We are Zarza, the prestigious firm behind major projects in information technology.