Listen "torch.nn"
Episode Synopsis
What goes into the implementation of torch.nn? Why do NN modules exist in the first place? What's the function of Parameter? How do modules actually track all the parameters in question? What is all of the goop in the top level NN module class? What are some new developments in torch.nn modules? What are some open problems with our modules?Further reading:Implementation of nn.Module https://github.com/pytorch/pytorch/blob/master/torch/nn/modules/module.pynn.Module is complicated and that means its sometimes a bit slow. Some analysis at https://dev-discuss.pytorch.org/t/overhead-in-nn-module-causing-massive-slowdowns-compared-to-raw-cublas-or-torchscript/110Lazy modules PR https://github.com/pytorch/pytorch/pull/44538 and factory kwargs https://github.com/pytorch/pytorch/pull/54508Liner notes: python for hackability (c++ is reimplemented)parameters parameter collection (for optimization) buffers: not considered optimizablemodules functorial operation (_apply) jit script: staged computation (init is not scripted) __call__ to forward (extra instrumentation) serialization / state_dict new stuff: device kwarg (joel schlosser) new stuff: lazy modules (emcastillo) open problems: parameter initialization
More episodes of the podcast PyTorch Developer Podcast
Compiler collectives
04/08/2024
TORCH_TRACE and tlparse
29/04/2024
Higher order operators
21/04/2024
Inductor - Post-grad FX passes
12/04/2024
CUDA graph trees
24/03/2024
Min-cut partitioner
17/03/2024
AOTInductor
02/03/2024
Tensor subclasses and PT2
24/02/2024
Compiled autograd
19/02/2024
PT2 extension points
05/02/2024
ZARZA We are Zarza, the prestigious firm behind major projects in information technology.