Code generation

04/06/2021 16 min Episodio 23
Code generation

Listen "Code generation"

Episode Synopsis

Why does PyTorch use code generation as part of its build process? Why doesn't it use C++ templates? What things is code generation used for? What are the pros/consof using code generation? What are some other ways to do the same things we currently do with code generation?Further reading.Top level file for the new code generation pipeline https://github.com/pytorch/pytorch/blob/master/tools/codegen/gen.pyOut of tree external backend code generation from Brian Hirsh: https://github.com/pytorch/xla/issues/2871Documentation for native_functions.yaml https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/README.md (have you seen this README before? Yes you've seen this README before. Imma post it again.)Outline:High level: reduce the amount of code in PyTorch, easier to developStrongly typed pythonStuff we're using codegen forMeta point: stuff c++ metaprogramming can't doC++ apis (functions, methods on classes)Especially for forwarding (operator dot doko)Prototypes for c++ to implementYAML files used by external frameworks for binding (accidental)Python arg parsingpyi generationAutograd classes for saving saved dataOtherwise complicated constexpr computation (e.g., parsing JITschema)ProsBetter surface syntax (native_functions.yaml, jit schema,derivatives.yaml)Better error messages (template messages famously bad)Easier to organize complicated code; esp nontrivial inputdata structureEasier to debug by looking at generated codeConNot as portable (template can be used by anyone)Less good modeling for C++ type based metaprogramming (we've replicated a crappy version of C++ type system in our codegen)Counterpoints in the design spaceC++ templates: just as efficientBoxed fallback: simpler, less efficientOpen question: can you have best of both worlds, e.g., with partially evaluated interpreters?