Los Alamos: overcoming the memory wall fighting sparse memory access

21/08/2025 29 min

Listen "Los Alamos: overcoming the memory wall fighting sparse memory access"

Episode Synopsis

We review Los Alamos National Laboratory advancements in managing indirect memory accesses in high-performance computing and it's relationship to overcoming the memory wall. The first goal of DoE’s next-generation supercomputer, ATS-5, is “Overcoming the memory wall: continued memory bandwidth performance improvements for tri-lab applications.”"DX100" introduces a programmable data access accelerator designed to improve memory bandwidth utilization for irregular applications by reordering, coalescing, and interleaving memory requests. This accelerator aims to offload bulk indirect memory operations from CPU cores, thus reducing instruction count and cache misses. Complementing this, "A Workflow for the Synthesis of Irregular Memory Access Microbenchmarks" presents GS Patterns, a novel tool workflow that analyzes and synthesizes memory access patterns from complex applications. This workflow generates compact representations of sparse memory access patterns, enabling their use in benchmarking and hardware design to evaluate performance on various CPU and GPU architectures, particularly for gather and scatter operations that often bottleneck application performance.Sources:1) https://www.osti.gov/servlets/purl/2332770 - Codesign for memory intensive applications2) https://dl.acm.org/doi/pdf/10.1145/3695053.3731015 -DX100: Programmable Data Access Accelerator for Indirection3) https://dl.acm.org/doi/pdf/10.1145/3695794.3695816 - A Workflow for the Synthesis of Irregular Memory Access Microbenchmarks