Parsing Millions of URLs Per Second

23/12/2024 26 min

Listen "Parsing Millions of URLs Per Second"

Episode Synopsis

This research article details the development and benchmarking of a high-performance URL parser compliant with the WHATWG standard. The authors created a C++ implementation leveraging vectorisation techniques, resulting in a parser significantly faster than existing solutions like curl and rust-url. Their parser was integrated into Node.js, leading to substantial performance improvements in URL processing within that environment. Extensive benchmarks across various datasets and platforms demonstrated the superior speed and efficiency of their new parser. The authors also provide open-source access to their code and datasets.