Listen "Were RNNs All We Needed? (Feng et al., 2024)"
Episode Synopsis
Welcome to Revise and Resubmit, the place where we explore research breakthroughs, challenge assumptions, and ask the questions that keep science alive. Today, we dive into a fascinating paper titled "Were RNNs All We Needed?", authored by Leo Feng, Frederick Tung, Mohamed Osama Ahmed, Yoshua Bengio, and Hossein Hajimirsadegh. Published on October 2, 2024, this preprint is hosted on arXiv, courtesy of Cornell University.
For years, Transformers have reigned supreme, revolutionizing natural language processing and sequential data tasks. But Transformers come with a cost: they struggle with long sequences, raising the question—did we abandon recurrent neural networks (RNNs) too soon? This paper suggests we may have. It takes us back to the classic models—LSTMs from 1997 and GRUs from 2014—and shows that with a little clever tweaking, these older architectures can still shine.
Imagine RNNs that no longer need to backpropagate through time (BPTT). The authors remove hidden-state dependencies from input and update gates, giving birth to minLSTMs and minGRUs—leaner, faster, and fully parallelizable. In fact, they train 175x faster for long sequences and match the latest sequence models in performance, proving that sometimes innovation lies in refining the old, not just chasing the new.
But here’s the million-dollar question: Did we jump to Transformers too soon? Could the future of deep learning lie in revisiting old ideas with fresh eyes?
Thank you to the authors for their brilliant work and to Cornell University for making this research openly accessible through arXiv.
Reference
Feng, L., Tung, F., Ahmed, M. O., Bengio, Y., & Hajimirsadegh, H. (2024). Were RNNs All We Needed?. arXiv preprint. https://doi.org/10.48550/arXiv.2410.01201
For years, Transformers have reigned supreme, revolutionizing natural language processing and sequential data tasks. But Transformers come with a cost: they struggle with long sequences, raising the question—did we abandon recurrent neural networks (RNNs) too soon? This paper suggests we may have. It takes us back to the classic models—LSTMs from 1997 and GRUs from 2014—and shows that with a little clever tweaking, these older architectures can still shine.
Imagine RNNs that no longer need to backpropagate through time (BPTT). The authors remove hidden-state dependencies from input and update gates, giving birth to minLSTMs and minGRUs—leaner, faster, and fully parallelizable. In fact, they train 175x faster for long sequences and match the latest sequence models in performance, proving that sometimes innovation lies in refining the old, not just chasing the new.
But here’s the million-dollar question: Did we jump to Transformers too soon? Could the future of deep learning lie in revisiting old ideas with fresh eyes?
Thank you to the authors for their brilliant work and to Cornell University for making this research openly accessible through arXiv.
Reference
Feng, L., Tung, F., Ahmed, M. O., Bengio, Y., & Hajimirsadegh, H. (2024). Were RNNs All We Needed?. arXiv preprint. https://doi.org/10.48550/arXiv.2410.01201
ZARZA We are Zarza, the prestigious firm behind major projects in information technology.