Listen "Were RNNs All We Needed?"
Episode Synopsis
This research paper revisits the traditional Recurrent Neural Networks (RNNs) – specifically, LSTMs and GRUs – and shows how to adapt them for modern parallel training. The authors demonstrate that by removing certain dependencies within the RNN structure, these models can be trained using the Parallel Scan algorithm, making them significantly faster than their traditional counterparts. The paper then compares the performance of these simplified LSTMs and GRUs (minLSTMs and minGRUs) to recent state-of-the-art sequence models in several tasks, including Selective Copying, Reinforcement Learning, and Language Modeling. The results show that the minLSTMs and minGRUs achieve comparable or better performance than other models while being far more efficient, suggesting that RNNs might be a viable option even in the era of Transformers.
More episodes of the podcast Artificial Discourse
BlueLM-V-3B: Algorithm and System Co-Design for Multimodal Large Language Models on Mobile Devices
19/11/2024
A Survey of Small Language Models
12/11/2024
Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization
11/11/2024
The Llama 3 Herd of Models
10/11/2024
Kolmogorov-Arnold Network (KAN)
09/11/2024
ZARZA We are Zarza, the prestigious firm behind major projects in information technology.