Contextual Blocks: Implicit Weight Updates and Federated Learning

08/10/2025 13 min

Listen "Contextual Blocks: Implicit Weight Updates and Federated Learning"

Episode Synopsis

We compare and contrast the math behind two recent research papers which we have covered individually before on this podcast:July 2025:Learning without training: The implicit dynamics of in-context learninghttps://arxiv.org/pdf/2507.16003September 2025:Federated Learning with Ad-hoc Adapter Insertions: The Case of Soft-Embeddings for Training Classifier-as-Retrieverhttps://arxiv.org/pdf/2509.16508The first source explores the concept of **In-Context Learning (ICL)** in neural networks, proposing that the effect of context on a token's output is equivalent to an **implicit weight update** in the neural network, specifically in the MLP layer, generalizing the transformer block using a **contextual block** notion. This work provides an explicit low-rank update formula for this implicit weight modification and mathematically demonstrates that token consumption aligns with an implicit **gradient descent learning dynamics** on the network weights. The second source introduces a novel **retrieval-augmented generation (RAG)** architecture called **Classifier-as-Retriever (CaR)** for memory-constrained edge devices, proposing to use a frozen Small Language Model (SLM) augmented with a small trainable **adapter network** to generate "soft embeddings" and a trainable **classifier head** instead of conventional similarity functions. Crucially, this architecture is designed for distributed training using **Federated Learning (FL)**, incorporating **Differential Privacy (DP)** techniques to ensure client-side data protection and demonstrating significant speedup advantages over centralized training.

More episodes of the podcast AI: post transformers