Latent Constituency in Humans and LLMs

26/10/2025 19 min

Listen "Latent Constituency in Humans and LLMs"

Episode Synopsis

The provided text is an academic paper titled **"Active Use of Latent Constituency Representation in both Humans and Large Language Models,"** which explores how sentences are internally represented in both the human brain and large language models (**LLMs**) like ChatGPT. The authors introduce a novel **one-shot learning word deletion task** where participants infer a deletion rule from a single example; they found that both humans and LLMs tend to delete a **complete linguistic constituent** rather than a nonconstituent word string, suggesting that latent, hierarchical linguistic structures emerge in both. Furthermore, the study demonstrates that the **deletion behavior** can be used to reconstruct a **constituency tree representation** that is structurally consistent with linguistically defined trees. The research also investigates how **language-dependent rules** are inferred and finds that native speakers primarily rely on **syntactic structure** over semantic plausibility in this task.Source:https://arxiv.org/pdf/2405.18241