Listen "28 - Data Programming: Creating Large Training Sets, Quickly"
Episode Synopsis
NIPS 2016 paper by Alexander Ratner and coauthors in Chris Ré's group at Stanford, presented by Waleed.
The paper presents a method for generating labels for an unlabeled dataset by combining a number of weak labelers. This changes the annotation effort from looking at individual examples to constructing a large number of noisy labeling heuristics, a task the authors call "data programming". Then you learn a model that intelligently aggregates information from the weak labelers to create a weighted "supervised" training set. We talk about this method, how it works, how it's related to ideas like co-training, and when you might want to use it.
https://www.semanticscholar.org/paper/Data-Programming-Creating-Large-Training-Sets-Quic-Ratner-Sa/37acbbbcfe9d8eb89e5b01da28dac6d44c3903ee
The paper presents a method for generating labels for an unlabeled dataset by combining a number of weak labelers. This changes the annotation effort from looking at individual examples to constructing a large number of noisy labeling heuristics, a task the authors call "data programming". Then you learn a model that intelligently aggregates information from the weak labelers to create a weighted "supervised" training set. We talk about this method, how it works, how it's related to ideas like co-training, and when you might want to use it.
https://www.semanticscholar.org/paper/Data-Programming-Creating-Large-Training-Sets-Quic-Ratner-Sa/37acbbbcfe9d8eb89e5b01da28dac6d44c3903ee
More episodes of the podcast NLP Highlights
Are LLMs safe?
29/02/2024
"Imaginative AI" with Mohamed Elhoseiny
08/01/2024
142 - Science Of Science, with Kyle Lo
28/12/2023
ZARZA We are Zarza, the prestigious firm behind major projects in information technology.