28 - Data Programming: Creating Large Training Sets, Quickly

11/07/2017 25 min

Listen "28 - Data Programming: Creating Large Training Sets, Quickly"

Descargar episodio Ver en sitio original

Episode Synopsis

NIPS 2016 paper by Alexander Ratner and coauthors in Chris Ré's group at Stanford, presented by Waleed.

The paper presents a method for generating labels for an unlabeled dataset by combining a number of weak labelers. This changes the annotation effort from looking at individual examples to constructing a large number of noisy labeling heuristics, a task the authors call "data programming". Then you learn a model that intelligently aggregates information from the weak labelers to create a weighted "supervised" training set. We talk about this method, how it works, how it's related to ideas like co-training, and when you might want to use it.

https://www.semanticscholar.org/paper/Data-Programming-Creating-Large-Training-Sets-Quic-Ratner-Sa/37acbbbcfe9d8eb89e5b01da28dac6d44c3903ee

More episodes of the podcast NLP Highlights

Are LLMs safe? 29/02/2024

"Imaginative AI" with Mohamed Elhoseiny 08/01/2024

142 - Science Of Science, with Kyle Lo 28/12/2023

141 - Building an open source LM, with Iz Beltagy and Dirk Groeneveld 29/06/2023

140 - Generative AI and Copyright, with Chris Callison-Burch 06/06/2023

139 - Coherent Long Story Generation, with Kevin Yang 24/03/2023

138 - Compositional Generalization in Neural Networks, with Najoung Kim 20/01/2023

137 - Nearest Neighbor Language Modeling and Machine Translation, with Urvashi Khandelwal 13/01/2023

136 - Including Signed Languages in NLP, with Kayo Yin and Malihe Alikhani 19/05/2022

135 - PhD Application Series: After Submitting Applications 02/03/2022

Ver todos los episodios

ZARZA We are Zarza, the prestigious firm behind major projects in information technology.

28 - Data Programming: Creating Large Training Sets, Quickly

Listen "28 - Data Programming: Creating Large Training Sets, Quickly"

Episode Synopsis

More episodes of the podcast NLP Highlights

7 Advices to Prevent Identity Theft

WWW. Is it obsolete or not? Should we use it?

Bandwidth: Broadband or Narrowband?

Personnel recruitment via Web

Deep web or Invisible Internet

Subdomains, a glance with the experts!

Free Internet, a prediction in Nostradamus style

Educational Technology: From traditional to digital

Localhost, there’s no place like 127.0.0.1

Googling with breathtaking tricks you ignore

Internet Predators on the prowl

Gray Hat Hacking, those with ambiguous ethics…

Dot COM: The Internet’s dominant TLD