Microsoft's Data Scandal 💻 // UK's AI Principles for Transparency 🇬🇧 // Efficient Language Models 🚀

20/09/2023 13 min

Listen "Microsoft's Data Scandal 💻 // UK's AI Principles for Transparency 🇬🇧 // Efficient Language Models 🚀"

Episode Synopsis

Microsoft's AI research team accidentally exposed 38 terabytes of private data while publishing open-source training data on GitHub, posing a significant security risk. The UK's new AI principles focus on accountability and transparency, seeking views from leading AI developers and governments to ensure the development and use of foundation models evolves in a way that promotes competition and protects consumers. Two new papers explore ways to improve the efficiency and quality of large language models, including a new inference scheme called self-speculative decoding and the ability to prune pretraining data while still retaining performance. A third paper introduces a new type of prompt called the "Chain of Density" or CoD, which generates increasingly dense summaries without increasing their length, resulting in more abstractive and human-preferred summaries.
Contact:  [email protected]
Timestamps:
00:34 Introduction
01:45 38TB of data accidentally exposed by Microsoft AI researchers
03:05 UK focuses on transparency and access with new AI principles
04:40 Jason Wei Tweet on the role of task-specific LLMs
06:11 Fake sponsor
07:43 Draft & Verify: Lossless Large Language Model Acceleration via Self-Speculative Decoding
09:20 When Less is More: Investigating Data Pruning for Pretraining LLMs at Scale
10:55 From Sparse to Dense: GPT-4 Summarization with Chain of Density Prompting
12:37 Outro

More episodes of the podcast GPT Reviews