Listen "Devin: First AI Software Engineer 🤖 // Google Gemini AI on Elections 🚫 // WorkArena Benchmark 💼"
Episode Synopsis
Introducing Devin, the first AI software engineer that can plan and execute complex engineering tasks requiring thousands of decisions.
Google's AI chatbot won't answer questions about upcoming elections to prevent inaccurate or misleading responses.
WorkArena, a benchmark measuring the ability of large language model-based agents to perform tasks that align with the daily work of knowledge workers using enterprise software systems.
Synth$^2$, a novel approach that leverages Large Language Models (LLMs) and image generation models to create synthetic image-text pairs for efficient and effective Visual-Language Model (VLM) training.
Contact: [email protected]
Timestamps:
00:34 Introduction
01:59 Introducing Devin, the first AI software engineer
03:48 Google won’t let you use its Gemini AI to answer questions about an upcoming election in your country
05:35 AI Datacenter Energy Dilemma - Race for AI Datacenter Space
06:45 Fake sponsor
09:08 Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLM
10:26 WorkArena: How Capable Are Web Agents at Solving Common Knowledge Work Tasks?
11:52 Synth$^2$: Boosting Visual-Language Models with Synthetic Captions and Image Embeddings
13:38 Outro
Google's AI chatbot won't answer questions about upcoming elections to prevent inaccurate or misleading responses.
WorkArena, a benchmark measuring the ability of large language model-based agents to perform tasks that align with the daily work of knowledge workers using enterprise software systems.
Synth$^2$, a novel approach that leverages Large Language Models (LLMs) and image generation models to create synthetic image-text pairs for efficient and effective Visual-Language Model (VLM) training.
Contact: [email protected]
Timestamps:
00:34 Introduction
01:59 Introducing Devin, the first AI software engineer
03:48 Google won’t let you use its Gemini AI to answer questions about an upcoming election in your country
05:35 AI Datacenter Energy Dilemma - Race for AI Datacenter Space
06:45 Fake sponsor
09:08 Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLM
10:26 WorkArena: How Capable Are Web Agents at Solving Common Knowledge Work Tasks?
11:52 Synth$^2$: Boosting Visual-Language Models with Synthetic Captions and Image Embeddings
13:38 Outro
More episodes of the podcast GPT Reviews
OpenAI's 'Strawberry' AI 🚀 // World's Fastest AI Inference ⚡ // Photo-realistic 3D Avatars 🎨
28/08/2024
Grok-2's Speed & Accuracy 🚀 // OpenAI's Transparency Push 🗳️ // LlamaDuo for Local LLMs 🔄
27/08/2024
Amazon Cloud Chief Spicy Takes 🚀 // Zuckerberg's AI Vision 📈 // Multimodal Models for Safety 🔒
23/08/2024
Grok-2 Beta Release 🚀 // Apple's $1,000 Home Robot 🏡 // ChemVLM Breakthrough in Chemistry 🔬
15/08/2024
Gemini Live AI Assistant 📱 // OpenAI’s Coding Benchmark ✅ // LongWriter’s 10K Word Generation ✍️
14/08/2024
ZARZA We are Zarza, the prestigious firm behind major projects in information technology.