Listen "Pixtral-12B Multimodal Model | Mistral AI"
Episode Synopsis
Pixtral 12B is a 12-billion parameter multimodal language model trained to understand both images and text. It uses a novel vision encoder trained from scratch which allows it to process images at their native resolution and aspect ratio. Pixtral outperforms comparable open-source models on multimodal benchmarks, including a new benchmark called MM-MT-Bench. This podcast also discusses the importance of having standardised evaluation protocols for multimodal language models. The pixtral paper authors highlight the problems with existing benchmarks and metrics, proposing solutions to improve the evaluation of these models.
More episodes of the podcast AI Talks
Byte Latent Transformer | Meta AI
16/12/2024
Reshaping Product Management | Generative AI
04/10/2024
Movie Gen | Meta AI
04/10/2024
Gemini Multimodal LLM | Google Deepmind
03/10/2024
Qwen2-VL | Alibaba Group
03/10/2024
Segment Anything 2 (SAM 2) | Meta AI
03/10/2024
Llama3 Large Language Model (LLM) | Meta AI
03/10/2024
ZARZA We are Zarza, the prestigious firm behind major projects in information technology.