Pixtral-12B Multimodal Model | Mistral AI

10/10/2024 10 min Temporada 1 Episodio 7

Listen "Pixtral-12B Multimodal Model | Mistral AI"

Descargar episodio Ver en sitio original

Episode Synopsis

Pixtral 12B is a 12-billion parameter multimodal language model trained to understand both images and text. It uses a novel vision encoder trained from scratch which allows it to process images at their native resolution and aspect ratio. Pixtral outperforms comparable open-source models on multimodal benchmarks, including a new benchmark called MM-MT-Bench. This podcast also discusses the importance of having standardised evaluation protocols for multimodal language models. The pixtral paper authors highlight the problems with existing benchmarks and metrics, proposing solutions to improve the evaluation of these models.

More episodes of the podcast AI Talks

Byte Latent Transformer | Meta AI 16/12/2024

Reshaping Product Management | Generative AI 04/10/2024

Movie Gen | Meta AI 04/10/2024

Gemini Multimodal LLM | Google Deepmind 03/10/2024

Qwen2-VL | Alibaba Group 03/10/2024

Segment Anything 2 (SAM 2) | Meta AI 03/10/2024

Llama3 Large Language Model (LLM) | Meta AI 03/10/2024

Ver todos los episodios

ZARZA We are Zarza, the prestigious firm behind major projects in information technology.

Pixtral-12B Multimodal Model | Mistral AI

Listen "Pixtral-12B Multimodal Model | Mistral AI"

Episode Synopsis

More episodes of the podcast AI Talks

Internet Predators on the prowl

Positive Attitude, Share your ZARZA Attitude!

Bandwidth: Broadband or Narrowband?

Personnel recruitment via Web

Deep web or Invisible Internet

Subdomains, a glance with the experts!

Free Internet, a prediction in Nostradamus style

Educational Technology: From traditional to digital

Localhost, there’s no place like 127.0.0.1

Googling with breathtaking tricks you ignore

Gray Hat Hacking, those with ambiguous ethics…

Internet Predators on the prowl

Dot COM: The Internet’s dominant TLD