Multimodal Benchmarks, Visual Task Transfer, and 3D Object Generation

08/08/2024 14 min Episodio 68
Multimodal Benchmarks, Visual Task Transfer, and 3D Object Generation

Listen "Multimodal Benchmarks, Visual Task Transfer, and 3D Object Generation"

Episode Synopsis


MMIU: Multimodal Multi-image Understanding for Evaluating Large
Vision-Language Models

LLaVA-OneVision: Easy Visual Task Transfer

An Object is Worth 64x64 Pixels: Generating 3D Object via Image
Diffusion

MedTrinity-25M: A Large-scale Multimodal Dataset with Multigranular
Annotations for Medicine

IPAdapter-Instruct: Resolving Ambiguity in Image-based Conditioning
using Instruct Prompts

Scaling LLM Test-Time Compute Optimally can be More Effective than
Scaling Model Parameters

Diffusion Models as Data Mining Tools

More episodes of the podcast AI Papers Podcast