FiVA: Fine-grained Visual Attribute Dataset for Text-to-Image Diffusion Models

11/12/2024 19 min Episodio 193

Listen "FiVA: Fine-grained Visual Attribute Dataset for Text-to-Image Diffusion Models"

Descargar episodio Ver en sitio original

Episode Synopsis

🤗 Upvotes: 19 | cs.CV

Authors:
Tong Wu, Yinghao Xu, Ryan Po, Mengchen Zhang, Guandao Yang, Jiaqi Wang, Ziwei Liu, Dahua Lin, Gordon Wetzstein

Title:
FiVA: Fine-grained Visual Attribute Dataset for Text-to-Image Diffusion Models

Arxiv:
http://arxiv.org/abs/2412.07674v1

Abstract:
Recent advances in text-to-image generation have enabled the creation of high-quality images with diverse applications. However, accurately describing desired visual attributes can be challenging, especially for non-experts in art and photography. An intuitive solution involves adopting favorable attributes from the source images. Current methods attempt to distill identity and style from source images. However, "style" is a broad concept that includes texture, color, and artistic elements, but does not cover other important attributes such as lighting and dynamics. Additionally, a simplified "style" adaptation prevents combining multiple attributes from different sources into one generated image. In this work, we formulate a more effective approach to decompose the aesthetics of a picture into specific visual attributes, allowing users to apply characteristics such as lighting, texture, and dynamics from different images. To achieve this goal, we constructed the first fine-grained visual attributes dataset (FiVA) to the best of our knowledge. This FiVA dataset features a well-organized taxonomy for visual attributes and includes around 1 M high-quality generated images with visual attribute annotations. Leveraging this dataset, we propose a fine-grained visual attribute adaptation framework (FiVA-Adapter), which decouples and adapts visual attributes from one or more source images into a generated one. This approach enhances user-friendly customization, allowing users to selectively apply desired attributes to create images that meet their unique preferences and specific content requirements.

More episodes of the podcast Daily Paper Cast

Native Parallel Reasoner: Reasoning in Parallelism via Self-Distilled Reinforcement Learning 09/12/2025

Beyond Real: Imaginary Extension of Rotary Position Embeddings for Long-Context LLMs 09/12/2025

Unified Video Editing with Temporal Reasoner 09/12/2025

Voxify3D: Pixel Art Meets Volumetric Rendering 09/12/2025

Scaling Zero-Shot Reference-to-Video Generation 09/12/2025

DoVer: Intervention-Driven Auto Debugging for LLM Multi-Agent Systems 09/12/2025

TwinFlow: Realizing One-step Generation on Large Models with Self-adversarial Flows 08/12/2025

EditThinker: Unlocking Iterative Reasoning for Any Image Editor 08/12/2025

From Imitation to Discrimination: Toward A Generalized Curriculum Advantage Mechanism Enhancing Cross-Domain Reasoning Tasks 08/12/2025

EMMA: Efficient Multimodal Understanding, Generation, and Editing with a Unified Architecture 08/12/2025

Ver todos los episodios

ZARZA We are Zarza, the prestigious firm behind major projects in information technology.

FiVA: Fine-grained Visual Attribute Dataset for Text-to-Image Diffusion Models

Listen "FiVA: Fine-grained Visual Attribute Dataset for Text-to-Image Diffusion Models"

Episode Synopsis

More episodes of the podcast Daily Paper Cast

Increase the rate of email delivery

WWW. Is it obsolete or not? Should we use it?

Bandwidth: Broadband or Narrowband?

Personnel recruitment via Web

Deep web or Invisible Internet

Subdomains, a glance with the experts!

Free Internet, a prediction in Nostradamus style

Educational Technology: From traditional to digital

Localhost, there’s no place like 127.0.0.1

Googling with breathtaking tricks you ignore

Gray Hat Hacking, those with ambiguous ethics…

Internet Predators on the prowl

Dot COM: The Internet’s dominant TLD