Concerto: Joint 2D-3D Self-Supervised Learning Emerges Spatial Representations Paper • 2510.23607 • Published Oct 27, 2025 • 177
Tell What You Hear From What You See -- Video to Audio Generation Through Text Paper • 2411.05679 • Published Nov 8, 2024 • 1
Tackling Data Bias in MUSIC-AVQA: Crafting a Balanced Dataset for Unbiased Question-Answering Paper • 2310.06238 • Published Oct 10, 2023 • 1
SAVVY: Spatial Awareness via Audio-Visual LLMs through Seeing and Hearing Paper • 2506.05414 • Published Jun 4, 2025 • 2