Robust-R1: Degradation-Aware Reasoning for Robust Visual Understanding Paper • 2512.17532 • Published 7 days ago • 62
The Prism Hypothesis: Harmonizing Semantic and Pixel Representations via Unified Autoencoding Paper • 2512.19693 • Published 4 days ago • 60
Kandinsky 5.0: A Family of Foundation Models for Image and Video Generation Paper • 2511.14993 • Published Nov 19 • 226
Thinking with Video: Video Generation as a Promising Multimodal Reasoning Paradigm Paper • 2511.04570 • Published Nov 6 • 210
A Comprehensive Study on Visual Token Redundancy for Discrete Diffusion-based Multimodal Large Language Models Paper • 2511.15098 • Published Nov 19
LongVT: Incentivizing "Thinking with Long Videos" via Native Tool Calling Paper • 2511.20785 • Published about 1 month ago • 154 • 7
AI Paper of the Day Collection A collection of papers that I think are interesting, one added each day • 550 items • Updated about 18 hours ago • 73