Seeing Fast and Slow: Learning the Flow of Time in Videos Paper • 2604.21931 • Published 1 day ago • 13
HiVLA: A Visual-Grounded-Centric Hierarchical Embodied Manipulation System Paper • 2604.14125 • Published 10 days ago • 20
HY-Embodied-0.5: Embodied Foundation Models for Real-World Agents Paper • 2604.07430 • Published 17 days ago • 185
WorldAgents: Can Foundation Image Models be Agents for 3D World Models? Paper • 2603.19708 • Published Mar 20 • 13
3DreamBooth: High-Fidelity 3D Subject-Driven Video Generation Model Paper • 2603.18524 • Published Mar 19 • 58
Look Before Acting: Enhancing Vision Foundation Representations for Vision-Language-Action Models Paper • 2603.15618 • Published Mar 16 • 21
VideoAgentTrek: Computer Use Pretraining from Unlabeled Videos Paper • 2510.19488 • Published Oct 22, 2025 • 21
RoboMME: Benchmarking and Understanding Memory for Robotic Generalist Policies Paper • 2603.04639 • Published Mar 4 • 29