Learn2Fold: Structured Origami Generation with World Model Planning Paper • 2603.29585 • Published Feb 2 • 15
GaussianGPT: Towards Autoregressive 3D Gaussian Scene Generation Paper • 2603.26661 • Published 9 days ago • 21
MMFace-DiT: A Dual-Stream Diffusion Transformer for High-Fidelity Multimodal Face Generation Paper • 2603.29029 • Published 6 days ago • 13
EgoSim: Egocentric World Simulator for Embodied Interaction Generation Paper • 2604.01001 • Published 5 days ago • 34
Unify-Agent: A Unified Multimodal Agent for World-Grounded Image Synthesis Paper • 2603.29620 • Published 5 days ago • 45
Vision2Web: A Hierarchical Benchmark for Visual Website Development with Agent Verification Paper • 2603.26648 • Published 9 days ago • 40
ViGoR-Bench: How Far Are Visual Generative Models From Zero-Shot Visual Reasoners? Paper • 2603.25823 • Published 10 days ago • 42
CutClaw: Agentic Hours-Long Video Editing via Music Synchronization Paper • 2603.29664 • Published 5 days ago • 44
VGGRPO: Towards World-Consistent Video Generation with 4D Latent Reward Paper • 2603.26599 • Published 9 days ago • 58
MiroEval: Benchmarking Multimodal Deep Research Agents in Process and Outcome Paper • 2603.28407 • Published 6 days ago • 62
SKILL0: In-Context Agentic Reinforcement Learning for Skill Internalization Paper • 2604.02268 • Published 4 days ago • 82
Project Imaging-X: A Survey of 1000+ Open-Access Medical Imaging Datasets for Foundation Model Development Paper • 2603.27460 • Published 8 days ago • 64
Lingshu-Cell: A generative cellular world model for transcriptome modeling toward virtual cells Paper • 2603.25240 • Published 10 days ago • 75
The Latent Space: Foundation, Evolution, Mechanism, Ability, and Outlook Paper • 2604.02029 • Published 4 days ago • 123