Thinking with Camera: A Unified Multimodal Model for Camera-Centric Understanding and Generation Paper • 2510.08673 • Published Oct 9, 2025 • 125
CommonForms: A Large, Diverse Dataset for Form Field Detection Paper • 2509.16506 • Published Sep 20, 2025 • 19
StreamingVLM: Real-Time Understanding for Infinite Video Streams Paper • 2510.09608 • Published Oct 10, 2025 • 50
view article Article BigCodeArena: Judging code generations end to end with code executions Oct 7, 2025 • 19
Vision-Zero: Scalable VLM Self-Improvement via Strategic Gamified Self-Play Paper • 2509.25541 • Published Sep 29, 2025 • 140
LongLive: Real-time Interactive Long Video Generation Paper • 2509.22622 • Published Sep 26, 2025 • 184
⚛️ Liquid Nanos Collection Library of task-specific models: https://www.liquid.ai/blog/introducing-liquid-nanos-frontier-grade-performance-on-everyday-devices • 26 items • Updated 3 days ago • 105
LLM-I: LLMs are Naturally Interleaved Multimodal Creators Paper • 2509.13642 • Published Sep 17, 2025 • 9
Holo1.5 Collection Holo1.5 - Open Foundation Models for Computer Use Agents • 5 items • Updated Sep 15, 2025 • 34
PP-OCRv5 Collection PP-OCRv5 is the latest text recognition solution, supporting Simplified Chinese, Chinese Pinyin, Traditional Chinese, English, and Japanese • 13 items • Updated Sep 15, 2025 • 50
Seedance 1.0: Exploring the Boundaries of Video Generation Models Paper • 2506.09113 • Published Jun 10, 2025 • 105
PartCrafter: Structured 3D Mesh Generation via Compositional Latent Diffusion Transformers Paper • 2506.05573 • Published Jun 5, 2025 • 82
Lingshu MLLMs Collection Lingshu: A Generalist Foundation Model for Unified Multimodal Medical Understanding and Reasoning • 4 items • Updated Oct 9, 2025 • 21
GUI-Actor: Coordinate-Free Visual Grounding for GUI Agents Paper • 2506.03143 • Published Jun 3, 2025 • 53