Small Vision-Language Models are Smart Compressors for Long Video Understanding Paper • 2604.08120 • Published 4 days ago • 15
view article Article Multimodal Embedding & Reranker Models with Sentence Transformers 4 days ago • 36
view article Article How we OCR'ed 30,000 papers using Codex, open OCR models and Jobs 6 days ago • 41
CORAL: Towards Autonomous Multi-Agent Evolution for Open-Ended Discovery Paper • 2604.01658 • Published 11 days ago • 54
Trace2Skill: Distill Trajectory-Local Lessons into Transferable Agent Skills Paper • 2603.25158 • Published 18 days ago • 50
CUA-Suite: Massive Human-annotated Video Demonstrations for Computer-Use Agents Paper • 2603.24440 • Published 19 days ago • 96
SlopCodeBench: Benchmarking How Coding Agents Degrade Over Long-Horizon Iterative Tasks Paper • 2603.24755 • Published 18 days ago • 28
Intern-S1-Pro: Scientific Multimodal Foundation Model at Trillion Scale Paper • 2603.25040 • Published 18 days ago • 128
MSA: Memory Sparse Attention for Efficient End-to-End Memory Model Scaling to 100M Tokens Paper • 2603.23516 • Published Mar 6 • 48
From Static Templates to Dynamic Runtime Graphs: A Survey of Workflow Optimization for LLM Agents Paper • 2603.22386 • Published 20 days ago • 55
Rethinking Token-Level Policy Optimization for Multimodal Chain-of-Thought Paper • 2603.22847 • Published 20 days ago • 25
Speed by Simplicity: A Single-Stream Architecture for Fast Audio-Video Generative Foundation Model Paper • 2603.21986 • Published 21 days ago • 123
WeEdit: A Dataset, Benchmark and Glyph-Guided Framework for Text-centric Image Editing Paper • 2603.11593 • Published Mar 12 • 25
One Model, Many Budgets: Elastic Latent Interfaces for Diffusion Transformers Paper • 2603.12245 • Published Mar 12 • 18