view article Article OpenEnv in Practice: Evaluating Tool-Using Agents in Real-World Environments +3 16 days ago • 30
view article Article IBM and UC Berkeley Diagnose Why Enterprise Agents Fail Using IT-Bench and MAST 10 days ago • 16
Nemotron-Terminal Collection We are releasing Nemotron-Terminal models and training datasets. • 7 items • Updated 3 days ago • 21
view article Article GGML and llama.cpp join HF to ensure the long-term progress of Local AI +4 8 days ago • 469
NL2Repo-Bench: Towards Long-Horizon Repository Generation Evaluation of Coding Agents Paper • 2512.12730 • Published Dec 14, 2025 • 48
When Models Manipulate Manifolds: The Geometry of a Counting Task Paper • 2601.04480 • Published Jan 8 • 4
Step 3.5 Flash: Open Frontier-Level Intelligence with 11B Active Parameters Paper • 2602.10604 • Published 17 days ago • 185
SEA-Guard: Culturally Grounded Multilingual Safeguard for Southeast Asia Paper • 2602.01618 • Published 26 days ago • 2
SWE-Universe: Scale Real-World Verifiable Environments to Millions Paper • 2602.02361 • Published 26 days ago • 60
view article Article The Future of the Global Open-Source AI Ecosystem: From DeepSeek to AI+ 25 days ago • 47
view article Article Architectural Choices in China's Open-Source AI Ecosystem: Building Beyond DeepSeek Jan 27 • 45