ResearchClawBench: A Benchmark for End-to-End Autonomous Scientific Research Paper • 2606.07591 • Published 15 days ago • 85
COLLEAGUE.SKILL: Automated AI Skill Generation via Expert Knowledge Distillation Paper • 2605.31264 • Published 14 days ago • 111
AutoResearchClaw: Self-Reinforcing Autonomous Research with Human-AI Collaboration Paper • 2605.20025 • Published 24 days ago • 187
Teaching Thinking Models to Reason with Tools: A Full-Pipeline Recipe for Tool-Integrated Reasoning Paper • 2605.06326 • Published May 7 • 26
Plan Then Action:High-Level Planning Guidance Reinforcement Learning for LLM Reasoning Paper • 2510.01833 • Published Oct 2, 2025
QCBench: Evaluating Large Language Models on Domain-Specific Quantitative Chemistry Paper • 2508.01670 • Published Aug 3, 2025
$δ$-mem: Efficient Online Memory for Large Language Models Paper • 2605.12357 • Published about 1 month ago • 128
δ-mem: Efficient Online Memory for Large Language Models Paper • 2605.12357 • Published about 1 month ago • 128
PRBench: End-to-end Paper Reproduction in Physics Research Paper • 2603.27646 • Published Mar 29 • 29
Intern-S1-Pro: Scientific Multimodal Foundation Model at Trillion Scale Paper • 2603.25040 • Published Mar 26 • 133
LongCat-Flash-Prover: Advancing Native Formal Reasoning via Agentic Tool-Integrated Reinforcement Learning Paper • 2603.21065 • Published Mar 22 • 78
OpenSeeker: Democratizing Frontier Search Agents by Fully Open-Sourcing Training Data Paper • 2603.15594 • Published Mar 16 • 150