ToolOrchestra: Elevating Intelligence via Efficient Model and Tool Orchestration Paper • 2511.21689 • Published Nov 26, 2025 • 111
Nemotron-Flash: Towards Latency-Optimal Hybrid Small Language Models Paper • 2511.18890 • Published Nov 24, 2025 • 32
V-ReasonBench: Toward Unified Reasoning Benchmark Suite for Video Generation Models Paper • 2511.16668 • Published Nov 20, 2025 • 54
DLER: Doing Length pEnalty Right - Incentivizing More Intelligence per Token via Reinforcement Learning Paper • 2510.15110 • Published Oct 16, 2025 • 15
GAR: Generative Adversarial Reinforcement Learning for Formal Theorem Proving Paper • 2510.11769 • Published Oct 13, 2025 • 25
ERA: Transforming VLMs into Embodied Agents via Embodied Prior Learning and Online Reinforcement Learning Paper • 2510.12693 • Published Oct 14, 2025 • 27
GAR: Generative Adversarial Reinforcement Learning for Formal Theorem Proving Paper • 2510.11769 • Published Oct 13, 2025 • 25
EPO: Entropy-regularized Policy Optimization for LLM Agents Reinforcement Learning Paper • 2509.22576 • Published Sep 26, 2025 • 134
Beyond Correctness: Harmonizing Process and Outcome Rewards through RL Training Paper • 2509.03403 • Published Sep 3, 2025 • 22
Chain-of-Experts: Unlocking the Communication Power of Mixture-of-Experts Models Paper • 2506.18945 • Published Jun 23, 2025 • 40
Saffron-1: Towards an Inference Scaling Paradigm for LLM Safety Assurance Paper • 2506.06444 • Published Jun 6, 2025 • 73
MiCRo: Mixture Modeling and Context-aware Routing for Personalized Preference Learning Paper • 2505.24846 • Published May 30, 2025 • 15
ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models Paper • 2505.24864 • Published May 30, 2025 • 143
CLIMB: CLustering-based Iterative Data Mixture Bootstrapping for Language Model Pre-training Paper • 2504.13161 • Published Apr 17, 2025 • 93
Predictive Data Selection: The Data That Predicts Is the Data That Teaches Paper • 2503.00808 • Published Mar 2, 2025 • 56
Self-rewarding correction for mathematical reasoning Paper • 2502.19613 • Published Feb 26, 2025 • 82
SPAM: Spike-Aware Adam with Momentum Reset for Stable LLM Training Paper • 2501.06842 • Published Jan 12, 2025 • 16
Mix-LN: Unleashing the Power of Deeper Layers by Combining Pre-LN and Post-LN Paper • 2412.13795 • Published Dec 18, 2024 • 20