M3-Bench: Multi-Modal, Multi-Hop, Multi-Threaded Tool-Using MLLM Agent Benchmark Paper • 2511.17729 • Published Nov 21 • 16
EPO: Entropy-regularized Policy Optimization for LLM Agents Reinforcement Learning Paper • 2509.22576 • Published Sep 26 • 134
Two Heads are Better Than One: Test-time Scaling of Multi-agent Collaborative Reasoning Paper • 2504.09772 • Published Apr 14 • 1