Towards General-Purpose Model-Free Reinforcement Learning
Paper
•
2501.16142
•
Published
•
30
RL + Transformer = A General-Purpose Problem Solver
Paper
•
2501.14176
•
Published
•
28
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model
Post-training
Paper
•
2501.17161
•
Published
•
123
Process-Supervised Reinforcement Learning for Code Generation
Paper
•
2502.01715
•
Published
VideoChat-R1: Enhancing Spatio-Temporal Perception via Reinforcement
Fine-Tuning
Paper
•
2504.06958
•
Published
•
13
ZeroSearch: Incentivize the Search Capability of LLMs without Searching
Paper
•
2505.04588
•
Published
•
65
Improving Editability in Image Generation with Layer-wise Memory
Paper
•
2505.01079
•
Published
•
29
RLVR-World: Training World Models with Reinforcement Learning
Paper
•
2505.13934
•
Published
•
16
s3: You Don't Need That Much Data to Train a Search Agent via RL
Paper
•
2505.14146
•
Published
•
19
Robust Reward Modeling via Causal Rubrics
Paper
•
2506.16507
•
Published
•
9
Chain-of-Experts: Unlocking the Communication Power of
Mixture-of-Experts Models
Paper
•
2506.18945
•
Published
•
40
Is Chain-of-Thought Reasoning of LLMs a Mirage? A Data Distribution Lens
Paper
•
2508.01191
•
Published
•
238
UI-Venus Technical Report: Building High-performance UI Agents with RFT
Paper
•
2508.10833
•
Published
•
44