RL - a galois77 Collection

galois77 's Collections

Thousand brains theory

THE ORB

energy based models

OCR

Poetry

Agentic

Videos

ahan

Image generation

Training optimization

RL

Benchmarks and challenges

RL

updated Sep 2, 2025

Towards General-Purpose Model-Free Reinforcement Learning

Paper • 2501.16142 • Published Jan 27, 2025 • 30
RL + Transformer = A General-Purpose Problem Solver

Paper • 2501.14176 • Published Jan 24, 2025 • 28
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training

Paper • 2501.17161 • Published Jan 28, 2025 • 123
Process-Supervised Reinforcement Learning for Code Generation

Paper • 2502.01715 • Published Feb 3, 2025
VideoChat-R1: Enhancing Spatio-Temporal Perception via Reinforcement Fine-Tuning

Paper • 2504.06958 • Published Apr 9, 2025 • 13
ZeroSearch: Incentivize the Search Capability of LLMs without Searching

Paper • 2505.04588 • Published May 7, 2025 • 65
Improving Editability in Image Generation with Layer-wise Memory

Paper • 2505.01079 • Published May 2, 2025 • 29
RLVR-World: Training World Models with Reinforcement Learning

Paper • 2505.13934 • Published May 20, 2025 • 16
s3: You Don't Need That Much Data to Train a Search Agent via RL

Paper • 2505.14146 • Published May 20, 2025 • 19
Robust Reward Modeling via Causal Rubrics

Paper • 2506.16507 • Published Jun 19, 2025 • 9
Chain-of-Experts: Unlocking the Communication Power of Mixture-of-Experts Models

Paper • 2506.18945 • Published Jun 23, 2025 • 40
Is Chain-of-Thought Reasoning of LLMs a Mirage? A Data Distribution Lens

Paper • 2508.01191 • Published Aug 2, 2025 • 238
UI-Venus Technical Report: Building High-performance UI Agents with RFT

Paper • 2508.10833 • Published Aug 14, 2025 • 44