Alternating Reinforcement Learning for Rubric-Based Reward Modeling in Non-Verifiable LLM Post-Training Paper • 2602.01511 • Published 6 days ago • 13
OpenRubrics: Towards Scalable Synthetic Rubric Generation for Reward Modeling and LLM Alignment Paper • 2510.07743 • Published Oct 9, 2025 • 10