Google ❤️ Open Source AI
RubricEM: Meta-RL with Rubric-guided Policy Decomposition beyond Verifiable Rewards
LLMs Improving LLMs: Agentic Discovery for Test-Time Scaling