Nemotron 3 Nano: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning Paper • 2512.20848 • Published 5 days ago • 28
TurboDiffusion: Accelerating Video Diffusion Models by 100-200 Times Paper • 2512.16093 • Published 11 days ago • 74
Visual Backdoor Attacks on MLLM Embodied Decision Making via Contrastive Trigger Learning Paper • 2510.27623 • Published Oct 31 • 12
Stress Testing Generalization: How Minor Modifications Undermine Large Language Model Performance Paper • 2502.12459 • Published Feb 18 • 2
LaSeR: Reinforcement Learning with Last-Token Self-Rewarding Paper • 2510.14943 • Published Oct 16 • 39
AEPO Collection The official datasets and model checkpoints of AEPO • 5 items • Updated 8 days ago • 4
view article Article Navigating the RLHF Landscape: From Policy Gradients to PPO, GAE, and DPO for LLM Alignment Feb 11 • 94
MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention Paper • 2506.13585 • Published Jun 16 • 273
LLaDA-V: Large Language Diffusion Models with Visual Instruction Tuning Paper • 2505.16933 • Published May 22 • 34
Tool-Star: Empowering LLM-Brained Multi-Tool Reasoner via Reinforcement Learning Paper • 2505.16410 • Published May 22 • 58
Super(ficial)-alignment: Strong Models May Deceive Weak Models in Weak-to-Strong Generalization Paper • 2406.11431 • Published Jun 17, 2024 • 4