5 16

Wenkai Yang

Keven16

https://keven980716.github.io/

keven980716

AI & ML interests

None yet

Recent Activity

upvoted a paper 4 days ago

Nemotron 3 Nano: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning

upvoted a paper 4 days ago

NVIDIA Nemotron 3: Efficient and Open Intelligence

upvoted a paper 4 days ago

TurboDiffusion: Accelerating Video Diffusion Models by 100-200 Times

View all activity

Organizations

None yet

upvoted 3 papers 4 days ago

Nemotron 3 Nano: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning

Paper • 2512.20848 • Published 5 days ago • 28

NVIDIA Nemotron 3: Efficient and Open Intelligence

Paper • 2512.20856 • Published 5 days ago • 23

TurboDiffusion: Accelerating Video Diffusion Models by 100-200 Times

Paper • 2512.16093 • Published 11 days ago • 74

upvoted a paper 26 days ago

Mixture of Horizons in Action Chunking

Paper • 2511.19433 • Published Nov 24 • 17

upvoted a paper about 2 months ago

Visual Backdoor Attacks on MLLM Embodied Decision Making via Contrastive Trigger Learning

Paper • 2510.27623 • Published Oct 31 • 12

upvoted 2 papers 2 months ago

Stress Testing Generalization: How Minor Modifications Undermine Large Language Model Performance

Paper • 2502.12459 • Published Feb 18 • 2

LaSeR: Reinforcement Learning with Last-Token Self-Rewarding

Paper • 2510.14943 • Published Oct 16 • 39

upvoted a collection 2 months ago

AEPO

Collection

The official datasets and model checkpoints of AEPO • 5 items • Updated 8 days ago • 4

upvoted an article 4 months ago

Article

Navigating the RLHF Landscape: From Policy Gradients to PPO, GAE, and DPO for LLM Alignment

Feb 11

•

upvoted a paper 5 months ago

Agentic Reinforced Policy Optimization

Paper • 2507.19849 • Published Jul 26 • 158

upvoted a paper 6 months ago

MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention

Paper • 2506.13585 • Published Jun 16 • 273

upvoted 3 papers 7 months ago

Reinforcement Pre-Training

Paper • 2506.08007 • Published Jun 9 • 263

LLaDA-V: Large Language Diffusion Models with Visual Instruction Tuning

Paper • 2505.16933 • Published May 22 • 34

Tool-Star: Empowering LLM-Brained Multi-Tool Reasoner via Reinforcement Learning

Paper • 2505.16410 • Published May 22 • 58

upvoted a paper 8 months ago

DeepCritic: Deliberate Critique with Large Language Models

Paper • 2505.00662 • Published May 1 • 53

upvoted a paper over 1 year ago

Super(ficial)-alignment: Strong Models May Deceive Weak Models in Weak-to-Strong Generalization

Paper • 2406.11431 • Published Jun 17, 2024 • 4

Wenkai Yang

AI & ML interests

Recent Activity

Organizations

Keven16's activity

Navigating the RLHF Landscape: From Policy Gradients to PPO, GAE, and DPO for LLM Alignment