arxiv:2511.06411
Zhi Zheng
zz1358m
·
AI & ML interests
LLM reasoning, Trustworthy LLM, LLM application, Neural combinatorial optimization.
Recent Activity
liked
a model
about 1 month ago
zz1358m/SofT-GRPO-master
updated
a model
about 2 months ago
zz1358m/SofT-GRPO-master
authored
a paper
about 2 months ago
SofT-GRPO: Surpassing Discrete-Token LLM Reinforcement Learning via
Gumbel-Reparameterized Soft-Thinking Policy Optimization