-
Safety in Large Reasoning Models: A Survey
Paper • 2504.17704 • Published -
Thinking Longer, Not Larger: Enhancing Software Engineering Agents via Scaling Test-Time Compute
Paper • 2503.23803 • Published • 8 -
A.S.E: A Repository-Level Benchmark for Evaluating Security in AI-Generated Code
Paper • 2508.18106 • Published • 346 -
Where LLM Agents Fail and How They can Learn From Failures
Paper • 2509.25370 • Published • 11
Tianya Liang
tl569
AI & ML interests
None yet
Organizations
None yet
ARC
-
Chain-of-Reasoning: Towards Unified Mathematical Reasoning in Large Language Models via a Multi-Paradigm Perspective
Paper • 2501.11110 • Published • 4 -
R1-Code-Interpreter: Training LLMs to Reason with Code via Supervised and Reinforcement Learning
Paper • 2505.21668 • Published • 2 -
Language Models as Compilers: Simulating Pseudocode Execution Improves Algorithmic Reasoning in Language Models
Paper • 2404.02575 • Published • 50
IIB
-
Safety in Large Reasoning Models: A Survey
Paper • 2504.17704 • Published -
Thinking Longer, Not Larger: Enhancing Software Engineering Agents via Scaling Test-Time Compute
Paper • 2503.23803 • Published • 8 -
A.S.E: A Repository-Level Benchmark for Evaluating Security in AI-Generated Code
Paper • 2508.18106 • Published • 346 -
Where LLM Agents Fail and How They can Learn From Failures
Paper • 2509.25370 • Published • 11
ARC
-
Chain-of-Reasoning: Towards Unified Mathematical Reasoning in Large Language Models via a Multi-Paradigm Perspective
Paper • 2501.11110 • Published • 4 -
R1-Code-Interpreter: Training LLMs to Reason with Code via Supervised and Reinforcement Learning
Paper • 2505.21668 • Published • 2 -
Language Models as Compilers: Simulating Pseudocode Execution Improves Algorithmic Reasoning in Language Models
Paper • 2404.02575 • Published • 50