9 33 11

Junxiao Yang

yangjunxiao2021

https://yangjunx21.github.io/

yangjunx21

AI & ML interests

Alignment/AI safety

Recent Activity

authored a paper 1 day ago

AgentDoG 1.5: A Lightweight and Scalable Alignment Framework for AI Agent Safety and Security

upvoted a paper 2 days ago

SynCred-Bench: Benchmarking Synthetic Credibility in AI-Generated Visual Misinformation

submitted a paper 2 days ago

SynCred-Bench: Benchmarking Synthetic Credibility in AI-Generated Visual Misinformation

View all activity

Organizations

upvoted a paper 2 days ago

SynCred-Bench: Benchmarking Synthetic Credibility in AI-Generated Visual Misinformation

Paper • 2606.03348 • Published 4 days ago • 2

upvoted a paper 8 days ago

AgentDoG 1.5: A Lightweight and Scalable Alignment Framework for AI Agent Safety and Security

Paper • 2605.29801 • Published 9 days ago • 142

upvoted a paper 21 days ago

Achieving Gold-Medal-Level Olympiad Reasoning via Simple and Unified Scaling

Paper • 2605.13301 • Published 24 days ago • 159

upvoted a paper about 2 months ago

LASA: Language-Agnostic Semantic Alignment at the Semantic Bottleneck for LLM Safety

Paper • 2604.12710 • Published Apr 13 • 5

upvoted an article 2 months ago

Article

Uncensor any LLM with abliteration

mlabonne

•

Jun 13, 2024

• 863

upvoted a paper 3 months ago

CUDA Agent: Large-Scale Agentic RL for High-Performance CUDA Kernel Generation

Paper • 2602.24286 • Published Feb 27 • 99

upvoted 4 papers 7 months ago

HaluMem: Evaluating Hallucinations in Memory Systems of Agents

Paper • 2511.03506 • Published Nov 5, 2025 • 95

The Tool Decathlon: Benchmarking Language Agents for Diverse, Realistic, and Long-Horizon Task Execution

Paper • 2510.25726 • Published Oct 29, 2025 • 47

DeepAgent: A General Reasoning Agent with Scalable Toolsets

Paper • 2510.21618 • Published Oct 24, 2025 • 103

The Illusion of Diminishing Returns: Measuring Long Horizon Execution in LLMs

Paper • 2509.09677 • Published Sep 11, 2025 • 37

upvoted an article 7 months ago

Article

DABStep: Data Agent Benchmark for Multi-step Reasoning

eggie5, martinigoyanes, frisokingma, andreumora, lvwerra, thomwolf, m-ric

•

Feb 4, 2025

• 131

upvoted a paper 8 months ago

It Takes Two: Your GRPO Is Secretly DPO

Paper • 2510.00977 • Published Oct 1, 2025 • 32

upvoted a collection 8 months ago

Agent & RL

Collection

55 items • Updated Nov 27, 2025 • 21

upvoted 7 papers 8 months ago

Glyph: Scaling Context Windows via Visual-Text Compression

Paper • 2510.17800 • Published Oct 20, 2025 • 69

A Theoretical Study on Bridging Internal Probability and Self-Consistency for LLM Reasoning

Paper • 2510.15444 • Published Oct 17, 2025 • 151

Beyond Ten Turns: Unlocking Long-Horizon Agentic Search with Large-Scale Asynchronous RL

Paper • 2508.07976 • Published Aug 11, 2025 • 53

In-the-Flow Agentic System Optimization for Effective Planning and Tool Use

Paper • 2510.05592 • Published Oct 7, 2025 • 112

Fathom-DeepResearch: Unlocking Long Horizon Information Retrieval and Synthesis for SLMs

Paper • 2509.24107 • Published Sep 28, 2025 • 80

BrowseComp: A Simple Yet Challenging Benchmark for Browsing Agents

Paper • 2504.12516 • Published Apr 16, 2025 • 2

BrowseComp-ZH: Benchmarking Web Browsing Ability of Large Language Models in Chinese

Paper • 2504.19314 • Published Apr 27, 2025 • 8

Junxiao Yang

AI & ML interests

Recent Activity

Organizations

yangjunxiao2021's activity

Uncensor any LLM with abliteration

DABStep: Data Agent Benchmark for Multi-step Reasoning