Code Agent - a btjhjeon Collection

btjhjeon 's Collections

Multimodal Agent

Multimodal System

Multimodal Reasoning

Multimodal Analysis

Multimodal Alignment

LLM context length

Multimodal Dataset

Multimodal Benchmarks

Code Agent

updated Feb 7

CODESIM: Multi-Agent Code Generation and Problem Solving through Simulation-Driven Planning and Debugging

Paper • 2502.05664 • Published Feb 8, 2025 • 24
AgentCoder: Multi-Agent-based Code Generation with Iterative Testing and Optimisation

Paper • 2312.13010 • Published Dec 20, 2023 • 6
HyperAgent: Generalist Software Engineering Agents to Solve Coding Tasks at Scale

Paper • 2409.16299 • Published Sep 9, 2024 • 11
Vibe Coding vs. Agentic Coding: Fundamentals and Practical Implications of Agentic AI

Paper • 2505.19443 • Published May 26, 2025 • 15
GSO: Challenging Software Optimization Tasks for Evaluating SWE-Agents

Paper • 2505.23671 • Published May 29, 2025 • 3
SWE-Flow: Synthesizing Software Engineering Data in a Test-Driven Manner

Paper • 2506.09003 • Published Jun 10, 2025 • 17
Skywork-SWE: Unveiling Data Scaling Laws for Software Engineering in LLMs

Paper • 2506.19290 • Published Jun 24, 2025 • 53
SWE-Perf: Can Language Models Optimize Code Performance on Real-World Repositories?

Paper • 2507.12415 • Published Jul 16, 2025 • 43
GitChameleon: Evaluating AI Code Generation Against Python Library Version Incompatibilities

Paper • 2507.12367 • Published Jul 16, 2025 • 7
SWE-Exp: Experience-Driven Software Issue Resolution

Paper • 2507.23361 • Published Jul 31, 2025 • 14
SWE-Debate: Competitive Multi-Agent Debate for Software Issue Resolution

Paper • 2507.23348 • Published Jul 31, 2025 • 12
GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models

Paper • 2508.06471 • Published Aug 8, 2025 • 212
CoAct-1: Computer-using Agents with Coding as Actions

Paper • 2508.03923 • Published Aug 5, 2025 • 13
Training Long-Context, Multi-Turn Software Engineering Agents with Reinforcement Learning

Paper • 2508.03501 • Published Aug 5, 2025 • 59
Agentic Software Engineering: Foundational Pillars and a Research Roadmap

Paper • 2509.06216 • Published Sep 7, 2025 • 8
SWE-Bench Pro: Can AI Agents Solve Long-Horizon Software Engineering Tasks?

Paper • 2509.16941 • Published Sep 21, 2025 • 21
Live-SWE-agent: Can Software Engineering Agents Self-Evolve on the Fly?

Paper • 2511.13646 • Published Nov 17, 2025 • 10
From Code Foundation Models to Agents and Applications: A Practical Guide to Code Intelligence

Paper • 2511.18538 • Published Nov 23, 2025 • 306
NL2Repo-Bench: Towards Long-Horizon Repository Generation Evaluation of Coding Agents

Paper • 2512.12730 • Published Dec 14, 2025 • 52
SWE-Bench++: A Framework for the Scalable Generation of Software Engineering Benchmarks from Open-Source Repositories

Paper • 2512.17419 • Published Dec 19, 2025 • 10
SWE-EVO: Benchmarking Coding Agents in Long-Horizon Software Evolution Scenarios

Paper • 2512.18470 • Published Dec 20, 2025 • 12
SWE-RM: Execution-free Feedback For Software Engineering Agents

Paper • 2512.21919 • Published Dec 26, 2025 • 10
Agentic Rubrics as Contextual Verifiers for SWE Agents

Paper • 2601.04171 • Published Jan 7 • 13
ABC-Bench: Benchmarking Agentic Backend Coding in Real-World Development

Paper • 2601.11077 • Published Jan 16 • 67
SWE-Pruner: Self-Adaptive Context Pruning for Coding Agents

Paper • 2601.16746 • Published Jan 23 • 92
CooperBench: Why Coding Agents Cannot be Your Teammates Yet

Paper • 2601.13295 • Published Jan 19 • 5
TAM-Eval: Evaluating LLMs for Automated Unit Test Maintenance

Paper • 2601.18241 • Published Jan 26 • 9
SWE-Universe: Scale Real-World Verifiable Environments to Millions

Paper • 2602.02361 • Published Feb 2 • 61
MEnvAgent: Scalable Polyglot Environment Construction for Verifiable Software Engineering

Paper • 2601.22859 • Published Jan 30 • 18