Ayhan Sebin's picture

Ayhan Sebin PRO

ayhansebin

·

AI & ML interests

None yet

Recent Activity

upvoted a paper 3 days ago

MCP-Cosmos: World Model-Augmented Agents for Complex Task Execution in MCP Environments

upvoted an article about 1 month ago

Inside VAKRA: Reasoning, Tool Use, and Failure Modes of Agents

updated a collection about 1 month ago

Enterprise Agents and Benchmarks

View all activity

Organizations

upvoted a paper 3 days ago

MCP-Cosmos: World Model-Augmented Agents for Complex Task Execution in MCP Environments

Paper • 2605.09131 • Published 8 days ago • 38

upvoted an article about 1 month ago

Article

Inside VAKRA: Reasoning, Tool Use, and Failure Modes of Agents

ibm-research

•

Apr 15

• 28

updated a collection about 1 month ago

Enterprise Agents and Benchmarks

Enterprise agent ecosystem featuring AssetOpsBench (industrial) and ITBench (SRE, FinOps, CISO), CUGA to accelerate AI Automation • 18 items • Updated 3 days ago • 16

liked a Space about 1 month ago

ScarfBench

Java framework migration

liked a dataset about 1 month ago

ibm-research/ScarfBench

Updated Apr 9 • 524 • 6

upvoted an article about 1 month ago

Article

ALTK‑Evolve: On‑the‑Job Learning for AI Agents

ibm-research

•

Apr 8

• 27

upvoted a paper about 2 months ago

From Static Templates to Dynamic Runtime Graphs: A Survey of Workflow Optimization for LLM Agents

Paper • 2603.22386 • Published Mar 23 • 57

liked a dataset about 2 months ago

ibm-research/VAKRA

Viewer • Updated Mar 31 • 1.33k • 1.94k • 43

liked a Space about 2 months ago

VAKRA Leaderboard

Evaluate AI agents on multi‑hop, multi‑source enterprise tasks

updated a collection about 2 months ago

Enterprise Agents and Benchmarks

Enterprise agent ecosystem featuring AssetOpsBench (industrial) and ITBench (SRE, FinOps, CISO), CUGA to accelerate AI Automation • 18 items • Updated 3 days ago • 16

upvoted a collection 2 months ago

Time Series Models

A collection of time series models trained by IBM • 4 items • Updated Feb 25 • 1

upvoted a collection 3 months ago

Granite Time Series

Time series models for forecasting, anomaly detection, classification, and more. • 9 items • Updated 17 days ago • 51

liked a Space 3 months ago

BlueBench Leaderboard

An open-source benchmark for enterprise use cases.

upvoted an article 3 months ago

Article

IBM and UC Berkeley Diagnose Why Enterprise Agents Fail Using IT-Bench and MAST

ibm-research

•

Feb 18

• 19

published an article 3 months ago

Article

IBM and UC Berkeley Diagnose Why Enterprise Agents Fail Using IT-Bench and MAST

ibm-research

•

Feb 18

• 19

liked a dataset 3 months ago

mcemri/MAST-Data

Preview • Updated Jul 21, 2025 • 363 • 15

liked a Space 3 months ago

fev-bench

Forecast evaluation benchmark

upvoted a paper 3 months ago

ITBench: Evaluating AI Agents across Diverse Real-World IT Automation Tasks

Paper • 2502.05352 • Published Feb 7, 2025 • 2