Papers
arxiv:2605.22681

Forecasting Scientific Progress with Artificial Intelligence

Published on May 21
· Submitted by
Sean Wu
on May 22
Authors:
,
,
,
,
,
,
,
,

Abstract

Current AI systems demonstrate limited capability in predicting scientific progress, showing inconsistent performance across domains and systematic overconfidence in forecasts.

AI-generated summary

Artificial intelligence (AI) is increasingly embedded in scientific discovery, yet whether it can anticipate scientific progress remains unclear. To study this question, we introduce a temporally grounded evaluation framework for forecasting scientific progress under controlled knowledge constraints. We present CUSP (Cutoff-conditioned Unseen Scientific Progress), a multi-disciplinary and event-level benchmark that evaluates scientific forecasting in AI systems through feasibility assessment, mechanistic reasoning, generative solution design, and temporal prediction. Across 4,760 scientific events, we observe systematic and domain-dependent limitations in current frontier models. While models can identify plausible research directions from competing candidates, they fail to reliably predict whether scientific advances will be realized and systematically misestimate when they will occur. Performance is highly heterogeneous across domains, with the timing of AI progress more predictable than advances in biology, chemistry, and physics. Performance is largely insensitive to whether events occur before or after the training cutoff, suggesting these limitations cannot be explained solely by knowledge exposure in training data. Under controlled information access, additional pre-cutoff knowledge improves performance but does not close the gap to full-information settings, which becomes more pronounced for high-citation advances. Models also exhibit systematic overconfidence and strong response biases, indicating unreliable uncertainty estimation. Taken together, current AI systems fall short as predictive tools for scientific progress. Access to prior knowledge does not translate into reliable forecasting, and performance benefits more from post-event information than from forward-looking prediction.

Community

Paper author Paper submitter

Can AI forecast scientific discovery, or just recognize past science? We built CUSP to test this rigorously.
4,760 real breakthroughs across biology, chemistry, physics, medicine, and AI, with verified knowledge cutoffs. Four forecasting tasks per event: feasibility, mechanism, solution design, and timing.

Across 6 frontier models, the pattern is consistent: strong recognition (GPT-5.4 at 82% on mechanism), but near-chance forecasting, with timing predictions overshooting by ~14 months. The gap doesn't close with web search and grows for high-impact discoveries.

CUSP_Figure_1

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2605.22681 in a model README.md to link it from this page.

Datasets citing this paper 1

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2605.22681 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.