V4FinBench TabPFN Checkpoints

Finetuned TabPFN v2 checkpoints for corporate financial distress prediction on V4FinBench. We release one checkpoint per prediction horizon (h = 0, …, 5), each trained with the prototype-undersampling context-construction strategy that performs best in our experiments.

What's in this repo

Six TabPFN v2 checkpoints, one per V4FinBench horizon task:

File Horizon Description
tabpfn_h0.ckpt h = 0 Current-year financial distress
tabpfn_h1.ckpt h = 1 One-year-ahead distress
tabpfn_h2.ckpt h = 2 Two-year-ahead distress
tabpfn_h3.ckpt h = 3 Three-year-ahead distress
tabpfn_h4.ckpt h = 4 Four-year-ahead distress
tabpfn_h5.ckpt h = 5 Five-year-ahead distress

Each checkpoint is the per-horizon model selected by maximizing $F_1$-score on the validation fold, finetuned from the pretrained TabPFN v2 base with imbalance-aware in-context construction (clustered majority prototypes paired with all minority examples).

Loading checkpoints

See the V4FinBench code repository for loading and inference: https://github.com/genwro-ai/V4FinBench

Training details

  • Base model: TabPFN v2
  • Finetuning: Adam, 10 epochs, learning rate 5e-6, batch size 1024, inference context size 10,000.
  • Context construction: prototype undersampling — for each context, all minority-class examples are retained; majority-class examples are selected by clustering with MiniBatchKMeans and keeping the real observation closest to each centroid, until $N_{\text{min}} / N_{\text{maj}} = 0.3$.
  • Hardware: single NVIDIA A100 GPU per run.
  • Folds: 5-fold company-grouped, country-stratified cross-validation. The released checkpoint per horizon is the best-performing fold by validation $F_1$.

Full configuration, evaluation protocol, and per-fold results are documented in the paper and in the GitHub code repository.

Scope of this release

This repository contains a curated subset of the checkpoints produced in the paper. The full reference experiments produce 90 checkpoints total (6 horizons × 5 folds × 3 context-construction strategies); reproducing them is fully supported by the released folds and training code in the GitHub repository.

Intended use

These checkpoints are intended for:

  • Benchmarking corporate financial-distress prediction methods on V4FinBench under the released evaluation protocol.
  • Research on tabular foundation models, in-context learning under severe class imbalance, and cross-dataset transfer in financial prediction (e.g., the American Bankruptcy Dataset transfer experiment in the paper).

Limitations and responsible use

  • These checkpoints are research artifacts, not production risk-scoring models. They are not intended for, and should not be used to make, individual credit decisions, supplier-risk decisions, or any other automated decisions about specific companies without additional jurisdiction-specific validation, fairness analysis, and human oversight.
  • V4FinBench labels capture composite financial distress (joint deterioration in solvency, profitability, and liquidity), not formal legal bankruptcy filings. Predictions reflect the operational distress definition in the paper and may not generalize to other distress or bankruptcy definitions.
  • Training data covers four Central European economies (Poland, Hungary, Czech Republic, Slovakia) over 2006–2021. Performance on other economies, accounting standards, or time periods is not guaranteed; see the cross-dataset transfer analysis in the paper for one external evaluation point.

License

These checkpoints are derivatives of TabPFN v2 and are released under the TabPFN license included in this repository (LICENSE). Please review the license terms before use.

Citation

A citation entry will be added once the preprint is available.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support