Amsi-fin-o1: Financial Thinking Vision-Language Model
A fine-tuned Qwen3-VL 4B model specialized for financial document understanding, chart analysis, and chain-of-thought reasoning.
Model Description
Amsi-fin-o1 is a vision-language model fine-tuned for financial applications. It combines:
- Visual understanding of charts, tables, and financial documents
- Chain-of-thought reasoning for complex financial analysis
- OCR capabilities for extracting text from financial images
Base Model
huihui-ai/Huihui-Qwen3-VL-4B-Thinking-abliterated
Training Details
Hardware
- GPU: NVIDIA A100 80GB PCIe
- Precision: BF16 (Brain Floating Point 16)
Training Stages
The model was trained in 4 progressive stages:
| Stage | Focus | Steps | Learning Rate | Datasets |
|---|---|---|---|---|
| B1 | Financial Text | 1,200 | 8e-6 | FinTrain (70%), FinTrain-Math (15%), OCR (10%), ChartQA (5%) |
| B2 | Visual OCR | 1,500 | 8e-6 | MultiFinBen-OCR (50%), SecureFinAI-OCR (20%), ChartQA (20%), NuminaMath (10%) |
| B3 | Chain-of-Thought | 2,000 | 6e-6 | CoTA (50%), Program-CoTA (50%) |
| B4 | Mixed Fine-tuning | 1,200 | 6e-6 | All datasets combined |
Training Configuration
bf16: true
full_finetune: true
per_device_train_batch_size: 1
gradient_accumulation_steps: 8
max_seq_length: 2048
target_context_length: 131072
rope_scaling_type: dynamic
rope_scaling_factor: 64.0
warmup_ratio: 0.03
weight_decay: 0.01
Datasets Used
Financial Data
- Salesforce/FinTrain - Financial training data with various supervised tasks
- TheFinAI/FinCoT - Financial Chain-of-Thought reasoning
Visual/OCR Data
- TheFinAI/MultiFinBen-EnglishOCR - Financial document OCR
- TheFinAI/SecureFinAI_Contest_2025-Task_3_EnglishOCR - Security-focused financial OCR
- HuggingFaceM4/ChartQA - Chart understanding and QA
Reasoning Data
- Salesforce/cota-llava - Chain-of-Thought visual reasoning
- Salesforce/program-cota-llava - Programmatic CoT reasoning
- AI-MO/NuminaMath-CoT - Mathematical reasoning
Usage
from transformers import AutoProcessor, AutoModelForVision2Seq
import torch
model_id = "AITRADER/Amsi-fin-o1"
processor = AutoProcessor.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForVision2Seq.from_pretrained(
model_id,
torch_dtype=torch.bfloat16,
device_map="auto",
trust_remote_code=True
)
# For text-only queries
messages = [
{"role": "user", "content": "Analyze the following financial statement..."}
]
# For image + text queries
from PIL import Image
image = Image.open("financial_chart.png")
messages = [
{"role": "user", "content": [
{"type": "image", "image": image},
{"type": "text", "text": "What trends do you see in this chart?"}
]}
]
inputs = processor.apply_chat_template(messages, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512)
response = processor.decode(outputs[0], skip_special_tokens=True)
print(response)
Intended Use
- Financial document analysis
- Chart and graph interpretation
- Financial OCR and text extraction
- Chain-of-thought financial reasoning
- Investment research assistance
Limitations
- Optimized for English financial content
- May not generalize well to non-financial domains
- Should not be used as sole source for investment decisions
License
Apache 2.0
Citation
@misc{amsi-fin-o1,
author = {AITRADER},
title = {Amsi-fin-o1: Financial Thinking Vision-Language Model},
year = {2025},
publisher = {HuggingFace},
url = {https://huggingface.co/AITRADER/Amsi-fin-o1}
}
- Downloads last month
- 12
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
Model tree for AITRADER/Amsi-fin-o1
Base model
Qwen/Qwen3-VL-4B-Thinking