SAM-1-Base
SAM-1-Base (Shopping Agent Model) is a 7.6B parameter language model fine-tuned for commerce reasoning tasks, developed by SnapCart AI. Built on Qwen2.5-7B-Instruct via LoRA adaptation, SAM-1-Base achieves 90.55/100 on SAM-Bench, a comprehensive benchmark spanning 8 shopping assistant task types.
This is the fully merged model (LoRA weights fused into the base) — ready for direct inference with no adapter loading required.
Links: GitHub | SAM-Bench Paper | LoRA Adapter | SAM SDK
Model Specifications
| Architecture | Qwen2 (Transformer decoder, GQA, SwiGLU, RoPE, RMSNorm) |
| Parameters | 7.62B total |
| Base model | Qwen/Qwen2.5-7B-Instruct |
| Fine-tuning | LoRA (rank 16, merged into base weights) |
| Precision | bfloat16 |
| Context length | 32,768 tokens |
| Vocabulary | 152,064 tokens |
| Chat format | ChatML (<|im_start|> / <|im_end|>) |
| Model size | ~14.2 GB (3 SafeTensors shards) |
| License | Apache 2.0 |
Architecture details
| Parameter | Value |
|---|---|
| Hidden size | 3,584 |
| Intermediate size (FFN) | 18,944 |
| Attention heads | 28 |
| KV heads (GQA) | 4 (7:1 ratio) |
| Head dimension | 128 |
| Layers | 28 |
| RoPE theta | 1,000,000 |
| Norm | RMSNorm (eps=1e-6) |
| Activation | SiLU (SwiGLU gating) |
| Position encoding | RoPE |
| Attention bias | Yes (on Q/K/V projections) |
| Tied embeddings | No |
Requirements
pip install mlx-lm>=0.12
For transformers-based inference (non-Apple Silicon):
pip install transformers>=4.37.0 torch accelerate
Quickstart
MLX (Apple Silicon)
from mlx_lm import load, generate
model, tokenizer = load("snapcart-ai/sam-1-base")
messages = [
{"role": "system", "content": "You are SAM, an expert AI shopping assistant."},
{"role": "user", "content": "I need wireless headphones under $100 for running. I care most about battery life and water resistance."}
]
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
response = generate(model, tokenizer, prompt=prompt, max_tokens=512, temp=0.7, top_p=0.8)
print(response)
Transformers (GPU)
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained(
"snapcart-ai/sam-1-base",
torch_dtype="auto",
device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("snapcart-ai/sam-1-base")
messages = [
{"role": "system", "content": "You are SAM, an expert AI shopping assistant."},
{"role": "user", "content": "Compare the Sony WH-1000XM5 and Bose QuietComfort Ultra. Which should I buy if I care about noise cancellation and comfort for long flights?"}
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer([text], return_tensors="pt").to(model.device)
output = model.generate(**inputs, max_new_tokens=512, temperature=0.7, top_p=0.8, top_k=20)
response = tokenizer.decode(output[0][inputs.input_ids.shape[1]:], skip_special_tokens=True)
print(response)
Chat Template
SAM-1-Base uses the ChatML format inherited from Qwen2.5:
<|im_start|>system
You are SAM, an expert AI shopping assistant.<|im_end|>
<|im_start|>user
{user_message}<|im_end|>
<|im_start|>assistant
{response}<|im_end|>
Evaluation
SAM-1-Base is evaluated on SAM-Bench v0.1.0 — a benchmark of 1,200 programmatically generated tasks across 8 shopping assistant capabilities and 3 difficulty levels. Evaluation uses the 60% test split (719 tasks) with difficulty-weighted composite scoring (easy=1.0x, medium=1.5x, hard=2.0x).
Overall Results
| Model | SAM-Bench Score | Tasks Evaluated |
|---|---|---|
| SAM-1-Base | 90.55 | 719 |
| Heuristic Baseline | 84.27 | 200 |
| Random Baseline | 29.82 | 200 |
Performance by Task Type
| Task Type | Description | Metrics | SAM-1-Base |
|---|---|---|---|
| Query Understanding | Parse intent and filters from shopping queries | Intent Accuracy, Filter F1 | 98.37 |
| Attribute Extraction | Extract structured attributes from product text | Precision, Recall, F1 | 97.57 |
| Product Comparison | Compare products, select the best option | Winner Accuracy, Attr. Coverage | 94.88 |
| Purchase Decision | Decide which product to buy given constraints | Decision Accuracy, Budget Fit | 94.11 |
| Review Synthesis | Summarize reviews with sentiment analysis | ROUGE-L, Sentiment Accuracy | 92.45 |
| Price Analysis | Analyze price trends, recommend buy/wait | Trend Accuracy, Rec. Accuracy | 89.39 |
| Product Recommendation | Rank products by relevance to user needs | NDCG@5, Relevance F1 | 77.76 |
| Personalization | Rank products for a specific user profile | NDCG@5, Top-Pick Accuracy | 77.59 |
Performance by Difficulty
| Difficulty | SAM-1-Base | Heuristic Baseline | Random Baseline |
|---|---|---|---|
| Easy (404 tasks, 1.0x weight) | 96.89 | 75.00 | 29.82 |
| Medium (399 tasks, 1.5x weight) | 91.17 | 91.67 | 25.19 |
| Hard (397 tasks, 2.0x weight) | 86.84 | 80.00 | 17.01 |
Scoring Methodology
Each task is scored using task-type-specific metrics:
- Ranking tasks (Recommendation, Personalization): NDCG@5 + Relevance F1
- Classification tasks (Price Analysis, Purchase Decision): Accuracy metrics
- Generation tasks (Review Synthesis): ROUGE-L + Sentiment Accuracy
- Extraction tasks (Attribute Extraction, Query Understanding): Precision, Recall, F1
The composite task score is the arithmetic mean of all applicable metrics, scaled to [0, 100]. The overall benchmark score is the difficulty-weighted average across all tasks.
Domain Coverage
SAM-1-Base is designed for 8 core shopping assistant capabilities:
| Capability | What it does | Example |
|---|---|---|
| Product Recommendation | Rank products by user preferences and constraints | "Find me a laptop under $800 for video editing" |
| Product Comparison | Compare products across attributes, pick a winner | "Sony WH-1000XM5 vs Bose QC Ultra — which is better?" |
| Review Synthesis | Summarize product reviews with sentiment detection | "What do people say about the Galaxy S24 camera?" |
| Price Analysis | Analyze price trends and recommend buy/wait timing | "Is this a good time to buy a 4K TV?" |
| Purchase Decision | Make purchase recommendations given budget and needs | "I have $500 — should I get the iPad Air or Galaxy Tab?" |
| Attribute Extraction | Extract structured product attributes from descriptions | "Samsung 65-inch 4K QLED..." → {brand, size, resolution, ...} |
| Query Understanding | Parse shopping queries into intent and structured filters | "red Nike running shoes size 10 under $80" → {intent, filters} |
| Personalization | Personalized ranking based on user profile and history | Rank products for a user who prefers premium electronics |
Intended Use
Intended for:
- Research evaluation and comparison of shopping assistant models
- Prototyping and development of e-commerce AI features
- Academic study of domain-specific LLM fine-tuning
Not intended for:
- Production deployment without additional safety testing and evaluation on real-world data
- Financial advice or automated purchasing decisions without human oversight
- Use cases outside the e-commerce/shopping domain
Limitations
- Synthetic evaluation only. SAM-Bench uses programmatically generated tasks with synthetic products, reviews, and user profiles. Performance on real-world shopping data may differ.
- English only. The model has been fine-tuned and evaluated exclusively on English-language tasks.
- Text only. No multimodal capabilities — cannot process product images, size charts, or videos.
- Ranking tasks are weakest. Product recommendation (77.76) and personalization (77.59) lag behind other categories, indicating room for improvement in preference modeling.
- No real-time data. The model has no access to current prices, inventory, or product availability.
- Inherited base model limitations. As a fine-tune of Qwen2.5-7B-Instruct, SAM-1-Base inherits its base model's biases and knowledge cutoff.
Model Variants
| Variant | Description | Size | Link |
|---|---|---|---|
| SAM-1-Base (this model) | Merged model, ready for inference | 14.2 GB | snapcart-ai/sam-1-base |
| SAM-1-Base LoRA | Adapter weights only (requires base model) | 88 MB | snapcart-ai/sam-1-base-lora |
Generation Parameters
The recommended generation parameters (from the model's generation_config.json):
| Parameter | Value |
|---|---|
temperature |
0.7 |
top_p |
0.8 |
top_k |
20 |
repetition_penalty |
1.05 |
Citation
@misc{sam-bench-2026,
title={SAM-Bench: A Comprehensive Benchmark for Evaluating AI Shopping Assistants},
author={SnapCart AI Research Team},
year={2026},
url={https://github.com/snapcart-ai/sam-bench}
}
License
This model is released under the Apache 2.0 License, consistent with the Qwen2.5-7B-Instruct license.
- Downloads last month
- 43
Quantized
Model tree for snapcart-ai/sam-1-base
Evaluation results
- SAM-Bench Overall on SAM-Benchself-reported90.550
- Query Understanding on SAM-Benchself-reported98.370
- Attribute Extraction on SAM-Benchself-reported97.570
- Product Comparison on SAM-Benchself-reported94.880
- Purchase Decision on SAM-Benchself-reported94.110
- Review Synthesis on SAM-Benchself-reported92.450
- Price Analysis on SAM-Benchself-reported89.390
- Product Recommendation on SAM-Benchself-reported77.760
- Personalization on SAM-Benchself-reported77.590