SAM-1-Base

SAM-1-Base (Shopping Agent Model) is a 7.6B parameter language model fine-tuned for commerce reasoning tasks, developed by SnapCart AI. Built on Qwen2.5-7B-Instruct via LoRA adaptation, SAM-1-Base achieves 90.55/100 on SAM-Bench, a comprehensive benchmark spanning 8 shopping assistant task types.

This is the fully merged model (LoRA weights fused into the base) — ready for direct inference with no adapter loading required.

Links: GitHub | SAM-Bench Paper | LoRA Adapter | SAM SDK

Model Specifications


Architecture	Qwen2 (Transformer decoder, GQA, SwiGLU, RoPE, RMSNorm)
Parameters	7.62B total
Base model	Qwen/Qwen2.5-7B-Instruct
Fine-tuning	LoRA (rank 16, merged into base weights)
Precision	bfloat16
Context length	32,768 tokens
Vocabulary	152,064 tokens
Chat format	ChatML (`<\|im_start\|>` / `<\|im_end\|>`)
Model size	~14.2 GB (3 SafeTensors shards)
License	Apache 2.0

Architecture details

Parameter	Value
Hidden size	3,584
Intermediate size (FFN)	18,944
Attention heads	28
KV heads (GQA)	4 (7:1 ratio)
Head dimension	128
Layers	28
RoPE theta	1,000,000
Norm	RMSNorm (eps=1e-6)
Activation	SiLU (SwiGLU gating)
Position encoding	RoPE
Attention bias	Yes (on Q/K/V projections)
Tied embeddings	No

Requirements

pip install mlx-lm>=0.12

For transformers-based inference (non-Apple Silicon):

pip install transformers>=4.37.0 torch accelerate

Quickstart

MLX (Apple Silicon)

from mlx_lm import load, generate

model, tokenizer = load("snapcart-ai/sam-1-base")

messages = [
    {"role": "system", "content": "You are SAM, an expert AI shopping assistant."},
    {"role": "user", "content": "I need wireless headphones under $100 for running. I care most about battery life and water resistance."}
]

prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
response = generate(model, tokenizer, prompt=prompt, max_tokens=512, temp=0.7, top_p=0.8)
print(response)

Transformers (GPU)

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    "snapcart-ai/sam-1-base",
    torch_dtype="auto",
    device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("snapcart-ai/sam-1-base")

messages = [
    {"role": "system", "content": "You are SAM, an expert AI shopping assistant."},
    {"role": "user", "content": "Compare the Sony WH-1000XM5 and Bose QuietComfort Ultra. Which should I buy if I care about noise cancellation and comfort for long flights?"}
]

text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer([text], return_tensors="pt").to(model.device)
output = model.generate(**inputs, max_new_tokens=512, temperature=0.7, top_p=0.8, top_k=20)
response = tokenizer.decode(output[0][inputs.input_ids.shape[1]:], skip_special_tokens=True)
print(response)

Chat Template

SAM-1-Base uses the ChatML format inherited from Qwen2.5:

<|im_start|>system
You are SAM, an expert AI shopping assistant.<|im_end|>
<|im_start|>user
{user_message}<|im_end|>
<|im_start|>assistant
{response}<|im_end|>

Evaluation

SAM-1-Base is evaluated on SAM-Bench v0.1.0 — a benchmark of 1,200 programmatically generated tasks across 8 shopping assistant capabilities and 3 difficulty levels. Evaluation uses the 60% test split (719 tasks) with difficulty-weighted composite scoring (easy=1.0x, medium=1.5x, hard=2.0x).

Overall Results

Model	SAM-Bench Score	Tasks Evaluated
SAM-1-Base	90.55	719
Heuristic Baseline	84.27	200
Random Baseline	29.82	200

Performance by Task Type

Task Type	Description	Metrics	SAM-1-Base
Query Understanding	Parse intent and filters from shopping queries	Intent Accuracy, Filter F1	98.37
Attribute Extraction	Extract structured attributes from product text	Precision, Recall, F1	97.57
Product Comparison	Compare products, select the best option	Winner Accuracy, Attr. Coverage	94.88
Purchase Decision	Decide which product to buy given constraints	Decision Accuracy, Budget Fit	94.11
Review Synthesis	Summarize reviews with sentiment analysis	ROUGE-L, Sentiment Accuracy	92.45
Price Analysis	Analyze price trends, recommend buy/wait	Trend Accuracy, Rec. Accuracy	89.39
Product Recommendation	Rank products by relevance to user needs	NDCG@5, Relevance F1	77.76
Personalization	Rank products for a specific user profile	NDCG@5, Top-Pick Accuracy	77.59

Performance by Difficulty

Difficulty	SAM-1-Base	Heuristic Baseline	Random Baseline
Easy (404 tasks, 1.0x weight)	96.89	75.00	29.82
Medium (399 tasks, 1.5x weight)	91.17	91.67	25.19
Hard (397 tasks, 2.0x weight)	86.84	80.00	17.01

Scoring Methodology

Each task is scored using task-type-specific metrics:

Ranking tasks (Recommendation, Personalization): NDCG@5 + Relevance F1
Classification tasks (Price Analysis, Purchase Decision): Accuracy metrics
Generation tasks (Review Synthesis): ROUGE-L + Sentiment Accuracy
Extraction tasks (Attribute Extraction, Query Understanding): Precision, Recall, F1

The composite task score is the arithmetic mean of all applicable metrics, scaled to [0, 100]. The overall benchmark score is the difficulty-weighted average across all tasks.

Domain Coverage

SAM-1-Base is designed for 8 core shopping assistant capabilities:

Capability	What it does	Example
Product Recommendation	Rank products by user preferences and constraints	"Find me a laptop under $800 for video editing"
Product Comparison	Compare products across attributes, pick a winner	"Sony WH-1000XM5 vs Bose QC Ultra — which is better?"
Review Synthesis	Summarize product reviews with sentiment detection	"What do people say about the Galaxy S24 camera?"
Price Analysis	Analyze price trends and recommend buy/wait timing	"Is this a good time to buy a 4K TV?"
Purchase Decision	Make purchase recommendations given budget and needs	"I have $500 — should I get the iPad Air or Galaxy Tab?"
Attribute Extraction	Extract structured product attributes from descriptions	"Samsung 65-inch 4K QLED..." → {brand, size, resolution, ...}
Query Understanding	Parse shopping queries into intent and structured filters	"red Nike running shoes size 10 under $80" → {intent, filters}
Personalization	Personalized ranking based on user profile and history	Rank products for a user who prefers premium electronics

Intended Use

Intended for:

Research evaluation and comparison of shopping assistant models
Prototyping and development of e-commerce AI features
Academic study of domain-specific LLM fine-tuning

Not intended for:

Production deployment without additional safety testing and evaluation on real-world data
Financial advice or automated purchasing decisions without human oversight
Use cases outside the e-commerce/shopping domain

Limitations

Synthetic evaluation only. SAM-Bench uses programmatically generated tasks with synthetic products, reviews, and user profiles. Performance on real-world shopping data may differ.
English only. The model has been fine-tuned and evaluated exclusively on English-language tasks.
Text only. No multimodal capabilities — cannot process product images, size charts, or videos.
Ranking tasks are weakest. Product recommendation (77.76) and personalization (77.59) lag behind other categories, indicating room for improvement in preference modeling.
No real-time data. The model has no access to current prices, inventory, or product availability.
Inherited base model limitations. As a fine-tune of Qwen2.5-7B-Instruct, SAM-1-Base inherits its base model's biases and knowledge cutoff.

Model Variants

Variant	Description	Size	Link
SAM-1-Base (this model)	Merged model, ready for inference	14.2 GB	snapcart-ai/sam-1-base
SAM-1-Base LoRA	Adapter weights only (requires base model)	88 MB	snapcart-ai/sam-1-base-lora

Generation Parameters

The recommended generation parameters (from the model's generation_config.json):

Parameter	Value
`temperature`	0.7
`top_p`	0.8
`top_k`	20
`repetition_penalty`	1.05

Citation

@misc{sam-bench-2026,
  title={SAM-Bench: A Comprehensive Benchmark for Evaluating AI Shopping Assistants},
  author={SnapCart AI Research Team},
  year={2026},
  url={https://github.com/snapcart-ai/sam-bench}
}