SAM-1-Base

SAM-1-Base (Shopping Agent Model) is a 7.6B parameter language model fine-tuned for commerce reasoning tasks, developed by SnapCart AI. Built on Qwen2.5-7B-Instruct via LoRA adaptation, SAM-1-Base achieves 90.55/100 on SAM-Bench, a comprehensive benchmark spanning 8 shopping assistant task types.

This is the fully merged model (LoRA weights fused into the base) — ready for direct inference with no adapter loading required.

Links: GitHub | SAM-Bench Paper | LoRA Adapter | SAM SDK

Model Specifications

Architecture Qwen2 (Transformer decoder, GQA, SwiGLU, RoPE, RMSNorm)
Parameters 7.62B total
Base model Qwen/Qwen2.5-7B-Instruct
Fine-tuning LoRA (rank 16, merged into base weights)
Precision bfloat16
Context length 32,768 tokens
Vocabulary 152,064 tokens
Chat format ChatML (<|im_start|> / <|im_end|>)
Model size ~14.2 GB (3 SafeTensors shards)
License Apache 2.0
Architecture details
Parameter Value
Hidden size 3,584
Intermediate size (FFN) 18,944
Attention heads 28
KV heads (GQA) 4 (7:1 ratio)
Head dimension 128
Layers 28
RoPE theta 1,000,000
Norm RMSNorm (eps=1e-6)
Activation SiLU (SwiGLU gating)
Position encoding RoPE
Attention bias Yes (on Q/K/V projections)
Tied embeddings No

Requirements

pip install mlx-lm>=0.12

For transformers-based inference (non-Apple Silicon):

pip install transformers>=4.37.0 torch accelerate

Quickstart

MLX (Apple Silicon)

from mlx_lm import load, generate

model, tokenizer = load("snapcart-ai/sam-1-base")

messages = [
    {"role": "system", "content": "You are SAM, an expert AI shopping assistant."},
    {"role": "user", "content": "I need wireless headphones under $100 for running. I care most about battery life and water resistance."}
]

prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
response = generate(model, tokenizer, prompt=prompt, max_tokens=512, temp=0.7, top_p=0.8)
print(response)

Transformers (GPU)

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    "snapcart-ai/sam-1-base",
    torch_dtype="auto",
    device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("snapcart-ai/sam-1-base")

messages = [
    {"role": "system", "content": "You are SAM, an expert AI shopping assistant."},
    {"role": "user", "content": "Compare the Sony WH-1000XM5 and Bose QuietComfort Ultra. Which should I buy if I care about noise cancellation and comfort for long flights?"}
]

text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer([text], return_tensors="pt").to(model.device)
output = model.generate(**inputs, max_new_tokens=512, temperature=0.7, top_p=0.8, top_k=20)
response = tokenizer.decode(output[0][inputs.input_ids.shape[1]:], skip_special_tokens=True)
print(response)

Chat Template

SAM-1-Base uses the ChatML format inherited from Qwen2.5:

<|im_start|>system
You are SAM, an expert AI shopping assistant.<|im_end|>
<|im_start|>user
{user_message}<|im_end|>
<|im_start|>assistant
{response}<|im_end|>

Evaluation

SAM-1-Base is evaluated on SAM-Bench v0.1.0 — a benchmark of 1,200 programmatically generated tasks across 8 shopping assistant capabilities and 3 difficulty levels. Evaluation uses the 60% test split (719 tasks) with difficulty-weighted composite scoring (easy=1.0x, medium=1.5x, hard=2.0x).

Overall Results

Model SAM-Bench Score Tasks Evaluated
SAM-1-Base 90.55 719
Heuristic Baseline 84.27 200
Random Baseline 29.82 200

Performance by Task Type

Task Type Description Metrics SAM-1-Base
Query Understanding Parse intent and filters from shopping queries Intent Accuracy, Filter F1 98.37
Attribute Extraction Extract structured attributes from product text Precision, Recall, F1 97.57
Product Comparison Compare products, select the best option Winner Accuracy, Attr. Coverage 94.88
Purchase Decision Decide which product to buy given constraints Decision Accuracy, Budget Fit 94.11
Review Synthesis Summarize reviews with sentiment analysis ROUGE-L, Sentiment Accuracy 92.45
Price Analysis Analyze price trends, recommend buy/wait Trend Accuracy, Rec. Accuracy 89.39
Product Recommendation Rank products by relevance to user needs NDCG@5, Relevance F1 77.76
Personalization Rank products for a specific user profile NDCG@5, Top-Pick Accuracy 77.59

Performance by Difficulty

Difficulty SAM-1-Base Heuristic Baseline Random Baseline
Easy (404 tasks, 1.0x weight) 96.89 75.00 29.82
Medium (399 tasks, 1.5x weight) 91.17 91.67 25.19
Hard (397 tasks, 2.0x weight) 86.84 80.00 17.01

Scoring Methodology

Each task is scored using task-type-specific metrics:

  • Ranking tasks (Recommendation, Personalization): NDCG@5 + Relevance F1
  • Classification tasks (Price Analysis, Purchase Decision): Accuracy metrics
  • Generation tasks (Review Synthesis): ROUGE-L + Sentiment Accuracy
  • Extraction tasks (Attribute Extraction, Query Understanding): Precision, Recall, F1

The composite task score is the arithmetic mean of all applicable metrics, scaled to [0, 100]. The overall benchmark score is the difficulty-weighted average across all tasks.

Domain Coverage

SAM-1-Base is designed for 8 core shopping assistant capabilities:

Capability What it does Example
Product Recommendation Rank products by user preferences and constraints "Find me a laptop under $800 for video editing"
Product Comparison Compare products across attributes, pick a winner "Sony WH-1000XM5 vs Bose QC Ultra — which is better?"
Review Synthesis Summarize product reviews with sentiment detection "What do people say about the Galaxy S24 camera?"
Price Analysis Analyze price trends and recommend buy/wait timing "Is this a good time to buy a 4K TV?"
Purchase Decision Make purchase recommendations given budget and needs "I have $500 — should I get the iPad Air or Galaxy Tab?"
Attribute Extraction Extract structured product attributes from descriptions "Samsung 65-inch 4K QLED..." → {brand, size, resolution, ...}
Query Understanding Parse shopping queries into intent and structured filters "red Nike running shoes size 10 under $80" → {intent, filters}
Personalization Personalized ranking based on user profile and history Rank products for a user who prefers premium electronics

Intended Use

Intended for:

  • Research evaluation and comparison of shopping assistant models
  • Prototyping and development of e-commerce AI features
  • Academic study of domain-specific LLM fine-tuning

Not intended for:

  • Production deployment without additional safety testing and evaluation on real-world data
  • Financial advice or automated purchasing decisions without human oversight
  • Use cases outside the e-commerce/shopping domain

Limitations

  • Synthetic evaluation only. SAM-Bench uses programmatically generated tasks with synthetic products, reviews, and user profiles. Performance on real-world shopping data may differ.
  • English only. The model has been fine-tuned and evaluated exclusively on English-language tasks.
  • Text only. No multimodal capabilities — cannot process product images, size charts, or videos.
  • Ranking tasks are weakest. Product recommendation (77.76) and personalization (77.59) lag behind other categories, indicating room for improvement in preference modeling.
  • No real-time data. The model has no access to current prices, inventory, or product availability.
  • Inherited base model limitations. As a fine-tune of Qwen2.5-7B-Instruct, SAM-1-Base inherits its base model's biases and knowledge cutoff.

Model Variants

Variant Description Size Link
SAM-1-Base (this model) Merged model, ready for inference 14.2 GB snapcart-ai/sam-1-base
SAM-1-Base LoRA Adapter weights only (requires base model) 88 MB snapcart-ai/sam-1-base-lora

Generation Parameters

The recommended generation parameters (from the model's generation_config.json):

Parameter Value
temperature 0.7
top_p 0.8
top_k 20
repetition_penalty 1.05

Citation

@misc{sam-bench-2026,
  title={SAM-Bench: A Comprehensive Benchmark for Evaluating AI Shopping Assistants},
  author={SnapCart AI Research Team},
  year={2026},
  url={https://github.com/snapcart-ai/sam-bench}
}

License

This model is released under the Apache 2.0 License, consistent with the Qwen2.5-7B-Instruct license.

Downloads last month
43
Safetensors
Model size
8B params
Tensor type
BF16
·
MLX
Hardware compatibility
Log In to add your hardware

Quantized

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for snapcart-ai/sam-1-base

Base model

Qwen/Qwen2.5-7B
Adapter
(1259)
this model
Adapters
2 models

Evaluation results