Instructions to use Gensyn/Qwen2.5-1.5B-Instruct with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Gensyn/Qwen2.5-1.5B-Instruct with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Gensyn/Qwen2.5-1.5B-Instruct")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("Gensyn/Qwen2.5-1.5B-Instruct")
model = AutoModelForCausalLM.from_pretrained("Gensyn/Qwen2.5-1.5B-Instruct")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Inference
Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use Gensyn/Qwen2.5-1.5B-Instruct with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Gensyn/Qwen2.5-1.5B-Instruct"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Gensyn/Qwen2.5-1.5B-Instruct",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/Gensyn/Qwen2.5-1.5B-Instruct

SGLang

How to use Gensyn/Qwen2.5-1.5B-Instruct with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Gensyn/Qwen2.5-1.5B-Instruct" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Gensyn/Qwen2.5-1.5B-Instruct",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Gensyn/Qwen2.5-1.5B-Instruct" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Gensyn/Qwen2.5-1.5B-Instruct",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use Gensyn/Qwen2.5-1.5B-Instruct with Docker Model Runner:
```
docker model run hf.co/Gensyn/Qwen2.5-1.5B-Instruct
```

H100 configuration

by wyceee - opened Apr 21, 2025

Discussion

wyceee

Apr 21, 2025

Hey,

Have you guys tested the 1.5b model with the H100? If so, what are the best configurations for the H100 running the 1.5b model? I'm still running it with the basic config.

Also, can we expect a bigger model soon?

# Model arguments
model_revision: main
torch_dtype: bfloat16
attn_implementation: flash_attention_2
bf16: true
tf32: true

# Dataset arguments
dataset_id_or_path: 'openai/gsm8k'

# Lora Arguments
# No LoRA is used here

# Training arguments
max_steps: 150 # Original 450
per_device_train_batch_size: 1
gradient_accumulation_steps: 8
gradient_checkpointing: true
gradient_checkpointing_kwargs:
  use_reentrant: false
learning_rate: 5.0e-7 # 1.0e-6 as in the deepseek math paper 5-e7 from https://hijkzzz.notion.site/unraveling-rlhf-and-its-variants-engineering-insights#147d9a33ecc9806090f3d5c749d31f05
lr_scheduler_type: cosine
warmup_ratio: 0.03
# GRPO specific parameters
beta: 0.001 # 0.04 as in the deepseek math paper 0.001 from https://hijkzzz.notion.site/unraveling-rlhf-and-its-variants-engineering-insights#147d9a33ecc9806090f3d5c749d31f05
max_prompt_length: 256
max_completion_length: 1024
num_generations: 8
use_vllm: true
# vllm_device: "cuda:3"
vllm_gpu_memory_utilization: 0.2

# Logging arguments
logging_strategy: steps
logging_steps: 2
report_to:
- tensorboard
save_strategy: "steps"
save_steps: 25
seed: 42

# Hugging Face Hub
# push_to_hub: false
# hub_strategy: every_save

# Script arguments
public_maddr: "/ip4/38.101.215.12/tcp/30002"
host_maddr: "/ip4/0.0.0.0/tcp/38331"
max_rounds: 10000

benfielding

Gensyn org Apr 23, 2025

Stay tuned on this, some updates coming soon..

andriuusa

Apr 23, 2025

I'm interested too. Running this right now:

Training arguments

max_steps: 100
per_device_train_batch_size: 2
gradient_accumulation_steps: 8
learning_rate: 1.0e-6
max_prompt_length: 384
max_completion_length: 1024
num_generations: 4
vllm_gpu_memory_utilization: 0.3

Will probably increase the steps further if it runs stably.

jpdreamthug

May 12, 2025

any updates with h100 configs and models to train?

nkxilx

Dec 15, 2025

any updates?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment