Instructions to use Gensyn/Qwen2.5-1.5B-Instruct with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Gensyn/Qwen2.5-1.5B-Instruct with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="Gensyn/Qwen2.5-1.5B-Instruct") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("Gensyn/Qwen2.5-1.5B-Instruct") model = AutoModelForCausalLM.from_pretrained("Gensyn/Qwen2.5-1.5B-Instruct") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Inference
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use Gensyn/Qwen2.5-1.5B-Instruct with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Gensyn/Qwen2.5-1.5B-Instruct" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Gensyn/Qwen2.5-1.5B-Instruct", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/Gensyn/Qwen2.5-1.5B-Instruct
- SGLang
How to use Gensyn/Qwen2.5-1.5B-Instruct with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Gensyn/Qwen2.5-1.5B-Instruct" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Gensyn/Qwen2.5-1.5B-Instruct", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Gensyn/Qwen2.5-1.5B-Instruct" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Gensyn/Qwen2.5-1.5B-Instruct", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use Gensyn/Qwen2.5-1.5B-Instruct with Docker Model Runner:
docker model run hf.co/Gensyn/Qwen2.5-1.5B-Instruct
H100 configuration
Hey,
Have you guys tested the 1.5b model with the H100? If so, what are the best configurations for the H100 running the 1.5b model? I'm still running it with the basic config.
Also, can we expect a bigger model soon?
# Model arguments
model_revision: main
torch_dtype: bfloat16
attn_implementation: flash_attention_2
bf16: true
tf32: true
# Dataset arguments
dataset_id_or_path: 'openai/gsm8k'
# Lora Arguments
# No LoRA is used here
# Training arguments
max_steps: 150 # Original 450
per_device_train_batch_size: 1
gradient_accumulation_steps: 8
gradient_checkpointing: true
gradient_checkpointing_kwargs:
use_reentrant: false
learning_rate: 5.0e-7 # 1.0e-6 as in the deepseek math paper 5-e7 from https://hijkzzz.notion.site/unraveling-rlhf-and-its-variants-engineering-insights#147d9a33ecc9806090f3d5c749d31f05
lr_scheduler_type: cosine
warmup_ratio: 0.03
# GRPO specific parameters
beta: 0.001 # 0.04 as in the deepseek math paper 0.001 from https://hijkzzz.notion.site/unraveling-rlhf-and-its-variants-engineering-insights#147d9a33ecc9806090f3d5c749d31f05
max_prompt_length: 256
max_completion_length: 1024
num_generations: 8
use_vllm: true
# vllm_device: "cuda:3"
vllm_gpu_memory_utilization: 0.2
# Logging arguments
logging_strategy: steps
logging_steps: 2
report_to:
- tensorboard
save_strategy: "steps"
save_steps: 25
seed: 42
# Hugging Face Hub
# push_to_hub: false
# hub_strategy: every_save
# Script arguments
public_maddr: "/ip4/38.101.215.12/tcp/30002"
host_maddr: "/ip4/0.0.0.0/tcp/38331"
max_rounds: 10000
Stay tuned on this, some updates coming soon..
I'm interested too. Running this right now:
Training arguments
max_steps: 100
per_device_train_batch_size: 2
gradient_accumulation_steps: 8
learning_rate: 1.0e-6
max_prompt_length: 384
max_completion_length: 1024
num_generations: 4
vllm_gpu_memory_utilization: 0.3
Will probably increase the steps further if it runs stably.
any updates with h100 configs and models to train?
any updates?