Instructions to use google/gemma-4-31B-it with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use google/gemma-4-31B-it with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="google/gemma-4-31B-it") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoProcessor, AutoModelForImageTextToText processor = AutoProcessor.from_pretrained("google/gemma-4-31B-it") model = AutoModelForImageTextToText.from_pretrained("google/gemma-4-31B-it") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] inputs = processor.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Inference
- HuggingChat
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use google/gemma-4-31B-it with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "google/gemma-4-31B-it" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "google/gemma-4-31B-it", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/google/gemma-4-31B-it
- SGLang
How to use google/gemma-4-31B-it with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "google/gemma-4-31B-it" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "google/gemma-4-31B-it", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "google/gemma-4-31B-it" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "google/gemma-4-31B-it", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }' - Docker Model Runner
How to use google/gemma-4-31B-it with Docker Model Runner:
docker model run hf.co/google/gemma-4-31B-it
Tokenizer problems, or just quants?
#105 opened 7 days ago
by
Nesy1
Drops in performance randomly VLLM on H200 and B200 with 2 GPUs
#104 opened 7 days ago
by
jtvino
gemma4-31b-it with MTP enabled works on dual DGX SPARKs now
1
#103 opened 9 days ago
by
kenleo
Gemma4 is super slow on vllm docker , max to max 15 token per second only (4 X L4 GPU)
👍 1
1
#102 opened 9 days ago
by
vi-ajayb
This model is so good, but the KV cache ruins it.
1
#100 opened 13 days ago
by
UniversalLove333
Update README.md
#99 opened 14 days ago
by
hectorruiz9
o-O too good to be true
#98 opened 15 days ago
by
mekasu
could ya' add gpt j 6b to google ai studio? completion mode
#97 opened 15 days ago
by
Tralalabs
Weird token after underscore `_//`
3
#96 opened 16 days ago
by
yamikumods
CodeGemma 4 collection
➕ 2
1
#95 opened 17 days ago
by
cbrain
{Feature request] Eliminate pre-attention RMSNorm in Gemma 4 via scale invariance + weight folding
#93 opened 18 days ago
by
graefics
Reviews of Gemma 4
1
#92 opened 19 days ago
by
Juanoto2012
Update chat_template.jinja to address JSON Schema shapes that do not expose their meaning through a direct top-level type
👍 6
#91 opened 20 days ago
by
sigjhl
Possible chat_template.jinja issue: nullable $ref tool schemas are rendered as empty types
1
#87 opened 20 days ago
by
sigjhl
When training models from the gemma4 series using GRPO, an abnormally high grad norm was observed
#84 opened 21 days ago
by
mamazi00
add newlines and thinking tokens to template to avoid having to compute 3 extra tokens per generation in chat completion+reasoning
👍 2
2
#83 opened 21 days ago
by
quasar-of-mikus
Update README.md
#81 opened 23 days ago
by
hectorruiz9
Gemma 4 models are way to paranoid about dates, any tips?
🔥 3
3
#80 opened 24 days ago
by
Ahugm
Incorrect output in Gemma 4: seeking a solution to the problem ( la la la )
7
#79 opened 24 days ago
by
Lintrarius
Fix chat_template: emit empty <|channel>thought\n<channel|> wrapper for existing asst turns
1
#78 opened 24 days ago
by
flotherxi
[Bug] chat_template: missing <|channel>thought\n<channel|> wrapper for non-thinking SFT / multi-turn
👍 1
2
#77 opened 24 days ago
by
flotherxi
Thinking erratic at 30000+ context
1
#76 opened 25 days ago
by
JeslynMcKenzie
Multilingual Support List
➕ 1
#75 opened 25 days ago
by
abcdvzz
Will there be a small model like gemma-3-270m?
🔥 1
#74 opened 26 days ago
by
ymcki
Unexpected loss spikes and performance degradation when fine-tuning Gemma 4 (google/gemma-4-31B-it)
1
#73 opened 26 days ago
by
rstaruch
Add ParseBench evaluation results
4
#72 opened about 1 month ago
by
boyang-runllama
Will there be a small model for speculative decoding?
3
#71 opened about 1 month ago
by
Regrin
Imagen 1 (2022) Should Be Open Sourced
👍 5
#70 opened about 1 month ago
by
Tralalabs
Question about tool-calling order in chat_template.jinja
1
#67 opened about 1 month ago
by
json0
gemma-4-31b-it unable to execute tool calling
3
#66 opened about 1 month ago
by
Naman2302
Do Gemma 4 models work well?
3
#65 opened about 1 month ago
by
Regrin
fix: embed chat_template in tokenizer_config.json
#64 opened about 1 month ago
by
NERDDISCO
Infinite loop is not fixed even with Google API
👀 1
2
#63 opened about 1 month ago
by
alexcardo
Chat Template has a bug.
🤗 2
5
#62 opened about 1 month ago
by
Reithan
why print rightarrow
👀❤️ 3
5
#61 opened about 1 month ago
by
wangtf-Kevin
Can anyone improve the model using the Rys methodology—by duplicating a block of layers?
11
#60 opened about 1 month ago
by
Regrin
Strange behaviour of the tokenizer
2
#58 opened about 1 month ago
by
andercorral
Good Workflow
2
#57 opened about 1 month ago
by
anthoekfj
fix: function calling formatting in chat template
❤️ 2
1
#55 opened about 1 month ago
by
RyanMullins
Chat template is too complicated that even Gemma 4 itself has no idea how to parse it
1
#53 opened about 1 month ago
by
alexcardo
Hardware requirement
👍👀 3
13
#52 opened about 1 month ago
by
Charan01
Tokens per Image Parameter?
2
#51 opened about 1 month ago
by
buckeye17
Guys please add the MTP to this model
🔥 5
5
#50 opened about 1 month ago
by
Narutoouz
Will there be QAT models?
🤝👍 12
3
#49 opened about 1 month ago
by
Regrin
Gemma 4 E4B will be as encyclopedically well-read as the 12b model?
3
#48 opened about 1 month ago
by
Regrin
Create BTS
#47 opened about 1 month ago
by deleted
brokersponsor
1
#46 opened about 1 month ago
by
Brokersponsor
Update README.md
#45 opened about 1 month ago
by
Brokersponsor
Qusetion about math_vision and mmmu_pro evaluation result
1
#44 opened about 1 month ago
by
JjjjjZzz