Instructions to use GSAI-ML/iLLaDA-8B-Instruct with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use GSAI-ML/iLLaDA-8B-Instruct with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="GSAI-ML/iLLaDA-8B-Instruct", trust_remote_code=True) messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("GSAI-ML/iLLaDA-8B-Instruct", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use GSAI-ML/iLLaDA-8B-Instruct with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "GSAI-ML/iLLaDA-8B-Instruct" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "GSAI-ML/iLLaDA-8B-Instruct", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/GSAI-ML/iLLaDA-8B-Instruct
- SGLang
How to use GSAI-ML/iLLaDA-8B-Instruct with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "GSAI-ML/iLLaDA-8B-Instruct" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "GSAI-ML/iLLaDA-8B-Instruct", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "GSAI-ML/iLLaDA-8B-Instruct" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "GSAI-ML/iLLaDA-8B-Instruct", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use GSAI-ML/iLLaDA-8B-Instruct with Docker Model Runner:
docker model run hf.co/GSAI-ML/iLLaDA-8B-Instruct
iLLaDA-8B-Instruct
iLLaDA is an 8B fully bidirectional masked diffusion language model trained from scratch with 12T pre-training tokens, an 8192-token context length, variable-length generation, and confidence-based scoring for multiple-choice evaluation.
For more details, please refer to the paper: Improved Large Language Diffusion Models.
Inference and evaluation codes can be found in the LLaDA GitHub Repository.
How to Use
You can load the model and tokenizer using the transformers library:
import torch
from transformers import AutoModel, AutoTokenizer
# Load the tokenizer and model
tokenizer = AutoTokenizer.from_pretrained('GSAI-ML/iLLaDA-8B-Instruct', trust_remote_code=True)
model = AutoModel.from_pretrained('GSAI-ML/iLLaDA-8B-Instruct', trust_remote_code=True, torch_dtype=torch.bfloat16)
For customized generation and evaluation scripts (such as generate.py and chat.py), please visit the official GitHub repository.
Architecture
| iLLaDA 8B | LLaDA 8B | |
|---|---|---|
| Layers | 32 | 32 |
| Model dimension | 4096 | 4096 |
| Attention heads | 32 | 32 |
| Key/Value heads | 8 | 32 |
| FFN dimension | 14,336 | 12,288 |
| Vocabulary size | 155,136 | 126,464 |
| Maximum sequence length | 8192 | 4096 |
| Embedding and LM-head | Tied | Untied |
| Total parameters | 7.62B | 8.02B |
| Non-embedding parameters | 6.98B | 6.98B |
Benchmark Results of Instruct Models
| iLLaDA 8B | LLaDA 8B | Dream 7B | Qwen2.5 7B | |
|---|---|---|---|---|
| Model | Diffusion | Diffusion | Diffusion | AR |
| MMLU | 71.6 | 65.5 | 67.0 | 76.6 |
| MMLU-Pro | 52.3 | 37.0 | 43.3 | 56.3 |
| MMLU-Redux | 76.4 | 68.9 | 76.3 | 75.7 |
| GSM8K | 89.0 | 77.5 | 81.0 | 91.6 |
| MATH | 56.7 | 42.2 | 39.2 | 75.5 |
| HumanEval | 65.9 | 49.4 | 55.5 | 84.8 |
| MBPP | 58.0 | 41.0 | 58.8 | 79.2 |
| Average | 67.1 | 54.5 | 60.2 | 77.1 |
Citation
@article{nie2025large,
title={Large Language Diffusion Models},
author={Nie, Shen and Zhu, Fengqi and You, Zebin and Zhang, Xiaolu and Ou, Jingyang and Hu, Jun and Zhou, Jun and Lin, Yankai and Wen, Ji-Rong and Li, Chongxuan},
journal={arXiv preprint arXiv:2502.09992},
year={2025}
}
- Downloads last month
- 110