Instructions to use GSAI-ML/iLLaDA-8B-Instruct with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use GSAI-ML/iLLaDA-8B-Instruct with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="GSAI-ML/iLLaDA-8B-Instruct", trust_remote_code=True)
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("GSAI-ML/iLLaDA-8B-Instruct", trust_remote_code=True, dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use GSAI-ML/iLLaDA-8B-Instruct with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "GSAI-ML/iLLaDA-8B-Instruct"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "GSAI-ML/iLLaDA-8B-Instruct",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/GSAI-ML/iLLaDA-8B-Instruct

SGLang

How to use GSAI-ML/iLLaDA-8B-Instruct with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "GSAI-ML/iLLaDA-8B-Instruct" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "GSAI-ML/iLLaDA-8B-Instruct",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "GSAI-ML/iLLaDA-8B-Instruct" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "GSAI-ML/iLLaDA-8B-Instruct",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use GSAI-ML/iLLaDA-8B-Instruct with Docker Model Runner:
```
docker model run hf.co/GSAI-ML/iLLaDA-8B-Instruct
```

iLLaDA-8B-Instruct

iLLaDA is an 8B fully bidirectional masked diffusion language model trained from scratch with 12T pre-training tokens, an 8192-token context length, variable-length generation, and confidence-based scoring for multiple-choice evaluation.

For more details, please refer to the paper: Improved Large Language Diffusion Models.

Inference and evaluation codes can be found in the LLaDA GitHub Repository.

How to Use

You can load the model and tokenizer using the transformers library:

import torch
from transformers import AutoModel, AutoTokenizer

# Load the tokenizer and model
tokenizer = AutoTokenizer.from_pretrained('GSAI-ML/iLLaDA-8B-Instruct', trust_remote_code=True)
model = AutoModel.from_pretrained('GSAI-ML/iLLaDA-8B-Instruct', trust_remote_code=True, torch_dtype=torch.bfloat16)

For customized generation and evaluation scripts (such as generate.py and chat.py), please visit the official GitHub repository.

Architecture

	iLLaDA 8B	LLaDA 8B
Layers	32	32
Model dimension	4096	4096
Attention heads	32	32
Key/Value heads	8	32
FFN dimension	14,336	12,288
Vocabulary size	155,136	126,464
Maximum sequence length	8192	4096
Embedding and LM-head	Tied	Untied
Total parameters	7.62B	8.02B
Non-embedding parameters	6.98B	6.98B

Benchmark Results of Instruct Models

	iLLaDA 8B	LLaDA 8B	Dream 7B	Qwen2.5 7B
Model	Diffusion	Diffusion	Diffusion	AR
MMLU	71.6	65.5	67.0	76.6
MMLU-Pro	52.3	37.0	43.3	56.3
MMLU-Redux	76.4	68.9	76.3	75.7
GSM8K	89.0	77.5	81.0	91.6
MATH	56.7	42.2	39.2	75.5
HumanEval	65.9	49.4	55.5	84.8
MBPP	58.0	41.0	58.8	79.2
Average	67.1	54.5	60.2	77.1

Citation

@article{nie2025large,
  title={Large Language Diffusion Models},
  author={Nie, Shen and Zhu, Fengqi and You, Zebin and Zhang, Xiaolu and Ou, Jingyang and Hu, Jun and Zhou, Jun and Lin, Yankai and Wen, Ji-Rong and Li, Chongxuan},
  journal={arXiv preprint arXiv:2502.09992},
  year={2025}
}

Downloads last month: 110

Safetensors

Model size

8B params

Tensor type

BF16

Space using GSAI-ML/iLLaDA-8B-Instruct 1

Papers for GSAI-ML/iLLaDA-8B-Instruct

Improved Large Language Diffusion Models

Paper • 2606.25331 • Published 5 days ago • 41

Large Language Diffusion Models

Paper • 2502.09992 • Published Feb 14, 2025 • 128