iLLaDA-8B-Instruct

iLLaDA is an 8B fully bidirectional masked diffusion language model trained from scratch with 12T pre-training tokens, an 8192-token context length, variable-length generation, and confidence-based scoring for multiple-choice evaluation.

For more details, please refer to the paper: Improved Large Language Diffusion Models.

Inference and evaluation codes can be found in the LLaDA GitHub Repository.

How to Use

You can load the model and tokenizer using the transformers library:

import torch
from transformers import AutoModel, AutoTokenizer

# Load the tokenizer and model
tokenizer = AutoTokenizer.from_pretrained('GSAI-ML/iLLaDA-8B-Instruct', trust_remote_code=True)
model = AutoModel.from_pretrained('GSAI-ML/iLLaDA-8B-Instruct', trust_remote_code=True, torch_dtype=torch.bfloat16)

For customized generation and evaluation scripts (such as generate.py and chat.py), please visit the official GitHub repository.

Architecture

iLLaDA 8B LLaDA 8B
Layers 32 32
Model dimension 4096 4096
Attention heads 32 32
Key/Value heads 8 32
FFN dimension 14,336 12,288
Vocabulary size 155,136 126,464
Maximum sequence length 8192 4096
Embedding and LM-head Tied Untied
Total parameters 7.62B 8.02B
Non-embedding parameters 6.98B 6.98B

Benchmark Results of Instruct Models

iLLaDA 8B LLaDA 8B Dream 7B Qwen2.5 7B
Model Diffusion Diffusion Diffusion AR
MMLU 71.6 65.5 67.0 76.6
MMLU-Pro 52.3 37.0 43.3 56.3
MMLU-Redux 76.4 68.9 76.3 75.7
GSM8K 89.0 77.5 81.0 91.6
MATH 56.7 42.2 39.2 75.5
HumanEval 65.9 49.4 55.5 84.8
MBPP 58.0 41.0 58.8 79.2
Average 67.1 54.5 60.2 77.1

Citation

@article{nie2025large,
  title={Large Language Diffusion Models},
  author={Nie, Shen and Zhu, Fengqi and You, Zebin and Zhang, Xiaolu and Ou, Jingyang and Hu, Jun and Zhou, Jun and Lin, Yankai and Wen, Ji-Rong and Li, Chongxuan},
  journal={arXiv preprint arXiv:2502.09992},
  year={2025}
}
Downloads last month
110
Safetensors
Model size
8B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Space using GSAI-ML/iLLaDA-8B-Instruct 1

Papers for GSAI-ML/iLLaDA-8B-Instruct