miniLLM-0.1B

A small (~109M parameters) causal language model pretrained from scratch on OpenWebText.

The script that support this model is uploaded on https://github.com/Cerynitius/llmTrain/raw/refs/heads/main/generate.py

loss 3.4

Model Details

Attribute Value
Architecture LlamaForCausalLM
Parameters ~109M
Hidden Size 768
Attention Heads 12
Layers 10
Intermediate Size 2048
Max Sequence Length 1024
Vocabulary Size 50257
Tokenizer GPT-2 (BPE)
Positional Encoding RoPE (θ=10000)
Activation SiLU
Tie Word Embeddings Yes
Precision (training) bfloat16

Limitations

This is a small-scale pretrained model intended for research and educational purposes.

The training script is uploaded on https://github.com/Cerynitius/llmTrain

It is not suitable for production use.

Outputs may be incoherent, biased, or factually incorrect.

Downloads last month
1,744
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Hippocrene/MiniLLM-0.1B

Quantizations
1 model

Dataset used to train Hippocrene/MiniLLM-0.1B