roneneldan/TinyStories
Viewer β’ Updated β’ 2.14M β’ 88.4k β’ 990
5.0M parameter language model designed for 2-CPU/5GB RAM environments. Trained for 2 hours on free-tier cloud CPU. No GPU β not for training, not for inference.
Embedding (4K Γ 256, float, weight-tied)
β 6 Γ NovaBlock:
LayerNorm β MultiHeadAttention (RoPE) + residual
LayerNorm β FFN (GELU, 256β512β256) + residual
β LayerNorm β Output Head (tied to embedding)
| Model | Params | BPC | PPL | Hardware |
|---|---|---|---|---|
| FlashLM v5.2 | 5.0M | 0.78 | 10.56 | 2-thread CPU |
| FlashLM v4 "Bolt" | 4.3M | 0.88 | 15.05 | 2-thread CPU |
| TinyStories-1M | 3.7M | 0.62 | 6.72 | V100 GPU |
v5.2 beats v4 by 11% relative in BPC with the same training time (2 hours)!
import torch
from tokenizers import Tokenizer
import torch.nn as nn
import torch.nn.functional as F
# Load tokenizer
tokenizer = Tokenizer.from_file("tokenizer.json")
# Load model (requires architecture definition - see model.py)
model = NovaIgnitionLM(vocab=4096, d_model=256, n_layers=6,
n_heads=4, d_head=64, d_ffn=512)
model.load_state_dict(torch.load("best.pt", weights_only=True))
# Generate
prompt = "Once upon a time"
ids = tokenizer.encode(prompt).ids
x = torch.tensor([ids])
out = model.generate(x, max_new_tokens=80, temperature=0.8, top_k=40)
text = tokenizer.decode(out[0].tolist())
print(text)
best.pt - Best model checkpointlatest.pt - Latest checkpointconfig.json - Training configurationMIT
@misc{flashlm-v52,
author = {Chang Cheng},
title = {FlashLM v5.2 Nova-Ignition},
year = {2026},
url = {https://github.com/changcheng967/FlashLM}
}