HGRN-1.3B Dense Baseline

HGRN-1.3B dense baseline trained on 100B tokens, used as the reference for post-training sparsity experiments in:

When Does One-Shot Pruning Beat Iterative Optimisation? Second-Order Correction for Sparse LLMs on Neuromorphic Hardware
Kimia Gholami et al., NeurIPS 2026 submission

Sparse variants (OBS-cancel-block, SparseGPT, Wanda, RIA, AWP) are available in the same organisation.

Model details

Property	Value
Architecture	HGRN (Hierarchical Gated Recurrent Network)
Parameters	~1.3B
Training tokens	100B
Precision	bfloat16
Dense WikiText-2 PPL	14.18

HGRN is a gated recurrent SSM with no attention layers. It uses the flash-linear-attention (FLA) library.

Loading

import torch
import fla
from fla.models.hgrn import HGRNConfig, HGRNForCausalLM
from transformers import AutoConfig, AutoModelForCausalLM, AutoTokenizer

AutoConfig.register("hgrn", HGRNConfig, exist_ok=True)
AutoModelForCausalLM.register(HGRNConfig, HGRNForCausalLM, exist_ok=True)

model_id = "ikimyaii/hgrn-1.3B-dense-baseline"
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    trust_remote_code=True,
).cuda()

Install FLA first:

pip install flash-linear-attention

Sparsity structure

All sparse variants use semi-structured sparsity: exactly k = floor(d_in * s) weights are zeroed per output row at sparsity ratio s. This maps directly to Intel Loihi, where zero weights generate no spike events.

Downloads last month: 1

Safetensors

Model size

1B params

Tensor type

BF16

Model tree for ikimyaii/hgrn-1.3B-dense-baseline

Finetunes

4 models