GSA
Collection
Models and Datasets of paper: [Forget, Then Recall: Learnable Compression and
Selective Unfolding via Gist Sparse Attention] • 30 items • Updated
This model is continued pretrained from Qwen/Qwen2-7B-Instruct using Gist Sparse Attention (GSA) with chunk size chunk16.
GSA: Gist Sparse Attention via Learnable Compression and Selective Unfolding
| Field | Value |
|---|---|
| Base model | Qwen/Qwen2-7B-Instruct |
| Training type | Continued Pretraining |
| Chunk size | chunk16 |
| Architecture | Qwen2-7B |