GSA
Collection
Models and Datasets of paper: [Forget, Then Recall: Learnable Compression and
Selective Unfolding via Gist Sparse Attention] • 30 items • Updated
This model is fine-tuned from gist-sparse-attention/GSA-PT-Qwen2-7B-Instruct-chunk8-chunk4 using Gist Sparse Attention (GSA) with chunk size chunk8-chunk4.
GSA: Gist Sparse Attention via Learnable Compression and Selective Unfolding
| Field | Value |
|---|---|
| Base model | gist-sparse-attention/GSA-PT-Qwen2-7B-Instruct-chunk8-chunk4 |
| Training type | Supervised Fine-Tuning |
| Chunk size | chunk8-chunk4 |
| Architecture | Qwen2-7B |
Base model
Qwen/Qwen2-7B