gist-sparse-attention
/

GSA-PT-Qwen2-7B-Instruct-chunk16

gist-sparse-attention

Model card Files Files and versions

GSA-PT-Qwen2-7B-Instruct-chunk16

This model is continued pretrained from Qwen/Qwen2-7B-Instruct using Gist Sparse Attention (GSA) with chunk size chunk16.

Paper

GSA: Gist Sparse Attention via Learnable Compression and Selective Unfolding

Model Details

Field	Value
Base model	Qwen/Qwen2-7B-Instruct
Training type	Continued Pretraining
Chunk size	chunk16
Architecture	Qwen2-7B

Downloads last month: 2

Safetensors

Model size

333k params

Tensor type

BF16

·

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for gist-sparse-attention/GSA-PT-Qwen2-7B-Instruct-chunk16

Base model

Finetuned

Qwen/Qwen2-7B-Instruct

Finetuned

(132)

this model

Finetunes

Collection including gist-sparse-attention/GSA-PT-Qwen2-7B-Instruct-chunk16

GSA

Models and Datasets of paper: [Forget, Then Recall: Learnable Compression and Selective Unfolding via Gist Sparse Attention] • 30 items • Updated Apr 22