GSA-PT-Qwen2-7B-Instruct-chunk16

This model is continued pretrained from Qwen/Qwen2-7B-Instruct using Gist Sparse Attention (GSA) with chunk size chunk16.

Paper

GSA: Gist Sparse Attention via Learnable Compression and Selective Unfolding

Model Details

Field Value
Base model Qwen/Qwen2-7B-Instruct
Training type Continued Pretraining
Chunk size chunk16
Architecture Qwen2-7B
Downloads last month
2
Safetensors
Model size
333k params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for gist-sparse-attention/GSA-PT-Qwen2-7B-Instruct-chunk16

Base model

Qwen/Qwen2-7B
Finetuned
(132)
this model
Finetunes
1 model

Collection including gist-sparse-attention/GSA-PT-Qwen2-7B-Instruct-chunk16