GSA-FT-Qwen2-7B-Instruct-chunk8-chunk4

This model is fine-tuned from gist-sparse-attention/GSA-PT-Qwen2-7B-Instruct-chunk8-chunk4 using Gist Sparse Attention (GSA) with chunk size chunk8-chunk4.

Paper

GSA: Gist Sparse Attention via Learnable Compression and Selective Unfolding

Model Details

Field Value
Base model gist-sparse-attention/GSA-PT-Qwen2-7B-Instruct-chunk8-chunk4
Training type Supervised Fine-Tuning
Chunk size chunk8-chunk4
Architecture Qwen2-7B
Downloads last month
8
Safetensors
Model size
333k params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for gist-sparse-attention/GSA-FT-Qwen2-7B-Instruct-chunk8-chunk4

Collection including gist-sparse-attention/GSA-FT-Qwen2-7B-Instruct-chunk8-chunk4