sunweiwei
/

AirRep-Flan-Small

@@ -1,22 +1,51 @@
 # AirRep-Flan
 AirRep is an embedding model designed for computing training data influence on test examples.
 ## Model Description
 This model is based on gte-small config with an additional projection layer
-## Usage
-https://github.com/sunnweiwei/AirRep
-## Training Data
-This model was trained on the FLAN dataset with data influence optimization.
 ## Citation
@@ -31,8 +60,4 @@ If you use this model, please cite:
   year={2025},
   url={https://arxiv.org/abs/2505.18513}
 }
-```
-## License
-This model is released under the Apache 2.0 License.

+---
+license: apache-2.0
+library_name: transformers
+pipeline_tag: feature-extraction
+---
 # AirRep-Flan
+This repository contains the AirRep model presented in [Enhancing Training Data Attribution with Representational Optimization](https://huggingface.co/papers/2505.18513).
 AirRep is an embedding model designed for computing training data influence on test examples.
+Code: https://github.com/sunnweiwei/airrep
 ## Model Description
 This model is based on gte-small config with an additional projection layer
+## Sample Usage
+You can use the FLAN-trained model to encode training and test data and compute similarity scores.
+```python
+from airrep import AirRep
+model = AirRep.from_pretrained("sunweiwei/AirRep-Flan-Small")
+train_texts = [
+    "Question: Classify the sentiment of 'The movie was wonderful and heartwarming.'\
+Answer: positive",
+    "Question: Does the hypothesis entail the premise? Premise: 'A man is playing a guitar on stage.' Hypothesis: 'Someone is performing music.'\
+Answer: entailment",
+]
+query_texts = [
+    "Question: Classify the sentiment of 'The service was awful and I won't return.'\
+Answer: negative"
+]
+# Embeddings and influence-like similarity score
+train_emb = model.encode(train_texts, batch_size=128)
+query_emb = model.encode(query_texts)
+score = model.similarity(query_emb, train_emb, softmax=True)
+print("Similarity score:", score)
+```
+## Training Data
+This model was trained on the FLAN dataset with data influence optimization.
 ## Citation
   year={2025},
   url={https://arxiv.org/abs/2505.18513}
 }
+```