Improve model card: Add metadata, paper link, code link, and sample usage

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +34 -9
README.md CHANGED
@@ -1,22 +1,51 @@
 
 
 
 
 
 
1
  # AirRep-Flan
2
 
 
 
3
  AirRep is an embedding model designed for computing training data influence on test examples.
4
 
 
 
5
  ## Model Description
6
 
7
  This model is based on gte-small config with an additional projection layer
8
 
 
9
 
10
- ## Usage
11
 
12
- https://github.com/sunnweiwei/AirRep
 
13
 
 
14
 
15
- ## Training Data
 
 
 
 
 
 
 
 
 
16
 
17
- This model was trained on the FLAN dataset with data influence optimization.
 
 
 
 
 
18
 
 
19
 
 
20
 
21
  ## Citation
22
 
@@ -31,8 +60,4 @@ If you use this model, please cite:
31
  year={2025},
32
  url={https://arxiv.org/abs/2505.18513}
33
  }
34
- ```
35
-
36
- ## License
37
-
38
- This model is released under the Apache 2.0 License.
 
1
+ ---
2
+ license: apache-2.0
3
+ library_name: transformers
4
+ pipeline_tag: feature-extraction
5
+ ---
6
+
7
  # AirRep-Flan
8
 
9
+ This repository contains the AirRep model presented in [Enhancing Training Data Attribution with Representational Optimization](https://huggingface.co/papers/2505.18513).
10
+
11
  AirRep is an embedding model designed for computing training data influence on test examples.
12
 
13
+ Code: https://github.com/sunnweiwei/airrep
14
+
15
  ## Model Description
16
 
17
  This model is based on gte-small config with an additional projection layer
18
 
19
+ ## Sample Usage
20
 
21
+ You can use the FLAN-trained model to encode training and test data and compute similarity scores.
22
 
23
+ ```python
24
+ from airrep import AirRep
25
 
26
+ model = AirRep.from_pretrained("sunweiwei/AirRep-Flan-Small")
27
 
28
+ train_texts = [
29
+ "Question: Classify the sentiment of 'The movie was wonderful and heartwarming.'\
30
+ Answer: positive",
31
+ "Question: Does the hypothesis entail the premise? Premise: 'A man is playing a guitar on stage.' Hypothesis: 'Someone is performing music.'\
32
+ Answer: entailment",
33
+ ]
34
+ query_texts = [
35
+ "Question: Classify the sentiment of 'The service was awful and I won't return.'\
36
+ Answer: negative"
37
+ ]
38
 
39
+ # Embeddings and influence-like similarity score
40
+ train_emb = model.encode(train_texts, batch_size=128)
41
+ query_emb = model.encode(query_texts)
42
+ score = model.similarity(query_emb, train_emb, softmax=True)
43
+ print("Similarity score:", score)
44
+ ```
45
 
46
+ ## Training Data
47
 
48
+ This model was trained on the FLAN dataset with data influence optimization.
49
 
50
  ## Citation
51
 
 
60
  year={2025},
61
  url={https://arxiv.org/abs/2505.18513}
62
  }
63
+ ```