nielsr HF Staff commited on
Commit
cb1fc4e
·
verified ·
1 Parent(s): ee215dd

Add comprehensive model card for Functional Dual Anchors

Browse files

This PR adds a comprehensive model card for the Functional Dual Anchors project, presented in the paper "[Model Merging with Functional Dual Anchors](https://huggingface.co/papers/2510.21223)".

It includes:
- Essential metadata (`license`, `library_name`, `pipeline_tag`) for better discoverability.
- Links to the paper, the official project page (`https://spherelab.ai/fda/`), and the GitHub repository (`https://github.com/Sphere-AI-Lab/fda/tree/main`).
- The paper abstract and introduction for a clear understanding of the methodology.
- A detailed "Quick Start" section, with information on checkpoints, environment setup, and code snippets for adaptation and construction, directly sourced from the GitHub README to aid users in reproducing results.
- The BibTeX citation for proper referencing.

Please review and merge if these improvements align with your expectations.

Files changed (1) hide show
  1. README.md +97 -0
README.md ADDED
@@ -0,0 +1,97 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ library_name: transformers
4
+ pipeline_tag: text-classification
5
+ ---
6
+
7
+ # Model Merging with Functional Dual Anchors
8
+
9
+ This repository is the official PyTorch implementation of “[Model Merging with Functional Dual Anchors](https://huggingface.co/papers/2510.21223)”, by Kexuan Shi, Yandong Wen, Weiyang Liu.
10
+
11
+ **Paper**: [Model Merging with Functional Dual Anchors](https://huggingface.co/papers/2510.21223)
12
+ **Project Page**: https://spherelab.ai/fda/
13
+ **Code**: https://github.com/Sphere-AI-Lab/fda/tree/main
14
+
15
+ <p align="center">
16
+ <img src="https://github.com/Sphere-AI-Lab/fda/raw/main/docs/assets/framework_trajectory.png" width="90%" />
17
+ </p>
18
+
19
+ ## Abstract
20
+ Model merging is an efficient post-training strategy for integrating knowledge from multiple finetuned checkpoints of a shared foundation model. Existing methods operate in the parameter space, combining task vectors to mitigate conflicts, but remain constrained by parameter inconsistencies. We propose Functional Dual Anchors (FDAs), a framework that instead models the input-representation space. FDAs are synthetic inputs whose induced gradients align with task vectors, capturing task-specific functional shifts relative to the pretrained model. This perspective bridges joint multi-task training and post-hoc merging, offering both robustness and flexibility. We further introduce a principled initialization scheme and show that FDAs are complementary to parameter-space model merging. Comprehensive experiments demonstrate the effectiveness of FDAs in model merging.
21
+
22
+ ## Introduction
23
+ ***Model Merging*** has been an intriguing post-training strategy for integrating knowledge from multiple finetuned checkpoints of a shared foundation model. Existing methods focuses on the operation in the parameter space, i.e, combing task vectors to mitgate knowledge confilctsm, thereby remain constrained by the complexity of the parameter space. In this work, we propose ***Functional Dual Anchors (FDAs)***, a framework (Figure 1(a)) that instead models the knowledge in the input-representation space. Specifically, FDAs are synthetic inputs whose induced gradients align with task vectors, capturing task-specific functional shifts relative to the pretrained model. Then, we use the FDAs to adapt the pretrained model. Comparing with the task vectors, FDAs can provide more robust and flexible trajectory for model merging, as shown in the Figure 1(b).
24
+ FDAs provide an alternative perspective on model merging by extending input-space modeling to this setting and bridges joint multi-task training and post-hoc merging.
25
+
26
+ ## 🚀 Quick Start
27
+
28
+ ### Checkpoints and Corresponding FDAs
29
+ To help you quickly get started with **Functional Dual Anchors (FDAs)**, we provide download links for the checkpoints used in the paper, along with the corresponding FDAs.
30
+ #### Vision Tasks
31
+ We directly use the checkpoints provided at the following link: [Vision Checkpoints (Google Drive)](https://drive.google.com/drive/folders/1u_Tva6x0p6oxu5Eo0ZZsf-520Cc_3MKw). For convenience, you can download all vision models of our experiment in our [hugging face page](https://huggingface.co/SphereLab/vision_models_in_FDA).
32
+
33
+ #### NLP Tasks
34
+ We adopt the pretrained **RoBERTa-base** and **RoBERTa-large** models from [Hugging Face – RoBERTa Large](https://huggingface.co/FacebookAI/roberta-large).
35
+ Then, we use the finetuning scripts from [DARE](https://github.com/YourDARERepoLink) to obtain the checkpoints on **eight GLUE benchmarks**. For convenience, you can download all NLP models of our experiment in our [hugging face page](https://huggingface.co/SphereLab/nlu_models_in_FDA).
36
+ #### NLG Tasks
37
+ - **Base Model:** [Llama-2-13B (Meta)](https://huggingface.co/meta-llama/Llama-2-13b-hf)
38
+ - **Expert Models:** [WizardMath-13B-V1.0](https://huggingface.co/vanillaOVO/WizardMath-13B-V1.0); [Llama-2-13B-Code-Alpaca](https://huggingface.co/layoric/llama-2-13b-code-alpaca)
39
+ #### FDAs
40
+ The FDAs corresponding to the above checkpoints can be downloaded from: [fda_for_vision](https://huggingface.co/datasets/SphereLab/FDA_for_Vision) and [fda_for_nlu](https://huggingface.co/datasets/SphereLab/FDA_for_NLU/tree/main).
41
+
42
+ Please follow the path comments in the code and replace them with your **local paths** for checkpoints and FDAs,
43
+ then run the provided commands to reproduce the **FDA adaptation results**.
44
+
45
+ ---
46
+
47
+ ### Environment
48
+ For Vision and NLP tasks, we use the same environment. It can be installed by:
49
+ ```bash
50
+ cd FDA/Vision #cd FDA/NLU
51
+ # Create conda environment
52
+ conda env create -f environment.yaml
53
+ # Activate environment
54
+ conda activate fda
55
+ ```
56
+ For NLU tasks, please use: ```NLG/environment.yaml```
57
+
58
+ ---
59
+
60
+ ### Adapt by FDAs
61
+ Please follow the path comments in the code file ```adapt.py```, replace them with the paths to your local checkpoints and FDAs, and then run the following commands to reproduce the FDA adaptation results:
62
+ ```bash
63
+ cd FDA/Vision #cd FDA/NLU cd FDA/NLG
64
+ sh adapt.sh
65
+ ```
66
+
67
+ For models in NLG tasks, please split the model first:
68
+ ```bash
69
+ cd FDA/NLG
70
+ python split_model.py
71
+ ```
72
+
73
+ ---
74
+
75
+ ### Construct FDAs
76
+ If you want to construct FDAs for your finetuned checkpoint, please follow the path comments in the code file ```construct_fda.py```, replace them with the paths to your finetuned checkpoints. Then,
77
+ ```bash
78
+ sh construct.sh
79
+ ```
80
+
81
+ ## Acknowledgement
82
+ This repository uses codes and resources from [Task Arithmetic](https://github.com/mlfoundations/task_vectors?tab=readme-ov-file), [DARE](https://github.com/yule-BUAA/MergeLM), [TSVM](https://github.com/AntoAndGar/task_singular_vectors), [WUDI](https://github.com/nathanielyvo/WUDI-Merging), [Prodistill](https://github.com/JingXuTHU/Scalable_Model_Merging_with_Progressive_Layerwise_Distillation).
83
+
84
+ ## Citation
85
+ If you find this work useful, please consider citing:
86
+
87
+ ```bibtex
88
+ @article{shi2025modelmergingfunctionaldual,
89
+ title = {Model Merging with Functional Dual Anchors},
90
+ author = {Shi, Kexuan and Wen, Yandong and Liu, Weiyang},
91
+ year = {2025},
92
+ journal = {arXiv preprint arXiv:2510.21223},
93
+ archivePrefix = {arXiv},
94
+ primaryClass = {cs.LG},
95
+ url = {https://arxiv.org/abs/2510.21223}
96
+ }
97
+ ```