Model Card for TWIN-Qwen2.5-VL-3B

This is the Qwen2.5-VL-3B-Instruct model post-trained on the TWIN dataset from the paper: Same or Not? Enhancing Visual Perception in Vision-Language Models

For further information please refer to the project webpage, paper, and repository.

Citation

If you use TWIN in your research, please consider citing our work:

BibTeX:

@misc{marsili2025notenhancingvisualperception,
      title={Same or Not? Enhancing Visual Perception in Vision-Language Models}, 
      author={Damiano Marsili and Aditya Mehta and Ryan Y. Lin and Georgia Gkioxari},
      year={2025},
      eprint={2512.23592},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2512.23592}, 
}

License

The dataset is derived from the UCSD Amazon Reviews’23 dataset. Use is permitted for research and educational purposes only. By using this dataset, you agree to respect the rights of original content owners and comply with applicable terms of service.

Downloads last month
34
Safetensors
Model size
4B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for glab-caltech/TWIN-Qwen2.5-VL-3B

Finetuned
(616)
this model
Quantizations
2 models

Dataset used to train glab-caltech/TWIN-Qwen2.5-VL-3B

Collection including glab-caltech/TWIN-Qwen2.5-VL-3B