Model Card for TWIN-Qwen2.5-VL-3B

This is the Qwen2.5-VL-3B-Instruct model post-trained on the TWIN dataset from the paper: Same or Not? Enhancing Visual Perception in Vision-Language Models

For further information please refer to the project webpage, paper, and repository.

Citation

If you use TWIN in your research, please consider citing our work:

BibTeX:

@misc{marsili2025notenhancingvisualperception,
      title={Same or Not? Enhancing Visual Perception in Vision-Language Models}, 
      author={Damiano Marsili and Aditya Mehta and Ryan Y. Lin and Georgia Gkioxari},
      year={2025},
      eprint={2512.23592},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2512.23592}, 
}

License

The dataset is derived from the UCSD Amazon Reviews’23 dataset. Use is permitted for research and educational purposes only. By using this dataset, you agree to respect the rights of original content owners and comply with applicable terms of service.