Papers
arxiv:2509.05895

BTCChat: Advancing Remote Sensing Bi-temporal Change Captioning with Multimodal Large Language Model

Published on Sep 7, 2025
Authors:
,
,
,
,

Abstract

BTCChat is a multi-temporal multimodal large language model that improves bi-temporal change analysis through specialized modules for change extraction and prompt augmentation.

AI-generated summary

Bi-temporal satellite imagery supports critical applications such as urbanization monitoring and disaster assessment. Although powerful multimodal large language models~(MLLMs) have been applied in bi-temporal change analysis, previous methods process image pairs through direct concatenation, inadequately modeling temporal correlations and spatial semantic changes. This deficiency hampers visual-semantic alignment in change understanding, thereby constraining the overall effectiveness of current approaches. To address this gap, we propose BTCChat, a multi-temporal MLLM with advanced bi-temporal change understanding capability. BTCChat supports bi-temporal change captioning and retains single-image interpretation capability. To better capture temporal features and spatial semantic changes in image pairs, we design a Change Extraction module. Moreover, to enhance the model's attention to spatial details, we introduce a Prompt Augmentation mechanism, which incorporates contextual clues into the prompt to enhance model performance. Experimental results demonstrate that BTCChat achieves state-of-the-art performance on change captioning and visual question answering tasks. The code is available https://github.com/IntelliSensing/BTCChat{here}.

Community

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2509.05895 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2509.05895 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2509.05895 in a Space README.md to link it from this page.

Collections including this paper 1