Afaq et al. (2026) ViTs-based Dual Metric Deep Learning Technique for change detection from high-resolution satellite images

Identification

Journal: Remote Sensing Applications Society and Environment
Year: 2026
Date: 2026-01-01
Authors: Yasir Afaq, Nouhaila El Koufi
DOI: 10.1016/j.rsase.2026.101956

Research Groups

Department of Computer Science and Engineering, SRM University-AP, Amaravati, Andhra Pradesh, India
Laboratory of Mathematical Modeling and Economic Calculations, FEG, Morocco

Short Summary

This paper proposes ViT-DMDLT, a deep learning framework leveraging Vision Transformers and Convolutional Neural Networks, to effectively detect small-scale land-use and land-cover changes from super-resolution satellite imagery, demonstrating superior accuracy across multiple public datasets.

Objective

To develop a robust deep learning framework capable of effectively detecting small-scale and complex land-use and land-cover changes from super-resolution satellite images, overcoming limitations in capturing both spatial and temporal variations.

Study Configuration

Spatial Scale: High-resolution and super-resolution satellite images, focusing on small-scale and complex land-use and land-cover changes.
Temporal Scale: Change detection between temporal images.

Methodology and Data

Models used: Vision Transformer-based Dual Metric Deep Learning Technique (ViT-DMDLT), which integrates Vision Transformers (ViTs), Convolutional Neural Networks (CNNs), and a Dual Metric Network (DMN).
Data sources: High-resolution and low-resolution satellite images (integrated to obtain super-resolution images). Validated on publicly available datasets: SYSU, Cropland, and LEVIR-CD.

Main Results

The ViT-DMDLT framework achieved superior performance in change detection.
Overall accuracy of 96.78% on the SYSU dataset.
Overall accuracy of 90.77% on the Cropland dataset.
Overall accuracy of 98.12% on the LEVIR-CD dataset.
The framework demonstrated robustness compared to other state-of-the-art models, effectively detecting land cover changes with high accuracy even for complex and small variations from super-resolution images.

Contributions

Introduction of ViT-DMDLT, a novel deep learning framework for change detection in super-resolution satellite images, combining the strengths of Vision Transformers, Convolutional Neural Networks, and a Dual Metric Network.
Addresses the significant challenge of monitoring small-scale and complex land-use and land-cover changes, which is crucial for various remote sensing applications.
Integrates low-resolution and high-resolution satellite images to generate super-resolution imagery, enhancing the input quality for change detection.
Utilizes ViT encoders to capture overall spatial dependencies and a dual metric to ensure robust features by reducing intra-class changes and enhancing inter-class separability among temporal images.
Achieves superior and robust performance compared to existing state-of-the-art models across multiple public datasets, demonstrating its effectiveness in complex scenarios.

Funding

Not specified in the provided text.

Citation

@article{Afaq2026ViTsbased,
  author = {Afaq, Yasir and Koufi, Nouhaila El},
  title = {ViTs-based Dual Metric Deep Learning Technique for change detection from high-resolution satellite images},
  journal = {Remote Sensing Applications Society and Environment},
  year = {2026},
  doi = {10.1016/j.rsase.2026.101956},
  url = {https://doi.org/10.1016/j.rsase.2026.101956}
}

Original Source: https://doi.org/10.1016/j.rsase.2026.101956