Ali et al. (2025) A hybrid 3D CNN-LSTM model with soft spatial attention mechanism for accurate hyperspectral image classification

Identification

Journal: Remote Sensing Applications Society and Environment
Year: 2025
Date: 2025-10-28
Authors: Mohamed Sultan Mohamed Ali, Md. Sakib Bin Islam, Molla Ehsanul Majid, Saad Bin Abul Kashem, Amith Khandakar, Muhammad E. H. Chowdhury
DOI: 10.1016/j.rsase.2025.101779

Research Groups

Department of Electrical Engineering, College of Engineering, Qatar University, Doha, Qatar
Computer Applications Department, Academic Bridge Program, Qatar Foundation, Doha, Qatar
Department of Computing Science, AFG College with the University of Aberdeen, Doha, Qatar

Short Summary

This study introduces a hybrid deep learning model combining 3D Convolutional Neural Networks (CNNs) with Long Short-Term Memory (LSTM) networks, enhanced by residual connections and a soft spatial attention mechanism, to improve hyperspectral image classification accuracy. The proposed model achieved remarkable overall accuracies of 99.66 % on the Indian Pines dataset and 99.58 % on the Salinas dataset, outperforming current leading methods.

Objective

To create and evaluate a customized deep learning model that integrates 3D CNNs for spatial-spectral feature extraction with LSTM layers to capture sequential dependencies, augmenting the model’s capacity to discern intricate patterns in hyperspectral images.
To integrate residual connections into the CNNs design to mitigate gradient vanishing and facilitate smoother convergence during training.
To incorporate a soft spatial attention mechanism with convolutional layers to emphasize significant sections of the hyperspectral image while diminishing less relevant areas.
To utilize Incremental Principal Component Analysis (IPCA) to decrease the dimensionality of hyperspectral data while preserving critical spectral information, enhancing computing efficiency.
To extract spatial patches from hyperspectral images to provide localized data segments, enhancing training efficiency and boosting classification accuracy.
To evaluate the model on external datasets such as Salinas, in addition to the Indian Pines dataset, to assess its generalization across other data sources.

Study Configuration

Spatial Scale: Hyperspectral images with pixel dimensions of 145 × 145 (Indian Pines) and 512 × 217 (Salinas), with the Salinas dataset having a high spatial resolution of 3.7 meters per pixel. The model processes 5 × 5 pixel spatial patches.
Temporal Scale: Not explicitly defined for the study configuration, but the model processes instantaneous hyperspectral image data. The LSTM component captures sequential dependencies across spectral bands, not temporal changes in scenes.

Methodology and Data

Models used:
- Proposed: Hybrid deep learning model combining 3D Convolutional Neural Networks (CNNs) with Long Short-Term Memory (LSTM) networks, incorporating residual connections and a soft spatial attention mechanism.
- Preprocessing: Incremental Principal Component Analysis (IPCA) for dimensionality reduction, ZeroPad function for image padding, and a sliding window technique for 5 × 5 pixel patch extraction.
- Baseline models for comparison: VGG16, ResNet50, InceptionV3, Xception, DenseNet121, MobileNetV2.
Data sources:
- Indian Pines dataset: Acquired by the AVIRIS sensor, 145 × 145 pixels, 200 spectral bands (after preprocessing), 16 land cover classes.
- Salinas dataset: Acquired by the AVIRIS sensor, 512 lines by 217 samples, 3.7-meter pixels, 204 spectral bands (after preprocessing), 16 agricultural land cover categories.

Main Results

The proposed hybrid model achieved an overall accuracy (OA) of 99.66 % on the Indian Pines dataset, with an average accuracy (AA) of 99.71 % and a Cohen’s Kappa score of 0.99.
For external validation on the Salinas dataset, the model achieved an overall accuracy (OA) of 99.58 % and a test accuracy of 99.58 %.
The model demonstrated high precision, recall, and F1-scores (often 1.00) across most land cover classes in both datasets, indicating strong classification performance.
Training duration for the custom LSTM model was approximately 110.17 seconds for Indian Pines and 98.1 seconds for Salinas, with a prediction duration of 1.76 seconds.
Ablation studies confirmed the essential functions of residual connections in enhancing training stability and the soft spatial attention mechanism in mitigating redundant features, although an anomaly was observed where removing residual connections led to slightly higher accuracy, hypothesized to be due to reduced complexity on smaller datasets.
The model exhibited strong generalization capabilities across diverse datasets, showing only a few misclassifications, even among spectrally similar classes.

Contributions

Introduced a novel hybrid deep learning architecture that synergistically combines 3D CNNs for joint spatial-spectral feature extraction with LSTMs for capturing sequential dependencies across spectral bands.
Integrated residual connections to effectively mitigate the vanishing gradient problem, enhancing training stability and enabling deeper network architectures.
Incorporated a soft spatial attention mechanism to dynamically emphasize discriminative spatial regions and reduce the impact of redundant features, leading to improved classification accuracy.
Achieved state-of-the-art classification performance on benchmark hyperspectral datasets (Indian Pines and Salinas), demonstrating superior accuracy compared to existing leading methods.
Provided comprehensive ablation studies that quantitatively validated the individual contributions of residual connections and the soft spatial attention mechanism to the model's overall efficacy.

Funding

Not applicable.

Citation

@article{Ali2025hybrid,
  author = {Ali, Mohamed Sultan Mohamed and Islam, Md. Sakib Bin and Majid, Molla Ehsanul and Kashem, Saad Bin Abul and Khandakar, Amith and Chowdhury, Muhammad E. H.},
  title = {A hybrid 3D CNN-LSTM model with soft spatial attention mechanism for accurate hyperspectral image classification},
  journal = {Remote Sensing Applications Society and Environment},
  year = {2025},
  doi = {10.1016/j.rsase.2025.101779},
  url = {https://doi.org/10.1016/j.rsase.2025.101779}
}

Original Source: https://doi.org/10.1016/j.rsase.2025.101779