Zheng et al. (2025) Attention mechanism-based multi-scale spatiotemporal fusion for precipitation nowcasting

Identification

Journal: Theoretical and Applied Climatology
Year: 2025
Date: 2025-10-09
Authors: Xiangming Zheng, Huawang Qin, C.Q. Yin, Weixi Wang, Piao Shi, Yawen Zhu, F. Hu
DOI: 10.1007/s00704-025-05816-1

Research Groups

School of Electronic and Information Engineering, Nanjing University of Information Science & Technology, Nanjing, Jiangsu, China
School of Automation, Nanjing University of Information Science & Technology, Nanjing, Jiangsu, China
School of Computer and Information, Hefei University of Technology, Hefei, Anhui, China

Short Summary

This study proposes STAt-Former, a novel deep learning model integrating multi-scale spatiotemporal channel attention and Transformer architecture, to enhance precipitation nowcasting accuracy by effectively capturing both local and long-range spatial dependencies from radar echo images. The model demonstrates superior performance over baseline methods for forecasts up to 120 minutes using a Netherlands radar dataset.

Objective

To develop a novel precipitation nowcasting model that integrates multi-scale spatiotemporal channel attention and Transformer architecture to overcome the limitations of traditional convolutional neural networks (CNNs) in capturing long-range spatial dependencies.
To achieve high-resolution precipitation forecasts with precise intensity estimates for future time intervals of 30, 60, 90, and 120 minutes.

Study Configuration

Spatial Scale: Radar images covering the Netherlands and surrounding regions, with each pixel corresponding to 1 square kilometer. Input images are 288x288 pixels.
Temporal Scale: Input consists of 12 consecutive radar images (60 minutes total, at 5-minute intervals). Forecasts are generated for lead times of 30, 60, 90, and 120 minutes.

Methodology and Data

Models used:
- Proposed Model: STAt-Former, a dual-encoder network integrating:
  - Local Feature Extraction Block (LFEB) using Depthwise Separable Convolution (DSC) and Convolutional Block Attention Module (CBAM).
  - Global Feature Extraction Block (GFEB) using Swin-Transformer and non-local attention mechanisms.
  - Spatiotemporal Feature Fusion Block (STFB) combining a feature fusion component with a ConvLSTM block.
  - Residual network for performance enhancement.
- Baseline Models for Comparison: SmaAt-UNet, SAR-UNet, and the Persistence method.
Data sources: Publicly available radar precipitation dataset from the Royal Netherlands Meteorological Institute (KNMI), collected from two C-band Doppler weather radar stations from 2016 to 2019 (approximately 420,000 rainfall maps). Datasets were filtered into NL-20 (at least 20% rainy pixels) and NL-50 (at least 50% rainy pixels).

Main Results

STAt-Former successfully generated precipitation forecasts for 30, 60, 90, and 120-minute lead times.
On the NL-50 dataset, STAt-Former consistently outperformed baseline models (Persistence, SmaAt-UNet, SAR-UNet) across different time scales in terms of Mean Squared Error (MSE), Precision, and Accuracy.
Visualizations demonstrated STAt-Former's superior ability to accurately predict precipitation distribution, texture details, and the location and intensity of heavy rain masses compared to SmaAt-UNet and SAR-UNet.
The model exhibited good generalization ability on the NL-20 dataset, particularly for 30 and 60-minute forecasts, showing significant reductions in MSE (e.g., 55.5% for 30 minutes) compared to baselines.
Ablation experiments confirmed that the integration of both local (LFEB) and global (GFEB) feature extraction pathways, along with the Spatiotemporal Feature Fusion Block (STFB), significantly contributes to the model's enhanced performance.
Model training analysis showed that a batch size of 6 and an initial learning rate of 0.001 yielded optimal performance, and smaller convolution kernels were more suitable for precipitation forecasting.

Contributions

Proposed a novel precipitation prediction method that integrates local and global features, based on a UNet architecture incorporating a Swin-Transformer network to expand the CNN's receptive field and capture long-range dependencies within radar echo sequences.
Designed the STAt-Former model, which integrates multiple attention mechanisms for interaction between local and global features and employs a ConvLSTM module to capture temporal dynamics, facilitating hierarchical feature extraction and fine-grained spatiotemporal dependency capture.
Constructed a network architecture capable of adaptively fusing both local and global features from radar precipitation images for multi-step ahead precipitation forecasting (30, 60, 90, and 120 minutes), validated on an open-access precipitation dataset from the Netherlands.

Funding

NSFC-CMA Joint Research Grant U2342222

Citation

@article{Zheng2025Attention,
  author = {Zheng, Xiangming and Qin, Huawang and Yin, C.Q. and Wang, Weixi and Shi, Piao and Zhu, Yawen and Hu, F.},
  title = {Attention mechanism-based multi-scale spatiotemporal fusion for precipitation nowcasting},
  journal = {Theoretical and Applied Climatology},
  year = {2025},
  doi = {10.1007/s00704-025-05816-1},
  url = {https://doi.org/10.1007/s00704-025-05816-1}
}

Original Source: https://doi.org/10.1007/s00704-025-05816-1