Zhang et al. (2026) Attention-driven and multi-scale feature integrated approach for earth surface temperature data reconstruction

Identification

Journal: Geoscientific model development
Year: 2026
Date: 2026-01-06
Authors: Minghui Zhang, Yunjie Chen, Fan Yang, Zhengkun Qin
DOI: 10.5194/gmd-19-73-2026

Research Groups

School of Mathematics and Statistics, Nanjing University of Information Science and Technology, Nanjing 210044, China
School of Atmospheric Sciences, Nanjing University of Information Science and Technology, Nanjing 210044, China

Short Summary

This paper introduces ESTD-Net, a novel deep learning model for Earth Surface Temperature (EST) data inpainting, which effectively reconstructs missing data by integrating enhanced multi-head context attention and a convolutional U-Net for refinement. The model significantly outperforms existing methods in both pixel-level accuracy and perceptual quality, offering a robust solution for restoring EST data.

Objective

To develop a robust deep learning model (ESTD-Net) for high-resolution Earth Surface Temperature (EST) data reconstruction, specifically addressing extensive missing data caused by satellite orbital gaps, cloud cover, and sensor errors.

Study Configuration

Spatial Scale: Global coverage, with a gridded resolution of 0.5° × 0.5°. Specific analyses focused on 256 × 256 pixel regions, and case studies included the eastern coastline of North America (25° N to 60° N, 279° E to 300° E) and the eastern Pacific (10° N to 10° S, 282° E to 312° E).
Temporal Scale: Full calendar year of 2023, with twice-daily global gridded surface temperature products centered at 06:00 UTC and 18:00 UTC.

Methodology and Data

Models used:
- Proposed: ESTD-Net (two-stage architecture: convolutional module and masked transformer module for initial reconstruction, followed by a convolutional U-Net for autoregressive refinement).
- Comparison: Inverse Distance Weighting (IDW), Partial Convolutions U-Net (PConv U-Net), Palette (diffusion-based), MAT (Mask-Aware Transformer).
Data sources:
- Primary: Microwave Radiation Imager (MWRI) aboard the FengYun-3D (FY-3D) satellite.
- Reference/Ground Truth: ERA5 reanalysis surface temperature data (hourly, interpolated to 0.5° × 0.5°).
- Benchmark Dataset: Simulated "MWRI-like" inputs with gaps created by applying actual MWRI coverage masks to full ERA5 fields.

Main Results

ESTD-Net achieved superior performance compared to IDW, PConv U-Net, Palette, and MAT, with the lowest Mean Absolute Error (MAE) of 0.0522, Root Mean Square Error (RMSE) of 0.2000, and highest Peak Signal-to-Noise Ratio (PSNR) of 56.9911, and Structural Similarity Index (SSIM) of 0.9985.
The integration of Weighted Reconstruction Loss significantly reduced MAE by 17.3% and RMSE by 18.1% compared to a baseline model (Ladv + LP).
Gradient Consistency Regularization further improved accuracy, leading to overall reductions of 22.0% in MAE and 24.7% in RMSE relative to the baseline, enhancing physical consistency and preserving temperature gradients.
Ablation studies confirmed the critical contributions of the Context Attention Module and the Stage II Conv-U-Net for reconstruction quality, with their removal leading to substantial performance degradation.
Qualitatively, ESTD-Net demonstrated superior structural continuity, smoother boundaries, and accurate reconstruction of internal spatial patterns, particularly in high-gradient regions like land-sea interfaces.
The model exhibited strong temporal stability, consistently aligning with ground-truth temperatures throughout 2023, even during an El Niño event and varying missing data rates.
ESTD-Net showed improved training efficiency and lower model complexity (95.8 million parameters) compared to a conventional ViT-baseline (102 million parameters).

Contributions

Proposed a gradient consistency regularization framework that enforces physical consistency in inpainted regions by minimizing the L1-norm of gradient discrepancies.
Designed an adaptive weighted reconstruction loss that dynamically prioritizes missing regions during optimization, improving data recovery precision.
Developed a boundary-aware transformer module with reinforced attention mechanisms for edge preservation and subpixel-level accuracy in transition zones.
Integrated a lightweight CNN-based U-Net for autoregressive refinement to enhance local texture continuity and suppress local artifacts.
Curated a temporally diagnostic dataset of surface temperatures at 06:00 UTC and 18:00 UTC, providing critical baselines for studying climate dynamics.

Funding

National Social Science Foundation of China (grant no. 42375004).

Citation

@article{Zhang2026Attentiondriven,
  author = {Zhang, Minghui and Chen, Yunjie and Yang, Fan and Qin, Zhengkun},
  title = {Attention-driven and multi-scale feature integrated approach for earth surface temperature data reconstruction},
  journal = {Geoscientific model development},
  year = {2026},
  doi = {10.5194/gmd-19-73-2026},
  url = {https://doi.org/10.5194/gmd-19-73-2026}
}

Original Source: https://doi.org/10.5194/gmd-19-73-2026