Zhang et al. (2026) Attention-driven and multi-scale feature integrated approach for earth surface temperature data reconstruction
Identification
- Journal: Geoscientific model development
- Year: 2026
- Date: 2026-01-06
- Authors: Minghui Zhang, Yunjie Chen, Fan Yang, Zhengkun Qin
- DOI: 10.5194/gmd-19-73-2026
Research Groups
- School of Mathematics and Statistics, Nanjing University of Information Science and Technology, Nanjing 210044, China
- School of Atmospheric Sciences, Nanjing University of Information Science and Technology, Nanjing 210044, China
Short Summary
This paper introduces ESTD-Net, a novel deep learning model for Earth Surface Temperature (EST) data inpainting, which effectively reconstructs missing data by integrating enhanced multi-head context attention and a convolutional U-Net for refinement. The model significantly outperforms existing methods in both pixel-level accuracy and perceptual quality, offering a robust solution for restoring EST data.
Objective
- To develop a robust deep learning model (ESTD-Net) for high-resolution Earth Surface Temperature (EST) data reconstruction, specifically addressing extensive missing data caused by satellite orbital gaps, cloud cover, and sensor errors.
Study Configuration
- Spatial Scale: Global coverage, with a gridded resolution of 0.5° × 0.5°. Specific analyses focused on 256 × 256 pixel regions, and case studies included the eastern coastline of North America (25° N to 60° N, 279° E to 300° E) and the eastern Pacific (10° N to 10° S, 282° E to 312° E).
- Temporal Scale: Full calendar year of 2023, with twice-daily global gridded surface temperature products centered at 06:00 UTC and 18:00 UTC.
Methodology and Data
- Models used:
- Proposed: ESTD-Net (two-stage architecture: convolutional module and masked transformer module for initial reconstruction, followed by a convolutional U-Net for autoregressive refinement).
- Comparison: Inverse Distance Weighting (IDW), Partial Convolutions U-Net (PConv U-Net), Palette (diffusion-based), MAT (Mask-Aware Transformer).
- Data sources:
- Primary: Microwave Radiation Imager (MWRI) aboard the FengYun-3D (FY-3D) satellite.
- Reference/Ground Truth: ERA5 reanalysis surface temperature data (hourly, interpolated to 0.5° × 0.5°).
- Benchmark Dataset: Simulated "MWRI-like" inputs with gaps created by applying actual MWRI coverage masks to full ERA5 fields.
Main Results
- ESTD-Net achieved superior performance compared to IDW, PConv U-Net, Palette, and MAT, with the lowest Mean Absolute Error (MAE) of 0.0522, Root Mean Square Error (RMSE) of 0.2000, and highest Peak Signal-to-Noise Ratio (PSNR) of 56.9911, and Structural Similarity Index (SSIM) of 0.9985.
- The integration of Weighted Reconstruction Loss significantly reduced MAE by 17.3% and RMSE by 18.1% compared to a baseline model (Ladv + LP).
- Gradient Consistency Regularization further improved accuracy, leading to overall reductions of 22.0% in MAE and 24.7% in RMSE relative to the baseline, enhancing physical consistency and preserving temperature gradients.
- Ablation studies confirmed the critical contributions of the Context Attention Module and the Stage II Conv-U-Net for reconstruction quality, with their removal leading to substantial performance degradation.
- Qualitatively, ESTD-Net demonstrated superior structural continuity, smoother boundaries, and accurate reconstruction of internal spatial patterns, particularly in high-gradient regions like land-sea interfaces.
- The model exhibited strong temporal stability, consistently aligning with ground-truth temperatures throughout 2023, even during an El Niño event and varying missing data rates.
- ESTD-Net showed improved training efficiency and lower model complexity (95.8 million parameters) compared to a conventional ViT-baseline (102 million parameters).
Contributions
- Proposed a gradient consistency regularization framework that enforces physical consistency in inpainted regions by minimizing the L1-norm of gradient discrepancies.
- Designed an adaptive weighted reconstruction loss that dynamically prioritizes missing regions during optimization, improving data recovery precision.
- Developed a boundary-aware transformer module with reinforced attention mechanisms for edge preservation and subpixel-level accuracy in transition zones.
- Integrated a lightweight CNN-based U-Net for autoregressive refinement to enhance local texture continuity and suppress local artifacts.
- Curated a temporally diagnostic dataset of surface temperatures at 06:00 UTC and 18:00 UTC, providing critical baselines for studying climate dynamics.
Funding
- National Social Science Foundation of China (grant no. 42375004).
Citation
@article{Zhang2026Attentiondriven,
author = {Zhang, Minghui and Chen, Yunjie and Yang, Fan and Qin, Zhengkun},
title = {Attention-driven and multi-scale feature integrated approach for earth surface temperature data reconstruction},
journal = {Geoscientific model development},
year = {2026},
doi = {10.5194/gmd-19-73-2026},
url = {https://doi.org/10.5194/gmd-19-73-2026}
}
Original Source: https://doi.org/10.5194/gmd-19-73-2026