Yan et al. (2026) Comparative Evaluation of Multi-Source Geospatial Data and Machine Learning Models for Hourly Near-Surface Air Temperature Mapping
Identification
- Journal: Atmosphere
- Year: 2026
- Date: 2026-01-09
- Authors: Zexiang Yan, Y Chen, Ruoxue Li, Meiling Gao
- DOI: 10.3390/atmos17010071
Research Groups
- State Key Laboratory of Loess Science, Chang’an University, Xi’an, China
- College of Geological Engineering and Geomatics, Chang’an University, Xi’an, China
- Big Data Center for Geosciences and Satellites (BDCGS), Xi’an, China
- Key Laboratory of Ecological Geology and Disaster Prevention, Ministry of Natural Resources, Xi’an, China
Short Summary
This study systematically evaluates multi-source land surface temperature (LST) datasets and machine learning models for hourly near-surface air temperature (NSAT) mapping across two contrasting regions in Shaanxi, China. It finds that single-source LST inputs (MODIS in mountainous regions, CGLS in urban areas) outperform multi-source stacking, and the Geospatial-Temporal Neural Network Weighted Regression (GTNNWR) model consistently achieves the highest accuracy.
Objective
- To systematically evaluate the performance of four widely used land surface temperature (LST) datasets (MODIS, ERA5-Land, FY-2F, CGLS) and five machine learning models (RF, MDN, DNN, XGBoost, GTNNWR) for hourly near-surface air temperature (NSAT) estimation.
- To identify effective data-model configurations for generating high-resolution (1 km, hourly) NSAT products and provide methodological insights for climate and environmental applications in regions with complex terrain or strong urban heterogeneity.
Study Configuration
- Spatial Scale: Two representative regions in Shaanxi Province, China: a complex-terrain mountainous region in southwestern Shaanxi and the urban area of Xi’an. All data were preprocessed to a uniform spatial resolution of 1 km.
- Temporal Scale: Hourly resolution. The study period covered 05:00 UTC on 3 August to 11:00 UTC on 5 August and from 13:00 UTC on 7 August to 23:00 UTC on 10 August.
Methodology and Data
- Models used:
- Random Forest (RF)
- Mixture Density Network (MDN)
- Deep Neural Network (DNN)
- Extreme Gradient Boosting (XGBoost)
- Geospatial-Temporal Neural Network Weighted Regression (GTNNWR)
- Data sources:
- Land Surface Temperature (LST) Data:
- MODIS LST (MOD11A1 daily product, interpolated to hourly).
- ERA5-Land LST (hourly, 0.1° spatial resolution).
- FY-2F LST (hourly, approximately 5 km spatial resolution).
- CGLS LST (hourly, approximately 5 km spatial resolution, multi-satellite composite).
- Auxiliary Variables:
- Meteorological: Downward longwave radiation, downward shortwave radiation, 10 m wind speed, precipitation rate, specific humidity (from ERA5-Land).
- Land Surface Features: Enhanced Vegetation Index (EVI) (MOD13A2, 1 km, 16-day, interpolated to daily), elevation and slope (SRTM digital elevation model), population density (GPWv4.11).
- Geographical: Longitude, latitude.
- Temporal: Year, month, day, hour.
- Ground Truth Data: Hourly near-surface air temperature observations from China Meteorological Administration meteorological stations.
- Platform: Google Earth Engine (GEE) cloud platform and Python 3.9.
- Land Surface Temperature (LST) Data:
Main Results
- LST Data Source Performance: Single-source LST inputs consistently outperformed multi-source LST stacking. Multi-source combinations generally degraded model performance due to compounded systematic biases among heterogeneous datasets (pairwise RMSE between LST products ranged from 4.79 °C to 8.64 °C).
- Optimal LST Source by Region:
- In the mountainous southwestern Shaanxi region, MODIS LST provided the best performance (RMSE: 0.9166 °C, MAE: 0.6700 °C, R²: 0.9579).
- In the urban Xi’an City, CGLS LST excelled (RMSE: 0.7880 °C, MAE: 0.5756 °C, R²: 0.9748).
- Machine Learning Model Performance:
- GTNNWR consistently achieved the highest accuracy in both study areas, significantly outperforming other models.
- In southwestern Shaanxi (using MODIS LST), GTNNWR achieved RMSE: 0.3861 °C, MAE: 0.2477 °C, R²: 0.9927.
- In Xi’an City (using CGLS LST), GTNNWR achieved RMSE: 0.3588 °C, MAE: 0.2032 °C, R²: 0.9949.
- GTNNWR reduced RMSE by 44.8% and 44.2% relative to the second-best model (XGBoost) in the two study areas, respectively.
- The overall ranking of models by predictive accuracy was: GTNNWR > XGBoost > DNN > RF > MDN.
- All models achieved relatively high performance, with mean absolute errors for hourly air temperature predictions remaining below 1 °C and coefficients of determination exceeding 0.9.
- GTNNWR consistently achieved the highest accuracy in both study areas, significantly outperforming other models.
- Spatial Generalization: GTNNWR demonstrated superior spatial generalization ability and stability, maintaining high estimation accuracy across regions with different terrain and climatic conditions.
- LST Data Source Impact on Models: MDN and RF models were more sensitive to LST data quality and spatial resolution, while XGBoost showed strong adaptability to variations in input features. Models in the urban region (Xi’an) exhibited higher sensitivity to data source selection compared to the mountainous region.
Contributions
- Provides a systematic and comprehensive comparative evaluation of multi-source LST datasets and machine learning models for hourly near-surface air temperature estimation.
- Identifies the regional dependence of optimal LST data source selection, recommending MODIS for complex terrain and CGLS for urban environments.
- Demonstrates that combining multiple LST sources can degrade model performance due to conflicting systematic biases, advocating for the selection of a single high-quality regional source.
- Highlights GTNNWR as a superior model for hourly NSAT estimation, attributed to its ability to capture spatiotemporal non-stationarity, significantly improving accuracy over other state-of-the-art machine learning models.
- Offers practical guidance for selecting suitable LST sources and models, supporting the production and application of high spatiotemporal resolution temperature datasets.
Funding
- National Natural Science Foundation of China (No. 42471392)
- Fundamental and Interdisciplinary Disciplines Breakthrough Plan of the Ministry of Education of China (No. JYB2025XDXM10)
- Fundamental Research Funds for the Central Universities, CHD (No. 300102265203)
Citation
@article{Yan2026Comparative,
author = {Yan, Zexiang and Chen, Y and Li, Ruoxue and Gao, Meiling},
title = {Comparative Evaluation of Multi-Source Geospatial Data and Machine Learning Models for Hourly Near-Surface Air Temperature Mapping},
journal = {Atmosphere},
year = {2026},
doi = {10.3390/atmos17010071},
url = {https://doi.org/10.3390/atmos17010071}
}
Original Source: https://doi.org/10.3390/atmos17010071