Goudarzi et al. (2026) A machine learning-based backward extension of IMERG daily precipitation over the Greater Alpine Region
Identification
- Journal: Atmospheric Research
- Year: 2026
- Date: 2026-01-13
- Authors: Iman Goudarzi, Davide Fazzini, Claudia Pasquero, Agostino N. Meroni, Matteo Borgnino
- DOI: 10.1016/j.atmosres.2026.108763
Research Groups
- Department of Earth and Environmental Sciences, University of Milano-Bicocca, Milan, Italy
- Department of Physics, University of Milano-Bicocca, Milan, Italy
Short Summary
This study develops a machine learning-based approach to extend the high-resolution IMERG satellite precipitation product backward in time over the Greater Alpine Region, using ERA5 reanalysis data as predictors. The resulting ML-IMEX-GAR dataset for 1960–2000 significantly reduces biases compared to ERA5 and outperforms dynamical downscaling, providing a valuable resource for climate and hydrological studies.
Objective
- To develop a machine learning-based approach to enhance ERA5 reanalysis precipitation estimates using the satellite-derived IMERG product as a reference.
- To create a new daily rainfall dataset, ML-IMEX-GAR, at IMERG’s spatial resolution (0.1°) for the historical period 1960–2000 over the Greater Alpine Region, closely mirroring IMERG’s spatial and temporal characteristics to enable improved detection of climate change signals.
Study Configuration
- Spatial Scale: Greater Alpine Region (GAR), covering longitudes 4.05° E to 18.95° E and latitudes 43.05° N to 48.95° N, at a 0.1° spatial resolution.
- Temporal Scale: Daily precipitation data for the period 1960–2000 (ML-IMEX-GAR dataset). The machine learning model was trained and validated using data from 2001–2020.
Methodology and Data
- Models used:
- Extreme Gradient Boosting (XGBoost) for machine learning.
- Shapley Additive Explanations (SHAP) for feature selection.
- Weather Research and Forecasting (WRF) model (used for the CHAPTER dataset, which served as a comparison).
- Data sources:
- Target/Reference: IMERG Final Run Version 07 (Integrated Multi-satellite Retrievals for GPM) daily precipitation (2001–2020).
- Input/Predictors: ERA5 reanalysis variables (24 standard variables + 4 calculated variables: specific humidity at 2 m (q2m), bulk shear (bs), column-average mixing ratio (camr), water vapor transport (wvt)).
- Topography: ETOPO 2022 v1 dataset.
- Validation/Benchmarking:
- HISTALP (Historical Instrumental Climatological Surface Time Series of the Greater Alpine Region) - monthly in-situ precipitation records.
- EEAR-Clim (Extended European Alpine Region Climate) - daily in-situ precipitation records.
- CHAPTER (Computational Hydrometeorology – with Advanced Processing Tools to Enhanced Realism) - high-resolution hindcast of ERA5 using the WRF model.
Main Results
- The ML-IMEX-GAR dataset, a backward temporal extension of IMERG, was successfully generated for the Greater Alpine Region from 1960 to 2000 at a 0.1° daily resolution.
- SHAP analysis identified total column ice water (tciw), ERA5 precipitation (ppt-era5), and column-average mixing ratio (camr) as the three most influential predictors for fine-scale precipitation reconstruction.
- The ML-IMEX-GAR dataset reduced the spatiotemporal Root Mean Square Deviation (RMSD) against IMERG by approximately 14% compared to ERA5 (from 5.13 mm/d for ERA5 to 4.39 mm/d for ML-IMEX-GAR).
- When compared to in-situ monthly HISTALP data, ML-IMEX-GAR achieved a significantly higher coefficient of determination (R²) of 0.87 and lower RMSE than ERA5 (R²=0.44) and CHAPTER (R²=0.67).
- For daily comparisons with EEAR-Clim data, ML-IMEX-GAR showed the highest R² (0.74) and lowest RMSE, outperforming both original and downscaled ERA5 (R²=0.14 and 0.68, respectively) and CHAPTER (R²=0.73) in most elevation classes.
- The model's performance was particularly strong at low and mid-elevations, with RMSD generally increasing with precipitation intensity and elevation, though still outperforming ERA5.
Contributions
- Development of a novel machine learning-based algorithm (ML-IMEX) for backward temporal extension of satellite-derived precipitation products, specifically IMERG.
- Creation of ML-IMEX-GAR, a unique high-resolution (0.1°), daily precipitation dataset for the Greater Alpine Region covering 1960–2000, which mirrors IMERG characteristics and significantly extends its temporal coverage.
- Demonstration of the effectiveness of machine learning (XGBoost with SHAP) in correcting reanalysis biases and improving historical precipitation reconstructions in complex terrains.
- Provision of a valuable, validated dataset for climate studies, hydrological modeling, and climate change research in data-scarce, complex orography regions.
- The method offers a computationally efficient alternative that outperforms a more resource-intensive dynamical downscaling product (CHAPTER) in most validation metrics.
Funding
- MAXM project, funded by the Bicocca Starting Grant 2023 program of the University of Milano-Bicocca, Italy.
- Agostino Niyonkuru Meroni was partially supported by OGS and CINECA under the HPC-TRES program, award number 2023–04, CUP H45E23000410001.
- Matteo Borgnino was supported by the European Union - Next Generation EU, Mission 4 Component 1 CUP H53D23011300001, project LocCLIMA.
Citation
@article{Goudarzi2026machine,
author = {Goudarzi, Iman and Fazzini, Davide and Pasquero, Claudia and Meroni, Agostino N. and Borgnino, Matteo},
title = {A machine learning-based backward extension of IMERG daily precipitation over the Greater Alpine Region},
journal = {Atmospheric Research},
year = {2026},
doi = {10.1016/j.atmosres.2026.108763},
url = {https://doi.org/10.1016/j.atmosres.2026.108763}
}
Original Source: https://doi.org/10.1016/j.atmosres.2026.108763