Pinheiro et al. (2025) Enhancing machine learning-based seasonal precipitation forecasting using CMIP6 simulations
Identification
- Journal: Atmospheric Research
- Year: 2025
- Date: 2025-09-09
- Authors: Enzo Pinheiro, Taha B. M. J. Ouarda
- DOI: 10.1016/j.atmosres.2025.108463
Research Groups
- Institut National de la Recherche Scientifique, Centre Eau-Terre-Environnement, Québec, Canada
Short Summary
This study demonstrates that training machine learning (ML) models for seasonal precipitation forecasting with a larger number of individual simulations from CMIP6 models significantly enhances their generalization ability and improves forecasts over South America. These CMIP6-trained ML models consistently outperform those trained with limited reanalysis data (ERA5) and state-of-the-art dynamical models.
Objective
- To investigate the advantages and limitations of using CMIP6 data to train machine learning-based seasonal forecasting (MLSF) models for seasonal precipitation forecasting.
- To assess how the number of CMIP6 models used during training affects the ML model's generalization ability.
- To compare the performance of MLSF models trained on individual CMIP6 model outputs versus their ensemble mean.
- To quantify the added value of CMIP6 simulations by comparing their performance against an MLSF model trained with ERA5 data.
- To compare the performance of the CMIP6-based ML model with state-of-the-art dynamical models.
Study Configuration
- Spatial Scale: South America, with data bilinearly interpolated to a common 1° × 1° grid. Original data resolutions include 0.25° for ERA5 and 2° for ERSSTv5.
- Temporal Scale:
- CMIP6 historical simulations: 1850–2014.
- ERA5 reanalysis: 1940 to present.
- ERSSTv5: 1854 to present.
- Training period: 1851–1981.
- Validation period: 1982–2002.
- Test period: Bootstrapped years from 2003 to 2023.
- Forecasts: Seasonal precipitation (3-month periods) at various initialization months (February, May, August, November) and lead times (up to three leads).
Methodology and Data
- Models used:
- TelNet: A sequence-to-sequence machine learning model designed for seasonal climate forecasting.
- CMIP6 models: Historical simulations from 18 individual models (e.g., CanESM5-CanOE, MPI-ESM1-2-HR, ACCESS-CM2).
- SEAS5: ECMWF seasonal forecasting system.
- NMME4: North American Multi-Model Ensemble project.
- Data sources:
- Monthly total precipitation, sea surface temperature (SST), and 10-meter wind components from CMIP6 historical simulations.
- Monthly atmospheric variables and total precipitation from ERA5 reanalysis.
- Extended Reconstructed SST version 5 (ERSSTv5).
- Seasonal precipitation forecasts from Copernicus Climate Change Service (C3S) for SEAS5 and the North American Multi-Model Ensemble (NMME) project for NMME4.
- Climate indices (e.g., ONI, ATN, IOBW) derived from SST and atmospheric variables.
Main Results
- Machine learning models trained with a small number of CMIP6 simulations perform worse than those trained with ERA5, attributed to instability during ML model tuning and reduced generalization ability.
- As the number of CMIP6 models used for training increases, the performance of the ML models improves, surpassing both ERA5-based ML models and those trained with the CMIP6 ensemble mean.
- Models trained with 9 or 18 CMIP6 simulations consistently outperform ERA5-TelNet across all seasons, with statistically significant improvements.
- Performance gains show diminishing returns beyond nine CMIP6 models, as 9-TelNet and 18-TelNet exhibit nearly identical performance.
- Reliability and sharpness diagrams indicate that ML models trained with more CMIP6 simulations yield more confident and calibrated forecasts, demonstrating improved forecast calibration.
- CMIP6-based TelNet (e.g., 9-TelNet) consistently matched or outperformed most state-of-the-art dynamical models (SEAS5, NMME4) across different initialization months and lead times, particularly for December–January–February (DJF) forecasts initialized in November.
- ML models incorporating the Oceanic Niño Index (ONI) as a covariate show better performance in the Amazon region, while those with tropical Atlantic indices (ATN-, ATS-, ATL-SST) perform better in northeastern Brazil.
- All ML models generally assign probabilities between 20 % and 60 % for each forecast category, suggesting low confidence, though confidence slightly increases in regions with high predictability (e.g., Amazon basin).
Contributions
- Demonstrates that leveraging a larger number of individual multi-model dynamical simulations from CMIP6 can significantly enhance the generalization ability and robustness of machine learning-based seasonal precipitation forecasting models.
- Quantifies the impact of the number of CMIP6 models on ML model performance, identifying a threshold for diminishing returns in forecast skill improvement.
- Provides a comprehensive comparison of CMIP6-trained ML models against both ERA5-trained ML models and current state-of-the-art dynamical seasonal forecasting systems.
- Highlights the critical role of robust model and predictor selection processes in scenarios with limited training data from CMIP6 models.
- Utilizes TelNet, a recently developed interpretable ML model, to conduct this assessment in a region with high seasonal predictability (South America).
Funding
- Natural Sciences and Engineering Research Council of Canada (NSERC)
- Canada Research Chairs Program
- Canadian Research Knowledge Network (CRKN)
- Copernicus Climate Change Service (C3S) (for ERA5 and SEAS5 data)
- NOAA, NSF, NASA, and DOE (for supporting the NMME project)
- Digital Research Alliance of Canada (for computational resources)
Citation
@article{Pinheiro2025Enhancing,
author = {Pinheiro, Enzo and Ouarda, Taha B. M. J.},
title = {Enhancing machine learning-based seasonal precipitation forecasting using CMIP6 simulations},
journal = {Atmospheric Research},
year = {2025},
doi = {10.1016/j.atmosres.2025.108463},
url = {https://doi.org/10.1016/j.atmosres.2025.108463}
}
Original Source: https://doi.org/10.1016/j.atmosres.2025.108463