McAdam et al. (2025) Feature selection for data-driven seasonal forecasts of European heatwaves
Identification
- Journal: Communications Earth & Environment
- Year: 2025
- Date: 2025-11-04
- Authors: Ronan McAdam, Antonello Squintu, César Peláez-Rodríguez, Felicitas Hansen, Verónica Torralba, Harilaos Loukos, Eduardo Zorita, Matteo Giuliani, Leone Cavicchia, Sancho Salcedo‐Sanz, Enrico Scoccimarro
- DOI: 10.1038/s43247-025-02863-4
Research Groups
- CMCC Foundation - Euro-Mediterranean Center on Climate Change, Bologna, Italy
- Department of Signal Processing and Communications, Universidad de Alcalá, Madrid, Spain
- Programa de doctorado en Computación Avanzada, Energía y Plasmas, Universidad de Córdoba, Córdoba, Spain
- Helmholtz-Zentrum Hereon, Geesthacht, Germany
- Barcelona Supercomputing Center (BSC), Barcelona, Spain
- The Climate Data Factory (TCDF), Paris, France
- Department of Electronics, Information and Bioengineering, Politecnico di Milano (POLIMI), Milan, Italy
Short Summary
This study develops an inexpensive, purely data-driven machine learning approach for seasonal forecasting of European summer heatwaves, demonstrating skill comparable to, and in some regions outperforming, state-of-the-art dynamical multi-model products, while also identifying key predictors and their optimal time-lags.
Objective
- Develop an inexpensive, purely data-driven approach using an optimisation-based feature selection framework to skilfully forecast summer heatwaves over Europe, identify key predictors and their time-lags, and compare its performance against state-of-the-art dynamical forecasting systems.
Study Configuration
- Spatial Scale: European domain, including central Europe, Mediterranean Basin, Black Sea basin, Scandinavia, northern central Europe, western Russia, Barents Sea, northern Italy. Grid resolution of 1° x 1°.
- Temporal Scale: Seasonal forecasts for May-June-July (MJJ) heatwave indicators (NDQ90). Predictors used from November 1st to April 30th, with time-lags up to 28 weeks prior to May 1st. Training period: 0-1850 (paleo-simulation). Test/validation period: 1993-2016 (modern era).
Methodology and Data
- Models used:
- Optimisation-based feature selection framework: Spatio-Temporal Cluster-Optimized Feature Selection (STCO-FS) using Probabilistic Coral Reef Optimization algorithm with Substrate Layers (PCRO-SL).
- Prediction models: Multiple Linear Regression (for optimization), Random Forest, Light Gradient Boost, AdaBoost, Multi-Layer Perceptron (for real-world forecasts).
- Clustering: Enhanced k-means clustering.
- Data sources:
- Paleoclimate simulation: MPI-ESM1.2-LR "past2k" (0-1850), with atmospheric component ECHAM6.3 (1.875° horizontal resolution) and ocean component MPIOM1.63 (1.5° horizontal resolution).
- Reanalysis: ERA5 (1993-2016) for target and predictor data, 0.25° regridded to 1° resolution. Variables: mean sea level pressure (SLP), volumetric soil moisture content (SM), sea ice concentration (SIC), sea surface temperature (SST), geopotential height at 500 hPa (z500), outgoing longwave radiation (OLR), daily maximum 2 m temperature (TMX).
- Dynamical seasonal forecast systems (for comparison): Copernicus Climate Change Service (C3S) operational systems: CMCC-35 (40 ensemble members), DWD-21 (30 ensemble members), ECMWF-51 (25 ensemble members), MF-8 (25 ensemble members). All 1° horizontal spatial resolution, burst-mode initialisation.
- Heatwave index: Number of days from May 1st to July 31st where TMX exceeds the climatological 90th percentile (NDQ90).
Main Results
- The purely data-driven forecasts match, and in some regions outperform, the skill of state-of-the-art dynamical multi-model products for European summer heatwaves.
- The data-driven approach improves forecast skill over Scandinavia and northern central Europe, a region where dynamical systems typically exhibit low skill.
- Data-driven re-forecasts display statistically significant correlation skill scores over 56% of the European domain, comparable to the C3S multi-model product (58%).
- The greatest contribution to forecast skill comes from predictors at 4-7 weeks time-lag (e.g., mid-March).
- Key predictors identified include European soil moisture, temperature, and geopotential height (z500) clusters (local), as well as sea surface temperature over the equatorial Pacific and outgoing longwave radiation over the tropical Atlantic (distant precursors).
- The computational expense of the data-driven approach is very low, requiring approximately 1 CPU-hour per grid cell for optimization (total ~1000 CPU-hours for Europe) and minutes for forecasts.
- The system successfully forecasts exceptional heatwave events, such as the 2003 and 2015 heatwaves in northern Italy.
- Increasing the training period from 50 to 1850 years using paleo-simulation data significantly improves forecast skill, with a plateau observed after approximately 1000 years of training data.
Contributions
- Development of a computationally inexpensive, purely data-driven seasonal forecasting system for European summer heatwaves that achieves skill comparable to or exceeding state-of-the-art dynamical multi-model products.
- Improvement of forecast skill in traditionally challenging regions for dynamical systems, such as Scandinavia and northern central Europe.
- Identification of optimal predictor variables, spatial domains, and time-lags (e.g., 4-7 weeks prior to initialisation) for European heatwave forecasting, offering insights into underlying physical mechanisms.
- Demonstration of successful transfer of learning from a multi-millennial paleoclimate simulation to accurate real-world modern-era forecasts.
- Introduction of an optimisation-based feature selection method that autonomously identifies key predictors without relying on pre-selected or known drivers, thereby reducing human bias and prior knowledge limitations.
Funding
- EU-funded Climate Intelligence (CLINT) project, Grant Agreement 101003876 (doi: 10.3030/101003876).
- "Agencia Estatal de Investigación (España)", Spanish Ministry of Science, Innovation and Universities NEXO Project, grant ref.: PID2023-150663NB-C21.
- Beatriu de Pinós program (2022 BP 00227) and the Ministry of Research and Universities of the Government of Catalonia (for Verónica Torralba).
Citation
@article{McAdam2025Feature,
author = {McAdam, Ronan and Pérez-Aracil, Jorge and Squintu, Antonello and Peláez-Rodríguez, César and Hansen, Felicitas and Torralba, Verónica and Loukos, Harilaos and Zorita, Eduardo and Giuliani, Matteo and Cavicchia, Leone and Salcedo‐Sanz, Sancho and Scoccimarro, Enrico},
title = {Feature selection for data-driven seasonal forecasts of European heatwaves},
journal = {Communications Earth & Environment},
year = {2025},
doi = {10.1038/s43247-025-02863-4},
url = {https://doi.org/10.1038/s43247-025-02863-4}
}
Original Source: https://doi.org/10.1038/s43247-025-02863-4