Min et al. (2025) Hydrological drought prediction and its influencing features analysis based on a machine learning model
Identification
- Journal: Natural hazards and earth system sciences
- Year: 2025
- Date: 2025-11-04
- Authors: Li Min, Yuhang Yao, Zilong Feng, Ming Ou
- DOI: 10.5194/nhess-25-4299-2025
Research Groups
- College of Hydraulic Science and Engineering, Yangzhou University, Yangzhou, China
- State Key Laboratory of Water Disaster Prevention, Nanjing, China
- JiLin Province Water Resource and Hydropower Consultative Company of P.R CHINA, Changchun, China
Short Summary
This study develops an interpretable machine learning framework using XGBoost and SHAP to predict hydrological drought in the Huaihe River Basin, China, achieving 79.9% overall accuracy and identifying the Standard Precipitation Index (SPI) as the most influential feature.
Objective
- To predict hydrological drought in the Huaihe River Basin using an XGBoost model, incorporating 26 monthly and 18 seasonal features, and evaluating its performance using precision and recall.
- To analyze the influence of different drought variables on the model's predictive results and gain insights into its outputs using various SHAP plots.
Study Configuration
- Spatial Scale: Huaihe River Basin, China (111°55′–121°25′ E, 30°55′–36°36′ N), covering approximately 270,000 km², divided into 28 grid regions at a 1° latitude × 1° longitude resolution.
- Temporal Scale: Data from 1960 to 2014; training period from 1960 to 2003, prediction period from 2004 to 2014. Predictions were made with a 1-month lead time, using sliding windows of 12 months for monthly data and 3 months for seasonal data.
Methodology and Data
- Models used:
- Extreme Gradient Boosting (XGBoost) for multi-input single-output regression prediction.
- Shapley Additive Explanations (SHAP) for model interpretability and variable importance analysis.
- Standardized Precipitation Index (SPI) to characterize meteorological drought.
- Standardized Runoff Index (SRI) to characterize hydrological drought.
- Data sources:
- GLDASNOAH10M_2.0: Monthly average precipitation, wind speed, temperature, evapotranspiration, runoff, 0–10 cm soil moisture, and 100–200 cm soil moisture.
- ERA5-Land reanalysis dataset: Monthly average 2 m dewpoint temperature, surface net solar radiation, surface net thermal radiation, surface pressure, and leaf area index.
- NOAA climate database: Large-scale climate indices (Nino3.4, AMO, TPI, PDO, AO, TNI, NP).
Main Results
- The XGBoost model achieved an overall precision of 79.9% in classifying hydrological drought categories.
- The model performed exceptionally well for "Normal" (ND) and "Mild drought" (D1) categories, with precision rates of 88% and 74%, and recall rates of 91% and 78%, respectively. The Heidke Skill Score (HSS) for ND was 0.77.
- Prediction performance for "Moderate drought" (D2) and "Severe/Extreme drought" (D3) categories was relatively poorer, with D3 having a recall rate not exceeding 0.5, indicating limited sensitivity for severe events despite high precision (86%).
- Spatially and temporally, the model effectively captured drought patterns in 2011, though it sometimes underestimated drought severity and lacked clarity in distinguishing boundaries between different drought categories.
- SHAP analysis identified the Standardized Precipitation Index (SPI) as the dominant influencing feature for hydrological drought across the Huaihe River Basin, with absolute average SHAP values not less than 0.147 for monthly predictions.
- Large-scale climate features (especially AMO) were the secondary major influence, particularly in the north-central basin, while soil moisture content and evapotranspiration also played significant roles.
- Seasonally, SPI-3 remained the most influential feature (absolute average SHAP values: 0.360 in spring, 0.261 in summer, 0.169 in autumn, 0.247 in winter). Soil moisture content and evapotranspiration were significant in spring and autumn, while large-scale climatic features and air temperature were more critical in summer and winter, and radiative fluxes (surface net thermal/solar radiation) gained importance in winter.
Contributions
- Introduces a novel interpretable machine learning framework combining XGBoost and SHAP for hydrological drought prediction, enhancing both prediction accuracy and transparency regarding feature contributions.
- Disentangles the hierarchical influence of various features (including SPI, large-scale climate indices, and soil moisture) on hydrological drought, providing a comprehensive understanding beyond just precursor relationships.
- Offers valuable decision support for regional drought management and water resource allocation by enabling prioritization of high-weight features for real-time warnings and identification of early risk signals for long-term planning.
Funding
- Open Research Fund Program of National Key Laboratory of Water Disaster Prevention (grant no. 2024490711)
- Yangzhou University Graduate Student Research and Practice Innovation Program Funding projects (grant no. SJCX24_2250)
- Natural Science Foundation of Jiangsu Province (grant no. BK20250906)
Citation
@article{Min2025Hydrological,
author = {Min, Li and Yao, Yuhang and Feng, Zilong and Ou, Ming},
title = {Hydrological drought prediction and its influencing features analysis based on a machine learning model},
journal = {Natural hazards and earth system sciences},
year = {2025},
doi = {10.5194/nhess-25-4299-2025},
url = {https://doi.org/10.5194/nhess-25-4299-2025}
}
Original Source: https://doi.org/10.5194/nhess-25-4299-2025