Bouregaa (2025) Comparative evaluation of machine learning models for regional agricultural drought prediction in Algeria using SHAP analysis
Identification
- Journal: Natural Hazards
- Year: 2025
- Date: 2025-11-17
- Authors: Tarek Bouregaa
- DOI: 10.1007/s11069-025-07719-w
Research Groups
- Plant and Animal Production Improvement Laboratory, Department of Agronomy, Faculty of Nature and Life Sciences, University Ferhat Abbas Setif1, Setif, Algeria
Short Summary
This study comparatively evaluated eight machine learning models for regional agricultural drought prediction in Algeria, finding that optimal model performance is highly dependent on region and timescale, and that efficient feature selection can maintain accuracy while SHAP analysis reveals key climate drivers.
Objective
- To systematically evaluate eight machine learning algorithms (LASSO, kNN, Decision Trees, Random Forest, Gradient Boosting Machines, Support Vector Machines, Adaptive Boosting, and Artificial Neural Networks) for predicting the Agricultural Standardized Precipitation Index (aSPI) at 3, 6, and 9-month timescales in Algeria’s four primary cereal-producing regions.
- To identify and rank the most influential climatic drivers of aSPI predictions using SHapley Additive exPlanations (SHAP) analysis.
- To assess the trade-off between model complexity and accuracy by examining the effect of feature reduction on prediction performance.
- Principal Hypothesis: Reduced feature sets, when combined with advanced machine learning algorithms, can achieve predictive performance for the Agricultural Standardized Precipitation Index (aSPI) comparable to or exceeding that of models using the full set of climatic variables across 3, 6, and 9-month timescales.
Study Configuration
- Spatial Scale: Four primary cereal-producing regions in the High Plateaus of northern Algeria: Oum El Bouaghi, Setif, Sidi Bel Abbes, and Tiaret.
- Temporal Scale: Climate data from 1982 to 2021. Drought prediction for aSPI at 3-month (October-December), 6-month (October-February), and 9-month (October-June) accumulation periods.
Methodology and Data
- Models used: Least Absolute Shrinkage and Selection Operator (LASSO), k-Nearest Neighbor (kNN), Decision Trees (DT), Random Forest (RF), Gradient Boosting Machines (GBM), Support Vector Machines (SVM), Adaptive Boosting (AdaBoost), and Artificial Neural Networks (ANN). SHAP (SHapley Additive exPlanations) was used for model interpretability.
- Data sources: AgERA5 reanalysis dataset (refined ERA5 data from ECMWF) accessed via the FAO-AQUASTAT Climate Information Tool.
- Variables: Precipitation (P, in millimeters), minimum temperature (Tmin, in degrees Celsius), maximum temperature (Tmax, in degrees Celsius), mean temperature (Tmean, in degrees Celsius), relative humidity (RH, in percent), sunshine (S, in joules per square meter per day), wind speed at 2 meters (WS, in meters per second), and reference evapotranspiration (ETo, in millimeters).
- Spatial Resolution: 0.1° × 0.1° spatial grid.
- Feature Selection: RReliefF method was used to rank variables, creating three scenarios: all climate inputs, top 5 best-ranked variables, and top 3 best-ranked variables.
Main Results
- Optimal model performance was highly region- and timescale-specific, indicating no single universally best model.
- Artificial Neural Networks (ANN) demonstrated strong overall performance, particularly for aSPI9 prediction (R² > 0.96), while other models like Gradient Boosting, SVM, and Random Forest were optimal for specific regions and forecasting horizons.
- Models using reduced input features (Scenario 3: top 3 variables) often retained or improved accuracy compared to using the full dataset (Scenario 1), highlighting the value of efficient feature selection for model parsimony and generalizability.
- SHAP analysis confirmed precipitation (P) as the primary drought driver across all models, regions, and timescales.
- SHAP also revealed model-specific sensitivities to secondary variables: Sunshine (S) and Relative Humidity (RH) were significant in more complex models (e.g., kNN, ensemble models), particularly during mispredictions, suggesting non-linear relationships. Temperature-related variables (Tmax, Tmin, Tmean) were important in Sidi Bel Abbes and Tiaret.
- Uncertainty analysis showed kNN often robust, while SVM and AdaBoost sometimes exhibited high variability. Neural Networks showed exceptional stability for aSPI9 in Setif and Tiaret.
- Multi-index event analysis of prediction errors indicated kNN was consistently the most error-prone. Models frequently over-predicted drought during hydrological recovery phases, suggesting a lag in capturing rapid transitions from dry to wet states.
Contributions
- First systematic comparison of eight diverse machine learning models for agricultural drought prediction across varying timescales, regions, and feature selection strategies in Algeria.
- Integration of SHapley Additive exPlanations (SHAP) for model interpretability, providing granular insights into the influence of climatic drivers on drought predictions, moving beyond "black-box" models.
- Demonstration that optimal model choice is fundamentally contingent on regional climatic characteristics and the specific forecasting objective, advocating for tailored rather than universal modeling approaches.
- Validation of the effectiveness of reduced feature sets in maintaining or improving predictive accuracy, leading to more computationally efficient and interpretable models.
- Provision of a valuable framework for interpretable and regionally calibrated drought forecasting, supporting the development of early warning systems and resource planning for sustainable agricultural water management in Algeria's semi-arid cereal-growing zones.
Funding
No funding was received for conducting this study.
Citation
@article{Bouregaa2025Comparative,
author = {Bouregaa, Tarek},
title = {Comparative evaluation of machine learning models for regional agricultural drought prediction in Algeria using SHAP analysis},
journal = {Natural Hazards},
year = {2025},
doi = {10.1007/s11069-025-07719-w},
url = {https://doi.org/10.1007/s11069-025-07719-w}
}
Original Source: https://doi.org/10.1007/s11069-025-07719-w