Bouaziz et al. (2025) A Century of Data: Machine Learning Approaches to Drought Prediction and Trend Analysis in Arid Regions
Identification
- Journal: Water
- Year: 2025
- Date: 2025-12-16
- Authors: Moncef Bouaziz, Mohamed Amine Abid, Emna Medhioub, André John
- DOI: 10.3390/w17243567
Research Groups
- Institute for Mine Surveying and Geodesy, Freiberg University of Technology, Germany (Moncef Bouaziz, André John)
- National School of Computer Science, University of Manouba, Tunisia (Mohamed Amine Abid)
- National School of Engineering of Sfax, University of Sfax, Tunisia (Emna Medhioub)
Short Summary
This study systematically evaluated four machine learning models for multi-scale Standardized Precipitation Index (SPI) tracking and prediction in southeastern Tunisia using a century-long dataset, finding Support Vector Regression (SVR) to be superior and revealing upward trends in longer-term SPIs indicating decreasing drought severity.
Objective
- To systematically evaluate the performance of four machine learning approaches (Support Vector Regression, Random Forest, K-Nearest Neighbor, and Linear Regression) for tracking and predicting the Standardized Precipitation Index (SPI) at multiple temporal scales (1, 3, 6, 9, 12, 18, and 24 months).
- To assess the presence of significant trends in the monthly SPI series using the Mann–Kendall trend test to provide insights for regional water resource management.
Study Configuration
- Spatial Scale: Sfax region, southeastern Tunisia, focusing on the Ezzitouna Meteorological Station (34°43′ N, 10°45′ E). Temperature, potential evapotranspiration, and soil moisture data were aggregated to a 0.25° (~25 km) spatial resolution.
- Temporal Scale: 100 years (1917–2017) for precipitation data. Temperature and potential evapotranspiration data covered 30 years (1991–2020). Soil moisture data spanned 1981–2020. SPI was computed at 1, 3, 6, 9, 12, 18, and 24-month timescales.
Methodology and Data
- Models used:
- Machine Learning: Support Vector Regression (SVR), Random Forest (RF), K-Nearest Neighbor (kNN).
- Statistical Test: Mann–Kendall trend test (for trend analysis).
- Drought Index: Standardized Precipitation Index (SPI).
- Data sources:
- Precipitation data: Ezzitouna Meteorological Station, Sfax, Tunisia (1917–2017), obtained from the Tunisian Ministry of Agriculture and Water Resources / Tunisian Directorate of Agricultural Engineering (DGACTA).
- Monthly average temperature and potential evapotranspiration (PET) data: National Institute of Meteorology (INM) and FAO CLIMWAT database (1991–2020).
- Soil moisture data: ESA Climate Change Initiative (CCI) Soil Moisture dataset (1981–2020).
Main Results
- Trend Analysis: The Mann–Kendall test revealed statistically significant upward trends in SPI 12, SPI 18, and SPI 24, indicating a gradual shift towards wetter conditions over longer time scales. For instance, SPI-24 showed an increase of approximately 0.025 SPI units per decade. No statistically significant trends were found for SPI 1, SPI 3, SPI 6, and SPI 9.
- Model Performance: All machine learning models (SVR, RF, kNN) demonstrated robust predictive performance for SPI forecasting. Support Vector Regression (SVR) consistently outperformed other models, achieving the highest accuracy across both short- and long-term time horizons. For SPI-12, SVR achieved a Root Mean Square Error (RMSE) of 0.37, Mean Square Error (MSE) of 0.14, correlation coefficient (r) of 0.93, and coefficient of determination (R²) of 0.85.
- Predictor Variables: For a one-month forecasting lead time, the inclusion of additional hydroclimatic variables (temperature, potential evapotranspiration, soil moisture) did not significantly improve model performance compared to using only lagged precipitation, suggesting the memory effect within the precipitation series is dominant.
- Drought Patterns: Histograms of SPI values showed a concentration between -2 and 0, particularly for SPI-1 and SPI-3, indicating recurrent drier-than-normal conditions at shorter timescales. Longer SPI timescales (SPI-12 to SPI-24) provided a smoother, broader perspective on drought conditions.
Contributions
- Conducted a rigorous and comprehensive comparative analysis of three advanced machine learning algorithms (RF, SVR, kNN) for multi-scale SPI forecasting (1 to 24 months) in a semi-arid Mediterranean climate, addressing a critical research gap.
- Utilized a unique century-long precipitation dataset, enhancing the robustness of drought trend analysis and forecasting efforts.
- Identified Support Vector Regression (SVR) as the superior model for drought prediction in the region, providing critical insights for regional water resource management and agricultural planning.
- Demonstrated that for short-term drought forecasting, the inherent memory of precipitation series is the primary predictive factor, with auxiliary hydroclimatic variables offering limited additional gain.
Funding
- This research received no external funding.
Citation
@article{Bouaziz2025Century,
author = {Bouaziz, Moncef and Abid, Mohamed Amine and Medhioub, Emna and John, André},
title = {A Century of Data: Machine Learning Approaches to Drought Prediction and Trend Analysis in Arid Regions},
journal = {Water},
year = {2025},
doi = {10.3390/w17243567},
url = {https://doi.org/10.3390/w17243567}
}
Original Source: https://doi.org/10.3390/w17243567