Talha et al. (2025) Robust Ensemble Machine Learning for Flash Flood Susceptibility Mapping Across Semiarid Regions
Identification
- Journal: Civil Engineering Journal
- Year: 2025
- Date: 2025-12-01
- Authors: Soukaina Talha, Ahmed Akhssas, Abdellatif Aarab, Ayoub Aabi, Badr Berkat, Said Amouch
- DOI: 10.28991/cej-2025-011-12-02
Research Groups
- Laboratory of Applied Geophysics, Geotechnics, Engineering Geology, and Environment, Mohammadia Engineering School, Mohammed V University in Rabat, Morocco.
- Laboratory for Water Analysis and Modeling of Natural Resources, Mohammadia Engineering School, Mohammed V University in Rabat, Morocco.
- Department of Earth Sciences, École Normale Supérieure, Mohammed V University in Rabat, Morocco.
Short Summary
This study aimed to enhance flash flood susceptibility mapping in Morocco's Assaka watershed using an ensemble of machine learning models, finding that the integrated approach significantly improved accuracy and identified key high-risk zones around Guelmim city and major river infrastructure.
Objective
- To improve flash flood susceptibility mapping accuracy through an ensemble of machine learning (ML) models.
- To quantify the influence of various environmental factors (e.g., altitude, slope, land use, soil moisture, lithology) on flood occurrence.
- To pinpoint the most vulnerable areas within the Assaka watershed for targeted risk mitigation.
- Principal Hypothesis: The ensemble model will outperform individual classifiers in predictive capability, providing a more robust and trustworthy flash flood susceptibility map.
Study Configuration
- Spatial Scale: Assaka watershed, southwestern Morocco, covering approximately 6,862 square kilometers. The analysis was performed on a raster grid with a spatial resolution of 30 meters by 30 meters.
- Temporal Scale: The study focuses on flash flood susceptibility, which represents long-term vulnerability. The flood inventory data, comprising over 1.5 million data points, was derived from a previous susceptibility map (Talha et al., 2019). Conditioning factors represent static or slowly changing environmental conditions (e.g., DEM, lithology) or snapshots from satellite imagery (e.g., Landsat-8 OLI for LST, SMI, LULC). Historical flood events (e.g., 1968, 1985, 1989, 2002, 2010, 2014) informed the context of the study area's vulnerability.
Methodology and Data
- Models used:
- Individual Machine Learning Models: Logistic Regression (LR), Multivariate Discriminant Analysis (MDA), Naïve Bayes (NB), Multilayer Perceptron (MLP).
- Ensemble Model: Voting Classifier, integrating the predictions of LR, MDA, NB, and MLP.
- Data sources:
- Flood Inventory: Binary flood-inventory map (target variable) derived from a Fuzzy Analytical Hierarchy Process (FAHP) susceptibility map (Talha et al., 2019), yielding 1,514,434 positive (flood-prone) cells.
- Conditioning Factors (14 environmental variables):
- Altitude (Digital Elevation Model - DEM): Global Data Explorer, 30 m resolution.
- Land Surface Temperature (LST): Satellite-derived (source not explicitly stated, but context implies remote sensing).
- Soil Moisture Index (SMI): Landsat-8 OLI satellite data, 30 m resolution.
- Soil type: Food and Agriculture Organization (FAO) data.
- Slope: Derived from DEM, 30 m resolution.
- Lithology: Geological maps (source not explicitly stated).
- Topographic Wetness Index (TWI): Derived from DEM.
- Aspect: Derived from DEM.
- Land Use and Land Cover (LULC): Landsat 8 OLI satellite imagery, 30 m resolution.
- Curvature: Derived from DEM.
- Drainage Density (DD): Derived from hydrographic network using QGIS.
- Topographic Position Index (TPI): Derived from DEM.
- Flow Accumulation (FA): Derived from DEM.
- Stream Power Index (SPI): Derived from DEM.
- Data Preprocessing: Raster images processed in GIS and converted to numerical data using Python libraries (Pandas, NumPy). Dataset randomly partitioned into 70% for training and 30% for testing, with class proportions preserved. Data standardization applied.
Main Results
- The Multilayer Perceptron (MLP) model achieved perfect predictive performance across all metrics (F1 Score = 1.0, Recall = 1.0, Accuracy = 1.0, Kappa = 1.0, AUC = 1.0, MSE = 0.0, Precision = 1.0).
- The Ensemble (Voting Classifier) model also demonstrated outstanding performance with high scores (Precision = 0.9679, Recall = 0.9667, Accuracy = 0.9667, F1 Score = 0.9646, Kappa = 0.8387, AUC = 1.0, MSE = 0.0).
- Logistic Regression (LR) and Multivariate Discriminant Analysis (MDA) showed good performance (AUC of 0.9808 and 0.9615, respectively), while Naive Bayes (NB) had the lowest performance (AUC of 0.8173).
- Sensitivity analysis (Jackknife test) identified the Digital Elevation Model (DEM) as the most influential factor across NB, LR, MLP, and Ensemble models. The MDA model uniquely identified the Soil Moisture Index (SMI) as the most critical determinant.
- Other highly influential factors included Land Surface Temperature (LST), Topographic Position Index (TPI), and Slope.
- The ensemble susceptibility map highlighted densely populated areas near Guelmim city and infrastructure along major rivers (Wadi Essayed and Oum Laachar) as the most prone to flash flooding.
- Low-permeability lithologies (e.g., sandstone, massive quartzite, clay-rich formations) strongly correlated with very high flood susceptibility, particularly around Guelmim city. Conversely, high-permeability alluvium and recent reg deposits exhibited low susceptibility.
- The Ensemble model provided a balanced distribution of susceptibility classes: 66.34% low, 14.77% moderate, 8.40% high, and 10.50% very high susceptibility.
Contributions
- Developed and applied a novel ensemble machine learning framework for flash flood susceptibility mapping, integrating Logistic Regression, Multivariate Discriminant Analysis, Naïve Bayes, and Multilayer Perceptron models.
- Presented the first study in Morocco’s semi-arid regions to integrate multiple ML models into an ensemble for flood mapping, addressing a significant knowledge gap.
- Demonstrated that the ensemble approach significantly reduces uncertainty and provides a more robust and reliable tool for flash flood risk prediction compared to individual models.
- Provided precise, data-driven flash flood susceptibility maps for the Assaka watershed, offering practical insights for improved drainage, early warning systems, and better land-use planning in high-risk zones.
- Quantified the influence of 14 environmental factors on flash flood occurrence, identifying key drivers such as DEM, SMI, LST, and TPI for the study area.
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
Citation
@article{Talha2025Robust,
author = {Talha, Soukaina and Akhssas, Ahmed and Aarab, Abdellatif and Aabi, Ayoub and Berkat, Badr and Amouch, Said},
title = {Robust Ensemble Machine Learning for Flash Flood Susceptibility Mapping Across Semiarid Regions},
journal = {Civil Engineering Journal},
year = {2025},
doi = {10.28991/cej-2025-011-12-02},
url = {https://doi.org/10.28991/cej-2025-011-12-02}
}
Original Source: https://doi.org/10.28991/cej-2025-011-12-02