Panagiotou et al. (2026) Investigating the mechanisms of flood susceptibility with the use of multi-basin machine learning models in data-scarce environments in Cyprus
Identification
- Journal: Journal of Hydrology Regional Studies
- Year: 2026
- Date: 2026-01-06
- Authors: Constantinos F. Panagiotou, Giorgia Guerrisi, Davide De Santis, Fabio Del Frate, Marios Tzouvaras
- DOI: 10.1016/j.ejrh.2025.103075
Research Groups
- Eratosthenes Centre of Excellence, Limassol, Cyprus
- Department of Civil Engineering and Computer Science Engineering, Tor Vergata University of Rome, Rome, Italy
Short Summary
This study developed and compared multi-basin machine learning models (SVM, XGBoost, RF, MLP) for flood susceptibility assessment in data-scarce environments in Cyprus. It demonstrated that simplified Random Forest models, utilizing key topographical and land-use features, can provide rapid and accurate predictions for effective flood risk management.
Objective
- To develop multi-basin, within-region generalized machine learning models (MLMs) for flood susceptibility by merging data from eight watersheds and testing them on an "unseen" watershed in Cyprus.
- To comprehensively assess flood susceptibility by developing and comparing multiple MLMs and analyzing the influence of driving factors on flood emergence.
- To determine the minimum number of features required for developing simplified, cost-effective, and fast screening models suitable for data-scarce environments.
Study Configuration
- Spatial Scale: The island of Cyprus, focusing on nine small-scale watersheds (typically less than 100 square kilometers). Eight watersheds were used for training, and the Geroskipou watershed was used for testing.
- Temporal Scale: Flood susceptibility assessment based on physical characteristics. Input data included daily rainfall from 2011 to 2023/2024 and air temperature data from 2011 to 2024. Flood inventory data corresponded to a 500-year return period worst-case scenario.
Methodology and Data
- Models used: Support Vector Machine (SVM), Extreme Gradient Boosting (XGBoost), Random Forest (RF), Multilayer Perceptron (MLP).
- Data sources:
- Flood-related factors (7 features):
- Rainfall Intensity (RI): Daily rainfall data from 90 ground-based stations (2011–2023), calculated as modified Fournier index (dMFI).
- Euclidean distance from the drainage network (DD): Geoportal of Water Development Department.
- Terrain Elevation (TE): Digital Elevation Model (DEM) with 25 meter resolution, Department of Land and Surveys (version 2023).
- Terrain Slope (TS): Derived from DEM.
- Soil (texture and depth): Geological Survey Department.
- Land Use and Land Cover (LULC): CORINE Land Cover 2018 (version 2020_20u1).
- Flow Accumulation (FA): Derived from DEM.
- Flood Inventory Database: Geospatial information on high-risk flood areas (500-year return period) from the geoportal of the national water authorities (WDD, 2023), supplemented by an expected flood susceptibility map (Panagiotou, 2025a).
- Training Dataset: 46,299 data points (31,311 non-flooded, 14,988 flooded) from eight watersheds.
- Test Dataset: 898 data points (449 for each class) from the Geroskipou watershed.
- Validation: Leave-One-Cluster-Out (LOCO) cross-validation strategy for training, and an independent test dataset for external validation.
- Flood-related factors (7 features):
Main Results
- All four MLMs achieved high performance, with Balanced Accuracy (BA) and F1 scores exceeding 0.95 for both training and test datasets.
- The SVM model exhibited the highest performance on the test dataset (BA = 0.995, F1 score = 0.996, Brier score = 0.003), closely followed by RF and MLP.
- Random Forest (RF) was selected for compiling multi-level susceptibility maps due to its robustness, ability to capture complex nonlinear relationships, and an optimal probability threshold (0.48) facilitating map interpretation.
- Feature importance analysis consistently identified Land Use and Land Cover (LULC), Terrain Slope (TS), Terrain Elevation (TE), and Flow Accumulation (FA) as the most significant flood-related factors across all MLMs. Rainfall Intensity (RI), Distance from Drainage Network (DD), and Soil properties showed minor influence.
- Approximately half (49.9%) of the Geroskipou watershed was classified as highly susceptible to flooding (36.9% "Very-High", 13% "High"), predominantly in urban and semi-urban coastal areas characterized by low terrain elevation and slope, and high LULC scores.
- Simplified RF models, built using only the four most important features (LULC, TS, TE, FA), achieved a Balanced Accuracy of 0.951 and a Brier score of 0.033 on the test dataset, demonstrating their effectiveness as computationally efficient flood screening tools.
- Partial dependence plots revealed critical threshold values: LULC scores greater than 0.8, terrain slopes less than 50%, flow accumulation values around 5.8, and terrain elevations below 350 meters above sea level were associated with increased flood susceptibility.
Contributions
- First study to apply multi-basin machine learning algorithms for flood susceptibility assessment in Cyprus, including explicit external-basin testing.
- Provides an operational and parsimonious methodology for flood susceptibility management in data-scarce environments.
- Implemented hyperparameter optimization for all MLMs, ensuring fair comparison, a step often omitted in similar studies.
- Assessed functional relationships between major input variables and flood susceptibility, identifying critical threshold values for key flood indicators.
- Demonstrated that shallow MLMs (e.g., RF, SVM) can serve as rapid, computationally efficient flood screening tools, offering a viable alternative to complex, data-intensive deep learning or physically-based models, especially in data-scarce regions.
- Offered a data-driven approach to feature importance analysis, reducing the influence of subjective expert opinions in decision-making.
Funding
- European Union’s Horizon Europe Framework Programme HORIZON-WIDERA-2021-ACCESS-03 (Grant Agreement No. 101079468) for the AI-OBSERVER project.
- European Union’s Horizon 2020 research and innovation programme (Grant Agreement No. 857510) for the EXCELSIOR H2020 Widespread Teaming project.
- Government of the Republic of Cyprus through the Directorate General for the European Programmes, Coordination and Development.
- Cyprus University of Technology.
Citation
@article{Panagiotou2026Investigating,
author = {Panagiotou, Constantinos F. and Guerrisi, Giorgia and Santis, Davide De and Frate, Fabio Del and Tzouvaras, Marios},
title = {Investigating the mechanisms of flood susceptibility with the use of multi-basin machine learning models in data-scarce environments in Cyprus},
journal = {Journal of Hydrology Regional Studies},
year = {2026},
doi = {10.1016/j.ejrh.2025.103075},
url = {https://doi.org/10.1016/j.ejrh.2025.103075}
}
Original Source: https://doi.org/10.1016/j.ejrh.2025.103075