Akinsoji et al. (2025) Ensemble Machine Learning-Based Feature Selection for Flood Susceptibility Mapping Under Climate and Land Use Change Scenarios
Identification
- Journal: Water Resources Management
- Year: 2025
- Date: 2025-12-29
- Authors: Adisa Hammed Akinsoji, Bashir Adelodun, Qudus Adeyi, Rahmon Abiodun Salau, Kyung Sook CHOI
- DOI: 10.1007/s11269-025-04425-x
Research Groups
- Department of Agricultural Civil Engineering, Kyungpook National University, Daegu, Korea
- Arusha Climate and Environmental Research Centre, Aga Khan University, Arusha, Tanzania
- School of Resource and Environmental Management, Simon Fraser University, Burnaby, Canada
- Department of Agricultural and Biosystems Engineering, University of Ilorin, Ilorin, Nigeria
- Institute of Agricultural Science & Technology, Kyungpook National University, Daegu, Korea
Short Summary
This study compares feature selection techniques with ensemble machine learning algorithms for flood susceptibility mapping in South Korea, integrating historical data, future climate projections (CMIP5/CMIP6), and land use change scenarios. It found that the Variance Inflation Factor (VIF) combined with Gradient Boosting (GB) achieved the highest accuracy (ROC-AUC: 0.93) and predicted increased flood exposure in urbanized, low-lying areas under future conditions.
Objective
- To conduct a comparative analysis of feature selection techniques (PCA, VIF, correlation) and ensemble machine learning algorithms (Random Forest, Gradient Boosting, Extra Trees) for flood susceptibility mapping under future climate and land use change scenarios.
- To generate future flood susceptibility maps with enhanced accuracy and reliability, offering critical insights into flood risk mitigation, management, preparedness, and response.
Study Configuration
- Spatial Scale: North Gyeongsang Province, South Korea, including Daegu Metropolitan City (area: 20,071.61 km²).
- Temporal Scale:
- Historical rainfall data: 1980–2023 (monthly monsoon), 1975–2017 (daily for bias correction).
- Historical Land Use and Land Cover (LULC): 2001, 2023.
- Future LULC simulations: 2045, 2067.
- Climate projections: Near-future (2023–2060), Far-future (2061–2100) based on CMIP5 (RCP4.5, RCP8.5) and CMIP6 (SSP245, SSP585).
- Historical flood traces: 2002–2023.
Methodology and Data
- Models used:
- Ensemble Machine Learning: Random Forest (RF), Gradient Boosting (GB), Extra Trees (ET) for Flood Susceptibility Mapping (FSM).
- LULC Classification: Support Vector Classifier (SVC).
- Future LULC Simulation: Cellular Automata-Artificial Neural Network (CA-ANN) via MOLUSCE plugin.
- Feature Selection: Variance Inflation Factor (VIF), Principal Component Analysis (PCA), Correlation analysis.
- Spatial Interpolation: Ordinary Kriging for rainfall.
- Bias Correction (GCMs): Inverse distance weighting, Quantile delta mapping.
- Data sources:
- Satellite Imagery: Landsat 5 and Landsat 8 Thematic Mapper (TM) from USGS Earth Explorer for LULC (2001, 2023). Derived Normalized Difference Vegetation Index (NDVI), Normalized Difference Built-up Index (NDBI), and Normalized Difference Water Index (NDWI).
- Digital Elevation Model (DEM): ASTER DEM (30 m resolution) from USGS Earth Explorer. Derived aspect, curvature, flow accumulation, distance to river, elevation, hillshade, LS-factor, plan curvature, profile curvature, Stream Power Index (SPI), Topographic Position Index (TPI), Terrain Ruggedness Index (TRI), slope.
- Soil Data: FAO Soils Portal.
- Historical Rainfall: Korea Meteorological Agency (KMA) (1980–2023 monthly, 1975–2017 daily from 14/8 stations).
- Flood Inventory Map: Historical flood traces (2002–2023) from Korea Safety Map (150 flood, 150 non-flood points).
- Climate Projections: Daily rainfall data from 11 Global Climate Models (GCMs) under CMIP5 (RCP4.5, RCP8.5) and CMIP6 (SSP245, SSP585) scenarios.
Main Results
- Feature Selection: The Variance Inflation Factor (VIF) method was most effective, retaining 12 essential flood-influencing factors (including LULC, soil, rainfall) by reducing redundancy, outperforming PCA and correlation. Elevation, rainfall, and urbanization were identified as key predictors.
- LULC Changes: From 2001 to 2023, urban areas increased from 24.68% to 27.09%, and water bodies increased from 0.75% to 1.93%, while vegetation declined from 67.59% to 64.11%. Future projections (2045, 2067) suggest continued urbanization and agricultural expansion.
- LULC Classification Accuracy: Support Vector Classification (SVC) achieved high accuracy for LULC mapping (93.75% for 2001, 95.18% for 2023), with Kappa coefficients of 91.66% and 95.28% respectively.
- Future Rainfall Projections: Monsoon rainfall (June-August) is projected to drastically increase in the near-future (2024–2060) and far-future (2061–2100) compared to historical levels (1980–2023). Annual rainfall may double, with maximums reaching 173–257 mm and minimums 72–129 mm, indicating a high probability of more severe flood events. Mean anomalies ranged from 471.96 mm to 817.71 mm (p < 0.05).
- FSM Model Performance: Gradient Boosting (GB) achieved the highest accuracy (ROC-AUC: 0.93), followed by Random Forest (RF) (ROC-AUC: 0.875) and Extra Trees (ET) (ROC-AUC: 0.85). All algorithms showed high sensitivity (> 90%).
- Flood Susceptibility Distribution:
- GB classified over 12% of the region as high flood risk, particularly in densely urbanized and low-lying areas, with over 62% at very low or no risk.
- RF projected less than 3% high risk.
- ET indicated most areas (90%) would experience low-to-high risk levels.
- The southwestern region of North Gyeongsang Province is projected to have higher flood risk.
- Impact of Future Scenarios:
- RF showed minimal deviation from baseline predictions under future climate and LULC changes, indicating stability under moderate changes.
- ET was more sensitive, showing sustained growth in high-risk areas (up to 18.35%) under early anthropogenic pressures and expanding moderate/high-risk areas under prolonged stress.
- GB maintained large low-risk zones (~62%) but increased very high-risk areas from 12.64% to over 13%, indicating sensitivity to extremes.
Contributions
- Introduces a novel approach by integrating heterogeneous data (multi-scenario climate projections from CMIP5/CMIP6 and simulated land-use changes) into ensemble machine learning for Flood Susceptibility Mapping (FSM).
- Provides a comprehensive comparative analysis of feature selection techniques (VIF, PCA, correlation) in conjunction with ensemble machine learning algorithms (RF, GB, ET) to identify the most effective combination for FSM.
- Generates future FSMs with enhanced accuracy and reliability under projected climate and LULC scenarios, offering critical insights for adaptive flood risk management, spatial planning, and informed decision-making.
- Highlights the superior performance of the VIF feature selection method and the Gradient Boosting algorithm for FSM, especially under extreme climate conditions, while also identifying RF for reliable medium-impact predictions.
- Emphasizes the critical role of local data collection for historical flood tracing over public satellite imagery due to temporal mismatches, underscoring the limitations of global satellites for specific hazard management.
Funding
- Korea Institute of Planning and Evaluation for Technology in Food, Agriculture and Forestry (IPET) through the Intelligent Agricultural Infra Management for Climate Change Development Program, funded by the Ministry of Agriculture, Food and Rural Affairs (MAFRA) (Reference Code: RS-2025-02303335).
Citation
@article{Akinsoji2025Ensemble,
author = {Akinsoji, Adisa Hammed and Adelodun, Bashir and Adeyi, Qudus and Salau, Rahmon Abiodun and CHOI, Kyung Sook},
title = {Ensemble Machine Learning-Based Feature Selection for Flood Susceptibility Mapping Under Climate and Land Use Change Scenarios},
journal = {Water Resources Management},
year = {2025},
doi = {10.1007/s11269-025-04425-x},
url = {https://doi.org/10.1007/s11269-025-04425-x}
}
Original Source: https://doi.org/10.1007/s11269-025-04425-x