Bhattarai et al. (2025) Ensemble learning for enhancing critical infrastructure resilience to urban flooding
Identification
- Journal: Scientific Reports
- Year: 2025
- Date: 2025-10-22
- Authors: Yogesh Bhattarai, Vijay Chaudhary, Curtis Walker, Rocky Talchabhadel, Sanjib Sharma
- DOI: 10.1038/s41598-025-20970-2
Research Groups
- Department of Civil and Environmental Engineering, Howard University, Washington, DC, USA
- Department of Computer Science and Electrical Engineering, Howard University, Washington, DC, USA
- National Center for Atmospheric Research, Boulder, CO, USA
- Department of Civil and Environmental Engineering, Jackson State University, Jackson, MS, USA
Short Summary
This study enhances urban road-network flood prediction in Washington, D.C., using ensemble machine learning models trained on crowd-sourced flood datasets, demonstrating that stacked super-ensemble learning significantly improves prediction accuracy (0.84) and identifies critical infrastructure exposure to high flood likelihood zones.
Objective
- To enhance road-network flood prediction using ensemble machine learning models trained on crowd-sourced flood datasets.
- To evaluate flood hazard to road networks connecting critical urban facilities and interpret the influence of flood conditioning factors using SHapley Additive exPlanations (SHAP).
Study Configuration
- Spatial Scale: Washington, D.C. area, with road network segments as the prediction unit. Geospatial data resolutions range from 10 meters to 1 kilometer.
- Temporal Scale: Flood events from 2006 to 2019 (including 10 rainfall events), with rainfall estimates over 24-hour periods during flooding events. Non-disrupted road networks were selected based on no reported flooding in the last five years.
Methodology and Data
- Models used:
- Base Learners: Random Forest, Support Vector Machine (SVM), Bagging, Boosting (AdaBoost, Gradient Boosting, CatBoost).
- Super Ensemble Learners: Voting (soft voting), Stacking (meta-learner: Logistic Regression).
- Explainable AI: SHapley Additive exPlanations (SHAP) for feature importance, applied to the AdaBoost model.
- Data sources:
- Crowd-sourced flood data: Compiled from local/regional news portals, archived reports, and X (formerly Twitter) for 150 disrupted road networks and 150 non-disrupted road networks in Washington, D.C.
- Geospatial data: Open Data DC (elevation, distance to stream, distance to combined sewer outfall, slope), National Soil Layer Database (curve number), National Land Cover Database (road surface roughness).
- Meteorological data: National Oceanic and Atmospheric Administration’s Multi-Radar/Multi-Sensor System (MRMS) for 24-hour rainfall estimates (1 km horizontal spacing, updated every 2 minutes).
- Critical infrastructure data: DCgov Open Data DC for educational, emergency, health, energy, and transportation services.
- Comparative data: Federal Emergency Management Agency (FEMA) Flood Insurance Rate Maps (100-year and 500-year floodplains).
Main Results
- Stacked super-ensemble learning achieved the highest predictive performance with an accuracy of 0.84, precision of 0.82, and F1-score of 0.82.
- AdaBoost and Gradient Boosting demonstrated the highest recall (0.84) and ROC AUC (0.92).
- Compared to Random Forest (baseline), stacking improved predictive accuracy by 6.33%, Kappa score by 19.30%, Precision by 13.89%, and F1-score by 6.49%.
- Hazard mapping revealed that 16.4% of Washington, D.C. road networks are in high or very high flood likelihood zones (11.21% high, 5.19% very high).
- FEMA's 100-year and 500-year flood zones captured only 12.28% and 19.37%, respectively, of the road networks identified within the high flood likelihood zone by the model, indicating significant underestimation of pluvial flooding.
- Critical infrastructure exposure: 66.7% of energy services and 44.4% of emergency services are located along high-hazard road segments. Additionally, 7.2% of educational, 6.7% of health, and 20% of transportation services are exposed.
- SHAP analysis identified elevation as the most influential predictor of road flood likelihood, followed by distance to combined sewer outfall and curve number. Rainfall played a less prominent role compared to topographic and infrastructural features.
Contributions
- Integrates pluvial (surface water) flooding into road flood likelihood estimation, complementing traditional riverine and coastal flood maps.
- Demonstrates the superior performance of ensemble machine learning, particularly stacking, for urban road network flood prediction using crowd-sourced datasets.
- Provides a comparative analysis of model-derived flood likelihood zones with existing FEMA floodplain maps, highlighting the limitations of traditional mapping for urban pluvial flooding.
- Quantifies the exposure of critical urban infrastructure (e.g., energy, emergency services) to high flood hazard road networks.
- Utilizes explainable artificial intelligence (SHAP) to interpret model predictions and identify the most influential flood conditioning factors, enhancing transparency and stakeholder confidence.
Funding
- Amazon
- U.S. National Science Foundation National Center for Atmospheric Research (Cooperative Agreement No. 1852977)
- Hydrological Impacts Computing, Outreach, and Resiliency Partnership (HICORPS) Project, in collaboration with the U.S. Army Engineer Research and Development Center (ERDC) and WOOLPERT-Taylor.
Citation
@article{Bhattarai2025Ensemble,
author = {Bhattarai, Yogesh and Chaudhary, Vijay and Walker, Curtis and Talchabhadel, Rocky and Sharma, Sanjib},
title = {Ensemble learning for enhancing critical infrastructure resilience to urban flooding},
journal = {Scientific Reports},
year = {2025},
doi = {10.1038/s41598-025-20970-2},
url = {https://doi.org/10.1038/s41598-025-20970-2}
}
Original Source: https://doi.org/10.1038/s41598-025-20970-2