Bhattarai et al. (2025) Ensemble learning for enhancing critical infrastructure resilience to urban flooding

Identification

Journal: Scientific Reports
Year: 2025
Date: 2025-10-22
Authors: Yogesh Bhattarai, Vijay Chaudhary, Curtis Walker, Rocky Talchabhadel, Sanjib Sharma
DOI: 10.1038/s41598-025-20970-2

Research Groups

Department of Civil and Environmental Engineering, Howard University, Washington, DC, USA
Department of Computer Science and Electrical Engineering, Howard University, Washington, DC, USA
National Center for Atmospheric Research, Boulder, CO, USA
Department of Civil and Environmental Engineering, Jackson State University, Jackson, MS, USA

Short Summary

This study enhances urban road-network flood prediction in Washington, D.C., using ensemble machine learning models trained on crowd-sourced flood datasets, demonstrating that stacked super-ensemble learning significantly improves prediction accuracy (0.84) and identifies critical infrastructure exposure to high flood likelihood zones.

Objective

To enhance road-network flood prediction using ensemble machine learning models trained on crowd-sourced flood datasets.
To evaluate flood hazard to road networks connecting critical urban facilities and interpret the influence of flood conditioning factors using SHapley Additive exPlanations (SHAP).

Study Configuration

Spatial Scale: Washington, D.C. area, with road network segments as the prediction unit. Geospatial data resolutions range from 10 meters to 1 kilometer.
Temporal Scale: Flood events from 2006 to 2019 (including 10 rainfall events), with rainfall estimates over 24-hour periods during flooding events. Non-disrupted road networks were selected based on no reported flooding in the last five years.

Methodology and Data

Models used:
- Base Learners: Random Forest, Support Vector Machine (SVM), Bagging, Boosting (AdaBoost, Gradient Boosting, CatBoost).
- Super Ensemble Learners: Voting (soft voting), Stacking (meta-learner: Logistic Regression).
- Explainable AI: SHapley Additive exPlanations (SHAP) for feature importance, applied to the AdaBoost model.
Data sources:
- Crowd-sourced flood data: Compiled from local/regional news portals, archived reports, and X (formerly Twitter) for 150 disrupted road networks and 150 non-disrupted road networks in Washington, D.C.
- Geospatial data: Open Data DC (elevation, distance to stream, distance to combined sewer outfall, slope), National Soil Layer Database (curve number), National Land Cover Database (road surface roughness).
- Meteorological data: National Oceanic and Atmospheric Administration’s Multi-Radar/Multi-Sensor System (MRMS) for 24-hour rainfall estimates (1 km horizontal spacing, updated every 2 minutes).
- Critical infrastructure data: DCgov Open Data DC for educational, emergency, health, energy, and transportation services.
- Comparative data: Federal Emergency Management Agency (FEMA) Flood Insurance Rate Maps (100-year and 500-year floodplains).

Main Results

Stacked super-ensemble learning achieved the highest predictive performance with an accuracy of 0.84, precision of 0.82, and F1-score of 0.82.
AdaBoost and Gradient Boosting demonstrated the highest recall (0.84) and ROC AUC (0.92).
Compared to Random Forest (baseline), stacking improved predictive accuracy by 6.33%, Kappa score by 19.30%, Precision by 13.89%, and F1-score by 6.49%.
Hazard mapping revealed that 16.4% of Washington, D.C. road networks are in high or very high flood likelihood zones (11.21% high, 5.19% very high).
FEMA's 100-year and 500-year flood zones captured only 12.28% and 19.37%, respectively, of the road networks identified within the high flood likelihood zone by the model, indicating significant underestimation of pluvial flooding.
Critical infrastructure exposure: 66.7% of energy services and 44.4% of emergency services are located along high-hazard road segments. Additionally, 7.2% of educational, 6.7% of health, and 20% of transportation services are exposed.
SHAP analysis identified elevation as the most influential predictor of road flood likelihood, followed by distance to combined sewer outfall and curve number. Rainfall played a less prominent role compared to topographic and infrastructural features.

Contributions

Integrates pluvial (surface water) flooding into road flood likelihood estimation, complementing traditional riverine and coastal flood maps.
Demonstrates the superior performance of ensemble machine learning, particularly stacking, for urban road network flood prediction using crowd-sourced datasets.
Provides a comparative analysis of model-derived flood likelihood zones with existing FEMA floodplain maps, highlighting the limitations of traditional mapping for urban pluvial flooding.
Quantifies the exposure of critical urban infrastructure (e.g., energy, emergency services) to high flood hazard road networks.
Utilizes explainable artificial intelligence (SHAP) to interpret model predictions and identify the most influential flood conditioning factors, enhancing transparency and stakeholder confidence.

Funding

Amazon
U.S. National Science Foundation National Center for Atmospheric Research (Cooperative Agreement No. 1852977)
Hydrological Impacts Computing, Outreach, and Resiliency Partnership (HICORPS) Project, in collaboration with the U.S. Army Engineer Research and Development Center (ERDC) and WOOLPERT-Taylor.

Citation

@article{Bhattarai2025Ensemble,
  author = {Bhattarai, Yogesh and Chaudhary, Vijay and Walker, Curtis and Talchabhadel, Rocky and Sharma, Sanjib},
  title = {Ensemble learning for enhancing critical infrastructure resilience to urban flooding},
  journal = {Scientific Reports},
  year = {2025},
  doi = {10.1038/s41598-025-20970-2},
  url = {https://doi.org/10.1038/s41598-025-20970-2}
}

Original Source: https://doi.org/10.1038/s41598-025-20970-2