Askari et al. (2025) A novel entropy-based machine learning frame work for flood risk mapping in Pakistan
Identification
- Journal: Journal of Hydrology Regional Studies
- Year: 2025
- Date: 2025-11-08
- Authors: Komelle Askari, Wende Zheng, Shangyu Shi, Jae-Woo Chu, Fei Wang
- DOI: 10.1016/j.ejrh.2025.102911
Research Groups
- State Key Laboratory of Soil and Water Conservation and Desertification Control, College of Soil and Water Conservation Science and Engineering, Northwest A&F University, Yangling, Shaanxi Province, China
- Institute of Soil and Water Conservation, Chinese Academy of Sciences and Ministry of Water Resources, Yangling, Shaanxi Province, China
- University of Chinese Academy of Sciences, Beijing, China
- Key Laboratory of Agricultural Soil and Water Engineering in Arid and Semiarid Areas, Ministry of Education, Northwest A&F University, Yangling, Shaanxi Province, China
- College of Water Resources and Architectural Engineering, Northwest A&F University, Yangling, Shaanxi Province, China
Short Summary
This study develops the District-level Flood Risk Assessment Model (D-FRAM), a three-tiered framework integrating satellite data, national surveys, and machine learning to map spatial and temporal flood risk across Pakistan. It identifies critical hotspots and provides monthly flood risk profiles, with XGBoost proving most effective for susceptibility prediction and flood frequency as the primary risk determinant.
Objective
- To develop and validate the District-level Flood Risk Assessment Model (D-FRAM), a novel framework integrating machine learning-based flood susceptibility, data-driven social vulnerability, and satellite-observed inundation dynamics.
- To generate detailed, spatially specific, and temporally adaptive monthly flood risk profiles for Pakistan to inform targeted flood risk management strategies.
Study Configuration
- Spatial Scale: National (Pakistan), disaggregated to district level, with spatial datasets standardized to 500 meter resolution.
- Temporal Scale: Flood inundation and precipitation data from 2010–2023; socioeconomic data from 2017 and 2019–2020; monthly flood risk estimation for monsoon season (June–October).
Methodology and Data
- Models used:
- Supervised Machine Learning: Random Forest (RF), Extreme Gradient Boosting (XGBoost), Gradient Boosting Machine (GBM) for Flood Susceptibility Index (FSI).
- Unsupervised Machine Learning: Hybrid Self-Organizing Map (SOM) and Uniform Manifold Approximation and Projection (UMAP) for Flood Vulnerability Index (FVI) clustering.
- Feature Selection: Mutual Information (MI).
- Risk Integration: Entropy-based weighting method for Flood Risk Index (FRI) calculation.
- Data sources:
- Satellite: Moderate Resolution Imaging Spectroradiometer (MODIS) MOD09A1 Version 6.1 (surface reflectance, Enhanced Vegetation Index, Flood Frequency, Maximum Flood extent), CHIRPS v2.0 (cumulative average maximum monsoonal precipitation), NASA’s Shuttle Radar Topography Mission (SRTM) Digital Elevation Model (DEM) (altitude, slope, plan curvature, profile curvature, drainage density, distance to rivers, flow accumulation, Stream Power Index, Sediment Transport Index, Topographic Wetness Index).
- Observation/Survey: Pakistan Social and Living Standards Measurement (PSLM) survey 2019–2020 (literacy, mobile phone ownership, internet usage, household technology access, electricity access, piped gas availability, safe drinking water sources, improved sanitation facilities, housing quality), Pakistan Bureau of Statistics (PBS) 2017 National Census (population density).
- Flood Inventory: 600 labeled points (300 flood, 300 non-flood locations).
Main Results
- XGBoost outperformed other models for Flood Susceptibility Index (FSI) prediction, achieving an Area Under Curve (AUC) of 0.956 and an accuracy of 89.44 %.
- Districts in eastern Balochistan (Jaffarabad, Sohbatpur) and northern Sindh (Jacobabad, Larkana, Kashmore) were identified as most flood-prone, with over 90 % of their land classified as "Very High" FSI; Jaffarabad had the highest at 96.47 %. Sindh province showed the largest provincial "Very High" susceptibility at 50.30 %.
- The SOM-UMAP framework effectively identified socioeconomic vulnerabilities, with Sibi (1.0), Kech (0.9197), Mastung (0.8914), and Nushki (0.8680) in Balochistan, and Shahdad Kot (Sindh) exhibiting the highest Flood Vulnerability Index (FVI) scores. Sindh (0.5142) and Balochistan (0.5105) had the highest average provincial FVI.
- The Risk Typology Matrix revealed Sindh with the highest national share of Critical Hotspots (48.3 % of its districts), while Punjab had the highest proportion of Physical Risk districts (44.4 %). Balochistan showed significant Latent Risk (57.7 % of its districts).
- Temporal analysis of the Flood Risk Index (FRI) showed a peak national average in August (0.2418), coinciding with the monsoon peak. Flood Frequency (FF) consistently held the highest entropy weight (ranging from 0.4950 to 0.5848), indicating its primary role in risk determination.
- Sujjawal in Sindh was identified as the most flood-prone district, exhibiting the highest FRI in three of five monsoon months, with a peak value of 0.8765 in October. Kashmore (Sindh) and Sheikhupura (Punjab) also showed high FRI in August (0.8603) and July (0.8084), respectively.
Contributions
- Introduces D-FRAM, a novel, reproducible, and spatially disaggregated framework that integrates supervised and unsupervised machine learning with real-world survey data and satellite observations for comprehensive flood risk assessment.
- Provides a dynamic, monthly flood risk assessment, capturing temporal fluctuations and identifying critical hotspots and resilient areas across Pakistan, which is crucial for anticipatory actions.
- Utilizes an entropy-based weighting method for objective integration of susceptibility, vulnerability, and inundation dynamics, enhancing precision and reducing reliance on subjective expert opinions.
- Develops a Risk Typology Matrix that translates continuous risk indicators into actionable classifications, facilitating targeted preparedness planning and investment strategies.
- Highlights the critical role of socioeconomic vulnerability, often underexplored in flood risk studies, by integrating nationally representative survey data into the flood risk mapping process.
Funding
- International Partnership Program of the Chinese Academy of Sciences (Grant No.16146kysb20200001)
- National Natural Science Foundation of China (NSFC) (Grant No. 42177344 and U2243213)
- 111 Project (Grant No. B20052)
- Natural Science Basic Research Program of Shaanxi (Program No. 2024JC-YBQN-0297)
Citation
@article{Askari2025novel,
author = {Askari, Komelle and Zheng, Wende and Shi, Shangyu and Chu, Jae-Woo and Wang, Fei},
title = {A novel entropy-based machine learning frame work for flood risk mapping in Pakistan},
journal = {Journal of Hydrology Regional Studies},
year = {2025},
doi = {10.1016/j.ejrh.2025.102911},
url = {https://doi.org/10.1016/j.ejrh.2025.102911}
}
Original Source: https://doi.org/10.1016/j.ejrh.2025.102911