Jain et al. (2025) Deriving hydrological inferences from a machine learning model to understand the physical drivers of flow duration curves
Identification
- Journal: Journal of Hydrology
- Year: 2025
- Date: 2025-11-30
- Authors: Shubham Jain, Dhruva Kathuria, Raghavan Srinivasan, Michael Schramm, Arun Bawa, Srinivasulu Ale, Jaehak Jeong, Michael J. White
- DOI: 10.1016/j.jhydrol.2025.134687
Research Groups
- Water Management and Hydrological Science, Texas A&M University, USA.
- Blackland Research and Extension Center, Texas A&M AgriLife Research, USA.
- NASA Goddard Space Flight Center, USA.
- GESTAR II, Morgan State University, USA.
- Texas Water Resources Institute, USA.
- Grassland Soil and Water Research Laboratory, USDA–ARS, USA.
Short Summary
This study utilizes Random Forest regression and SHapley Additive exPlanations (SHAP) to predict Flow Duration Curves (FDCs) across 991 watersheds in the contiguous United States. The research demonstrates that while climate attributes primarily determine the scale of FDCs, the baseflow index and geological features are the critical drivers of FDC shape and low-flow regimes.
Objective
- To develop an interpretable machine learning framework that predicts FDC quantiles and identifies the global, regional, and local physical drivers of streamflow variability.
Study Configuration
- Spatial Scale: Contiguous United States (CONUS), encompassing 991 least-disturbed watersheds across 19 USGS HUC-02 hydrologic regions.
- Temporal Scale: 30 years of daily streamflow data (1991–2020).
Methodology and Data
- Models used: Random Forest (RF) regression (18 individualized models for 17 exceedance percentiles and the FDC slope) and TreeSHAP for post-hoc model interpretability.
- Data sources: Daily mean streamflow from the USGS National Water Information System (NWIS); 18 watershed attributes (climatic, geological, topographical, and land cover) from the USGS GAGES-II dataset.
- Evaluation Metrics: Nash-Sutcliffe efficiency (NSE), Kling-Gupta Efficiency (KGE), Volumetric efficiency (VE), and Root Mean Square Logarithmic Error (RMSLE).
Main Results
- Predictive Performance: The RF model achieved high accuracy in mid-range flows ($Q5$ to $Q{50}$), with NSE and KGE values > 0.90. Performance was lower at the FDC tails ($Q{0.01}$ NSE = 0.61; $Q{99.99}$ NSE = 0.62).
- Primary Drivers: Climate attributes, specifically mean annual precipitation and the aridity index, were the most significant predictors across all quantiles, primarily influencing the FDC scale.
- Shape Controls: The Baseflow Index (BFI) and percentage of poorly drained soils were the dominant controls for the FDC slope and low-flow regimes ($Q{80}$–$Q{99.99}$).
- Regional Variability: SHAP values revealed distinct regional drivers, such as precipitation seasonality being a paramount factor in California, and snow percentage being critical in northern regions like New England and the Pacific Northwest.
Contributions
- Bridged the gap between "black-box" machine learning and hydrological theory by using SHAP to provide physically consistent explanations for model predictions.
- Provided a multi-scale (global, regional, and local) assessment of how watershed attributes interact to govern streamflow frequency distributions.
- Demonstrated that SHAP can serve as a "one-size-fits-all" metric for evaluating attribute importance and interactions in regionalization studies.
Funding
- Texas A&M AgriLife Research.
- Texas Water Resources Institute.
- Grassland Soil and Water Research Laboratory, USDA–ARS.
Citation
@article{Jain2025Deriving,
author = {Jain, Shubham and Kathuria, Dhruva and Srinivasan, Raghavan and Schramm, Michael and Bawa, Arun and Ale, Srinivasulu and Jeong, Jaehak and White, Michael J.},
title = {Deriving hydrological inferences from a machine learning model to understand the physical drivers of flow duration curves},
journal = {Journal of Hydrology},
year = {2025},
doi = {10.1016/j.jhydrol.2025.134687},
url = {https://doi.org/10.1016/j.jhydrol.2025.134687}
}
Generated by BiblioAssistant using gemini-3-flash-preview (Google API)
Original Source: https://doi.org/10.1016/j.jhydrol.2025.134687