Baishya et al. (2025) An Easy-to-Apply Machine Learning Framework for Hydrologic Evaluation of Ungauged Catchments
Identification
- Journal: Water Resources Management
- Year: 2025
- Date: 2025-12-29
- Authors: Bhaswatee Baishya, Arup Kumar Sarma
- DOI: 10.1007/s11269-025-04396-z
Research Groups
- Department of Civil Engineering, Indian Institute of Technology Guwahati, Guwahati, Assam, India
Short Summary
This study developed a novel machine learning framework for streamflow regionalization in ungauged catchments by integrating Curve Number (CN) and specific discharge normalization, demonstrating superior performance over a conventional SWAT parameter-transfer method in Northeast India. The LightGBM model, incorporating dynamic CN and specific discharge scaling, achieved significantly higher accuracy and lower bias in the ungauged target basin.
Objective
- To develop and evaluate an easy-to-apply machine learning framework for streamflow regionalization in ungauged catchments, integrating Curve Number (CN) and specific discharge normalization, and compare its performance against a conventional SWAT parameter-transfer method.
Study Configuration
- Spatial Scale: Catchment scale, focusing on the Deepor Beel Ramsar wetland in Northeast India. The study involved a donor basin (Basistha, 107.7 km²), an assumed target basin for validation (Bharalu, 35.1 km²), and a target ungauged basin (Pamohi, 156.11 km²).
- Temporal Scale: Monthly streamflow data from 1990 to 2012. Training period: 1990–2004. Testing/Validation period: 2005–2012. Land Use Land Cover (LULC) maps from 1992, 2002, and 2011 were used to derive dynamic Curve Number (CN) values.
Methodology and Data
- Models used:
- Machine Learning (ML) models: Support Vector Regression (SVR), Extreme Gradient Boosting (XGB), Random Forest (RF), LightGBM (LGBM), and Artificial Neural Networks (ANN).
- Hydrological model: Soil and Water Assessment Tool (SWAT).
- Data sources:
- Precipitation and Temperature: India Meteorological Department (IMD) Borjhar station and IMD gridded point data (26.00° N & 91.75° E).
- Streamflow: Water Resources Department, Guwahati East division office (monthly data for Basistha and Bharalu basins).
- Digital Elevation Model (DEM): 30-meter SRTM DEM from USGS Earth Explorer.
- Land Use Land Cover (LULC): Landsat images from 1992, 2002, and 2011.
- Soil data: FAO site.
- Catchment features: Basin shapefile (area, perimeter).
- Curve Number (CN) raster datasets derived from LULC and soil maps, with temporal interpolation.
Main Results
- Recursive Feature Elimination with Cross-Validation (RFECV) showed that tree-based ML models (RF, XGB, LGBM) consistently selected Curve Number (CN) as a key predictor, alongside precipitation and temperature. SVR used four hydrometeorological variables, while ANN used only one precipitation gauge.
- Sensitivity analysis of six CN estimation scenarios indicated that piece-wise linear interpolation from 1992 to 2002 yielded the best performance metrics (RMSE, R², NSE, PBIAS) in the donor basin.
- In the donor Basistha basin, the LightGBM (LGBM) model significantly outperformed other ML models and the calibrated SWAT model. LGBM achieved a testing R² of 0.847 and NSE of 0.847 with negligible bias (PBIAS = -0.23%). The SWAT model, despite meticulous calibration, underestimated peak flows (testing R² = 0.734, NSE = 0.713, PBIAS = -14.36%).
- Upon regionalization to the assumed target Bharalu basin, the ML-CN methodology (LGBM with specific discharge scaling) produced R² = 0.804, NSE = 0.804, and PBIAS = -2.73%. This significantly outperformed the conventional SWAT parameter-transfer method, which yielded R² = 0.708, NSE = 0.613, and PBIAS = +9.44%.
Contributions
- Development of a novel machine learning-based regionalization framework that integrates physically informed predictors (dynamic Curve Number) with specific discharge normalization for streamflow prediction in ungauged catchments.
- Demonstrated the critical role of Curve Number as a predictor in tree-based machine learning models for capturing catchment infiltration dynamics.
- Evaluation of multiple temporal Curve Number estimation scenarios, providing insights into the impact of land-use dynamics on model performance.
- Empirical evidence showing the superior performance of the proposed ML framework (specifically LightGBM) over a conventional physically-based model (SWAT) parameter-transfer approach for regionalization in data-scarce regions.
- Offers a practical, straightforward, and robust tool for accurate streamflow prediction, enhancing water resource management in ungauged and data-scarce basins.
Funding
The authors declare that no funds, grants, or other support were received during the preparation of this manuscript.
Citation
@article{Baishya2025EasytoApply,
author = {Baishya, Bhaswatee and Sarma, Arup Kumar},
title = {An Easy-to-Apply Machine Learning Framework for Hydrologic Evaluation of Ungauged Catchments},
journal = {Water Resources Management},
year = {2025},
doi = {10.1007/s11269-025-04396-z},
url = {https://doi.org/10.1007/s11269-025-04396-z}
}
Original Source: https://doi.org/10.1007/s11269-025-04396-z