Reddy et al. (2025) Exploring the Impact of Optimization Techniques on Streamflow Prediction Using XGBoost: A Comparative Analysis with Satellite and Reanalysis Precipitation Datasets
Identification
- Journal: Water Resources Management
- Year: 2025
- Date: 2025-12-26
- Authors: Nagireddy Masthan Reddy, Roshan Srivastav
- DOI: 10.1007/s11269-025-04417-x
Research Groups
- Department of Civil and Environmental Engineering, Indian Institute of Technology Tirupati, Tirupati, India
Short Summary
This study systematically compares the joint impact of eight precipitation datasets and five optimization techniques on the performance of an Extreme Gradient Boosting (XGBoost) model for one-day-ahead streamflow prediction in India's Godavari Basin. The research found that the combination of Simulated Annealing (SA) for hyperparameter tuning and the India Meteorological Department (IMD) precipitation dataset consistently yielded the most accurate and reliable streamflow forecasts.
Objective
- To compare the performance of multiple precipitation datasets and optimization methods for one-day-ahead streamflow forecasting using XGBoost.
- To identify robust dataset–algorithm pairings that enhance predictive accuracy.
- To evaluate the impact of these pairings on predictive accuracy during both training and testing phases in a complex hydrological region.
Study Configuration
- Spatial Scale: Wairagarh station, Pranhita subbasin of the Godavari River basin, Maharashtra, India. The drainage basin covers approximately 2,600 square kilometers, located between 80°05′ E and 80°40′ E longitude and 20°20′ N and 20°47′ N latitude, with elevations ranging from 208 meters to 660 meters.
- Temporal Scale: Daily data from 1998 to 2016 for one-day-ahead streamflow prediction.
Methodology and Data
- Models used:
- Machine Learning Model: Extreme Gradient Boosting (XGBoost).
- Optimization Algorithms for hyperparameter tuning: Ant Colony Optimization (ACO), Particle Swarm Optimization (PSO), Grid Search (GS), Randomized Search (RS), and Simulated Annealing (SA).
- Data sources:
- Precipitation: India Meteorological Department (IMD), Tropical Rainfall Measuring Mission (TRMM), Climate Hazards group Infrared Precipitation with Stations (CHIRPS), Climate Prediction Center MORPHing technique (CMORPH), Climate Prediction Center (CPC), Multi-Source Weighted-Ensemble Precipitation (MSWEP), Precipitation Estimation from Remotely Sensed Information Using Artificial Neural Networks-Climate Data Record (PERSIANN CDR), and Princeton Global Forcing (PGF).
- Streamflow: Daily data for Wairagarh station from the India Water Resources Information System (India-WRIS).
- Temperature: Daily maximum and minimum temperature observations (Tmax, Tmin) for Wairagarh station.
Main Results
- The Simulated Annealing (SA) optimizer combined with the India Meteorological Department (IMD) precipitation dataset consistently delivered the best overall performance for one-day-ahead streamflow prediction. This combination achieved a Nash–Sutcliffe Efficiency (NSE) of 0.94 and a Root Mean Square Error (RMSE) of 32.8 cubic meters per second (m³/s) during training, and an NSE of 0.82 and RMSE of 53.8 m³/s during testing.
- The CMORPH–SA combination also showed strong performance, with an NSE of 0.88 (training) and 0.71 (testing).
- TRMM–SA exhibited good training skills but weaker generalization in testing. MSWEP–ACO and PERSIANN–SA provided moderate results.
- CHIRPS, CPC, and PGF datasets generally resulted in less accurate predictions, with PGF–SA showing the lowest testing accuracy.
- A non-parametric Friedman test statistically confirmed IMD as the most dependable precipitation input (lowest mean rank of 1.45) and SA as the most efficient optimizer (highest mean rank of 1.60), endorsing the superiority of the SA + IMD configuration.
- Taylor diagrams indicated that SA provided the best balance of high correlation, low centered Root Mean Square Difference (cRMSD), and realistic variability reproduction compared to other optimizers.
- A common limitation across most dataset–method pairings was the underestimation of peak streamflow magnitudes and the severity of streamflow fluctuations, with simulated standard deviations consistently smaller than observed values.
Contributions
- Provides the first systematic evaluation of XGBoost performance when integrated with multiple precipitation datasets and optimization techniques within the complex, monsoon-influenced Godavari Basin in India.
- Addresses a critical practical gap by demonstrating how the combined selection of precipitation input and hyperparameter tuning strategy significantly influences streamflow forecasting accuracy and reliability.
- Offers empirical evidence to guide the selection of reliable precipitation datasets and efficient optimization algorithms for operational water management in data-scarce or complex hydrological regions.
- Highlights the superior versatility and robustness of Simulated Annealing (SA) as an optimization algorithm for hydrological predictions, particularly when paired with high-quality local precipitation data like IMD.
Funding
- Indian Institute of Technology Tirupati (IITT)
- GISE Hub IITB
- Project No.: RD/0121-DST0000-011 (for Postdoctoral Researcher Nagireddy Masthan Reddy)
Citation
@article{Reddy2025Exploring,
author = {Reddy, Nagireddy Masthan and Srivastav, Roshan},
title = {Exploring the Impact of Optimization Techniques on Streamflow Prediction Using XGBoost: A Comparative Analysis with Satellite and Reanalysis Precipitation Datasets},
journal = {Water Resources Management},
year = {2025},
doi = {10.1007/s11269-025-04417-x},
url = {https://doi.org/10.1007/s11269-025-04417-x}
}
Original Source: https://doi.org/10.1007/s11269-025-04417-x