Ye et al. (2025) Comparison of Process-Based and Machine Learning Models for Streamflow Simulation in Typical Basins in Northern and Southern China

Identification

Journal: Water
Year: 2025
Date: 2025-12-10
Authors: Rui Ye, Feng Zhang, Jiaxue Ren, Tao Wu, Haitao Chen
DOI: 10.3390/w17243498

Research Groups

Anhui Water Conservancy Technical College, China
College of Water Conservancy, Jiangxi University of Water Resources and Electric Power, China
Chinese Research Academy of Environmental Sciences, China
College of Environmental Science and Engineering, Nankai University, China

Short Summary

This study compared the performance of two process-based hydrological models (SWAT, GWLF) and a machine learning model (Random Forest) for monthly streamflow simulation in contrasting humid southern and semi-arid northern Chinese basins, concluding that optimal model selection depends on hydrological context, data availability, and the need for physical realism.

Objective

To systematically compare the applicability and performance of two process-based hydrological models (SWAT, GWLF) and a machine learning model (Random Forest) for monthly streamflow simulation in hydroclimatically contrasting humid southern (Shuai Shui Basin) and semi-arid northern (Shahe River Basin) basins in China.

Study Configuration

Spatial Scale: Two typical basins in China: the Shahe River Basin (SRB) in northern China (905.2 km²) and the Shuai Shui Basin (SSB) in southern China (880.4 km²).
Temporal Scale: Monthly streamflow simulations.
- SWAT & GWLF:
  - SSB: Calibration (2007–2013), Validation (2014–2016), with a one-year warm-up period.
  - SRB: Calibration (2001–2007), Validation (2008–2010), with a one-year warm-up period.
- RF: Dataset split into 75% for training and 25% for testing.

Methodology and Data

Models used:
- Process-based: Soil and Water Assessment Tool (SWAT), Generalized Watershed Loading Function (GWLF) (specifically, the modified ReNuMa model).
- Machine Learning: Random Forest (RF) regression model.
Data sources:
- Meteorological station data (precipitation, maximum and minimum temperatures, average temperatures, relative humidity, wind speed, solar radiation) from the National Weather Science Data Center.
- Rainfall station data (precipitation) from local hydrological bureaus.
- Monthly evapotranspiration data from the Institute of Tibetan Plateau Research, Chinese Academy of Sciences.
- Streamflow data from hydrological stations (local hydrological bureaus).
- Spatial data: Digital Elevation Model (DEM) (30 m × 30 m resolution) from the Geospatial Data Cloud; Land use and soil maps (1:1,000,000 scale) from the Resource and Environmental Science and Data Center.

Main Results

All models exhibited significantly better performance in the humid Southern Basin (SSB) (Nash–Sutcliffe efficiency (NSE) and coefficient of determination (R²) values > 0.85) compared to the semi-arid Northern Basin (SRB).
The SWAT model consistently demonstrated the highest overall accuracy in both basins, particularly in simulating peak flow events, attributed to its comprehensive physics-based representation of hydrological processes.
The GWLF model provided acceptable streamflow simulations with minimal data requirements, offering a practical alternative in data-limited regions like the SRB, though it showed limitations in capturing extreme flow events.
The Random Forest (RF) model performed well in the SSB under zero-lag conditions, indicating a rapid rainfall–runoff response. However, in the SRB, it required the incorporation of multi-day lags to account for delayed infiltration-excess runoff and consistently underestimated high-flow events due to its data-driven nature and lack of embedded physical mechanisms.
Calibrated model parameters showed significant spatial variability, reflecting fundamental differences in hydrological processes between the two contrasting basins (e.g., higher curve number (CN2) in SSB for surface runoff potential, higher soil available water capacity (SOL_AWC) in SRB).

Contributions

Provides a systematic and comparative evaluation of process-based (SWAT, GWLF) and data-driven (RF) hydrological models across hydroclimatically contrasting regions in China, addressing a gap in the literature.
Offers critical insights into the applicability, strengths, and limitations of different modeling approaches for streamflow simulation under diverse hydrological contexts, data availability, and the need for physical realism.
Highlights the superior performance of process-based models in capturing extreme flow events, especially crucial in regions experiencing intensified rainfall extremes due to climate change.
Advocates for an integrated modeling strategy, combining process-based models for long-term planning and scenario analysis with machine learning for adaptive, data-driven refinement of forecasts and residual error correction, enhancing both predictive accuracy and scientific credibility for water resource management.

Funding

Natural Science Foundation of Anhui Province of China (Grant No. 2208085US06)
Jiangxi Province High-level and High-skilled Leading Talents Cultivation Project (2025)

Citation

@article{Ye2025Comparison,
  author = {Ye, Rui and Zhang, Feng and Ren, Jiaxue and Wu, Tao and Chen, Haitao},
  title = {Comparison of Process-Based and Machine Learning Models for Streamflow Simulation in Typical Basins in Northern and Southern China},
  journal = {Water},
  year = {2025},
  doi = {10.3390/w17243498},
  url = {https://doi.org/10.3390/w17243498}
}

Original Source: https://doi.org/10.3390/w17243498