Yang et al. (2025) Improving streamflow simulation through machine learning-powered data integration and its potential for forecasting in the Western U.S.
Identification
- Journal: Hydrology and earth system sciences
- Year: 2025
- Date: 2025-10-21
- Authors: Yuan Yang, Ming Pan, Dapeng Feng, Mu Xiao, T. A. Dixon, Robert Hartman, Chaopeng Shen, Yalan Song, Agniv Sengupta, Luca Delle Monache, F. Martin Ralph
- DOI: 10.5194/hess-29-5453-2025
Research Groups
- Center for Western Weather and Water Extremes, Scripps Institution of Oceanography, University of California San Diego, CA, USA
- Department of Earth System Science, Stanford University, Stanford, CA, USA
- Robert K. Hartman Consulting Services, Roseville, CA, USA
- Civil and Environmental Engineering, Pennsylvania State University, PA, USA
Short Summary
This study evaluated an LSTM-based data integration approach incorporating streamflow (Q) and snow water equivalent (SWE) observations to improve streamflow estimations across various lag times and timescales in the Western U.S. Integrating daily Q observations provided the most significant improvements, boosting the median Kling-Gupta Efficiency (KGE) from 0.80 to 0.96 for 1-day lagged data.
Objective
- To evaluate an LSTM-based data integration approach that incorporates streamflow (Q) and snow water equivalent (SWE) observations to improve streamflow estimations across different lag times (1–10 days, 1–6 months) and timescales (daily and monthly) over hundreds of basins in the Western U.S.
- To provide insights into the effectiveness of LSTM-based data integration for streamflow forecasting and the differential influence of Q and SWE observations on forecast performance across varying timescales.
Study Configuration
- Spatial Scale: 646 basins across the Western U.S., with a subset of 429 identified as snow-dominated. Basin areas ranged from 50 km² to 5000 km².
- Temporal Scale:
- Data period: 1983–2022.
- Training period: 1983–2002.
- Evaluation period: 2003–2022.
- Lag times: 1–10 days for daily scale experiments; 1–6 months for monthly scale experiments.
- Timescales: Daily and monthly.
- Specific evaluation for monthly scale: April–July (spring-summer flow).
Methodology and Data
- Models used: Long Short-Term Memory (LSTM) networks, including a standard LSTM as a baseline and a Data Integration LSTM (DI-LSTM) which incorporates lagged observations as additional inputs.
- Data sources:
- Meteorological forcings: CW3E 1 km 1-hourly Meteorological Forcing on NWM Grid (daily precipitation in mm d⁻¹, daily maximum/minimum temperature in °C, daily mean surface downwelling shortwave in W m⁻², daily mean 10 m wind in m s⁻¹).
- Leaf Area Index (LAI): Monthly LAI climatology from PROBA-V.
- Basin attributes: 10 static attributes (climate, topography, soil) including mean daily precipitation (mm d⁻¹), high precipitation duration (days), fraction of precipitation as snow, aridity, frozen days (days), basin area (km²), mean elevation (m above sea level), mean slope (°), geological permeability (m²), and soil sand content (%).
- Streamflow (Q): Daily observations from the USGS Water Information System.
- Snow Water Equivalent (SWE): Daily 4 km gridded data from the University of Arizona dataset.
- Basin selection: USGS Geospatial Attributes of Gages for Evaluating Streamflow II (GAGEII) database.
Main Results
- The baseline LSTM model achieved a strong median Kling-Gupta Efficiency (KGE) of 0.80 at both daily and monthly scales across the Western U.S.
- Integrating lagged streamflow (Q) at the daily scale provided the most substantial improvements:
- Median KGE for 646 basins increased from 0.80 to 0.96 with 1-day lagged Q.
- Median KGE remained at 0.89 even with a 10-day lag.
- Correlation coefficient (CC) improved to 0.98, relative variability (RV) to 0.96, and relative bias (RB) to 1.24% for 1-day lagged Q.
- Improvements were most pronounced in mountainous regions (e.g., Rocky Mountains, Sierra Nevada Ranges) and less effective in hyper-arid, flash-flood prone areas.
- Integrating lagged Q at the monthly scale also improved streamflow estimations, though to a lesser extent:
- Median KGE increased from 0.80 to 0.86 with 1-month lagged Q.
- Median KGE remained at 0.83 with 6-month lagged Q.
- For April–July flow, median KGE improved from 0.76 to 0.81 (1-month lag).
- Integrating lagged Snow Water Equivalent (SWE) at the daily scale did not improve KGE, but showed some improvements in CC and RV, while worsening RB.
- Integrating lagged SWE at the monthly scale led to better accuracy:
- Median KGE increased from 0.80 to 0.82 with 1-month lagged SWE.
- Benefits were more pronounced in snow-dominated basins during the snowmelt season (April–July).
- Overall, the benefits of data integration ranked as: daily DI(Q) > monthly DI(Q) > monthly DI(SWE) > daily DI(SWE).
- LSTM's inherent ability to capture long-term dependencies from meteorological forcings implicitly accounts for snow dynamics, which explains why direct SWE integration provided less incremental value than direct streamflow observations.
Contributions
- Provided a comprehensive, large-sample evaluation of an LSTM-based data integration approach for streamflow simulation and forecasting in the Western U.S., incorporating both streamflow (Q) and snow water equivalent (SWE) observations across a wide range of lag times and temporal scales.
- Quantified the differential effectiveness of integrating Q versus SWE observations, highlighting that daily Q integration yields the most significant improvements, while monthly SWE integration is particularly beneficial for snow-dominated basins during snowmelt.
- Demonstrated the potential of this automated and flexible DI-LSTM framework for enhancing both short-term (1–10 days) and long-term (1–6 months) operational streamflow forecasts, offering a valuable alternative to traditional hydrological models.
Funding
- Cooperative Institute for Research to Operations in Hydrology (CIROH)
- NOAA Cooperative Institute Program (Award NA22NWS4320003)
Citation
@article{Yang2025Improving,
author = {Yang, Yuan and Pan, Ming and Feng, Dapeng and Xiao, Mu and Dixon, T. A. and Hartman, Robert and Shen, Chaopeng and Song, Yalan and Sengupta, Agniv and Monache, Luca Delle and Ralph, F. Martin},
title = {Improving streamflow simulation through machine learning-powered data integration and its potential for forecasting in the Western U.S.},
journal = {Hydrology and earth system sciences},
year = {2025},
doi = {10.5194/hess-29-5453-2025},
url = {https://doi.org/10.5194/hess-29-5453-2025}
}
Original Source: https://doi.org/10.5194/hess-29-5453-2025