Modi et al. (2025) Understanding the relationship between streamflow forecast skill and value across the western US

Identification

Journal: Hydrology and earth system sciences
Year: 2025
Date: 2025-10-22
Authors: Parthkumar Modi, Jared C. Carbone, Keith S. Jennings, Hannah Kamen, Joseph Kasprzyk, Bill Szafranski, Cameron Wobus, Ben Livneh
DOI: 10.5194/hess-29-5593-2025

Research Groups

Department of Civil, Environmental, and Architectural Engineering, University of Colorado Boulder, Boulder, CO, USA
Cooperative Institute for Research in Environmental Sciences (CIRES), University of Colorado Boulder, Boulder, CO, USA
Economics and Business, Colorado School of Mines, Golden, CO, USA
Water Resources Institute, University of Vermont, Burlington, VT, USA
Lynker, Boulder, CO, USA
CK Blueshift LLC, University of Colorado Boulder, Boulder, CO, USA
Western Water Assessment, University of Colorado Boulder, Boulder, CO, USA

Short Summary

This study investigates the complex relationship between seasonal streamflow forecast skill and economic value in unmanaged snow-dominated basins across the western US, finding that while forecast skill is explained by errors in mean and variability, forecast value is more influenced by irregular error structures impacting categorical measures like hit and false alarm rates, meaning high skill does not always translate to high value.

Objective

To understand how errors in different forecasting systems affect forecast skill and decision-making value in unmanaged basins, and how these insights can guide improvements in forecast systems.
To explore how errors in probabilistic streamflow forecasts reduce their economic value, particularly during droughts when decision-making is critical.

Study Configuration

Spatial Scale: 76 unmanaged, snow-dominated drainage basins across the western US, primarily within US Environmental Protection Agency's level III snow ecoregions (Cascades, Idaho Batholith, Intermountain West, Rockies, Sierra Nevada, Wasatch and Uinta Mountains). Basins have drainage areas smaller than 2500 km² and minimal anthropogenic influence.
Temporal Scale: Seasonal streamflow forecasts for the April–July (AMJJ) period. Forecasts generated on 1 April. Analysis period for forecasts and observations is Water Year (WY) 2006–2022. Historical data for model training/calibration and validation spans WY1983–WY2022, with a testing period of WY2001–2010 for historical model performance.

Methodology and Data

Models used:
- Synthetic forecasts: Generated by imposing systematic error structures (modifications to mean and standard deviation) on observed AMJJ streamflow volumes.
- WRF-Hydro (WRFH): A process-based hydrologic model, run on an hourly timescale and aggregated to AMJJ volumes, using an Ensemble Streamflow Prediction (ESP) framework. Calibrated using a dynamically dimensioned search approach to minimize a weighted Nash–Sutcliffe efficiency (NSEwt).
- Long Short-term Memory Networks (LSTM): A deep-learning model, run on a daily timescale and aggregated to AMJJ volumes, using an ESP framework. Trained regionally using a basin-average Nash–Sutcliffe efficiency loss function.
- NRCS operational forecasts: Statistical forecasts based on a principal component regression model, using predictors like snow water equivalent (SWE), accumulated precipitation, and antecedent streamflow.
Data sources:
- Basin attributes: USGS Geospatial Attributes of Gages for Evaluating Streamflow (GAGES-II) dataset, Hydro-Climatic Data Network (HCDN), and Catchment Attributes and Meteorology for Large-sample Studies (CAMELS) dataset.
- Meteorological forcings: Analysis of Records for Calibration (AORC) for precipitation, average wind speed, 2 m average air temperature, incoming longwave and shortwave radiation, near-surface air pressure, and vapor pressure.
- Land surface parameters: Moderate Resolution Imaging Spectrometer (MODIS) for surface albedo, leaf area index, green fraction; United States Department of Agriculture National Agricultural Statistics Service (NASS) for land-use/land-cover; State Soil Geographic (STATSGO) for soil type; UCAR (WRF Preprocessing System data page) for maximum snow albedo and soil temperature.
- Snow data: University of Arizona (UA) gridded snow dataset for snow water equivalent (SWE). SNOTEL for SWE and accumulated precipitation (for NRCS model).
- Streamflow observations: USGS’s National Water Information System (NWIS) for daily streamflow estimates at basin outlets.
Forecast Skill Metric: Normalized Mean Quantile Loss (NMQloss), an average of quantile loss for quantiles {0.1, 0.5, 0.9}, normalized by the mean of observations.
Forecast Value Metric: Area under the Potential Economic Value (PEVmax) curve (APEVmax), calculated using a cost–loss model for categorical decisions (drought event defined as AMJJ streamflow volume below the 25th percentile, P25, with additional thresholds at P35 and P15).

Main Results

The LSTM model consistently outperformed the WRF-Hydro model in historical streamflow simulations (WY2001–2010), with median daily Nash–Sutcliffe Efficiency (NSE) of 0.80 (LSTM) vs. 0.42 (WRFH), and normalized root mean square error (NRMSE) of 20% (LSTM) vs. 45% (WRFH).
For synthetic forecasts, optimal forecast skill (NMQloss close to 0) was observed with errors in the mean between -20% and 20% and changes in variability of -100% and -50%. Optimal forecast value (APEVmax close to 0.9) occurred with errors in the mean between -20% and 20% and changes in variability between -100% and 0%.
Synthetic forecast skill was symmetric around mean errors, but forecast value was asymmetric, primarily due to the interplay of categorical measures (hit and false alarm rates) and the P25 drought threshold.
True forecast systems consistently overpredicted the mean streamflow during drought years (median error: WRFH 55%, LSTM 30%, NRCS 14%) and showed lower standard deviation than interannual variability.
Forecast skill of true forecasts showed good correspondence with synthetic forecasts (median Relative Median Absolute Deviations (RMADs) from optimal: WRFH 30%, LSTM 23%, NRCS 20%), indicating skill is largely driven by error in mean and variability.
Forecast value of true forecasts showed poor correspondence with synthetic forecasts (median RMADs from optimal: WRFH 100%, LSTM 81%, NRCS 91%), demonstrating that error in mean and change in variability do not adequately explain variations in forecast value.
The relationship between forecast skill (NMQloss) and value (APEVmax) was strong for synthetic forecasts (correlation ≥ 0.65) but significantly weaker and more variable for true forecasts (correlation ≤ 0.38), indicating that high forecast skill does not always translate to high forecast value in real-world conditions.
The skill-value relationship monotonically changes with drought severity, with forecast value worsening as drought events become more severe (from P35 to P15).
Categorical error measures, specifically hit and false alarm rates, better explained the discrepancies in forecast value between synthetic and true forecast systems than the normalized mean quantile loss skill metric.

Contributions

Provides a systematic, quantitative assessment of the relationship between seasonal streamflow forecast skill and economic value in unmanaged, snow-dominated basins across the western US.
Introduces and utilizes synthetic forecasts with controlled error structures to diagnose and interpret the impact of irregular error structures in real-world (true) forecasts on both skill and value.
Demonstrates that while traditional skill metrics (e.g., quantile loss) are sensitive to errors in mean and variability, economic forecast value is disproportionately influenced by irregular error structures that impact categorical performance measures (hit and false alarm rates).
Highlights a critical disconnect: high forecast skill does not consistently translate to high economic value in true forecasting systems, particularly during droughts, due to complex error characteristics.
Reveals that the relationship between forecast skill and value shifts monotonically with drought severity, with models struggling more to provide value for increasingly severe drought events.
Advocates for a multi-faceted forecast evaluation framework that integrates both skill and economic value, moving beyond sole reliance on accuracy metrics to better inform water management decisions.

Funding

National Oceanic and Atmospheric Administration (Identifying Alternatives to Snow-based Streamflow Predictions to Advance Future Drought Predictability, grant no. NA20OAR4310420)
National Science Foundation (Water-Mediated Coupling of Natural-Human Systems: Drought and Water Allocation Across Spatial Scales, grant no. 2009922)
University of Colorado Boulder Libraries’ Open Access Fund

Citation

@article{Modi2025Understanding,
  author = {Modi, Parthkumar and Carbone, Jared C. and Jennings, Keith S. and Kamen, Hannah and Kasprzyk, Joseph and Szafranski, Bill and Wobus, Cameron and Livneh, Ben},
  title = {Understanding the relationship between streamflow forecast skill and value across the western US},
  journal = {Hydrology and earth system sciences},
  year = {2025},
  doi = {10.5194/hess-29-5593-2025},
  url = {https://doi.org/10.5194/hess-29-5593-2025}
}

Original Source: https://doi.org/10.5194/hess-29-5593-2025