Li et al. (2025) Ensembling differentiable process-based and data-driven models with diverse meteorological forcing datasets to advance streamflow simulation

Identification

Journal: Hydrology and earth system sciences
Year: 2025
Date: 2025-12-01
Authors: Peijun Li, Yalan Song, Ming Pan, Kathryn Lawson, Chaopeng Shen
DOI: 10.5194/hess-29-6829-2025

Research Groups

Civil and Environmental Engineering, The Pennsylvania State University, University Park, PA, USA
Center for Western Weather and Water Extremes, Scripps Institution of Oceanography, University of California San Diego, La Jolla, CA, USA

Short Summary

This study systematically evaluates and utilizes ensembles of a data-driven Long Short-Term Memory (LSTM) network and a physics-informed differentiable HBV ($\delta$HBV) model with diverse meteorological forcing datasets to advance streamflow simulation. The research demonstrates that cross-model-type ensembles consistently outperform single-model approaches and set new accuracy benchmarks, particularly enhancing spatial generalization due to complementary error characteristics and the structural constraints of $\delta$HBV.

Objective

Will a cross-model-type ensemble of LSTM and $\delta$HBV improve deterministic streamflow prediction more than a within-class ensemble?
Is it better to use multiple forcings in one model or to ensemble multiple models, each with a different forcing input?
Do process-based equations bring unique value to an ensemble, especially in terms of spatial generalizability?

Study Configuration

Spatial Scale: 531 river basins across the conterminous United States, derived from the CAMELS dataset. Basin sizes range from 1 to 25 800 square kilometers (median: 335 square kilometers).
Temporal Scale: Daily temporal resolution for meteorological forcing data and streamflow observations. Evaluations conducted across different periods (temporal test) and for ungauged basins/regions (PUB/PUR tests) over multi-year periods.

Methodology and Data

Models used:
- Long Short-Term Memory (LSTM) network (data-driven)
- Differentiable HBV ($\delta$HBV) model (physics-informed machine learning, specifically $\delta$HBV1.1p)
- Ensemble averaging of individual model outputs
Data sources:
- CAMELS (Catchment Attributes and Meteorology for Large-sample Studies) dataset: Streamflow observations, static basin attributes (e.g., basin area, topography, climate, soil texture, land cover, geology).
- Meteorological forcing datasets: Daymet, North American Land Data Assimilation System (NLDAS), and Maurer (daily precipitation, temperature, vapor pressure, surface radiation).
- Potential evapotranspiration calculated using the Hargreaves method.

Main Results

Cross-model-type ensembles (LSTM + $\delta$HBV) consistently surpassed single-model approaches and within-class ensembles across all temporal and spatial generalization tests.
New benchmark records were established on the CAMELS dataset, achieving median Nash-Sutcliffe model efficiency coefficients (NSE) of approximately 0.83 for the temporal test, 0.79 for the ungauged basin test (PUB), and 0.70 for the ungauged region test (PUR).
Ensembling models trained on individual meteorological forcing datasets (e.g., LSTM$^{123}$) yielded higher performance (NSE of 0.8082) than feeding multiple forcing datasets simultaneously into a single LSTM model (LSTM$_{multi}$, NSE of 0.7974).
The $\delta$HBV model significantly improved ensemble performance for spatial interpolation (PUB) and, more prominently, for spatial extrapolation (PUR), demonstrating the value of its structural constraints.
LSTM and $\delta$HBV exhibited distinct error characteristics that complemented each other, leading to improved high-flow and low-flow metrics in ensembles.
The most substantial performance improvements from ensembling were observed in the Great Plains and midwestern US, regions historically challenging for hydrological models.

Contributions

Provides a systematic evaluation of ensembling highly structurally different hydrological models (data-driven and physics-informed) under comprehensive spatiotemporal generalization tests.
Establishes new state-of-the-art performance benchmarks for streamflow simulation on the CAMELS dataset using cross-model-type ensembles.
Challenges the conventional approach of fusing multiple forcing datasets into a single data-driven model, demonstrating that ensembling models trained on separate forcings is more effective.
Highlights the critical role of physics-informed models like $\delta$HBV in providing valuable structural constraints that significantly enhance spatial generalization capabilities of ensembles, particularly in ungauged regions.
Advances the understanding of how to effectively leverage diverse model types and multi-source datasets to improve streamflow simulations across various hydrological scenarios.

Funding

Office of Biological and Environmental Research of the U.S. Department of Energy (contract no. DESC0016605)
California Department of Water Resources Atmospheric River Program Phase III (Grant 4600014294)
Cooperative Institute for Research to Operations in Hydrology (CIROH) through the National Oceanic and Atmospheric Administration (NOAA) Cooperative Agreement (Grant no. NA22NWS4320003)

Citation

@article{Li2025Ensembling,
  author = {Li, Peijun and Song, Yalan and Pan, Ming and Lawson, Kathryn and Shen, Chaopeng},
  title = {Ensembling differentiable process-based and data-driven models with diverse meteorological forcing datasets to advance streamflow simulation},
  journal = {Hydrology and earth system sciences},
  year = {2025},
  doi = {10.5194/hess-29-6829-2025},
  url = {https://doi.org/10.5194/hess-29-6829-2025}
}

Original Source: https://doi.org/10.5194/hess-29-6829-2025