Baste et al. (2025) Unveiling the limits of deep learning models in hydrological extrapolation tasks

Identification

Journal: Hydrology and earth system sciences
Year: 2025
Date: 2025-11-03
Authors: Sanika Baste, Daniel Klotz, Eduardo Acuña Espinoza, András Bàrdossy, Ralf Loritz
DOI: 10.5194/hess-29-5871-2025

Research Groups

Institute of Water and Environment, Karlsruhe Institute of Technology (KIT), Karlsruhe, Germany
Interdisciplinary Transformation University Austria, Linz, Austria
Google Research, Vienna, Austria
Institut für Wasser- und Umweltsystemmodellierung, Universität Stuttgart, Stuttgart, Germany

Short Summary

This study investigates the extrapolation capabilities of stand-alone Long Short-Term Memory (LSTM) networks in hydrological rainfall-runoff modeling under extreme, synthetic precipitation events, revealing their inability to predict discharge beyond a calculated theoretical limit and exhibiting physically unrealistic concave runoff responses, in contrast to a more robust hybrid model.

Objective

Can LSTMs extrapolate to discharge values beyond the training distribution when forced with statistically derived design precipitation events?
Is the saturation of LSTM memory states the primary reason that limits their ability to extrapolate to extreme and unprecedented hydrological conditions?
How do the inherent assumptions and structural characteristics (inductive biases) of LSTMs influence their ability to simulate realistic hydrological responses under conditions that exceed observed training ranges?

Study Configuration

Spatial Scale: 196 river catchments in Switzerland (from the CAMELS-CH dataset), with a subset of 25 catchments selected for design precipitation experiments. Some models were also trained on a combined dataset including 531 catchments from CAMELS-US.
Temporal Scale: Daily hydrometeorological time series. Data span for CAMELS-CH: 1 January 1981 to 31 December 2020. Training period: 1 October 1995 to 30 September 2005. Test period: 1 October 2010 to 30 September 2015. Design precipitation events were applied over 1, 3, and 5 days with Annual Return Intervals (ARI) from 50 to 300 years.

Methodology and Data

Models used:
- Stand-alone Long Short-Term Memory (LSTM) networks (ensemble of five LSTMs with 1 layer, 64 nodes).
- Hybrid model combining a modified Hydrologiska Byråns Vattenbalansavdelning (HBV) conceptual model with an LSTM for differentiable parameter learning.
- Conceptual HBV model (locally calibrated for comparison).
Data sources:
- CAMELS-CH dataset (Höge et al., 2023): Daily hydrometeorological time series (precipitation, min/max temperature, relative sunshine duration, snow water equivalent) and 22 static catchment attributes (e.g., area, elevation, slope, soil properties, land cover, reservoir capacity) for Swiss catchments.
- CAMELS-US dataset (Addor et al., 2017; Newman et al., 2015): Daily meteorological forcing (Daymet) and streamflow data for 531 catchments in the United States, along with 12 static catchment attributes (used for combined training).
- MeteoSwiss (2022): Statistically derived design precipitation values (1 to 5 day durations, ARI 1 to 300 years) from over 300 meteorological observation stations in Switzerland.

Main Results

The stand-alone LSTM ensemble exhibited a theoretical prediction limit of 73 mm d⁻¹ and a practical "design limit" of 60 mm d⁻¹ for discharge, both significantly below the maximum observed training discharge of 183 mm d⁻¹.
Under extreme synthetic precipitation, the LSTM showed a hydrologically unrealistic concave runoff response, where runoff coefficients decreased with increasing precipitation intensity.
This limitation was primarily attributed to the LSTM's gating mechanisms preventing the incorporation of new extreme precipitation information, rather than full saturation of cell states, especially for 1-day events.
Increasing LSTM hidden states (e.g., to 256) and/or training on larger, more diverse datasets (CAMELS-CH + CAMELS-US) raised the theoretical prediction limit (up to 194 mm d⁻¹) and design limit (up to 110 mm d⁻¹), but these still remained below the maximum observed training discharges.
The hybrid model, by incorporating conceptual hydrological structures, did not exhibit a theoretical prediction limit and demonstrated a more hydrologically plausible, almost linear, increase in discharge with increasing extreme precipitation.
While the LSTM ensemble achieved a higher average median Nash–Sutcliffe Efficiency (NSE) of 0.84 (compared to 0.79 for the hybrid model and 0.64 for HBV) in in-sample, out-of-time testing, it showed greater error in peak predictions (higher peak mean absolute percentage error).

Contributions

Provides a systematic investigation into the extrapolation capabilities and limitations of stand-alone LSTMs in hydrological modeling under extreme, synthetic precipitation events, a novel approach compared to previous stress tests.
Quantifies the existence of a "theoretical prediction limit" and a practical "design limit" for LSTM discharge predictions, demonstrating that these limits can be significantly lower than observed training extremes.
Identifies and explains the physically counterintuitive concave runoff response of LSTMs to increasing extreme precipitation, highlighting the role of gating mechanisms in filtering or discarding extreme input signals.
Offers a direct comparison with a hybrid hydrological model, showcasing the benefits of incorporating structural priors for achieving more plausible extrapolation behavior under unprecedented conditions.
Suggests avenues for improvement, including architectural adjustments (e.g., more hidden states), larger and more diverse training datasets, and refined training strategies (e.g., weighted loss functions) to mitigate observed limitations.

Funding

Deutsche Forschungsgemeinschaft (grant no. 521210228)
Karlsruhe Institute of Technology (KIT) (for article processing charges)

Citation

@article{Baste2025Unveiling,
  author = {Baste, Sanika and Klotz, Daniel and Espinoza, Eduardo Acuña and Bàrdossy, András and Loritz, Ralf},
  title = {Unveiling the limits of deep learning models in hydrological extrapolation tasks},
  journal = {Hydrology and earth system sciences},
  year = {2025},
  doi = {10.5194/hess-29-5871-2025},
  url = {https://doi.org/10.5194/hess-29-5871-2025}
}

Original Source: https://doi.org/10.5194/hess-29-5871-2025