Baste et al. (2025) Unveiling the limits of deep learning models in hydrological extrapolation tasks
Identification
- Journal: Hydrology and earth system sciences
- Year: 2025
- Date: 2025-11-03
- Authors: Sanika Baste, Daniel Klotz, Eduardo Acuña Espinoza, András Bàrdossy, Ralf Loritz
- DOI: 10.5194/hess-29-5871-2025
Research Groups
- Institute of Water and Environment, Karlsruhe Institute of Technology (KIT), Karlsruhe, Germany
- Interdisciplinary Transformation University Austria, Linz, Austria
- Google Research, Vienna, Austria
- Institut für Wasser- und Umweltsystemmodellierung, Universität Stuttgart, Stuttgart, Germany
Short Summary
This study investigates the extrapolation capabilities of stand-alone Long Short-Term Memory (LSTM) networks in hydrological rainfall-runoff modeling under extreme, synthetic precipitation events, revealing their inability to predict discharge beyond a calculated theoretical limit and exhibiting physically unrealistic concave runoff responses, in contrast to a more robust hybrid model.
Objective
- Can LSTMs extrapolate to discharge values beyond the training distribution when forced with statistically derived design precipitation events?
- Is the saturation of LSTM memory states the primary reason that limits their ability to extrapolate to extreme and unprecedented hydrological conditions?
- How do the inherent assumptions and structural characteristics (inductive biases) of LSTMs influence their ability to simulate realistic hydrological responses under conditions that exceed observed training ranges?
Study Configuration
- Spatial Scale: 196 river catchments in Switzerland (from the CAMELS-CH dataset), with a subset of 25 catchments selected for design precipitation experiments. Some models were also trained on a combined dataset including 531 catchments from CAMELS-US.
- Temporal Scale: Daily hydrometeorological time series. Data span for CAMELS-CH: 1 January 1981 to 31 December 2020. Training period: 1 October 1995 to 30 September 2005. Test period: 1 October 2010 to 30 September 2015. Design precipitation events were applied over 1, 3, and 5 days with Annual Return Intervals (ARI) from 50 to 300 years.
Methodology and Data
- Models used:
- Stand-alone Long Short-Term Memory (LSTM) networks (ensemble of five LSTMs with 1 layer, 64 nodes).
- Hybrid model combining a modified Hydrologiska Byråns Vattenbalansavdelning (HBV) conceptual model with an LSTM for differentiable parameter learning.
- Conceptual HBV model (locally calibrated for comparison).
- Data sources:
- CAMELS-CH dataset (Höge et al., 2023): Daily hydrometeorological time series (precipitation, min/max temperature, relative sunshine duration, snow water equivalent) and 22 static catchment attributes (e.g., area, elevation, slope, soil properties, land cover, reservoir capacity) for Swiss catchments.
- CAMELS-US dataset (Addor et al., 2017; Newman et al., 2015): Daily meteorological forcing (Daymet) and streamflow data for 531 catchments in the United States, along with 12 static catchment attributes (used for combined training).
- MeteoSwiss (2022): Statistically derived design precipitation values (1 to 5 day durations, ARI 1 to 300 years) from over 300 meteorological observation stations in Switzerland.
Main Results
- The stand-alone LSTM ensemble exhibited a theoretical prediction limit of 73 mm d⁻¹ and a practical "design limit" of 60 mm d⁻¹ for discharge, both significantly below the maximum observed training discharge of 183 mm d⁻¹.
- Under extreme synthetic precipitation, the LSTM showed a hydrologically unrealistic concave runoff response, where runoff coefficients decreased with increasing precipitation intensity.
- This limitation was primarily attributed to the LSTM's gating mechanisms preventing the incorporation of new extreme precipitation information, rather than full saturation of cell states, especially for 1-day events.
- Increasing LSTM hidden states (e.g., to 256) and/or training on larger, more diverse datasets (CAMELS-CH + CAMELS-US) raised the theoretical prediction limit (up to 194 mm d⁻¹) and design limit (up to 110 mm d⁻¹), but these still remained below the maximum observed training discharges.
- The hybrid model, by incorporating conceptual hydrological structures, did not exhibit a theoretical prediction limit and demonstrated a more hydrologically plausible, almost linear, increase in discharge with increasing extreme precipitation.
- While the LSTM ensemble achieved a higher average median Nash–Sutcliffe Efficiency (NSE) of 0.84 (compared to 0.79 for the hybrid model and 0.64 for HBV) in in-sample, out-of-time testing, it showed greater error in peak predictions (higher peak mean absolute percentage error).
Contributions
- Provides a systematic investigation into the extrapolation capabilities and limitations of stand-alone LSTMs in hydrological modeling under extreme, synthetic precipitation events, a novel approach compared to previous stress tests.
- Quantifies the existence of a "theoretical prediction limit" and a practical "design limit" for LSTM discharge predictions, demonstrating that these limits can be significantly lower than observed training extremes.
- Identifies and explains the physically counterintuitive concave runoff response of LSTMs to increasing extreme precipitation, highlighting the role of gating mechanisms in filtering or discarding extreme input signals.
- Offers a direct comparison with a hybrid hydrological model, showcasing the benefits of incorporating structural priors for achieving more plausible extrapolation behavior under unprecedented conditions.
- Suggests avenues for improvement, including architectural adjustments (e.g., more hidden states), larger and more diverse training datasets, and refined training strategies (e.g., weighted loss functions) to mitigate observed limitations.
Funding
- Deutsche Forschungsgemeinschaft (grant no. 521210228)
- Karlsruhe Institute of Technology (KIT) (for article processing charges)
Citation
@article{Baste2025Unveiling,
author = {Baste, Sanika and Klotz, Daniel and Espinoza, Eduardo Acuña and Bàrdossy, András and Loritz, Ralf},
title = {Unveiling the limits of deep learning models in hydrological extrapolation tasks},
journal = {Hydrology and earth system sciences},
year = {2025},
doi = {10.5194/hess-29-5871-2025},
url = {https://doi.org/10.5194/hess-29-5871-2025}
}
Original Source: https://doi.org/10.5194/hess-29-5871-2025