Puche et al. (2025) Assessing temporal and spatial generalization of LSTMs for streamflow modeling in French watersheds with and without European training data

Identification

Journal: Journal of Hydrology Regional Studies
Year: 2025
Date: 2025-12-01
Authors: Mathilde Puche, Magali Troin, Dennis Fox
DOI: 10.1016/j.ejrh.2025.103022

Research Groups

Universit´e Cˆote d’Azur, UMR ESPACE CNRS, Nice, France
Hydroclimat, Aubagne, France

Short Summary

This study evaluates the temporal, spatial, and spatio-temporal generalization capabilities of Long Short-Term Memory (LSTM) networks for streamflow modeling across 310 French watersheds, also investigating the impact of including 501 additional European basins in the training data. LSTMs perform best in temporal generalization (median Kling-Gupta efficiency (KGE) = 0.78), but performance slightly decreased when European training data was added.

Objective

Evaluate the performance of Long Short-Term Memory (LSTM) networks for temporal induction (TI), spatial induction (SI), and spatio-temporal induction (STI) streamflow simulation tasks across 310 French watersheds.
Assess the potential benefits of incorporating additional training data from 501 European watersheds to improve LSTM performance in these three tasks.

Study Configuration

Spatial Scale: 310 French watersheds (drainage area between 50 km² and 5000 km²) and 501 additional European watersheds (for training).
Temporal Scale: 26-year period from 1987 to 2012 for hydro-meteorological data.

Methodology and Data

Models used: Long Short-Term Memory (LSTM) networks, implemented using the NeuralHydrology Python package (cudaLSTM). Adam optimizer with Mean Square Error (MSE) as the loss function (Nash-Sutcliffe efficiency (NSE*) also tested).
Data sources:
- Hydro-meteorological data:
  - Daily streamflow: HydroPortail (French watersheds), Global Runoff Data Centre (GRDC) (European watersheds).
  - Daily precipitation, minimum and maximum temperatures: ERA5-Land reanalysis (European Centre for Medium Range Weather Forecasts).
- Static watershed attributes:
  - Land cover: 2018 Corine Land Cover (CLC) map.
  - Soil properties: Digital Soil Open Land Map (DSOLMap).
  - Climatic attributes (derived): ERA5-Land dataset.
  - Topographic attributes: Digital Elevation Model (DEM) from Shuttle Radar Topography Mission (SRTM).
  - Geospatial attributes (drainage area, coordinates): BNBV database (French watersheds), GRDC shapefiles (European basins).

Main Results

LSTM performs best in Temporal Induction (TI) for French watersheds (median KGE = 0.78), with 89% of watersheds achieving KGE ≥ 0.65.
Spatial Induction (SI) shows satisfactory but lower median performance (median KGE = 0.68), with 53% of stations categorized as good or excellent, but 22% showing poor KGE.
Spatio-Temporal Induction (STI) yields the lowest performance (median KGE = 0.63), with 46% of stations showing good or excellent KGE, and 25% showing poor KGE.
Across all tasks, larger basins (area > 1400 km²) with higher mean daily streamflow (> 12 m³/s) and greater streamflow variability (standard deviation > 18 m³/s) tend to be better simulated.
LSTMs struggle with low-flow and low-variability basins, as well as those influenced by regulatory factors (e.g., groundwater, human infrastructure).
Unexpectedly, expanding the training data with 501 European basins slightly decreased overall model performance (median KGE decreased by 0.02 across all tasks).
The Pearson correlation coefficient remained stable across tasks (0.83 to 0.88), indicating LSTMs effectively capture the timing of streamflow variations but struggle more with magnitudes in spatial generalization.
Physical consistency checks showed rare gross violations (≤3% of basins for extreme values and water balance consistency in SI/STI, none in TI).

Contributions

Provides the first comprehensive evaluation of LSTM performance for temporal, spatial, and spatio-temporal streamflow simulation tasks across a large sample (310) of hydrologically diverse French watersheds.
Investigates the impact of expanding training data with additional European basins on LSTM generalization capabilities, revealing an unexpected slight decrease in performance.
Identifies specific watershed characteristics (size, mean flow, variability, land cover, regulatory factors) that influence LSTM performance across different generalization tasks in French and European contexts.
Demonstrates LSTM's robustness and generalizability for streamflow simulation in diverse Western European and Euro-Mediterranean basins, particularly for temporal induction.

Funding

SUD-PACA region (“Emplois Jeunes Doctorants” program)
Hydroclimat
National ’France Relance’ recovery plan of the CNRS (POP-RISK project)

Citation

@article{Puche2025Assessing,
  author = {Puche, Mathilde and Troin, Magali and Fox, Dennis},
  title = {Assessing temporal and spatial generalization of LSTMs for streamflow modeling in French watersheds with and without European training data},
  journal = {Journal of Hydrology Regional Studies},
  year = {2025},
  doi = {10.1016/j.ejrh.2025.103022},
  url = {https://doi.org/10.1016/j.ejrh.2025.103022}
}

Original Source: https://doi.org/10.1016/j.ejrh.2025.103022