Kim et al. (2026) Evaluating rainfall-driven LSTM for statistical analysis of low-flow regimes in the Nakagawa basin, Japan

Identification

Journal: Journal of Hydrology Regional Studies
Year: 2026
Date: 2026-03-31
Authors: Do-yup Kim, Naoki Shirakawa
DOI: 10.1016/j.ejrh.2026.103366

Research Groups

The Graduate School of Science and Technology, University of Tsukuba, Japan
Institute of Systems and Information Engineering, University of Tsukuba, Japan

Short Summary

This study evaluated a rainfall-driven Long Short-Term Memory (LSTM) model for low-flow regime analysis in the Nakagawa Basin, Japan, finding that while the model performed well globally, it exhibited substantial discrepancies and poor reproducibility in extreme low-flow conditions. The research highlights the critical need for regime-specific evaluation of deep-learning runoff predictions for drought and environmental-flow decision support.

Objective

To evaluate the ability of a rainfall-driven LSTM rainfall-runoff model to reproduce low-flow regimes (occupancy, event structure, seasonality, and recession timescales) in the Nakagawa Basin, Japan, and to quantify where discrepancies amplify under bootstrap-based uncertainty estimates.

Study Configuration

Spatial Scale: Nakagawa Basin, northern Kanto region, Honshu, Japan. The basin has an area of approximately 3270 square kilometers. The study focused on the Noguchi streamflow gauging station and used data from 16 rainfall stations within and around the basin.
Temporal Scale: Hourly discharge and rainfall data from 1 January 1994 to 31 December 2022. The model was trained on data from 1994–2014 and validated on 2014–2022.

Methodology and Data

Models used: Long Short-Term Memory (LSTM) rainfall-runoff model.
Data sources:
- Hourly discharge data from the Noguchi station, operated by the Ministry of Land, Infrastructure, Transport and Tourism (MLIT).
- Hourly rainfall data from 16 MLIT observation network stations, aggregated to basin-mean rainfall using the Thiessen polygon method.
- Rainfall-derived predictor variables: rainfall at time t, cumulative rainfall over 1, 2, 3, 6, 9, 12, 24, 48, 96, 180, 360, and 720 hours; two antecedent precipitation indices (API) with daily decay constants of 0.95 and 0.99; dry hours since rain; dry days since rain; and sine and cosine encodings of day of year.

Main Results

The LSTM model showed strong overall agreement between observed and simulated discharge during the validation period, with a Nash-Sutcliffe Efficiency (NSE) of 0.84, Kling-Gupta Efficiency (KGE) of 0.89, Pearson Correlation (CORR) of 0.92, and Percent Bias (PBIAS) of -0.77%.
A two-sample Kolmogorov-Smirnov test on daily mean discharge yielded a D statistic of 0.0445, indicating a measurable distributional discrepancy despite strong global performance.
Flow-duration curve (FDC) diagnostics revealed that bias was negligible for intermediate-to-high flows but became significantly positive in the low-flow tail, increasing monotonically from +4.44% at the 75% exceedance level (Q75) to +20.19% at the 99% exceedance level (Q99), indicating systematic overestimation of low flows.
Threshold-based low-flow diagnostics showed a progressive deterioration from moderate to extreme low-flow conditions:
- At Q75 (38.68 cubic meters per second), the dominant discrepancy was reduced low-flow occupancy (687 simulated days vs. 793 observed days, a difference of -106 days).
- At Q90 (30.45 cubic meters per second), underestimation was more pronounced, with simulated low-flow days decreasing from 317 to 235 (difference of -82 days), and event counts decreasing from 10 to 9.
- At Q95 (26.41 cubic meters per second), extreme-event detection deteriorated sharply, with simulated low-flow days decreasing from 159 to 36 (difference of -123 days), and only one simulated event detected compared to six observed events. Recession characteristics could not be quantitatively interpreted due to insufficient valid simulated events.
The study concluded that strong global model skill can coexist with poor reproducibility of extreme low-flow behavior, and this deterioration is non-linear towards more extreme low-flow regimes.

Contributions

Provides a comprehensive, uncertainty-aware diagnostic framework for evaluating rainfall-driven deep-learning runoff models, combining global performance metrics, distributional tests, flow-duration curve analysis, and threshold-based low-flow signatures with bootstrap uncertainty.
Demonstrates a critical decoupling between high global model skill and low-flow-tail reliability, emphasizing that global metrics alone are insufficient for assessing model suitability in low-flow applications.
Highlights the necessity for regime-specific verification of threshold crossing, event structure, and low-flow-tail bias when applying deep-learning models for drought monitoring, environmental-flow support, or threshold-based water management.
Suggests that rainfall-only deep-learning models may be insufficient to fully capture extreme low-flow behavior in complex basins like Nakagawa without incorporating additional process information such as evapotranspiration, groundwater, or operational water use.

Funding

Miyabi Supercomputer System (Supercomputer System for JCAHPC, the Joint Center for Advanced High Performance Computing)

Citation

@article{Kim2026Evaluating,
  author = {Kim, Do-yup and Shirakawa, Naoki},
  title = {Evaluating rainfall-driven LSTM for statistical analysis of low-flow regimes in the Nakagawa basin, Japan},
  journal = {Journal of Hydrology Regional Studies},
  year = {2026},
  doi = {10.1016/j.ejrh.2026.103366},
  url = {https://doi.org/10.1016/j.ejrh.2026.103366}
}

Original Source: https://doi.org/10.1016/j.ejrh.2026.103366