Yang et al. (2026) Development of a Two-Stage LSTM for Multi-Step Runoff Forecasting Using a XAJ Model and EEMD

Identification

Journal: Water Resources Management
Year: 2026
Date: 2026-01-01
Authors: Zihao Yang, Qing Dong, Xu Zhang, Hongyu Zhu, Zhetao Cheng
DOI: 10.1007/s11269-025-04420-2

Research Groups

State Key Laboratory of Water Resources Engineering and Management, Wuhan University, Wuhan, China
Department of Geography, University of Hong Kong, Hong Kong SAR, China
Institute for Climate and Carbon Neutrality, University of Hong Kong, Hong Kong SAR, China

Short Summary

This study proposes a two-stage Long Short-Term Memory (LSTM) framework, integrating the Xinanjiang (XAJ) hydrological model with Ensemble Empirical Mode Decomposition (EEMD) for error correction, to improve multi-step runoff forecasting accuracy and interpretability in the middle and lower Yangtze River Basin.

Objective

To develop a reliable and interpretable two-stage LSTM approach for multi-step runoff forecasting that addresses compatibility issues between hydrological processes and signal decomposition, thereby improving predictive accuracy while retaining physical interpretability.

Study Configuration

Spatial Scale: Middle and lower Yangtze River Basin, China, specifically Datong Station. The watershed spans 1.8 million square kilometers.
Temporal Scale: Daily runoff and meteorological data from 1992–2020. The dataset was divided into a warm-up period (1992), a training period (1993–2015), and a testing period (2016–2020). Forecast lead times ranged from 1 day to 7 days.

Methodology and Data

Models used:
- Conceptual Hydrological Models: Xinanjiang (XAJ), GR4J, SAC-SMA (for comparison and initial prediction)
- Machine Learning: Long Short-Term Memory (LSTM) network
- Signal Decomposition: Ensemble Empirical Mode Decomposition (EEMD) for error sequence decomposition
- Interpretability: SHapley Additive exPlanations (SHAP) for feature importance analysis
- Optimization: Particle Swarm Optimization (PSO) for hydrological model parameter calibration
- Error Correction: Fuzzy entropy and Kalman filtering for noise suppression in the EEMD stage
Data sources:
- Daily runoff data (1992–2020) from the Yangtze River Hydrological Network.
- Daily meteorological data (1992–2020) from 39 stations, obtained via the U.S. National Oceanic and Atmospheric Administration (NOAA).
- Reference evapotranspiration (ET0) calculated using the Hargreaves equation from daily mean, maximum, and minimum temperatures, and extraterrestrial radiation.
- Input features for the LSTM models included hydrological model outputs (PRE), dew point temperature (DEWP), maximum temperature (MAX), mean temperature (TEMP), minimum temperature (MIN), and reference evapotranspiration (ET0).

Main Results

The proposed EC-X-LSTM framework consistently demonstrated superior performance over standalone LSTM and traditional hydrological models (XAJ, GR4J, SAC-SMA) for multi-step runoff forecasting across lead times from 1 to 7 days.
For a 7-day lead time, EC-X-LSTM significantly improved performance metrics during testing: correlation coefficient (r) by +4.63%, Nash–Sutcliffe Efficiency (NSE) by +9.80%, Kling–Gupta Efficiency (KGE) by +2.93%, and Willmott Index (WI) by +2.27%. It also reduced Root Relative Mean Square Error (RRMSE) by –54.26% and Mean Absolute Percentage Error (MAPE) by –52.41%.
During the testing period, EC-X-LSTM achieved high accuracy for a 1-day lead time with r = 0.994, NSE = 0.988, KGE = 0.978, WI = 0.997, RRMSE = 0.051, and MAPE = 0.037. Even for long-term forecasts (7-day lead time), NSE and KGE remained above 0.95, indicating excellent stability and accuracy.
EEMD-based error modeling (EEMD-Pre) effectively improved error prediction accuracy, maintaining a correlation coefficient (r) of 0.66 even at a 7-day lead time, whereas direct error modeling (Pre) lost predictive capability (r = 0.02) at the same lead time.
SHAP analysis identified the hybrid model output (PRE), dew point temperature (DEWP), maximum temperature (MAX), mean temperature (TEMP), and reference evapotranspiration (ET0) as the top five predictors at Datong station. PRE was the most significant contributor for short-term forecasts (1-5 days), while MAX became the dominant input for longer-term forecasts (7 days).

Contributions

Proposed a novel hydrological–machine learning hybrid framework (EC-X-LSTM) that significantly enhances the predictive accuracy and robustness of multi-step runoff forecasting while maintaining physical interpretability.
Successfully integrated the hydrological model with signal decomposition techniques by using EEMD for residual error decomposition and correction, offering a new perspective for applying signal decomposition methods in hydrological modeling.
Utilized SHAP analysis to provide clear insights into feature contributions, elucidating the key drivers of runoff variability and enhancing the interpretability of the forecast model.

Funding

National Natural Science Foundation of China (No. 52279024, 52261145744)
Fundamental Research Funds for the Central Universities (No. 2042025kf0079)

Citation

@article{Yang2026Development,
  author = {Yang, Zihao and Dong, Qing and Zhang, Xu and Zhu, Hongyu and Cheng, Zhetao},
  title = {Development of a Two-Stage LSTM for Multi-Step Runoff Forecasting Using a XAJ Model and EEMD},
  journal = {Water Resources Management},
  year = {2026},
  doi = {10.1007/s11269-025-04420-2},
  url = {https://doi.org/10.1007/s11269-025-04420-2}
}

Original Source: https://doi.org/10.1007/s11269-025-04420-2