Xie et al. (2026) Deep reinforcement learning for long-horizon reservoir operation: Temporal horizon, state representation, and hydrological data synthesis

Identification

Journal: Journal of Hydrology
Year: 2026
Date: 2026-03-28
Authors: Zaichao Xie, Minglei Ren, Wei Xu, Te Zhang, Bing Zhu, Dong Li, Shuncai Zhang, Junbo Wang
DOI: 10.1016/j.jhydrol.2026.135421

Research Groups

College of River and Ocean Engineering, Chongqing Jiaotong University, Chongqing, China
China Institute of Water Resources and Hydropower Research, Beijing, China
Three Gorges Digital Intelligence Institute, China Three Gorges University, Yichang, China
Information Center (Hydrology Monitor and Forecast Center), Ministry of Water Resources, Beijing, China
Chongqing Shipping Construction and Development Group Co., Ltd., Chongqing, China

Short Summary

This study develops a Deep Reinforcement Learning (DRL) framework for long-horizon reservoir operation, systematically evaluating the impact of episode length, state representation, and synthetic hydrological data. It finds that a 4-year episode length, two-dimensional periodic date encoding, and extreme-enhanced synthetic inflows significantly improve policy performance, stability, and robustness for the Three Gorges Reservoir.

Objective

To develop a Deep Reinforcement Learning (DRL) environment tailored to long-horizon reservoir operation and systematically evaluate the optimal design for episode horizons, seasonality-aware state representations, and inflow data augmentation to improve policy performance and robustness under nonstationary hydroclimatic conditions.

Study Configuration

Spatial Scale: Three Gorges Reservoir (TGR) on the Yangtze River, controlling a catchment area of approximately 1.0 million square kilometers.
Temporal Scale: Daily operation (time step Δ𝑡 = 86,400 s). Episode lengths of 1, 2, 4, 6, and 8 years were tested. The effective discount horizon was approximately 200 days.

Methodology and Data

Models used:
- Deep Reinforcement Learning (DRL) framework: Actor-Critic architecture with Proximal Policy Optimization (PPO) algorithm.
- Data synthesis: Four-stage pipeline integrating STL (seasonality-trend-remainder) decomposition, autoregressive (AR) modeling, Markov year-type transitions, and extreme event enhancement (power transformation for high flows, quantile mapping for low flows).
Data sources:
- Historical daily inflow series (60 years, 1965–2024) from the Yangtze Water Resources Commission.
- Synthesized daily inflow data (200 years) using the proposed STL-AR-Markov scheme, with and without explicit extreme event enhancement.

Main Results

For the Three Gorges Reservoir, a 4-year episode length achieved the optimal balance between training efficiency and value-estimation stability, resulting in superior final reward, decision regularity, seasonal consistency, and training stability.
Two-dimensional periodic date encoding (sine-cosine pair) was identified as a critical breakthrough in state representation, yielding an 85.5% performance improvement over the baseline without date encoding by enabling the agent to learn and manifest standard seasonal operation patterns.
The proposed STL-autoregressive-Markov synthetic inflow scheme generated large-scale training data (200 years) that closely preserved key multi-scale hydrological statistics. Policies trained on extreme-enhanced synthetic data demonstrated superior risk resilience, maintaining higher year-end water levels (173.07 m versus 172.44 m for historical data) and increasing hydropower generation by 0.32% under consecutive dry years, thereby improving robustness under changing climate conditions.

Contributions

Systematically examined the impact of temporal credit assignment by evaluating five episode-length configurations within a unified actor-critic framework, identifying an effective training horizon for multi-year optimization.
Conducted a state-space ablation study across five information configurations to quantify the independent contribution of each state dimension to policy learning, convergence behavior, and attainable performance, highlighting the critical role of date encoding.
Developed an STL-autoregressive-Markov synthetic inflow generation scheme with explicit extreme-event enhancement, which generates large-scale training data to improve policy generalization while preserving key multi-scale hydrological statistics.

Funding

National Key R&D Program of China (2024YFC3212800, 2023YFC3006605)
Chongqing Transportation Science and Technology Project (CQJT-CZKJ2025–08)
Chongqing Jiaotong University Graduate Research Innovation Project (2025B0018)

Citation

@article{Xie2026Deep,
  author = {Xie, Zaichao and Ren, Minglei and Xu, Wei and Zhang, Te and Zhu, Bing and Li, Dong and Zhang, Shuncai and Wang, Junbo},
  title = {Deep reinforcement learning for long-horizon reservoir operation: Temporal horizon, state representation, and hydrological data synthesis},
  journal = {Journal of Hydrology},
  year = {2026},
  doi = {10.1016/j.jhydrol.2026.135421},
  url = {https://doi.org/10.1016/j.jhydrol.2026.135421}
}

Original Source: https://doi.org/10.1016/j.jhydrol.2026.135421