Zheng et al. (2025) GDROM v2: An Inventory of Operation Variables Time Series and Rules for 2,017 Large Reservoirs across the CONUS
Identification
- Journal: Scientific Data
- Year: 2025
- Date: 2025-11-28
- Authors: Zihan Zheng, Ximing Cai, L.-X. Zhang, James Li, Yanan Chen
- DOI: 10.1038/s41597-025-06162-7
Research Groups
- Department of Civil and Environmental Engineering, University of Illinois Urbana-Champaign, Urbana, IL, USA.
- School of Environmental Science and Engineering, Southern University of Science and Technology, Shenzhen, China.
Short Summary
This paper introduces GDROM v2, a nationwide dataset for 2,017 large reservoirs across the Contiguous United States (CONUS), providing daily time series of inflow, release, and storage, along with operation rules derived through data fusion and transfer learning. The dataset addresses the scarcity of complete reservoir operation records, offering a comprehensive resource for hydrological modeling and water management studies.
Objective
- To address the lack of complete daily operation records and realistic operation rules for most reservoirs, which constrains their representation in large-scale hydrological models.
- To present GDROM v2, a nationwide dataset providing daily time series of inflow, release, and storage variables, and operation rules for 2,017 reservoirs across the Contiguous United States (CONUS).
Study Configuration
- Spatial Scale: Contiguous United States (CONUS), covering 2,017 large reservoirs (surface area greater than 1 square kilometer).
- Temporal Scale:
- Daily operation variables time series: 1980–2024 (inflow/release for 1002 reservoirs), 2016–2024 (storage for 1590 reservoirs).
- Operation rules: Derived for the period covered by the time series.
Methodology and Data
- Models used:
- GDROM (Generic Data-Driven Reservoir Operation Models) framework.
- Hidden Markov Decision Tree (HMDT) for release simulation modules.
- Classification and Regression Tree (CART) for module application conditions.
- Data fusion methods for compiling time series.
- Transfer learning strategy for data-limited reservoirs.
- Analogous Reservoir Identification Method (based on operation purpose, geographical distance, storage capacity).
- Data sources:
- Existing datasets: Original GDROM (452 reservoirs), ResOpsUS (679 reservoirs), USACE WM (488 reservoirs).
- Reconstruction data: U.S. Geological Survey (USGS) streamflow gage stations (daily discharge records from 16,902 stations, 1980–2024), SARAH-CONUS (satellite-derived surface area sub-week time series for 1,900 reservoirs, 2016–2023), GRDL (bathymetric profiles for 7,250 global reservoirs), GRanD v1.3 (reservoir attributes, georeferenced polygons), NID database (dam attributes), Lake-TopoCat (global lake drainage topology and catchment database), NHDPlusV2 (high-resolution river network geometry), GRS (global reservoir monthly storage estimates, 1999–2018), PDSI (monthly drought severity index, 1980–2024).
- Validation data: ResOpsUS observed records (excluding USGS-provided records).
Main Results
- GDROM v2 provides daily time series of inflow, release, and storage, and operation rules for 2,017 large reservoirs across the CONUS.
- Reconstructed variable time series (inflow, release, storage) reasonably agree with observations:
- Reconstructed release series demonstrate excellent performance (over 90% Kling–Gupta Efficiency (KGE) values greater than 0.75, Pearson’s Correlation Coefficient (PCC) exceeding 0.9, Nash–Sutcliffe Efficiency (NSE) values above 0.8).
- Inflow reconstruction shows satisfactory performance (approximately 80% PCC values greater than 0.8, around 60% KGE and NSE values exceeding 0.5).
- Storage reconstruction demonstrates strong consistency in temporal trends (PCC generally exceeding 0.8) but greater uncertainty in reproducing the absolute magnitude of observed storage.
- Derived operation rules (GDROMs) outperform existing benchmark models (HANA and WISS) across diverse data availability conditions (data-rich, data-limited, data-missing).
- GDROMs for Data-Rich Reservoirs (Res-R) perform best, achieving low errors and tight distributions.
- GDROMs for Data-Limited Reservoirs (Res-L), fine-tuned with only two years of data, deliver consistently strong performance and generally outperform benchmark models.
- GDROMs for Data-Missing Reservoirs (Res-M), derived solely from reference reservoirs via transfer learning, achieve performance generally comparable to benchmark models trained on complete time series.
- The GDROM v2 dataset is publicly available via Hydroshare.org.
Contributions
- Provides GDROM v2, the largest and most comprehensive collection of reservoir operation variables time series and realistic operation rules for 2,017 reservoirs across the CONUS.
- Introduces a robust data fusion method to compile daily inflow, storage, and release from multiple sources, effectively overcoming data scarcity.
- Develops and applies a transfer learning strategy to derive operation rules for data-limited and data-missing reservoirs, significantly expanding the applicability of data-driven reservoir modeling.
- Offers a critical resource for improving streamflow simulation in large-scale hydrological and land surface models, enhancing water management studies, and enabling the development of more robust models.
- The dataset and associated scripts are openly shared under a Creative Commons Attribution (CC BY 4.0) license, promoting reproducibility and customization.
Funding
- Cooperative Institute for Research to Operations in Hydrology (CIROH)
- NOAA Cooperative Institute Program (Award NA22NWS4320003)
Citation
@article{Zheng2025GDROM,
author = {Zheng, Zihan and Cai, Ximing and Zhang, L.-X. and Li, James and Chen, Yanan},
title = {GDROM v2: An Inventory of Operation Variables Time Series and Rules for 2,017 Large Reservoirs across the CONUS},
journal = {Scientific Data},
year = {2025},
doi = {10.1038/s41597-025-06162-7},
url = {https://doi.org/10.1038/s41597-025-06162-7}
}
Original Source: https://doi.org/10.1038/s41597-025-06162-7