Atreya et al. (2026) Towards an improved language for river data analysis: Demonstration for the highly-regulated Ohio River basin
Identification
- Journal: Environmental Modelling & Software
- Year: 2026
- Date: 2026-01-14
- Authors: Gaurav Atreya, Todd Steissberg, Drew C. McAvoy, Xi Chen, Patrick Ray
- DOI: 10.1016/j.envsoft.2026.106866
Research Groups
- Department of Chemical and Environmental Engineering, University of Cincinnati, Cincinnati, OH, United States of America
- U.S. Army Engineer Research and Development Center (ERDC), Davis, CA, United States of America
- Department of Geography, University of Cincinnati, Cincinnati, OH, United States of America
Short Summary
This paper introduces the Network Analysis and Data Integration (NADI) System, a software tool with a GIS component and a Domain Specific Language (DSL), for analyzing river data with upstream/downstream relationships. A case study of the Ohio River basin demonstrates NADI's utility for large-scale metadata analysis, revealing that only 35% of USGS streamflow gages in the basin are unaffected by upstream dams, accounting for merely 1.2% of the measured streamflow.
Objective
- To present the Network Analysis and Data Integration (NADI) System for extracting, organizing, analyzing, and visualizing river data with upstream/downstream relationships.
- To demonstrate NADI's capabilities through a case study of the highly-regulated Ohio River basin, focusing on large-scale metadata analysis based on river connections.
Study Configuration
- Spatial Scale: Ohio River basin, United States. The analysis involved approximately 4181 dams and 1806 USGS gages upstream of USGS station 03399800 (Ohio River at Smithland Lock and Dam).
- Temporal Scale: Historical streamflow records and dam construction years, with data from 1950 up to 2023 for active gages.
Methodology and Data
- Models used:
- Network Analysis and Data Integration (NADI) System (Free and Open Source Software - FOSS)
- NADI GIS tool (for network detection and file import/export)
- NADI Domain Specific Language (DSL) (for network analysis)
- R-Tree data structure (for storing stream points)
- Geospatial Data Abstraction Library (GDAL)
- Data sources:
- USGS National Hydrography Dataset (NHDPlus Streams)
- USGS Gage Locations (indexed to NHDPlus streamlines)
- USGS Gage Streamflow (Water Data for the Nation)
- National Inventory of Dams (NID Dam Locations and Attributes)
Main Results
- The NADI System successfully processed approximately 4181 dams and 1806 USGS gages in the Ohio River basin, demonstrating its suitability for large-scale metadata analysis.
- Network detection achieved connections between 1806 USGS gages with 40 errors and between 5987 combined gages and dams with 469 errors; with a 2.5% tolerance, 8 gage errors and 443 dam errors were identified, often related to basin area discrepancies (e.g., node 6 with 3000.0 square kilometers basin area, less than its inputs' sum of 3825.4 square kilometers).
- Metadata analysis revealed that over 46% of the 1242 USGS gages with streamflow records were installed after at least one dam was constructed upstream.
- Only 35% of USGS gages currently active in the Ohio River basin are registering "natural flow" (i.e., flow unaffected by upstream dams).
- These unaffected gages account for only 1.2% of the total measured streamflow in the Ohio River basin, indicating a significant scarcity of natural streamflow data, particularly in larger streams.
- Network-based data filling techniques (input ratio and output ratio methods) generally outperformed linear interpolation, forward fill, and backward fill for streamflow data imputation, especially when the input ratio (𝑟𝑖) was between 1 and 1.2, or the output ratio (𝑟𝑜) was greater than 0.6.
- Analysis considering only large dams (> 15 meters height, or 5–15 meters height and > 3 x 10^6 cubic meters storage) still showed that the majority of streamflow data is affected by upstream dams when weighted by long-term mean streamflow.
Contributions
- Development and presentation of the NADI System, a novel open-source software framework combining GIS utilities and a Domain Specific Language (DSL) specifically designed for efficient river network analysis, data integration, and visualization.
- NADI provides a concise, declarative syntax that mirrors hydrologic conceptual models, accelerating research cycles and improving reproducibility compared to general-purpose programming languages.
- It offers automated methods for network detection, metadata analysis (e.g., counting upstream dams/gages, determining dam-affected years), and timeseries data imputation using network connectivity, which traditionally require manual inspection.
- The system's extensibility through compiled and executable plugins allows for custom analysis and integration with other tools, enhancing its adaptability to diverse research requirements.
- The case study in the Ohio River basin provides a quantitative assessment of the impact of dams on streamflow gage records, revealing a significant scarcity of natural flow data (only 1.2% of total measured streamflow is unaffected by upstream dams).
- NADI contributes to addressing the challenge of repetitive data organization and error-correction tasks in hydrological modeling, enabling researchers to focus more on higher-level analysis and model development.
Funding
- US Department of Defense through the Intelligent Environmental Battlefield Awareness (IEBA) Advanced Applied Technology Program, Hydrology Mapping (PE 622182CX3 and 633042CX7)
- US Army Corps of Engineers (USACE) Engineer Research and Development Center (ERDC) Federal Award Number W912HZ-24-2-0049, entitled Network Analysis and Data Integration (NADI) System.
Citation
@article{Atreya2026Towards,
author = {Atreya, Gaurav and Steissberg, Todd and McAvoy, Drew C. and Chen, Xi and Ray, Patrick},
title = {Towards an improved language for river data analysis: Demonstration for the highly-regulated Ohio River basin},
journal = {Environmental Modelling & Software},
year = {2026},
doi = {10.1016/j.envsoft.2026.106866},
url = {https://doi.org/10.1016/j.envsoft.2026.106866}
}
Original Source: https://doi.org/10.1016/j.envsoft.2026.106866