Gao et al. (2026) Reconstruction of global long-term daily streamflow dataset using machine learning models for revealing streamflow changes
Identification
- Journal: Journal of Hydrology Regional Studies
- Year: 2026
- Date: 2026-01-19
- Authors: Yingying Gao, Zengliang Luo, Huan Liu, Lunche Wang, Xi Chen, Huan Li
- DOI: 10.1016/j.ejrh.2026.103148
Research Groups
- State Key Laboratory of Water Cycle and Water Security in River Basin, China Institute of Water Resources and Hydropower Research, Beijing, China
- Hubei Key Laboratory of Regional Ecology and Environmental Change, State Key Laboratory of Biogeology and Environmental Geology, China University of Geosciences, Wuhan, China
- State Key Laboratory of Remote Sensing Science, Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing, China
- HUN-REN Balaton Limnological Research Institute, Hungary
Short Summary
This study reconstructs a high-precision, long-term global daily streamflow dataset for 314 major watersheds (1980–2020) using an ensemble of machine learning models to address data gaps. The reconstructed data reveal diverse spatio-temporal streamflow trends, including significant increases in African basins and decreases in South America and Australia, and highlight ENSO's regulatory role.
Objective
- To evaluate the performance and limitations of four machine learning models (Random Forest, Gradient Boosting Decision Tree, XGBoost, and LightGBM) in reconstructing global streamflow data through cross-validation.
- To construct a long-term, high-precision daily-scale streamflow dataset for major global river basins by selecting the optimal single model or multi-model fusion result.
- To assess the changing characteristics and evolution trends of streamflow in major global basins, including long-term trends in streamflow and extreme flows, and the modulating effects of large-scale climate factors like ENSO.
Study Configuration
- Spatial Scale: 314 major global watersheds, focusing on streamflow reconstruction at basin outlets.
- Temporal Scale: Daily scale, from 1980 to 2020 (41 years).
Methodology and Data
- Models used: Random Forest (RF), Gradient Boosting Decision Tree (GBDT), Extreme Gradient Boosting (XGBoost), Light Gradient Boosting Machine (LightGBM). Multi-model ensemble using Inverse Error Variance Weighting (IEVW).
- Data sources:
- Streamflow: Global Runoff Data Centre (GRDC) database (station data).
- Meteorological: Global Surface Summary of the Day (GSOD) for daily precipitation (P, in mm/day), daily mean temperature (T, in °C), and daily mean wind speed (WS, in m/s). The Global Land Evaporation Amsterdam Model (GLEAM) for daily potential evapotranspiration (PET, in mm/day), daily surface soil moisture (SMs, in mm/day), and daily root-zone soil moisture (SMrz, in mm/day).
- Environmental: GIMMS for Normalized Difference Vegetation Index (NDVI); ETOPO 2022 for Digital Elevation Model (DEM, in m).
Main Results
- Model Performance: LightGBM demonstrated the optimal balance of prediction accuracy and computational efficiency, achieving median test set metrics of Spearman's rank correlation coefficient (ρ) = 0.829, Nash-Sutcliffe efficiency (NSE) = 0.666, Kling-Gupta efficiency (KGE) = 0.707, and percent bias (PBIAS) = -0.0378. GBDT achieved the best median KGE (0.7133). Models performed better in larger basins and in humid tropical, continental, and polar climate zones, with weaker performance in arid regions.
- Reconstruction Accuracy: The final reconstructed streamflow dataset, generated by selecting the best-performing single model or multi-model fusion for each basin, achieved high accuracy with median ρ = 0.9169, NSE = 0.8909, KGE = 0.8435, and PBIAS = 0.0001.
- Long-term Streamflow Trends (1980–2020): Globally, 52% of basins showed an upward trend, 37% a downward trend, and 11% no significant trend. African river basins exhibited the highest proportion of significant increases (75%), while South America and Australia showed significant declines in 58% and 59% of their basins, respectively.
- Extreme Streamflow Trends: For maximum 1-day and 3-day cumulative streamflow, 10% and 12% of basins showed increasing trends, respectively, while 22% and 24% showed decreasing trends. For minimum 7-day cumulative streamflow, 23% of basins showed an upward trend, and 11% showed a downward trend. Overall, extreme streamflow characteristics tended to weaken globally, while low-flow and mean flow characteristics showed increasing trends in more regions.
- ENSO Influence: The El Niño-Southern Oscillation (ENSO) significantly modulates streamflow variability, particularly in tropical regions. El Niño events are associated with reduced streamflow in the Amazon basin, Southeast Asia, Australia, and southern Africa, and increased streamflow in southeastern North America, southeastern coastal South America, and equatorial East Africa. La Niña events lead to drier conditions in regions like the western United States and central Asia.
- Trend Reliability: A cross-validation with observed data showed an 87% consistency rate in long-term trend classification between reconstructed and observed streamflow for basins with sufficient data.
Contributions
- Developed a novel, high-precision, long-term (1980–2020) global daily-scale streamflow reconstruction dataset for 314 major river basins, effectively addressing widespread data gaps.
- Systematically evaluated and optimized the application of four machine learning models (RF, GBDT, XGBoost, LightGBM) and a multi-model ensemble strategy for global streamflow reconstruction at basin outlets.
- Provided new hydrological insights into the spatio-temporal evolution patterns of global streamflow and extreme flows, including regional trends and the significant modulating effects of large-scale climate factors like ENSO.
- Offers a crucial data foundation for high-resolution hydrological process analysis, water resource management, flood and drought risk assessment, and climate change impact studies.
Funding
- National Natural Science Foundation of China (No. 52394233)
- National Key R&D Program (2021YFC3200203)
- Young Elite Scientists Sponsorship Program by CAST (2023QNRC001)
Citation
@article{Gao2026Reconstruction,
author = {Gao, Yingying and Luo, Zengliang and Liu, Huan and Wang, Lunche and Chen, Xi and Li, Huan},
title = {Reconstruction of global long-term daily streamflow dataset using machine learning models for revealing streamflow changes},
journal = {Journal of Hydrology Regional Studies},
year = {2026},
doi = {10.1016/j.ejrh.2026.103148},
url = {https://doi.org/10.1016/j.ejrh.2026.103148}
}
Original Source: https://doi.org/10.1016/j.ejrh.2026.103148