Zha et al. (2026) Machine learning-based precipitation dataset for the Yarlung Zangbo River Basin: Generation, evaluation, and environmental factor analysis
Identification
- Journal: Journal of Hydrology Regional Studies
- Year: 2026
- Date: 2026-04-03
- Authors: Hang Zha, F. Zhang, Xiaonan Shi, Yuxuan Xiang, Xuelong Chen, H. Zhang, Yang Zhao, Jun Zhu
- DOI: 10.1016/j.ejrh.2026.103387
Research Groups
- State Key Laboratory of Tibetan Plateau Earth System, Environment and Resources (TPESER), Institute of Tibetan Plateau Research, Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
- State Key Laboratory of Mountain Hazards and Engineering Resilience, Key Laboratory of Mountain Hazards and Earth Surface Process, Institute of Mountain Hazards and Environment, Chinese Academy of Sciences, Chengdu, China
- State Key Laboratory of Efficient Utilization of Agricultural Water Resources, China Agricultural University, Beijing, China
- Faculty of Geosciences and Environmental Engineering, Southwest Jiaotong University, Chengdu, China
Short Summary
This study developed a machine learning framework to merge multiple precipitation products and environmental variables, generating a high-precision precipitation dataset (MMPD) for the Yarlung Zangbo River Basin. The MMPD significantly improved precipitation accuracy across multiple timescales, and an interpretable analysis identified key environmental factors and their nonlinear thresholds influencing precipitation intensity.
Objective
- To develop a machine learning-based framework to fuse multiple precipitation products and environmental variables to enhance precipitation data accuracy in the Yarlung Zangbo River Basin (YZRB).
- To comparatively evaluate the performance of the newly generated precipitation product (MMPD) against existing products across multi-timescale accuracy and extreme precipitation representation.
- To analyze the impact of environmental variables on precipitation intensity using the SHAP method, identifying key variables and their nonlinear threshold effects.
Study Configuration
- Spatial Scale: Yarlung Zangbo River Basin (YZRB), Southern Tibetan Plateau, covering approximately 240,000 km². All data were aggregated or interpolated to a uniform 0.25° grid.
- Temporal Scale: 1983 to 2020 (38 years) for model training and evaluation. Performance was assessed at daily, monthly, and annual scales.
Methodology and Data
- Models used:
- Machine Learning for fusion: Random Forest (RF), Gradient Boosting Decision Trees (GBDT), and eXtreme Gradient Boosting (XGBoost) were evaluated for both precipitation classification (wet/dry days) and regression (precipitation intensity). RF was selected for classification, and GBDT for regression.
- Interpretability: Shapley Additive Explanations (SHAP) framework was used to quantify the influence of environmental factors.
- Data sources:
- Gauge Observations: Daily rainfall records from 22 rain gauges (15 from China Meteorological Administration (CMA) and 7 from the Yarlung Zangbo Grand Canyon (YZGC) network).
- Multiple Precipitation Products (MPPs):
- CN05.1 (Station interpolation product, 0.25° spatial resolution)
- ERA5-Land (Reanalysis product, 0.1° spatial resolution)
- TPMFD (Reanalysis product, 1/30° spatial resolution)
- CHIRPS (Remote sensing satellite product, 0.05° spatial resolution)
- Environmental Variables (Covariates):
- Digital Elevation Model (DEM) (90 m resolution, from Shuttle Radar Topography Mission)
- Longitude, Latitude
- Slope (extracted from DEM)
- Wind speed (from CN05.1, 0.25° spatial resolution, daily)
- Relative humidity (from CN05.1, 0.25° spatial resolution, daily)
- Cloud cover (from ERA5, 0.25° spatial resolution, hourly)
- Soil moisture (from ERA5-Land, 0.1° spatial resolution, hourly)
Main Results
- The Multiple-Product Merged Precipitation Dataset (MMPD) significantly enhanced precipitation accuracy compared to individual products across all timescales. Relative improvements in Modified Kling-Gupta Efficiency (KGE) were 60% at the daily scale (compared to CN05.1), 23% at the monthly scale (compared to TPMFD), and 11% at the annual scale (compared to TPMFD).
- MMPD also demonstrated superior performance in capturing extreme precipitation events across nine established indices (PRCPTOT, SDII, RX1day, RX5day, R95pTOT, R99pTOT, R10, CWD, CDD), showing ratios to observed values close to 1 and lower spatial dispersion.
- SHAP analysis identified longitude, relative humidity, and cloud cover as the most influential environmental variables in predicting precipitation intensity, followed by latitude, soil moisture, slope, elevation, and wind speed.
- Key nonlinear thresholds associated with enhanced precipitation intensity were statistically inferred: longitude greater than 94.12°, relative humidity less than 71.34%, cloud cover less than 0.84, latitude greater than 29.62°, soil moisture less than 0.37 m³/m³, slope greater than 25.08°, elevation less than 4350.00 m, and wind speed less than 3.08 m/s.
Contributions
- Developed an innovative interpretable machine learning framework that integrates geospatial feature engineering for fusing multiple precipitation datasets in complex, high-altitude terrain.
- Generated a high-accuracy precipitation dataset (MMPD) specifically for the Yarlung Zangbo River Basin, addressing limitations of existing products in this data-scarce region.
- Applied the SHAP method to quantitatively uncover and explain the nonlinear threshold effects of various environmental variables on precipitation intensity, providing mechanistic insights into regional precipitation patterns.
- Offered an interpretable paradigm for machine learning-driven meteorological attribution, providing practical and theoretical guidance for developing high-precision precipitation products in mountainous river basins globally.
Funding
- National Natural Science Foundation of China (Grant No. 42125104 and 42430506).
Citation
@article{Zha2026Machine,
author = {Zha, Hang and Zhang, F. and Shi, Xiaonan and Xiang, Yuxuan and Chen, Xuelong and Zhang, H. and Zhao, Yang and Zhu, Jun},
title = {Machine learning-based precipitation dataset for the Yarlung Zangbo River Basin: Generation, evaluation, and environmental factor analysis},
journal = {Journal of Hydrology Regional Studies},
year = {2026},
doi = {10.1016/j.ejrh.2026.103387},
url = {https://doi.org/10.1016/j.ejrh.2026.103387}
}
Original Source: https://doi.org/10.1016/j.ejrh.2026.103387