Guan et al. (2026) Interpretable machine learning workflow for estimating reference crop evapotranspiration in China's five major dry-wet regions using limited meteorological data
Identification
- Journal: Agricultural Water Management
- Year: 2026
- Date: 2026-01-15
- Authors: Ziyu Guan, Changhai Qin, Yong Zhao, Junlin Qu, Rong Liu, Yuan Liu, Wenxin Che, Tao Wang
- DOI: 10.1016/j.agwat.2026.110143
Research Groups
- State Key Laboratory of Water Cycle and Water Security, Beijing 100038, China
- China Institute of Water Resources and Hydropower Research, Beijing 100038, China
Short Summary
This study developed an interpretable machine learning workflow to accurately and transparently estimate reference crop evapotranspiration (ET0) in China's diverse dry-wet regions using limited meteorological data. The optimized XGBoost model (GWO-XGB) achieved superior accuracy and robust generalization, with SHAP analysis revealing solar radiation and extreme temperatures as primary ET0 predictors and their climate-specific influences.
Objective
- To construct heuristic algorithm-based machine learning models using feature selection to achieve precise ET0 estimation across China's climate zones.
- To evaluate the transpiration estimation performance of optimized interpretable machine learning models under limited meteorological data conditions.
- To demonstrate local interpretability to infer key climatic variables driving ET0 variations across different climate zones.
Study Configuration
- Spatial Scale: 2382 meteorological stations across five major dry-wet climatic zones in China, covering approximately 9.6 million square kilometers.
- Temporal Scale: Daily meteorological data spanning 63 years, from January 1960 to January 2022.
Methodology and Data
- Models used:
- Machine Learning (ML) algorithms: XGBoost, Random Forest (RF), Decision Tree (DT), LightGBM.
- Meta-heuristic optimization algorithms: Grey Wolf Optimizer (GWO), Particle Swarm Optimization (PSO), Artificial Rabbits Optimization (ARO), RUNge Kutta optimizer (RUN).
- Deep Learning models for comparison: Long Short-Term Memory (LSTM), Convolutional Neural Network (CNN), Gated Recurrent Unit (GRU), Bidirectional LSTM (BiLSTM), Bidirectional GRU (BiGRU).
- Empirical ET0 models for comparison: 12 traditional empirical models (including FAO-56 Penman-Monteith).
- Explainable AI (XAI) method: SHapley Additive exPlanations (SHAP).
- Feature selection: Light Gradient Boosting Machine (LGBM) regressor.
- Data sources:
- Daily meteorological data from 2382 stations across China, obtained from the National Meteorological Center (http://data.cma.cn).
- Nine predictive variables: precipitation (PRE), atmospheric pressure (PRU), mean air temperature (TMEAN), maximum temperature (Tmax), minimum temperature (Tmin), wind speed (WIN), relative humidity (RHU), sunshine duration (SSD), and solar radiation (RA).
- Target variable: Reference crop evapotranspiration (ET0) calculated using the Penman-Monteith (PM) equation.
Main Results
- The Grey Wolf Optimization (GWO) optimized XGBoost (GWO-XGB) model achieved the highest fitting accuracy with a Mean Absolute Error (MAE) of 0.087 mm, Root Mean Square Error (RMSE) of 0.116 mm, Nash-Sutcliffe Efficiency (NSE) of 0.993, Coefficient of Determination (R2) of 0.993, and Global Performance Index (GPI) of 1.783.
- Cross-validation across basins demonstrated that GWO-XGB maintained an R2 above 0.96 on independent validation datasets, indicating robust stability and generalization.
- SHAP analysis revealed that solar radiation (RA) and extreme temperatures (Tmax) were the primary predictors of ET0, while humidity (RHU) and wind speed (WIN) had lesser influences.
- The dominant driving factor of ET0 exhibited regional differentiation: maximum temperature (Tmax) dominated in arid (40.6% contribution), semi-humid (39.4%), and humid-semi-humid transition zones (29.5%), whereas solar radiation (RA) showed stronger explanatory power in humid (34.2%) and semi-arid zones (26.5%).
- SHAP values identified critical threshold phenomena, such as a significant linear positive correlation between temperature extremes and ET0 in arid and semi-arid regions (R2 > 0.95), and a consistent negative inhibitory effect of relative humidity across all climate zones (SHAP value range -1.5 to -0.4).
- Interaction effects showed a nonlinear synergistic enhancement between solar radiation and maximum temperature under high energy input conditions (RA > 30 MJ⋅m⁻²⋅d⁻¹, Tmax > 30 °C).
Contributions
- Developed an interpretable machine learning workflow that enhances the transparency of ET0 prediction, addressing the "black box" nature of traditional ML models.
- Demonstrated the superior accuracy and robust generalization capability of the GWO-XGB model for ET0 estimation across China's diverse climatic zones using limited meteorological data.
- Provided a systematic interpretability analysis using SHAP values, quantifying the relative contributions of meteorological factors and revealing their nonlinear interactions and regional differentiation in driving ET0.
- Identified climate-specific dominant drivers and threshold responses of ET0, offering valuable insights for scientific water resource management and precision irrigation in data-scarce regions.
- Released an open-source prediction application, facilitating the practical application and popularization of interpretable AI in agricultural hydrology and meteorology.
Funding
- The National Natural Science Foundation of China (Grant Nos. 52239004, 52025093).
Citation
@article{Guan2026Interpretable,
author = {Guan, Ziyu and Qin, Changhai and Zhao, Yong and Qu, Junlin and Liu, Rong and Liu, Yuan and Che, Wenxin and Wang, Tao},
title = {Interpretable machine learning workflow for estimating reference crop evapotranspiration in China's five major dry-wet regions using limited meteorological data},
journal = {Agricultural Water Management},
year = {2026},
doi = {10.1016/j.agwat.2026.110143},
url = {https://doi.org/10.1016/j.agwat.2026.110143}
}
Original Source: https://doi.org/10.1016/j.agwat.2026.110143