Elbeltagi et al. (2025) An interpretable machine learning approach based on SHAP, Sobol and LIME values for precise estimation of daily soybean crop coefficients
Identification
- Journal: Scientific Reports
- Year: 2025
- Date: 2025-10-21
- Authors: Ahmed Elbeltagi, Aman Srivastava, Xinchun Cao, Ali El Bilali, Ali Raza, Leena Khadke, Ali Salem
- DOI: 10.1038/s41598-025-20386-y
Research Groups
- College of Agricultural Science and Engineering, Hohai University, Nanjing, China
- Agricultural Engineering Department, Faculty of Agriculture, Mansoura University, Mansoura, Egypt
- Centre for Technology Alternatives for Rural Areas (CTARA), Indian Institute of Technology (IIT) Bombay, Mumbai, India
- Department of Bioresource Engineering, Faculty of Agricultural & Environmental Sciences, McGill University, Québec, Canada
- Faculty of Sciences and Techniques of Mohammedia, Hassan II University of Casablanca, Casablanca, Morocco
- School of Geography, Nanjing Normal University, Nanjing, P.R. China
- Department of Civil Engineering, Indian Institute of Technology (IIT) Bombay, Mumbai, India
- Civil Engineering Department, Faculty of Engineering, Minia University, Minia, Egypt
- Structural Diagnostics and Analysis Research Group, Faculty of Engineering and Information Technology, University of Pecs, Pecs, Hungary
Short Summary
This study developed and evaluated interpretable machine learning models for precise daily soybean crop coefficient (Kc) estimation in Upper Egypt, demonstrating that the Extra Tree model achieved the highest accuracy (r = 0.96, NSE = 0.93, RMSE = 0.05, MAE = 0.02) and consistently identified antecedent Kc and solar radiation as the most influential variables. The research provides a transparent framework for enhancing irrigation scheduling and sustainable water management in arid regions.
Objective
- To accurately predict daily soybean crop coefficients (Kc) in Upper Egypt using interpretable machine learning models (XGBoost, Extra Tree, Random Forest, CatBoost).
- To compare the ML model predictions with actual Kc values (calibrated CROPWAT) and evaluate model interpretability and consistency with physical processes using SHAP, Sobol, and LIME.
Study Configuration
- Spatial Scale: Suhaj Governorate, Upper Egypt.
- Temporal Scale: Daily data from 1979–2014 (36 years). Training period: 1979–2003. Validation period: 2004–2014.
Methodology and Data
- Models used:
- Machine Learning: Extreme Gradient Boosting (XGBoost), Extra Tree (ET), Random Forest (RF), CatBoost.
- Interpretability: SHapley Additive exPlanations (SHAP), Sobol sensitivity analysis, Local Interpretable Model-agnostic Explanations (LIME).
- Baseline/Reference: FAO-56 CROPWAT model (adjusted for local conditions).
- Data sources:
- Meteorological data (minimum, maximum, and average temperatures, relative humidity, wind speed, solar radiation): National Centers for Environmental Prediction (NCEP) and Climate Forecast System Reanalysis (CFSR).
- Crop coefficient (Kc) values: Derived from the FAO CROPWAT model and calibrated against lysimeter-based evapotranspiration (ETc) data and local observations.
Main Results
- The Extra Tree (ET) model demonstrated the highest predictive accuracy with a correlation coefficient (r) of 0.96, Nash–Sutcliffe model efficiency coefficient (NSE) of 0.93, Root Mean Square Error (RMSE) of 0.05, and Mean Absolute Error (MAE) of 0.02.
- XGBoost and Random Forest (RF) models also performed robustly (r = 0.96, NSE = 0.92, RMSE = 0.06, MAE = 0.02 for both), while CatBoost showed slightly lower accuracy (r = 0.95, NSE = 0.91, RMSE = 0.06, MAE = 0.02).
- SHAP and Sobol sensitivity analyses consistently identified the antecedent crop coefficient [Kc(d-1)] and solar radiation (Sin) as the most influential variables across all models, with Kc(d-1) contributing over 90% to the output variance.
- LIME results provided local interpretability, revealing dynamic crop-climate interactions and identifying critical thresholds for low Kc values, such as wind speed below 2.56 m/s and relative humidity above 24%.
- The interpretability analyses confirmed that the models' predictions align with established physical understanding of crop physiology, where prior soil moisture and growth stage transitions significantly drive Kc dynamics.
Contributions
- This study is the first to integrate SHAP, Sobol sensitivity analysis, and LIME within a unified interpretability framework for crop coefficient (Kc) modeling in arid agroecosystems, enabling robust validation of both predictive accuracy and physical relevance.
- It systematically quantifies feature importance at both global (SHAP, Sobol) and local (LIME) levels, bridging the gap between "black-box" machine learning outputs and agronomic process understanding.
- The research provides a transparent, scalable, and transferable approach for daily Kc estimation, validated against a 36-year dataset, which supports improved irrigation scheduling and sustainable water management in water-scarce regions.
- It offers actionable insights for irrigation optimization and climate adaptation by linking ML predictions to interpretable variables and identifying critical thresholds for effective water management.
Funding
Not explicitly stated in the provided text.
Citation
@article{Elbeltagi2025interpretable,
author = {Elbeltagi, Ahmed and Srivastava, Aman and Cao, Xinchun and Bilali, Ali El and Raza, Ali and Khadke, Leena and Salem, Ali},
title = {An interpretable machine learning approach based on SHAP, Sobol and LIME values for precise estimation of daily soybean crop coefficients},
journal = {Scientific Reports},
year = {2025},
doi = {10.1038/s41598-025-20386-y},
url = {https://doi.org/10.1038/s41598-025-20386-y}
}
Original Source: https://doi.org/10.1038/s41598-025-20386-y