Al-Taher et al. (2025) Optimizing cotton green water footprint prediction using hybrid machine learning algorithms: a case study of Al-Gezira state, Sudan
Identification
- Journal: Applied Water Science
- Year: 2025
- Date: 2025-11-14
- Authors: Rogaia H. Al-Taher, Mohamed E. Abuarab, A. Ahmed, Sarah A. Helalia, Elbashir A. Hammad, Ali Mokhtar
- DOI: 10.1007/s13201-025-02656-2
Research Groups
- Department of Agricultural Engineering, Faculty of Agriculture, Cairo University, Giza 12613, Egypt
- Department of Natural Resources, Faculty of African Postgraduate Studies, Cairo University, Giza 12613, Egypt
- Department of Agricultural Engineering, Faculty of Agriculture, University of Khartoum, Khartoum, Sudan
- School of Geographic Sciences Key Lab. of Geographic Information Science (Ministry of Education), East China Normal University, Shanghai, China
Short Summary
This study optimizes cotton green water footprint (GWFP) prediction in Al-Gezira state, Sudan, using hybrid machine learning algorithms (RF, XGBoost, SVR) with climatic and remote sensing data from 2001-2020, demonstrating that hybrid models significantly outperform single models in accuracy and error reduction.
Objective
- Assess the response of cotton green water footprint (GWFP) to climate change over a time series spanning 2001–2020.
- Utilize four remote sensing indices to examine their influence on cotton GWFP throughout the same time series.
- Develop and compare three single machine learning models (SVR, RF, and XGB) along with their hybrid combinations for cotton GWFP prediction.
- Identify the optimal model within the best scenario that achieves a high level of accuracy and minimal error in predicting the GWFP of cotton.
Study Configuration
- Spatial Scale: Al-Gezira State, Sudan, located between the Blue Nile and White Nile, south of Khartoum. Coordinates range from 13°30′ N to 15°30′ N latitude and 32°30′ E to 33°40′ E longitude, at an elevation of 406.65 m above sea level.
- Temporal Scale: 2001 to 2020 (20 years), with monthly climatic data and remote sensing data focused on the growing season (April to November).
Methodology and Data
- Models used:
- Single Machine Learning Models: Random Forest (RF), Extreme Gradient Boosting (XGBoost), Support Vector Regressor (SVR).
- Hybrid Machine Learning Models: RF-XGB, RF-SVR, XGB-SVR, RF-XGB-SVR.
- Data sources:
- Climatic data: Monthly minimum and maximum air temperatures (Tmax, Tmin, in °C), wind velocity (WS, in m s−1), relative humidity (RH, in %), and precipitation (P, in mm) from NASA website (0.5° x 0.5° resolution). Solar radiation (SR) from climate.northwestknowledge.net.
- Remote sensing data: Landsat 7 ETM+ and Landsat 8 OLI imageries (30 m spatial resolution, Data level 2) via Google Earth Engine (GEE) platform. Four vegetation indices: Enhanced Vegetation Index (EVI), Normalized Difference Vegetation Index (NDVI), Soil-Adjusted Vegetation Index (SAVI), and Normalized Difference Water Index (NDWI).
- Crop data: Cotton yield (t ha−1).
- Soil data: Digital soil map of the world (Calcaric Fluvisols, Chromic Vertisols, Pellic Vertisols).
Main Results
- Hybrid machine learning models (double or triple combinations) consistently achieved the highest R2 values (1.0 or 0.99) and lowest RMSE values across all scenarios, significantly outperforming single models.
- The RF-XGB-SVR hybrid model under Scenario 5 (effective precipitation and maximum temperature) yielded the lowest RMSE of 31.35 m3 t−1.
- Single models, particularly SVR and XGB under Scenario 3 (remote sensing indices only), showed the poorest performance with the lowest R2 values (0.0676 and 0.0767, respectively) and highest RMSE values (e.g., 337.61 m3 t−1 for XGB under Sc3).
- Scenario 3 (remote sensing indices only) consistently produced the lowest statistical results across all models, indicating a negligible positive impact of remote sensing indices alone on GWFP prediction.
- Climatic parameters, especially effective precipitation (Peeff), exhibited the most substantial positive correlation with GWFP (0.75), while maximum temperature (Tmax) showed the most pronounced negative correlation (-0.7).
- The XGB-SVR hybrid model with Scenario 3 displayed the lowest interquartile range (IQR) for GWFP error at 0.047, followed by RF-XGB-SVR with Scenario 3 at 0.052.
Contributions
- First study to integrate single and hybrid machine learning algorithms with multi-source data (climatic, agronomic, remote sensing) for cotton green water footprint (GWFP) prediction in Al-Gezira State, Sudan.
- Demonstrates the superior performance of hybrid ML models over single models for GWFP prediction, offering a more accurate and robust modeling approach.
- Identifies optimal input variable scenarios for GWFP prediction, highlighting the critical role of climatic parameters and the limited standalone impact of remote sensing indices.
- Provides a framework adaptable for GWFP prediction of other crops and supports strategic water resource management and climate change mitigation policies.
Funding
- Open access funding provided by The Science, Technology & Innovation Funding Authority (STDF) in cooperation with The Egyptian Knowledge Bank (EKB).
Citation
@article{AlTaher2025Optimizing,
author = {Al-Taher, Rogaia H. and Abuarab, Mohamed E. and Ahmed, A. and Helalia, Sarah A. and Hammad, Elbashir A. and Mokhtar, Ali},
title = {Optimizing cotton green water footprint prediction using hybrid machine learning algorithms: a case study of Al-Gezira state, Sudan},
journal = {Applied Water Science},
year = {2025},
doi = {10.1007/s13201-025-02656-2},
url = {https://doi.org/10.1007/s13201-025-02656-2}
}
Generated by BiblioAssistant using gemini-2.5-flash (Google API)
Original Source: https://doi.org/10.1007/s13201-025-02656-2