Nouri et al. (2025) Mitigating crop modeling uncertainties through machine learning in drylands
Identification
- Journal: Scientific Reports
- Year: 2025
- Date: 2025-11-28
- Authors: Milad Nouri, Shadman Veysi
- DOI: 10.1038/s41598-025-26811-6
Research Groups
Soil and Water Research Institute, Agricultural Research, Education and Extension Organization (AREEO), Karaj, Iran.
Short Summary
This study developed a novel machine learning (ML)-based clustering–unbiasing–ensembling framework to improve the reliability of gridded meteorological data for the CSM-CERES-Wheat crop model in data-scarce drylands of Iran. The framework, particularly when correcting all meteorological variables, significantly enhanced wheat yield and water stress simulations in approximately 60% of cases, outperforming classical methods.
Objective
- To develop and evaluate a novel machine learning-based clustering–unbiasing–ensembling framework for bias correction and multi-ensembling of gridded meteorological datasets to improve dry farming wheat yield and water stress simulations using the CSM-CERES-Wheat model in data-scarce drylands of western Iran.
Study Configuration
- Spatial Scale: Western and northern regions of Iran, covering 104 sites, with a digital elevation model (DEM) at 30-arc-second spatial resolution.
- Temporal Scale: Daily meteorological data (precipitation, minimum and maximum temperature, solar radiation) for a historical period, with specific start/end dates not explicitly provided.
Methodology and Data
- Models used:
- Crop Model: Cropping System Model-Crop Environment Resource Synthesis-Wheat (CSM-CERES-Wheat), integrated into Decision Support System for Agro-technology Transfer (DSSAT v4.8.2).
- Machine Learning Algorithms: Random Forest (RF), Light Gradient Boosting Machine (LGBM), Extreme Gradient Boosting (XGBM), Support Vector Machines (SVM).
- Clustering and Dimensionality Reduction: K-means clustering, Principal Component Analysis (PCA).
- Data sources:
- Observation Data: Daily minimum and maximum temperature (Tmin, Tmax), solar radiation (SR), and precipitation (mm) for 104 sites from the Iran Meteorological Organization (IRIMO).
- Gridded Meteorological Data: ERA5-Land (ERA5L), Climate Forecast System (CFS), IMERG (Integrated Multi-satellite Retrievals for GPM), PERSIANN-CDR (Precipitation Estimation from Remotely Sensed Information using Artificial Neural Networks—Climate Data Record), CHIRPS (Climate Hazards Group InfraRed Precipitation with Station Data).
- Soil Data: Silt and clay contents from SoilGrids250m version 2.0.
- Ancillary Data: Digital Elevation Model (DEM) at 30-arc-second spatial resolution from the USGS website.
Main Results
- Original gridded meteorological datasets performed poorly for daily precipitation (average Nash–Sutcliffe Efficiency (NSE) < 0.5, Root Mean Square Error-observations standard deviation ratio (RSR) > 70%, Normalized Root Mean Square Error (nRMSE) and Absolute Mean Relative Error (AMRE) > 30%). ERA5-Land showed relatively better performance for maximum temperature (Tmax) (NSE = 0.88) and solar radiation (SR) (NSE = 0.77).
- Light Gradient Boosting Machine (LGBM) and Random Forest (RF) were the best-performing ML algorithms for ensembling and unbiasing meteorological datasets. RF performed best for precipitation, while LGBM excelled for Tmin and Tmax, and both performed well for SR depending on the cluster.
- ML-based multi-ensembling significantly improved the accuracy of meteorological inputs:
- Average NSE for precipitation increased from 0.11–0.17 to 0.47.
- Average NSE for Tmax increased from 0.68–0.88 to 0.96.
- Average NSE for Tmin increased from 0.75–0.76 to 0.89.
- Average NSE for SR increased from 0.56–0.77 to 0.90.
- Correcting individual meteorological variables had limited impact on crop model performance (e.g., precipitation-only correction improved reliable yield simulations to 4.8–10.6% of cases).
- The "TOTAL" scenario, which combined all corrected meteorological variables (precipitation, temperature, and solar radiation), significantly improved dry farming wheat yield simulations, achieving reliable results in approximately 60% of cases.
- The "TOTAL" scenario also produced reliable Soil Water Stress Factor (SWFAC) simulations in approximately 64.7% of cases.
- ML-based unbiasing-ensembling substantially outperformed the conventional linear scaling and equal-weight averaging method, which achieved reliable yield modeling in only about 33% of cases and failed to capture temporal variability.
- Bootstrapping analysis confirmed the robustness and reliability of the ML-based clustering–unbiasing–ensembling approach, with low Relative Confidence Interval Width (RCIW < 30%) for the TOTAL scenario, supporting its transferability.
- ML-corrected data more accurately captured the probability density distributions of SR, Tmin, and Tmax during critical wet days (March–April–May) compared to raw gridded data, particularly in representing cloudiness patterns affecting SR.
Contributions
- Introduces a novel clustering–unbiasing–ensembling framework leveraging machine learning for comprehensive meteorological data correction in process-based crop modeling.
- Demonstrates the superior performance of tree-based ML algorithms (Random Forest and Light Gradient Boosting Machine) over conventional statistical methods for bias correction and multi-ensembling of gridded meteorological data, especially for precipitation.
- Highlights the critical importance of a holistic correction approach (ensembling all key meteorological variables simultaneously) to maintain physical consistency and significantly enhance the accuracy of crop yield and water stress simulations in data-scarce drylands.
- Provides a robust and transferable framework, validated by bootstrapping analysis, for improving climate data reliability and supporting decision-making in susceptible dryland agricultural systems under climate extremes.
Funding
The authors acknowledge the Soil and Water Research Institute and the Iran Meteorological Organization for providing essential data support. No specific funding projects, programs, or reference codes were explicitly stated in the paper.
Citation
@article{Nouri2025Mitigating,
author = {Nouri, Milad and Veysi, Shadman},
title = {Mitigating crop modeling uncertainties through machine learning in drylands},
journal = {Scientific Reports},
year = {2025},
doi = {10.1038/s41598-025-26811-6},
url = {https://doi.org/10.1038/s41598-025-26811-6}
}
Original Source: https://doi.org/10.1038/s41598-025-26811-6