Cheshmberah et al. (2025) Ensemble machine learning for predicting soil hydraulic properties in semi-arid regions
Identification
- Journal: Modeling Earth Systems and Environment
- Year: 2025
- Date: 2025-10-14
- Authors: Fatemeh Cheshmberah, Ali Asghar Zolfaghari, Ruhollah Taghizadeh‐Mehrjardi
- DOI: 10.1007/s40808-025-02648-w
Research Groups
- Department of Desert Studies, Semnan University, Semnan, Islamic Republic of Iran
- Department of Geosciences, Soil Science and Geomorphology, University of Tübingen, Tübingen, Germany
Short Summary
This study developed an ensemble machine learning approach combining Random Forest (RF) and Cubist models to predict and map soil hydraulic properties in a semi-arid region of Iran. The RF–Cubist ensemble consistently outperformed individual models, achieving higher accuracy and improved spatial reliability for field capacity (FC), permanent wilting point (PWP), and available water capacity (AWC).
Objective
- To develop an advanced Digital Soil Mapping (DSM) framework using an ensemble machine learning approach (RF-Cubist) to improve the predictive accuracy and spatial reliability of soil hydraulic properties (Field Capacity, Permanent Wilting Point, Available Water Capacity) in semi-arid regions.
- To evaluate the performance of the ensemble model against individual RF and Cubist models, explicitly incorporating spatial dependence analysis to reduce bias.
- To quantify prediction uncertainty through bootstrap simulations and assess the sensitivity of Available Water Capacity (AWC) estimates to errors in Field Capacity (FC) and Permanent Wilting Point (PWP) predictions.
Study Configuration
- Spatial Scale: Qazvin Province, central Iran (approximately 152,000 hectares). 150 surface soil samples collected from 0–30 cm depth. Covariates resampled to a uniform 30 m grid.
- Temporal Scale: Not explicitly stated for the study period, but uses satellite imagery (Landsat 8, Sentinel-2) and SRTM DEM, which represent a spatial snapshot.
Methodology and Data
- Models used: Ensemble machine learning (Random Forest (RF) and Cubist), individual Random Forest (RF), individual Cubist. Feature selection was performed using the Boruta algorithm. Model performance was assessed using R², RMSE, RMSLE, Bias, and Moran's I for spatial autocorrelation. Uncertainty analysis involved standard deviation, confidence intervals, and coefficient of variation from 100 bootstrap repetitions. A perturbation-based sensitivity analysis was conducted for AWC.
- Data sources:
- Soil samples: 150 surface soil samples (0–30 cm) collected using Conditional Latin Hypercube Sampling (cLHS). Laboratory analyses included sand, silt, clay content (hydrometer method), soil bulk density (core method), EC, CaCO3 (calcimetric method), organic matter (Walkley–Black method), and soil water retention curve (pressure plate extractor) to determine FC (at 300 cm H₂O) and PWP (at 15,000 cm H₂O). AWC was derived as FC - PWP.
- Remote sensing: Spectral bands and vegetation indices derived from Landsat 8 (OLI) and Sentinel-2 imagery.
- Digital Elevation Model (DEM): SRTM DEM, providing spatial descriptors such as Valley Depth (VD) and Channel Network Distance (CN).
Main Results
- The RF–Cubist ensemble consistently outperformed individual RF and Cubist models for all predicted soil hydraulic properties.
- Predictive Accuracy (RF-Cubist):
- Field Capacity (FC): R² = 0.70, RMSE = 3.97%
- Permanent Wilting Point (PWP): R² = 0.71, RMSE = 1.81%
- Available Water Capacity (AWC): R² = 0.65, RMSE = 3.95%
- Uncertainty Analysis (RF-Cubist): Achieved the lowest standard deviations (FC = 3.98, PWP = 1.81, AWC = 3.96) and coefficients of variation (FC = 11.95%, PWP = 13.62%, AWC = 19.90%), indicating greater prediction stability.
- Spatial Autocorrelation (Moran's I): The RF model showed significant positive spatial autocorrelation in residuals (I = 0.096, p = 0.028), while the Cubist (I = 0.008, p = 0.389) and RF–Cubist (I = 0.045, p = 0.169) models did not, demonstrating improved spatial performance and reduced spatial uncertainty for the ensemble.
- Feature Importance: Vegetation Density (VD) derived from DEM was identified as a critical variable for predicting PWP and FC.
- Error Propagation: Sensitivity analysis revealed that AWC estimates are substantially more sensitive to FC prediction errors than to PWP errors. For example, a ±5% change in FC resulted in an average ±8.44% change in AWC, whereas a ±5% change in PWP caused an average ∓3.44% change.
Contributions
- Developed and validated a novel ensemble machine learning framework (RF-Cubist) for Digital Soil Mapping of key soil hydraulic properties (FC, PWP, AWC) in semi-arid environments, demonstrating superior accuracy and spatial reliability compared to standalone models.
- Explicitly addressed spatial autocorrelation in model residuals using Moran's I, confirming that the ensemble approach effectively minimizes spatial bias in predictions.
- Provided a comprehensive uncertainty assessment (SD, CI, CV) for the predicted soil hydraulic properties, enhancing the interpretability and trustworthiness of the generated digital soil maps.
- Quantified the propagation of prediction errors from FC and PWP to AWC through a perturbation-based sensitivity analysis, highlighting the greater influence of FC errors on AWC estimates.
- Generated high-resolution digital maps of FC, PWP, and AWC, offering valuable tools for precision agriculture, water resource management, and drought risk mitigation in water-limited regions.
Funding
- Iran National Science Foundation (INSF) under Grant No. 97012557.
Citation
@article{Cheshmberah2025Ensemble,
author = {Cheshmberah, Fatemeh and Zolfaghari, Ali Asghar and Taghizadeh‐Mehrjardi, Ruhollah},
title = {Ensemble machine learning for predicting soil hydraulic properties in semi-arid regions},
journal = {Modeling Earth Systems and Environment},
year = {2025},
doi = {10.1007/s40808-025-02648-w},
url = {https://doi.org/10.1007/s40808-025-02648-w}
}
Original Source: https://doi.org/10.1007/s40808-025-02648-w