Arshad et al. (2025) Enhancing assimilated soil moisture prediction from environmental data using advanced machine learning
Identification
- Journal: ENVIRONMENTAL SYSTEMS RESEARCH
- Year: 2025
- Date: 2025-12-24
- Authors: Sana Arshad, Amna Ashraf, Main Al-Dalahmeh, Endre Harsányi, Safwan S. Mohammed
- DOI: 10.1186/s40068-025-00451-1
Research Groups
- Department of Geography and Geoinformatics, The Islamia University of Bahawalpur, Pakistan.
- Department of Artificial Intelligence, The Islamia University of Bahawalpur, Pakistan.
- Institute of Business Administration, Faculty of Business, Applied Science Private University, Jordan.
- Institute of Land Use, Technical and Precision Technology, University of Debrecen, Hungary.
- Institutes for Agricultural Research and Educational Farm, University of Debrecen, Hungary.
Short Summary
This study enhances the prediction of top-layer soil moisture (0–10 cm) in arid irrigated and rainfed regions of Pakistan by integrating t-Distributed Stochastic Neighbor Embedding (t-SNE) dimensionality reduction with machine learning models. The results demonstrate that t-SNE-enhanced Gradient Boosting Regression significantly outperforms standard models, providing a more reliable tool for drought early warning and agricultural management.
Objective
- To improve the prediction accuracy of assimilated soil moisture (0–10 cm) using multisource environmental data by integrating non-linear feature extraction (t-SNE) with machine learning algorithms (Random Forest, Gradient Boosting Regression, and Artificial Neural Networks) across distinct agroclimatic zones.
Study Configuration
- Spatial Scale: Regional scale focusing on two distinct areas in Punjab, Pakistan: the "Rainfed" northern region (Potohar and Salt Range, 22,250 $km^2$) and the "Arid Irrigated" southern region (Indus plains and Cholistan desert, 45,580 $km^2$).
- Temporal Scale: Monthly time series spanning 40 years from January 1982 to December 2022 ($n = 492$).
Methodology and Data
- Models used: Random Forest (RF), Gradient Boosting Regression (GBR), and Artificial Neural Network (ANN-MLP). These were integrated with t-Distributed Stochastic Neighbor Embedding (t-SNE) in two and three dimensions (2D and 3D) for feature extraction.
- Data sources:
- FLDAS (Famine Land Data Assimilation System): Used for top-layer soil moisture (0–10 cm) and 12 environmental variables (e.g., humidity, evapotranspiration, soil temperature, net longwave radiation).
- TerraClimate: Used for 6 climate and water balance variables (e.g., climate water deficit, vapor pressure deficit, temperature extremes).
- Ancillary Data: CHIRPS (precipitation) and MERRA-2 (meteorological forcings) integrated within the assimilation systems.
Main Results
- Regional Characteristics: Mean soil moisture was significantly higher in the rainfed region (0.245 $m^3/m^3$) compared to the irrigated region (0.175 $m^3/m^3$).
- Predictive Performance (Irrigated): The 2D-t-SNE + GBR model achieved the highest accuracy ($R^2 = 0.889$, $RMSE = 0.011$ $m^3/m^3$), outperforming the baseline GBR without t-SNE ($R^2 = 0.845$).
- Predictive Performance (Rainfed): The 3D-t-SNE + GBR model achieved the highest accuracy ($R^2 = 0.754$, $RMSE = 0.020$ $m^3/m^3$), compared to the best baseline model (RF, $R^2 = 0.676$).
- Feature Extraction Impact: t-SNE proved superior to Principal Component Analysis (PCA) for this application due to its ability to capture non-linear manifold structures in high-dimensional environmental datasets.
- Environmental Drivers: Soil moisture showed strong correlations with net longwave radiation flux, relative humidity, and latent heat net flux across both regions.
Contributions
- Methodological Innovation: This is one of the first studies to apply t-SNE as a feature extraction (rather than selection) technique to enhance soil moisture regression models.
- Agroclimatic Specificity: The research establishes that dimensionality reduction requirements differ by region (2D for irrigated vs. 3D for rainfed), highlighting the need for tailored modeling in diverse landscapes.
- Operational Value: Provides a robust framework for predicting soil moisture in data-scarce regions, directly supporting climate-resilient agriculture and drought monitoring systems.
Funding
- University of Debrecen: Supported through the Program for Scientific Publications (2025). No external funding was received for this research.
Citation
@article{Arshad2025Enhancing,
author = {Arshad, Sana and Ashraf, Amna and Al-Dalahmeh, Main and Harsányi, Endre and Mohammed, Safwan S.},
title = {Enhancing assimilated soil moisture prediction from environmental data using advanced machine learning},
journal = {ENVIRONMENTAL SYSTEMS RESEARCH},
year = {2025},
doi = {10.1186/s40068-025-00451-1},
url = {https://doi.org/10.1186/s40068-025-00451-1}
}
Generated by BiblioAssistant using gemini-3-flash-preview (Google API)
Original Source: https://doi.org/10.1186/s40068-025-00451-1