Liang (2025) Soil Moisture Prediction for Intelligent Irrigation: An XGBoost-based Model with Multi-Dimensional Feature Engineering
Identification
- Journal: Frontiers in Science and Engineering
- Year: 2025
- Date: 2025-11-24
- Authors: Jinqiao Liang
- DOI: 10.54691/kkftyy94
Research Groups
Business School, Shandong Normal University, Jinan, China
Short Summary
This study developed an Extreme Gradient Boosting (XGBoost)-based model with multi-dimensional feature engineering to predict 5 cm soil moisture using hourly meteorological data. The model achieved high-precision predictions (R² = 0.673) on a test set, significantly outperforming traditional linear models and providing reliable support for intelligent irrigation.
Objective
- To develop an XGBoost-based soil moisture prediction model that integrates multi-dimensional meteorological information and captures complex non-linear relationships, thereby providing reliable data support for intelligent irrigation systems.
Study Configuration
- Spatial Scale: A 1-hectare multi-crop farm.
- Temporal Scale: Hourly meteorological and soil moisture data collected from May 1 to July 31, 2021 (3 months), reconstructed into daily-scale data for modeling.
Methodology and Data
- Models used: Extreme Gradient Boosting (XGBoost).
- Data sources: On-site farm measurements, including hourly meteorological data (temperature, sea-level pressure, station pressure, water vapor pressure, relative humidity, wind direction, wind force, precipitation) and 5 cm depth absolute soil moisture.
- Data preprocessing: Linear interpolation and similar-day patterns for missing values, interquartile range (IQR) method for outlier detection, and reconstruction of hourly data into daily-scale data (e.g., daily average, max, min, range for continuous variables; daily total for precipitation).
- Feature engineering: Construction of intra-day statistical features, temporal variation features, lag features (1-3 days), and interaction features.
- Feature selection: Forward sequential feature selection combined with F-test, resulting in 44 optimal features.
- Model training and validation: Time-series cross-validation with a chronological split (70% training, 30% testing) and grid search for hyperparameter tuning.
Main Results
- The XGBoost model, using 44 optimal features, achieved a coefficient of determination (R²) of 0.673 on the test set, with a Root Mean Squared Error (RMSE) of 0.038 and a Mean Absolute Error (MAE) of 0.029. The training set R² was 0.892, and the test set AUC was 0.815.
- The model significantly outperformed traditional linear models in capturing complex non-linear relationships.
- Feature importance analysis identified the key driving factors for 5 cm soil moisture (ranked by importance): daily average temperature, 5 cm soil moisture on the previous day, daily total precipitation, daily average relative humidity, intra-day temperature range, total precipitation on the previous day, and the interaction term of temperature × relative humidity. These findings are consistent with soil water evaporation and replenishment mechanisms.
- The model successfully predicted 5 cm depth soil moisture of 0.2435 for a target date, meeting the minimum crop survival threshold.
Contributions
- Methodological Innovation: Verified the applicability of the XGBoost algorithm in agricultural temporal prediction, providing a robust technical framework through in-depth multi-dimensional feature engineering (including intra-day statistical, temporal variation, lag, and interaction features) to capture complex relationships and temporal memory effects.
- Practical Application: The high-precision prediction results directly serve as a decision-making basis for intelligent irrigation systems, enabling farmers to optimize irrigation timing and amount, thereby reducing water waste and agricultural production costs.
- Environmental Sustainability: Promotes precision irrigation, which helps avoid soil salinization and groundwater pollution caused by over-irrigation, aligning with green and sustainable agricultural development goals.
- Scientific Validation: Employed time-series cross-validation and a chronological training-test set division to prevent data leakage and ensure the model's generalization ability for "predicting the future" in real-world intelligent irrigation scenarios.
Funding
Not specified.
Citation
@article{Liang2025Soil,
author = {Liang, Jinqiao},
title = {Soil Moisture Prediction for Intelligent Irrigation: An XGBoost-based Model with Multi-Dimensional Feature Engineering},
journal = {Frontiers in Science and Engineering},
year = {2025},
doi = {10.54691/kkftyy94},
url = {https://doi.org/10.54691/kkftyy94}
}
Original Source: https://doi.org/10.54691/kkftyy94