Wei et al. (2025) A global long-term daily multilayer soil moisture dataset derived from machine learning
Identification
- Journal: Scientific Data
- Year: 2025
- Date: 2025-12-15
- Authors: Zeyang Wei, Lifei Wei, Ting Wang, Q. Richard Lu, Shuang Tian, Fei Hu Zhang, Yanfei Zhong
- DOI: 10.1038/s41597-025-06436-0
Research Groups
- Faculty of Resources and Environmental Science, Hubei University, Wuhan, China
- Hubei Key Laboratory of Regional Development and Environmental Response, Hubei University, Wuhan, China
- Hubei Spatial Planning Research Institute, Wuhan, China
- College of Geography and Environmental Science, Zhejiang Normal University, Jinhua, China
- State Key Laboratory of Information Engineering in Surveying Mapping and Remote Sensing, Wuhan University, Wuhan, China
Short Summary
This study generated a global, daily, seamless multilayer soil moisture dataset (SWSM) for 2002–2021 at 0.05° spatial resolution using an XGBoost machine learning approach, demonstrating high accuracy against in situ observations across three soil depths. The resulting dataset addresses the scarcity of continuous, high-resolution, deep soil moisture products and provides physically consistent insights into soil moisture controls.
Objective
- To generate a global, daily, seamless multilayer soil moisture dataset (SWSM) for the period 2002–2021, at a fine spatial resolution, by leveraging a machine learning approach (XGBoost) to address the scarcity of continuous, high-resolution datasets for deep soil horizons.
Study Configuration
- Spatial Scale: Global, 0.05° spatial resolution.
- Temporal Scale: Daily, 2002–2021 (20 years).
Methodology and Data
- Models used: Extreme Gradient Boosting (XGBoost)
- Data sources:
- Training Target: International Soil Moisture Network (ISMN) in situ observations (quality-controlled, daily averaged, depth-weighted interpolated, and normalized).
- Input Features (Environmental Factors):
- ERA5-Land soil moisture (primary variable)
- ERA5-Land precipitation
- ERA5-Land surface net solar radiation
- GLASS Land Surface Temperature (LST)
- GLASS Leaf Area Index (LAI)
- MODIS/Terra+Aqua Land Cover Type (LULC)
- Global Multi-Resolution Terrain Elevation Data 2010 (GMTED2010)
- SoilGrids soil texture (clay, silt, sand content)
- SoilGrids depth to bedrock (DTB)
Main Results
- The SWSM dataset provides global, daily, seamless multilayer soil moisture at 0.05° spatial resolution for three depth horizons: 0–10 cm, 10–30 cm, and 30–60 cm.
- Validation against in situ observations demonstrated high accuracy: Pearson correlation coefficients (R) exceeded 0.90 (0.905 for 0–10 cm, 0.919 for 10–30 cm and 30–60 cm) and root mean square errors (RMSE) were below 0.05 m³/m³ (0.047 m³/m³ for 0–10 cm, 0.046 m³/m³ for 10–30 cm, 0.045 m³/m³ for 30–60 cm) across all depths.
- Bias remained practically zero, and unbiased RMSE (ubRMSE) was equal to RMSE at all depths.
- SWSM exhibited superior performance over other machine learning (SoMo.ml) and model-based (GLDAS, GLEAM) datasets across various depths and soil textures in terms of median BIAS, R, and RMSE.
- Feature importance analysis (SHAP) confirmed physical consistency, revealing depth-dependent patterns where surface layers were dominated by ERA5-Land soil moisture and surface indicators (LST, LAI), while deeper layers increasingly reflected soil texture and depth to bedrock.
- Ablation analysis confirmed the fundamental contribution of ERA5-Land soil moisture and static soil properties, with surface remote-sensing indicators being more important for the top layer.
- Prediction Interval Coverage Probabilities (PICP) ranged from 93.8% to 94.5% with an average interval width of approximately 0.18, indicating reliable uncertainty representation.
Contributions
- Generation of a novel global, long-term (2002–2021), daily, seamless multilayer soil moisture dataset (SWSM) at a fine spatial resolution (0.05°), covering three distinct depths (0–10 cm, 10–30 cm, 30–60 cm).
- Development of an interpretable XGBoost-based framework that effectively integrates multi-source data for robust soil moisture estimation across depths.
- Demonstration of superior performance compared to existing global soil moisture products (SoMo.ml, GLDAS, GLEAM) in terms of accuracy and consistency.
- Provision of novel insights into depth-dependent controls on soil moisture variability through feature importance analysis, validating the physical realism of the dataset.
- The dataset serves as a valuable resource for hydrological modeling, agricultural water management, and climate change studies, addressing critical gaps in existing soil moisture products.
Funding
- National Natural Science Foundation of China (42271392)
- Opening Foundation of Xi’an Key Laboratory of Territorial Spatial Information (3001023545016)
- Open Fund of the Key Laboratory of Natural Resources Monitoring and Supervision in Southern Hilly Region, Ministry of Natural Resources (NRMSSHR2022Y02, NRMSSHR2023Y03)
Citation
@article{Wei2025global,
author = {Wei, Zeyang and Wei, Lifei and Wang, Ting and Lu, Q. Richard and Tian, Shuang and Zhang, Fei Hu and Zhong, Yanfei},
title = {A global long-term daily multilayer soil moisture dataset derived from machine learning},
journal = {Scientific Data},
year = {2025},
doi = {10.1038/s41597-025-06436-0},
url = {https://doi.org/10.1038/s41597-025-06436-0}
}
Original Source: https://doi.org/10.1038/s41597-025-06436-0