Hengl et al. (2026) OpenLandMap-soildb: global soil information at 30 m spatial resolution for 2000–2022+ based on spatiotemporal Machine Learning and harmonized legacy soil samples and observations
Identification
- Journal: Earth system science data
- Year: 2026
- Date: 2026-02-06
- Authors: Tomislav Hengl, Davide Consoli, Xuemeng Tian, Travis Nauman, Madlene Nussbaum, Mustafa Serkan Isik, Leandro Parente, Yu-Feng Ho, Rolf Simoes, Surya Gupta, Alessandro Samuel‐Rosa, Taciara Zborowski Horst, José Lucas Safanelli, Nancy L. Harris
- DOI: 10.5194/essd-18-989-2026
Research Groups
- OpenGeoHub Foundation, Doorwerth, the Netherlands
- University of Utrecht, Utrecht, the Netherlands
- Department of Environmental Sciences, University of Basel, Basel, Switzerland
- Universidade Tecnológica Federal do Paraná, Santa Helena and Dois Vizinhos, Paraná, Brazil
- Woodwell Climate Research Center, Falmouth, MA, USA
- World Resources Institute, Washington DC, USA
Short Summary
This paper presents OpenLandMap-soildb, a global dataset providing dynamic predictions of key soil properties and types at 30 m resolution for 2000–2022+ using spatiotemporal Machine Learning. It reveals a global loss of at least 11 Pg of soil organic carbon in the topsoil over the past 25 years, primarily due to deforestation.
Objective
- To assess if Landsat 30 m resolution images improve prediction accuracy and identify key Landsat-derived biophysical indices for soil mapping.
- To evaluate the expected prediction error of global models at unvisited locations compared to observed values.
- To identify key drivers of changes in soil organic carbon (SOC) and pH, particularly the impact of converting tropical forests to croplands and pasturelands over 20-30 years.
- To identify the world's remaining hotspots of SOC stocks.
Study Configuration
- Spatial Scale: Global land mask at 30 m resolution, with some outputs also at 120 m resolution. Predictions are aggregated into 2x2 spatio-temporal blocks.
- Temporal Scale: 2000–2022+ (with 2023 and 2024 under production). Predictions are generated for 5-year time intervals (e.g., 2000–2005, 2005–2010, etc.) and specific years (2000, 2005, 2010, 2015, 2020, 2022). Soil depths include 0–30 cm, 30–60 cm, and 60–100 cm.
Methodology and Data
- Models used: Spatiotemporal Machine Learning, specifically Quantile Regression Random Forest (QRRF), Random Forest (RF), and LightGBM. The framework is named "EO-SoilMapper" and implemented using the scikit-map library for Python.
- Data sources:
- Training Data: A large compilation of harmonized and quality-controlled legacy soil samples and observations (e.g., 216,000 for SOC density, 408,000 for SOC content, 272,000 for pH, 363,000 for texture, 134,000 for bulk density, 332,000 for soil types). This includes national/regional monitoring networks (e.g., LUCAS soil, NCSS Soil Characterization Database), compiled international databases (e.g., Africa Soil Profile Database, WoSIS), and citizen science data (LandPKS app). Pseudo-observations were also incorporated to represent extreme conditions.
- Covariate Layers: Over 160 TB of data, including:
- Landsat bimonthly and annual global composites (ARD V2) and derived biophysical indices (30 m resolution).
- 6-scale Digital Terrain Model (DTM) relief parameters (30 m to 960 m, resampled to 30 m).
- CHELSA Climate time-series of climatic and bioclimatic variables v2.1 (1 km resolution).
- MODIS Land Surface Temperature MOD11A2 and Water Vapor data sets MCD19A2 (1 km resolution).
- Additional layers such as peatland extent, bare rock extent, forest/wetland/crop cover, World Karst Aquifer Map, sediment types, bare soil/photosynthetically active vegetation fractions, global water extent probability, snow probability, soil salinity grade, Global Soil Bioclimatic variables, geometric temperature, landform class, and MERIT Hydro upstream area.
- Harmonization: Soil organic carbon values were corrected to the Dry Combustion (DC) method (ISO 10694:1995). Bulk density and rock fragments were also harmonized. Soil texture fractions were transformed using a modified additive log-ratio (ALR) transform.
Main Results
- Prediction Accuracy:
- Soil Organic Carbon (SOC) density: Root Mean Square Error (RMSE) of 17.7 kg m⁻³ (0.486 in log-scale), Concordance Correlation Coefficient (CCC) of 0.88.
- SOC content: RMSE of 51.3 g kg⁻¹ (0.574 in log-scale), CCC of 0.87.
- Bulk density: RMSE of 0.15 t m⁻³, CCC of 0.92.
- Soil pH: RMSE of 0.51, CCC of 0.91.
- Soil texture (clay, silt, sand): RMSEs of 8.4%, 9.9%, and 12.6% respectively, with CCCs ranging from 0.84 to 0.87.
- Prediction Interval Coverage Probability (PICP) for 68% intervals ranged from 38% (bulk density) to 67% (SOC content).
- Key Drivers:
- For SOC density, soil depth, Landsat-derived Gross Primary Productivity (GPP), Normalized Difference Vegetation Index (NDVI), and CHELSA bioclimatic indices were most important.
- For soil pH, the CHELSA Aridity Index, annual precipitation, and salinity grade were primary explanatory variables.
- For soil types, elevation and CHELSA climate variables were dominant.
- Global Estimates and Trends:
- Global SOC stocks for the 0–30 cm depth interval (2020–2022+) are estimated at 461 Pg, with a 68% probability range of 239–890 Pg.
- Total SOC stocks for 0–1 m depth are estimated at 1037 Pg.
- The world has lost at least 11 Pg of SOC in the topsoil (0–30 cm) between 2000 and 2022+.
- SOC loss is primarily driven by deforestation and peatland removal, particularly the conversion of tropical forests to croplands and pasturelands.
- Soil pH shows a slight trend toward acidification across land cover change classes.
- Remaining global SOC hotspots are identified in boreal peatlands (Canada and Russian Federation) and tropical peatlands.
Contributions
- Generation of the first global, dynamic soil property maps at 30 m spatial resolution for 2000–2022+, including quantified per-pixel uncertainty, enabling detailed monitoring of soil changes over time.
- Development of OpenLandMap-soildb, a comprehensive and consistent dataset for key soil properties (SOC content, SOC density, pH, texture, bulk density) and USDA soil taxonomy subgroups, surpassing previous efforts in spatial detail and accuracy.
- Implementation of an open, high-performance computing framework (EO-SoilMapper) that integrates extensive Earth Observation time series, climatic, terrain, and human impact variables with harmonized legacy soil data and pseudo-observations.
- Provision of quantitative estimates of global SOC stocks and a direct, data-driven assessment of significant SOC loss (11 Pg in topsoil) over the past 25 years, without relying on process-based assumptions.
- Release of all data products and code under open licenses, fostering transparency, reproducibility, and community engagement for continuous improvement in global soil mapping.
Funding
- Land & Carbon Lab grant from the Bezos Earth Fund.
- Open-Earth-Monitor Cyberinfrastructure project (European Union’s Horizon Europe research and innovation programme, grant agreement No. 101059548).
- AI4SoilHealth project (European Union’s Horizon Europe research and innovation programme, grant agreement No. 101086179).
- NRCS (award NR243A750023C026) and LLNL-LDRD Program (Project No. 24-SI-002, DOE Contract DE-AC52-07NA27344) for JLS.
Citation
@article{Hengl2026OpenLandMapsoildb,
author = {Hengl, Tomislav and Consoli, Davide and Tian, Xuemeng and Nauman, Travis and Nussbaum, Madlene and Isik, Mustafa Serkan and Parente, Leandro and Ho, Yu-Feng and Simoes, Rolf and Gupta, Surya and Samuel‐Rosa, Alessandro and Horst, Taciara Zborowski and Safanelli, José Lucas and Harris, Nancy L.},
title = {OpenLandMap-soildb: global soil information at 30 m spatial resolution for 2000–2022+ based on spatiotemporal Machine Learning and harmonized legacy soil samples and observations},
journal = {Earth system science data},
year = {2026},
doi = {10.5194/essd-18-989-2026},
url = {https://doi.org/10.5194/essd-18-989-2026}
}
Original Source: https://doi.org/10.5194/essd-18-989-2026