Xue (2026) Gross Primary Production (GPP) of Vegetation Calculated by Machine Learning
Identification
- Journal: Mendeley Data
- Year: 2026
- Date: 2026-04-08
- Authors: Yayong Xue
- DOI: 10.17632/jzxjxyyp6z
Research Groups
- Xinjiang University, Ürümqi, China
Short Summary
This study comprehensively evaluated five mainstream Gross Primary Production (GPP) products and quantified their uncertainties using flux tower observations, then generated a high-fidelity GPP dataset for mainland China by integrating multi-source data with five machine learning methods, finding that Categorical Boosting (CatBoost) performed best.
Objective
- To comprehensively evaluate the performance and quantify the uncertainties of five mainstream GPP products.
- To generate a high-fidelity GPP dataset for mainland China by integrating multi-source data using machine learning methods.
Study Configuration
- Spatial Scale: Mainland China, utilizing data from 15 flux tower sites across various ecosystems.
- Temporal Scale: Not explicitly stated for the generated dataset, but based on continuous flux tower observations and existing GPP products.
Methodology and Data
- Models used:
- GPP products: MODIS (Moderate Resolution Imaging Spectroradiometer), PML-V2 (Penman-Monteith-Leuning Version 2), GOSIF (Global Orbiting Carbon Observatory-2 based Solar Induced chlorophyll Fluorescence), CEDAR (sCaling Ecosystem Dynamics with ARtifical intelligence), TL-LUE.
- Uncertainty quantification: Bayesian Three-Cornered Hat method.
- Machine learning methods: Categorical Boosting (CatBoost), Support Vector Machine Regression, Light Gradient Boosting, Extreme Gradient Boosting, Random Forest.
- Data sources:
- Flux tower observations from 15 sites in mainland China (Changbaishan, Qianyanzhou, Yanshan, Maoershan, Huzhong, Dangxiong, Haibei, Inner Mongolia, Ruoergai, Naqu, Damao, Yucheng, Jinzhou, Changlin, Panjin).
- Multi-source data, including common remote sensing products (e.g., MODIS, GOSIF).
Main Results
- Categorical Boosting (CatBoost) exhibited the best performance among the five machine learning methods, demonstrating the lowest Root Mean Square Error (RMSE) and Mean Absolute Error (MAE), and the highest coefficient of determination (R²) when validated against flux tower data.
Contributions
- Provides a comprehensive evaluation and uncertainty quantification of five widely used GPP products.
- Generates a new high-fidelity GPP dataset for mainland China by integrating multi-source data with advanced machine learning techniques.
- Identifies Categorical Boosting as a superior machine learning method for GPP estimation in the studied region.
Funding
- National Natural Science Foundation of China (Grant ID: 42301127)
Citation
@article{Xue2026Gross,
author = {Xue, Yayong},
title = {Gross Primary Production (GPP) of Vegetation Calculated by Machine Learning},
journal = {Mendeley Data},
year = {2026},
doi = {10.17632/jzxjxyyp6z},
url = {https://doi.org/10.17632/jzxjxyyp6z}
}
Original Source: https://doi.org/10.17632/jzxjxyyp6z