Xue et al. (2026) Gross Primary Production (GPP) for China from 2001–2020 Estimated by Machine Learning Methods
Identification
- Journal: Mendeley Data
- Year: 2026
- Date: 2026-04-10
- Authors: Yayong Xue, Xiaojian Zhao
- DOI: 10.17632/jzxjxyyp6z.2
Research Groups
- Xinjiang University
Short Summary
This study evaluated five existing Gross Primary Production (GPP) products and five machine learning methods to generate a high-fidelity GPP dataset for China from 2001–2020, identifying Categorical Boosting as the best-performing method.
Objective
- To comprehensively evaluate existing GPP products and machine learning methods to generate a high-fidelity GPP dataset for China.
Study Configuration
- Spatial Scale: Mainland China, with a spatial resolution of 0.05 degrees.
- Temporal Scale: 2001–2020.
Methodology and Data
- Models used: Categorical Boosting (CatBoost), Support Vector Machine Regression, Light Gradient Boosting, Extreme Gradient Boosting, Random Forest. Bayesian Three-Cornered Hat method for uncertainty quantification.
- Data sources:
- Five mainstream GPP products: MODIS (Moderate Resolution Imaging Spectroradiometer), PML-V2 (Penman-Monteith-Leuning Version 2), GOSIF (Global Orbiting Carbon Observatory-2 based Solar Induced chlorophyll Fluorescence), CEDAR (sCaling Ecosystem Dynamics with ARtifical intelligence), and TL-LUE.
- Flux tower observations from 15 sites in mainland China.
- Multi-source data (integrated for machine learning).
Main Results
- Five mainstream GPP products were comprehensively evaluated, and their uncertainties quantified using the Bayesian Three-Cornered Hat method.
- Five machine learning methods were trained and compared for GPP estimation.
- Categorical Boosting (CatBoost) demonstrated the best performance among the tested machine learning methods, exhibiting the lowest Root Mean Square Error (RMSE) and Mean Absolute Error (MAE), and the highest coefficient of determination (R²) when validated against flux tower data from 15 sites.
- A high-fidelity GPP dataset for China from 2001–2020 was generated using the CatBoost method at a 0.05-degree spatial resolution.
Contributions
- Comprehensive evaluation and uncertainty quantification of multiple mainstream GPP products for China.
- Generation of a new high-fidelity GPP dataset for China (2001–2020) using advanced machine learning techniques.
- Identification of Categorical Boosting as a superior machine learning method for GPP estimation in the Chinese context.
Funding
- National Natural Science Foundation of China Beijing (Grant ID: 42301127)
Citation
@article{Xue2026Gross,
author = {Xue, Yayong and Zhao, Xiaojian},
title = {Gross Primary Production (GPP) for China from 2001–2020 Estimated by Machine Learning Methods},
journal = {Mendeley Data},
year = {2026},
doi = {10.17632/jzxjxyyp6z.2},
url = {https://doi.org/10.17632/jzxjxyyp6z.2}
}
Original Source: https://doi.org/10.17632/jzxjxyyp6z.2