Xue et al. (2026) Gross Primary Production (GPP) for China from 2001–2020 Estimated by Machine Learning Methods

Identification

Journal: Mendeley Data
Year: 2026
Date: 2026-04-10
Authors: Yayong Xue, Xiaojian Zhao
DOI: 10.17632/jzxjxyyp6z.2

Research Groups

Xinjiang University

Short Summary

This study evaluated five existing Gross Primary Production (GPP) products and five machine learning methods to generate a high-fidelity GPP dataset for China from 2001–2020, identifying Categorical Boosting as the best-performing method.

Objective

To comprehensively evaluate existing GPP products and machine learning methods to generate a high-fidelity GPP dataset for China.

Study Configuration

Spatial Scale: Mainland China, with a spatial resolution of 0.05 degrees.
Temporal Scale: 2001–2020.

Methodology and Data

Models used: Categorical Boosting (CatBoost), Support Vector Machine Regression, Light Gradient Boosting, Extreme Gradient Boosting, Random Forest. Bayesian Three-Cornered Hat method for uncertainty quantification.
Data sources:
- Five mainstream GPP products: MODIS (Moderate Resolution Imaging Spectroradiometer), PML-V2 (Penman-Monteith-Leuning Version 2), GOSIF (Global Orbiting Carbon Observatory-2 based Solar Induced chlorophyll Fluorescence), CEDAR (sCaling Ecosystem Dynamics with ARtifical intelligence), and TL-LUE.
- Flux tower observations from 15 sites in mainland China.
- Multi-source data (integrated for machine learning).

Main Results

Five mainstream GPP products were comprehensively evaluated, and their uncertainties quantified using the Bayesian Three-Cornered Hat method.
Five machine learning methods were trained and compared for GPP estimation.
Categorical Boosting (CatBoost) demonstrated the best performance among the tested machine learning methods, exhibiting the lowest Root Mean Square Error (RMSE) and Mean Absolute Error (MAE), and the highest coefficient of determination (R²) when validated against flux tower data from 15 sites.
A high-fidelity GPP dataset for China from 2001–2020 was generated using the CatBoost method at a 0.05-degree spatial resolution.

Contributions

Comprehensive evaluation and uncertainty quantification of multiple mainstream GPP products for China.
Generation of a new high-fidelity GPP dataset for China (2001–2020) using advanced machine learning techniques.
Identification of Categorical Boosting as a superior machine learning method for GPP estimation in the Chinese context.

Funding

National Natural Science Foundation of China Beijing (Grant ID: 42301127)

Citation

@article{Xue2026Gross,
  author = {Xue, Yayong and Zhao, Xiaojian},
  title = {Gross Primary Production (GPP) for China from 2001–2020 Estimated by Machine Learning Methods},
  journal = {Mendeley Data},
  year = {2026},
  doi = {10.17632/jzxjxyyp6z.2},
  url = {https://doi.org/10.17632/jzxjxyyp6z.2}
}

Original Source: https://doi.org/10.17632/jzxjxyyp6z.2