Hasan (2025) Global Synthetic Crop Yield, Meteorological, and Climate Teleconnection Dataset for Machine Learning Benchmarking
Identification
- Journal: Mendeley Data
- Year: 2025
- Date: 2025-12-05
- Authors: Hasan, Raza
- DOI: 10.17632/y7hkz2zfcc.1
Research Groups
- Raza Hasan (Contributor)
- Elsevier inc (Publisher/Host)
Short Summary
This paper presents a high-fidelity synthetic dataset of global crop yield, local meteorological conditions, and large-scale climate teleconnection indices from 1990 to 2023. The dataset was generated to benchmark machine learning architectures, particularly Spatial-Temporal Graph Neural Networks (ST-GNNs), by explicitly modeling physical correlations between global climate drivers and regional weather patterns.
Objective
- To generate a synthetic dataset that explicitly models the physical correlations between global climate drivers (e.g., ENSO, NAO) and regional weather patterns, making it suitable for benchmarking Spatial-Temporal Graph Neural Networks (ST-GNNs) and architectures like HESE-GNN-CP.
Study Configuration
- Spatial Scale: Global, encompassing 15 distinct regions.
- Temporal Scale: 34 years, spanning the period from 1990 to 2023.
Methodology and Data
- Models used:
- A physics-informed Python simulation script (GlobalSyntheticDataGenerator) was used to generate the synthetic data.
- The dataset is designed for benchmarking machine learning architectures, specifically HESE-GNN-CP and Spatial-Temporal Graph Neural Networks (ST-GNNs).
- Data sources: The dataset is entirely synthetic, generated through a multi-step simulation process:
- Global Signal Generation: Sinusoidal functions with added Gaussian noise simulated periodic climate phenomena (e.g., ENSO, NAO cycles of approximately 5-7 years).
- Teleconnection Coupling: Each of 15 regions was assigned a "Coupling Coefficient" based on real-world atmospheric physics to link global signals to regional conditions.
- Local Weather Simulation: Local temperature and rainfall were generated by modulating a region's baseline climate (determined by Latitude/Longitude centroids) with the weighted global signals.
- Yield Calculation: Final crop yield was computed using a non-linear biological stress function: Yield = BaseYield × TempStress × WaterStress × TechTrend + Noise.
Main Results
- The primary result is the "Global Synthetic Crop Yield, Meteorological, and Climate Teleconnection Dataset," a high-fidelity synthetic dataset covering global agricultural production, local meteorological conditions, and large-scale climate teleconnection indices for the period 1990–2023.
- This dataset explicitly incorporates physical correlations between major global climate drivers (ENSO, NAO) and regional weather patterns, making it uniquely suited for testing and benchmarking Spatial-Temporal Graph Neural Networks (ST-GNNs) and similar machine learning architectures.
Contributions
- The dataset's original value lies in its explicit modeling of physical correlations between global climate drivers and regional weather patterns, a feature often lacking in standard agricultural datasets. This makes it an ideal benchmark for machine learning models, particularly ST-GNNs, designed to capture long-distance climate links (teleconnections).
Funding
- Not specified in the provided text.
Citation
@article{Hasan2025Global,
author = {Hasan, Raza},
title = {Global Synthetic Crop Yield, Meteorological, and Climate Teleconnection Dataset for Machine Learning Benchmarking},
journal = {Mendeley Data},
year = {2025},
doi = {10.17632/y7hkz2zfcc.1},
url = {https://doi.org/10.17632/y7hkz2zfcc.1}
}
Original Source: https://doi.org/10.17632/y7hkz2zfcc.1