Hasan (2025) Global Synthetic Crop Yield, Meteorological, and Climate Teleconnection Dataset for Machine Learning Benchmarking
Identification
- Journal: Mendeley Data
- Year: 2025
- Date: 2025-12-05
- Authors: Hasan, Raza
- DOI: 10.17632/y7hkz2zfcc
Research Groups
- Raza Hasan
Short Summary
This paper introduces a synthetic dataset (1990-2023) designed to benchmark machine learning models, particularly Spatial-Temporal Graph Neural Networks (ST-GNNs), in understanding the impact of global climate teleconnections on regional weather and crop yields.
Objective
- To generate a high-fidelity synthetic dataset that explicitly models global climate teleconnections (e.g., ENSO, NAO) and their influence on regional meteorological conditions and crop yields, specifically for benchmarking Spatial-Temporal Graph Neural Networks (ST-GNNs) in capturing these complex relationships.
Study Configuration
- Spatial Scale: Global, with data generated for 15 distinct regions.
- Temporal Scale: 1990–2023
Methodology and Data
- Models used: Physics-informed Python simulation script (GlobalSyntheticDataGenerator) incorporating: sinusoidal functions with Gaussian noise for global signals (ENSO, NAO), region-specific coupling coefficients based on real-world atmospheric physics, and a non-linear biological stress function for yield calculation (Yield = BaseYield * TempStress * WaterStress * TechTrend + Noise).
- Data sources: Synthetic data generated based on real-world atmospheric physics principles, latitude/longitude centroids for regional baselines, and simulated global climate indices (ENSO, NAO).
Main Results
- A high-fidelity synthetic dataset was successfully generated, covering global agricultural production, local meteorological conditions, and climate teleconnection indices from 1990 to 2023.
- This dataset explicitly incorporates physics-informed correlations between global climate drivers (ENSO, NAO) and regional weather patterns, providing a robust benchmark for Spatial-Temporal Graph Neural Networks (ST-GNNs).
- The generation process involved simulating global climate signals, coupling them to 15 regions with physics-informed coefficients, simulating local temperature and rainfall, and calculating yield using a non-linear biological stress function.
Contributions
- Creation of a novel synthetic dataset that uniquely integrates physics-informed global climate teleconnections and their regional impacts, offering a distinct advantage over standard agricultural datasets.
- Provides a specialized benchmark for evaluating Spatial-Temporal Graph Neural Networks (ST-GNNs) in their ability to model complex, long-distance climate-agriculture interactions.
Funding
- Not specified in the provided text.
Citation
@article{Hasan2025Global,
author = {Hasan, Raza},
title = {Global Synthetic Crop Yield, Meteorological, and Climate Teleconnection Dataset for Machine Learning Benchmarking},
journal = {Mendeley Data},
year = {2025},
doi = {10.17632/y7hkz2zfcc},
url = {https://doi.org/10.17632/y7hkz2zfcc}
}
Original Source: https://doi.org/10.17632/y7hkz2zfcc