Ofori-Ampofo et al. (2025) On the strategy of exploring spatio-temporal information from Earth observation data for crop yield prediction

Identification

Journal: Smart Agricultural Technology
Year: 2025
Date: 2025-10-17
Authors: Stella Ofori-Ampofo, Rıdvan Salih Kuzu, Peter Schauer, Martin Willberg, Aaron Hohl, Xiao Xiang Zhu
DOI: 10.1016/j.atech.2025.101540

Research Groups

Technical University of Munich, Germany (TUM School of Engineering and Design, Department of Aerospace and Geodesy; Munich Center for Machine Learning)
Remote Sensing Technology Institute, German Aerospace Center (DLR), Weßling, Germany
Industrieanlagen-Betriebsgesellschaft mbH (IABG), Ottobrunn, Germany

Short Summary

This study comprehensively compares multiple strategies for encoding spatial and temporal information from Earth observation data for county-level corn yield prediction in the USA using various machine learning models. It reveals that predicting crop yield effectively using only time series data is possible, with surface reflectance being a critical predictor, and highlights the importance of recent historical data over long-term records for model accuracy.

Objective

To conduct a comprehensive comparison of existing spatial and temporal encoding techniques for corn yield prediction in the USA, evaluating their trade-offs and contributions to prediction accuracy under a unified dataset and experimental setup.

Study Configuration

Spatial Scale: County level (473 counties across Iowa, Illinois, Indiana, Nebraska, and Minnesota, USA). Data resolutions: MODIS at 500 meters, Daymet at 1 kilometer, USDA-NASS crop type maps at 30 meters.
Temporal Scale: 19 years of county-level corn yield records (2003-2021). Satellite image time series (SITS) from MODIS with an 8-day revisit frequency (46 timesteps/year, truncated to 27 timesteps for April-October). Daily Daymet weather data aggregated to 8-day resolution.

Methodology and Data

Models used:
- Spatial Encoding Strategies: Pixel averages, Image histograms, Pixel-set encoders (PSE).
- Temporal Encoding Strategies: Random Forests (RF), XGBoost, Support Vector Machines (SVM), Multilayer Perceptrons (MLP), Temporal Convolutional Neural Networks (TempCNN), Multi-Scale Residual Network (MSResNet), InceptionTime, Long Short-Term Memory (LSTM), LSTM with Attention, Lightweight Temporal Attention Encoder (LTAE).
- Combined Models: Histogram-LSTM, Histogram-TempCNN, Histogram-2D CNN, PSE-LTAE.
Data sources:
- Yield Data: County-level corn yield (bushels per acre) from the National Agriculture Statistics Office (NASS) of the United States Department of Agriculture (USDA) (2003-2021).
- Satellite Imagery: Gridded 8-day MODIS surface reflectance (MOD9A1.061, 500 m resolution) for visible and infrared bands.
- Spectral Indices: Normalized Difference Vegetation Index (NDVI) and Normalized Difference Water Index (NDWI) derived from MODIS bands.
- Weather Data: Gridded daily precipitation, minimum temperature, and maximum temperature from Daymet (1 km resolution).
- Ancillary Data: Annual gridded crop type maps from USDA-NASS (30 m resolution) for crop masking.

Main Results

The average Mean Absolute Percentage Error (MAPE) on test sets was generally below 10%. Root Mean Squared Error (RMSE) was higher in 2021 compared to 2020.
Among models using time-series pixel averages, SVM demonstrated strong performance, outperforming other classical baselines (RF, XGBoost, MLP) and even some deep learning models like LSTM and Histogram-2D.
TempCNN achieved the best overall performance (lowest RMSE) when averaged across both test years (RMSE 13.89 bu/acre or 872.1 kg/hectare in 2020, 16.05 bu/acre or 1007.4 kg/hectare in 2021), with modest improvements over MSResNet and SVM.
The performance of LTAE declined when combined with the Pixel-Set Encoder (PSE-LTAE), possibly due to challenges in handling high spectral variation within large prediction units.
Flattening histograms and modeling the temporal component with 1D temporal convolution (Histogram-TempCNN) proved more efficient than using LSTMs or 2D CNNs on histogram images.
Surface reflectance (SR) features consistently performed well independently (highest R²), indicating their strong ability to capture yield variability. Models combining SR with either weather or spectral indices achieved the best overall performance and stability.
An 8-year training window (2012-2019) generalized better to test years than a 17-year (2003-2019) or 4-year (2016-2019) window, suggesting the presence of concept drift and the benefit of more recent data.
In-season forecasting showed that longer time spans improved prediction accuracy, with optimal performance observed around late August, after which extending time steps provided diminishing returns.
Including the previous year's data (2020) in the training set significantly improved the prediction accuracy for the current year (2021), reducing RMSE by 2 units (approximately 125.5 kg/hectare) and improving R² by 12%.
The MSResNet model achieved RMSEs of 13.77 bu/acre (864.6 kg/hectare) in 2020 and 14.89 bu/acre (934.7 kg/hectare) in 2021 (when 2020 was included in training), outperforming some existing 1D CNN-LSTM and LSTM-attention models (reported around 17 bu/acre or 1067.1 kg/hectare).

Contributions

Provides the first comprehensive comparison of various spatial and temporal encoding techniques for corn yield prediction in the USA, using a unified dataset and experimental setup.
Introduces and evaluates advanced deep learning architectures (e.g., MSResNet, InceptionTime, LTAE) for yield prediction, including the application of pixel-set encoders from crop classification studies to large prediction units.
Offers practical insights into the trade-offs between different data encoding strategies and their impact on predictive performance.
Conducts an in-depth analysis of critical factors influencing model performance, such as training data sample size, feature combinations, in-season forecasting capabilities, and the significance of prior-year observations.
Highlights the effectiveness of predicting crop yield using solely time series data without explicit spatial features and underscores the importance of surface reflectance as a key predictor.
Demonstrates that reliance on long-term historical data may hinder models' ability to reflect current conditions accurately, suggesting the presence of concept drift.
Reinforces the value of temporal continuity in training data, showing that including the previous year's data significantly enhances current year prediction accuracy.
Compiles and makes available a multi-source dataset for crop monitoring in the USA, facilitating future methodological advancements in remote sensing for agricultural applications.

Funding

Munich Aerospace e.V. scholarship (S. Ofori-Ampofo)
ML4Earth project by the German Federal Ministry for Economic Affairs and Climate Action (grant number 50EE2201C) (A. Höhl)
MONITOR and ML4Earth projects (for dataset collation)

Citation

@article{OforiAmpofo2025strategy,
  author = {Ofori-Ampofo, Stella and Kuzu, Rıdvan Salih and Schauer, Peter and Willberg, Martin and Hohl, Aaron and Zhu, Xiao Xiang},
  title = {On the strategy of exploring spatio-temporal information from Earth observation data for crop yield prediction},
  journal = {Smart Agricultural Technology},
  year = {2025},
  doi = {10.1016/j.atech.2025.101540},
  url = {https://doi.org/10.1016/j.atech.2025.101540}
}

Original Source: https://doi.org/10.1016/j.atech.2025.101540