Yu et al. (2025) An efficient and physics-informed regional maize yield estimation scheme by combining data assimilation and machine learning
Identification
- Journal: Computers and Electronics in Agriculture
- Year: 2025
- Date: 2025-11-06
- Authors: Danyang Yu, Yuanyuan Zha, Yijian Zeng, Peiyu Lai, Wanxue Zhu, Jiang Bian, Qi Yang, Xi Huang, Zhongbo (Bob) Su
- DOI: 10.1016/j.compag.2025.111142
Research Groups
- State Key Laboratory of Water Resources Engineering and Management, Wuhan University, Wuhan, Hubei 430072, China
- Faculty of Geo-Information Science and Earth Observation (ITC), University of Twente, PO Box 217, 7500 AE Enschede, the Netherlands
- Department of Crop Sciences, University of Göttingen, Von-Siebold-Str. 8, 37075 Göttingen, Germany
- Department of Bioproducts and Biosystems Engineering, University of Minnesota, Saint Paul, MN 55108, USA
- Center for Agricultural Water Research in China, China Agricultural University, Beijing 100083, China
Short Summary
This study developed an efficient, physics-informed regional maize yield estimation framework by integrating data assimilation (DA) with machine learning (ML), significantly reducing computational costs while maintaining accuracy in Shandong Province, China.
Objective
- To develop an efficient, physics-informed regional yield estimation framework by integrating data assimilation (DA) with machine learning (ML) to reduce computational costs and improve accuracy.
Study Configuration
- Spatial Scale: Regional (Shandong Province, China), county-level (114 counties), pixel-based for DA.
- Temporal Scale: Maize growing season, with calibration and validation years.
Methodology and Data
- Models used:
- Crop Growth Model: SWAP
- Data Assimilation Algorithm: Iterative Ensemble Smoother
- Machine Learning Models: Feature Tokenizer Transformer (FTT), Artificial Neural Network (ANN), eXtreme Gradient Boosting (XGBoost), Random Forest (RF)
- Data sources:
- Remote sensing data (e.g., optical imageries from Sentinel-2, Landsat, MODIS)
- "Virtual" yield observations (generated by DA with SWAP model)
- Meteorological conditions, soil properties, crop types
Main Results
- The DA-ML framework achieved a computational time saving of over 99.8 % compared to traditional pixel-based DA methods.
- The framework maintained comparable yield estimation accuracy.
- Random Forest (RF) achieved the best performance among the ML models for regional yield estimates across 114 counties:
- Calibration year: R² = 0.62, RMSE = 1.19 × 10⁵ t
- Validation year: R² = 0.49, RMSE = 1.28 × 10⁵ t
- Relative importance analysis indicated that radiation, soil moisture, and rainfall during the jointing stage are the most influential variables for regional maize yield, followed by wind speed, air humidity, and temperature during the reproductive stage.
Contributions
- Development of a reliable and rapid-responsive DA-ML system for regional crop yield estimation and management decisions.
- Significant reduction in computational costs for regional yield estimation compared to traditional data assimilation methods.
- Identification of critical environmental conditions influencing maize yield at different growth stages.
- Demonstrated potential applicability to other crops and regions.
Funding
- Not specified in the provided text.
Citation
@article{Yu2025efficient,
author = {Yu, Danyang and Zha, Yuanyuan and Zeng, Yijian and Lai, Peiyu and Zhu, Wanxue and Bian, Jiang and Yang, Qi and Huang, Xi and Su, Zhongbo (Bob)},
title = {An efficient and physics-informed regional maize yield estimation scheme by combining data assimilation and machine learning},
journal = {Computers and Electronics in Agriculture},
year = {2025},
doi = {10.1016/j.compag.2025.111142},
url = {https://doi.org/10.1016/j.compag.2025.111142}
}
Original Source: https://doi.org/10.1016/j.compag.2025.111142