Ci et al. (2025) Multi-timescale evapotranspiration fusion: A novel autoencoder with automated machine learning-based approach for enhanced estimation accuracy

Identification

Journal: Agricultural Water Management
Year: 2025
Date: 2025-12-16
Authors: Mengtao Ci, Xingming Hao, Fan Sun, Qixiang Liang, Xue Fan, Jingjing Zhang, Haibing Xiong, Jinfan Xu, Xinran Guo
DOI: 10.1016/j.agwat.2025.110086

Research Groups

Key Laboratory of Ecological Safety and Sustainable Development in Arid Lands, Xinjiang Institute of Ecology and Geography, Chinese Academy of Sciences, Urumqi, China
University of Chinese Academy of Sciences, Beijing, China
Akesu National Station of Observation and Research for Oasis Agro-ecosystem, Akesu, Xinjiang, China

Short Summary

This study developed AGFusionET, a novel multi-timescale fusion model combining autoencoders and automated machine learning (AutoML), to integrate 20 heterogeneous evapotranspiration (ET) products. It generated a global, high-resolution (0.05 degrees) ET dataset for 1982–2023, demonstrating superior accuracy (Kling-Gupta Efficiency of 0.88, Root Mean Square Error of 12.12 mm/month) compared to benchmark products.

Objective

To construct a next-generation global evapotranspiration (ET) dataset featuring high spatial resolution (0.05 degrees), extended temporal coverage (1982–2023), and global robustness.
To advance ET fusion methodologies from traditional linear weighting towards a new paradigm centered on multimodality, automation, and deep feature-driven approaches by integrating autoencoders and automated machine learning (AutoML).

Study Configuration

Spatial Scale: Global, with a final resolution of 0.05 degrees (approximately 5.6 km at the equator).
Temporal Scale: 1982–2023, with monthly aggregation for analysis and validation.

Methodology and Data

Models used:
- AGFusionET (Proposed): A multi-timescale fusion model integrating:
  - Autoencoder: A symmetric deep autoencoder with a 12-dimensional bottleneck layer for nonlinear dimensionality reduction and latent feature extraction from high-dimensional ET products.
  - AutoGluon: An automated machine learning (AutoML) framework for automated feature selection, ensemble model construction, and hyperparameter optimization. It uses a multilayer stacked ensembling architecture, incorporating 12 distinct base model types (e.g., k-nearest neighbors, Random Forest, LightGBM, CatBoost, XGBoost, NeuralNetFastAI, NeuralNetTorch).
- Comparison Models: Random Forest (RF), Gradient Boosting Regressor (GBR), Light-Gradient Boosting Machine (LGBM), and a Stacked Ensemble model.
Data sources:
- Eddy covariance observations: 585 datasets compiled from FLUXNET, AmeriFlux, EuroFlux, AsiaFlux, ChinaFlux, and two national field observation stations in Xinjiang, China (Aksu Oasis Farmland Ecosystem, Fukang Desert Ecosystem). These served as ground truth for model training and validation.
- Evapotranspiration (ET) products: 20 global ET or latent heat flux (LE) products, categorized into statistical/empirical ensemble, reanalysis-based, remote sensing (RS), process-based modeling, surface energy balance based, and water balance-based. All products were harmonized to 0.05 degrees spatial resolution and monthly temporal scale.
- Meteorological forcing data: ERA5-Land reanalysis product (0.1 degrees native spatial resolution, monthly temporal scale), including total precipitation (tp), 2 m air temperature (t2m), dew point temperature (d2m), 10 m u- and v-wind components (u10, v10), surface net solar radiation (ssr), surface net thermal radiation (str), soil moisture, and derived Vapour Pressure Deficit (VPD). Resampled to 0.05 degrees.
- Normalised Difference Vegetation Index (NDVI) data: Global Inventory Modelling and Mapping Studies (GIMMS) 3G dataset (1/12 degrees native resolution, 1982–2022) and MOD13C1 observations for 2023. Resampled to 0.05 degrees.
- Digital Elevation Model (DEM) data: GLOBE Digital Elevation Model (1 km spatial resolution).
- MODIS land cover type product (MCD12C1): IGBP classification (0.05 degrees spatial resolution, 2001–2023), used to identify dominant land cover types and snow/ice.

Main Results

Superior Performance: AGFusionET consistently outperformed all 20 benchmark ET products and comparison machine learning models across various validation metrics and scales.
High Accuracy: Achieved a Kling-Gupta Efficiency (KGE) of 0.88 and a Root Mean Square Error (RMSE) of 12.12 mm/month when validated against monthly ET observations from all available flux tower sites.
Robust Generalization: On an independent validation set, AGFusionET models showed RMSE values ranging from 16.4 to 16.6 mm/month and KGE scores between 0.835 and 0.855, demonstrating strong robustness and extrapolation capabilities to unseen spatial domains.
Land Cover Adaptability: AGFusionET achieved the lowest RMSE and smallest uncertainty across all five IGBP land cover types (croplands: 16.63 mm/month; grasslands: 14.35 mm/month; shrublands: 11.08 mm/month; savannas: 13.09 mm/month; forests: 13.59 mm/month), indicating strong adaptability to diverse ecosystems.
Spatial and Temporal Consistency: The generated global ET dataset (1982–2023, 0.05 degrees) exhibited enhanced spatio-temporal continuity and harmonized inter-product consistency.
Uncertainty Reduction: AGFusionET showed the lowest uncertainty (10.9 mm/year) among all evaluated products, significantly reducing the uncertainty compared to individual ET products.
Trend Representation: When validated against a water-balance-based ET (ETWB) dataset for 56 major river basins, AGFusionET exhibited the best overall trend representation (Nash–Sutcliffe Efficiency of 0.409) compared to other reanalysis and model-driven products.

Contributions

Introduces AGFusionET, a novel and generalisable framework for multi-source evapotranspiration (ET) data fusion, leveraging autoencoders for deep feature extraction and AutoML for automated model selection and hyperparameter optimization.
Generates a high-quality, long-term (1982–2023), and high-resolution (0.05 degrees) global ET dataset with significantly improved spatio-temporal continuity and cross-regional robustness.
Demonstrates superior accuracy and reliability in ET estimation across diverse ecosystems, particularly in arid and high-latitude regions, outperforming 20 existing benchmark ET products.
Effectively addresses critical challenges in multi-source ET data integration, such as temporal misalignment, input redundancy, and biased propagation, through a batchwise modeling strategy and deep feature integration.
Provides a reliable and unified foundational dataset that can support various applications, including hydrological modeling, drought monitoring, water resources management, and climate change research.

Funding

Strategy Priority Research Program (Category B) of the Chinese Academy of Sciences [grant number XDB0720101]

Citation

@article{Ci2025Multitimescale,
  author = {Ci, Mengtao and Hao, Xingming and Sun, Fan and Liang, Qixiang and Fan, Xue and Zhang, Jingjing and Xiong, Haibing and Xu, Jinfan and Guo, Xinran},
  title = {Multi-timescale evapotranspiration fusion: A novel autoencoder with automated machine learning-based approach for enhanced estimation accuracy},
  journal = {Agricultural Water Management},
  year = {2025},
  doi = {10.1016/j.agwat.2025.110086},
  url = {https://doi.org/10.1016/j.agwat.2025.110086}
}

Original Source: https://doi.org/10.1016/j.agwat.2025.110086