Zeynoddin et al. (2025) Overcoming hydrological forecasting challenges through augmented adaptive deep algorithms: a case study of the great lakes across Canada and the U.S

Identification

Journal: Theoretical and Applied Climatology
Year: 2025
Date: 2025-10-14
Authors: Mohammad Zeynoddin, Hossein Bonakdari, Afshin Amiri, Silvio José Gumière, Tadros Ghobrial
DOI: 10.1007/s00704-025-05819-y

Research Groups

Department of Soils and Agri-Food Engineering, Université Laval, Québec, Canada
Department of Civil Engineering, University of Ottawa, Ottawa, Canada
Department of Civil and Water Engineering, Université Laval, Québec, Canada

Short Summary

This study proposes a novel framework integrating grid search-optimized Seasonal Autoregressive Integrated Moving Average (GS-SARIMA), Long Short-Term Memory (LSTM), and Extreme Gradient Boosting (XGB) models, optimized using the Augmented Weighted Mean Vector Optimizer (AWMVO), for accurate lake level forecasting in the Great Lakes. The GS-SARIMA model achieved the highest accuracy for site-specific predictions, while AWMVO-LSTM demonstrated superior generalizability across lakes despite higher computational cost, highlighting trade-offs between accuracy, efficiency, and transferability.

Objective

To develop and evaluate a novel framework integrating GS-SARIMA, AWMVO-LSTM, and AWMVO-XGB models for accurate and generalizable lake level forecasting.
To compare these models across accuracy, computational efficiency, and generalizability for the Great Lakes under diverse hydrological conditions.
To improve forecasting accuracy, ensure model interpretability, and develop a scalable approach for practical water resource management.

Study Configuration

Spatial Scale: The Great Lakes (Superior, Huron-Michigan, Erie, and Ontario) spanning the border of the United States and Canada.
Temporal Scale: Monthly observations from September 1981 to December 2023 (508 data points). Data was chronologically split into training (304 points, ~60%), validation (102 points, ~20%), and testing (102 points, ~20%) subsets.

Methodology and Data

Models used:
- Grid Search-optimized Seasonal Autoregressive Integrated Moving Average (GS-SARIMA)
- Long Short-Term Memory (LSTM)
- Extreme Gradient Boosting (XGB)
- Augmented Weighted Mean Vector Optimizer (AWMVO) for hyperparameter optimization of LSTM and XGB.
- Optimization process enhanced with composite cost functions (combining root of mean square deviation and correlation) and data-driven feature engineering (lag selection based on autocorrelation analysis).
Data sources: Monthly time series data for lake level height collected from the Virtual Stations on the Database for Hydrological Time Series of Inland Waters (DAHITI) platform, based on satellite altimetry.

Main Results

GS-SARIMA: Achieved the highest overall accuracy with a mean test Kling-Gupta Efficiency (KGE) of 0.965 (range: 0.946–0.989) and a consistently low mean absolute percentage error (MAPE) of 0.039%. It excelled in capturing seasonal and nonseasonal patterns but required site-specific tuning, limiting its generalizability. Average runtime was approximately 4.5 hours per lake.
AWMVO-LSTM: Demonstrated superior generalizability, maintaining comparable accuracy (mean test KGE: 0.933, range: 0.890–0.960; mean MAPE: 0.046%) when transferred to other lakes without retraining (zero-shot). However, it was the most computationally expensive model, averaging nearly 19 hours of training time per lake.
AWMVO-XGB: Was the most computationally efficient model, with an average runtime of approximately 0.36 hours per lake. However, it showed the lowest average accuracy (mean test KGE: 0.838; mean MAPE: 0.073%, max: 0.176%), particularly underperforming in high variability systems like Lake Ontario.
Trade-offs: The study highlighted significant trade-offs between computational efficiency, accuracy, and generalizability among the models.
Lake Ontario: Consistently presented the greatest modeling challenge across all models, showing higher mean square deviation (MSDr) values and larger deviations in standard deviation, suggesting potential unmodeled external influences.
Uncertainty Analysis (U95): GS-SARIMA (average 0.045 meters) and LSTM (average 0.049 meters) showed comparable uncertainty in test predictions. XGB had the highest average test-phase uncertainty (0.052 meters), with the maximum observed for Lake Ontario (0.073 meters).

Contributions

Proposes a novel framework integrating grid search-optimized SARIMA, AWMVO-LSTM, and AWMVO-XGB models with advanced optimization and composite cost functions for hydrological forecasting.
Provides a systematic evaluation and comparison of diverse modeling techniques (statistical, deep learning, tree-based) in terms of accuracy, computational efficiency, and generalizability for lake level forecasting.
Demonstrates the effectiveness of zero-shot transferability for LSTM models across diverse hydrological systems, offering a pathway towards more scalable and adaptable forecasting solutions.
Emphasizes the importance of optimizing predictive models with advanced techniques customized to the specific hydrological and statistical characteristics of the data.
Utilizes a univariate input approach (past lake levels only) which implicitly captures the integrated effects of meteorological conditions and human activities, simplifying the modeling framework while maintaining predictive power.

Funding

Not applicable.

Citation

@article{Zeynoddin2025Overcoming,
  author = {Zeynoddin, Mohammad and Bonakdari, Hossein and Amiri, Afshin and Gumière, Silvio José and Ghobrial, Tadros},
  title = {Overcoming hydrological forecasting challenges through augmented adaptive deep algorithms: a case study of the great lakes across Canada and the U.S},
  journal = {Theoretical and Applied Climatology},
  year = {2025},
  doi = {10.1007/s00704-025-05819-y},
  url = {https://doi.org/10.1007/s00704-025-05819-y}
}

Original Source: https://doi.org/10.1007/s00704-025-05819-y