Barzegar et al. (2026) Explaining Great Lakes water level variability through interpretable ensemble machine learning

Identification

Journal: The Science of The Total Environment
Year: 2026
Date: 2026-01-01
Authors: Rahim Barzegar, Ehsan Raei, Jan Adamowski
DOI: 10.1016/j.scitotenv.2025.181302

Research Groups

Groundwater Research Group (GRES), Research Institute on Mines and Environment (RIME), Université du Québec en Abitibi-Témiscamingue (UQAT), Amos, Québec, Canada
Department of Bioresource Engineering, McGill University, Sainte-Anne-de-Bellevue, Quebec, Canada
United Nations University Institute for Water, Environment and Health (UNU-INWEH), Richmond Hill, ON, Canada

Short Summary

This study develops an interpretable ensemble machine learning framework to quantify the immediate and lagged controls of environmental drivers on monthly water-level fluctuations in the Great Lakes (Superior, Michigan, Erie, Ontario). It reveals that boosting-based models and an ensemble approach significantly improve predictions, with inflow and outflow being dominant drivers, while temperature, evaporation, and runoff act as secondary, lake-specific modulators.

Objective

To quantify the immediate and lagged controls of environmental drivers on monthly water-level fluctuations in Lakes Superior, Michigan, Erie, and Ontario using an interpretable multi-model machine learning framework.

Study Configuration

Spatial Scale: Lakes Superior, Michigan, Erie, and Ontario (Great Lakes region).
Temporal Scale: Monthly fluctuations over the period 1982–2022, with lagged predictors up to six months.

Methodology and Data

Models used: Eight tree-based algorithms (Random Forest, Extra Trees, Gradient Boosting Regression Trees (GBRT), Histogram-Based Gradient Boosting (HGBRT), XGBoost, LightGBM, CatBoost, and AdaBoost) integrated into a Supervised Committee Machine Learning (SCML) ensemble. SHapley Additive exPlanations (SHAP) and Variogram Analysis of Response Surfaces (VARS) were used for interpretability.
Data sources: Environmental drivers used as predictors include inflow, outflow, evaporation, runoff, and air temperature.

Main Results

Boosting-based machine learning models (XGBoost, LightGBM, HGBRT) significantly improved Great Lakes water-level prediction compared to Random Forest and AdaBoost.
The Supervised Committee Machine Learning (SCML) ensemble provided the most stable overall predictions, achieving Root Mean Square Error (RMSE) values as low as 0.118 m.
The SHAP–VARS framework effectively identified dominant drivers and revealed their lagged sensitivities.
Inflow and outflow are the overwhelming dominant drivers of lake-level dynamics across the Great Lakes.
Evaporation, runoff, and air temperature act as secondary but lake-specific modulators of water levels.
Runoff and inflow primarily govern water levels in Lakes Superior and Michigan, while inflow overwhelmingly drives Lake Erie's dynamics.
Temperature and evaporation exert strong long-lag effects, particularly evident in Lake Ontario.

Contributions

Development of an interpretable, multi-model machine learning framework for enhanced Great Lakes water-level prediction.
Quantification of immediate and lagged controls of environmental drivers on lake levels using a novel SHAP–VARS framework.
Identification of dominant and secondary environmental drivers and their lake-specific lagged sensitivities across the Great Lakes.
Significant improvement in water-level prediction stability and accuracy through the application of a Supervised Committee Machine Learning (SCML) ensemble.

Funding

Not mentioned in the provided paper text.

Citation

@article{Barzegar2026Explaining,
  author = {Barzegar, Rahim and Raei, Ehsan and Adamowski, Jan},
  title = {Explaining Great Lakes water level variability through interpretable ensemble machine learning},
  journal = {The Science of The Total Environment},
  year = {2026},
  doi = {10.1016/j.scitotenv.2025.181302},
  url = {https://doi.org/10.1016/j.scitotenv.2025.181302}
}

Original Source: https://doi.org/10.1016/j.scitotenv.2025.181302