Boo et al. (2026) Deep learning for groundwater level simulation in unconfined aquifers across the contiguous United States: Analyzing simulations at multiple lead times and integrating groundwater signatures

Identification

Journal: Journal of Hydrology
Year: 2026
Date: 2026-01-13
Authors: Kenneth Beng Wee Boo, M. Chow, Wai Peng Wong, Ali Najah Ahmed, Ahmed El-Shafie
DOI: 10.1016/j.jhydrol.2026.134949

Research Groups

Department of Civil Engineering, School of Engineering, Monash University Malaysia
Monash Climate-Resilient Infrastructure Research Hub (M-CRInfra), School of Engineering, Monash University Malaysia
School of Information Technology, Monash University Malaysia
School of Engineering, Faculty of Engineering and Technology, Sunway University
National Water and Energy Center, United Arab Emirates University

Short Summary

This study simulates daily groundwater levels in 249 unconfined wells across the contiguous United States using deep learning models (LSTM-based Seq2Seq and Seq2One) at 1-day to 7-day lead times. The research demonstrates satisfactory model performance (median NSE of 0.744 for 1-day lead and 0.603 for 7-day lead) and highlights the utility of integrating groundwater signatures for comprehensive model evaluation, revealing correlations between model accuracy and groundwater dynamics.

Objective

To evaluate LSTM-based Seq2Seq and Seq2One models in multistep groundwater level (GWL) simulation of 1 to 7 days lead time in 249 unconfined aquifer monitoring wells across the contiguous United States (CONUS).
To study the effects of changing the training-validation split for LSTM-based Seq2Seq models with subpar performance (i.e., with a Nash-Sutcliffe Efficiency (NSE) score < 0).
To expand on the evaluation of LSTM-based Seq2Seq model by integrating groundwater signatures into three distinct analyses: a) measuring the correlation between model performance and groundwater signatures, b) quantifying how well each signature is simulated, and c) computing the correlation between simulated and observed groundwater signatures.

Study Configuration

Spatial Scale: 249 unconfined groundwater monitoring wells across the contiguous United States (CONUS). Meteorological data is gridded at 1 km × 1 km resolution.
Temporal Scale: Daily groundwater level simulations at 1-day to 7-day lead times. Models are fed with 365 consecutive days of daily meteorological data. Dataset split: Training (01 October 2009 to 30 September 2019), Validation (01 October 2019 to 30 September 2021), Test (01 October 2021 to 30 September 2023).

Methodology and Data

Models used: Long Short-Term Memory (LSTM)-based Sequence-to-Sequence (Seq2Seq) and Sequence-to-One (Seq2One) deep learning models.
Data sources:
- Daily groundwater level (GWL) time series: United States Geological Survey (USGS) portal.
- Daily meteorological (forcing) data: Daymet dataset (NASA ORNL DAAC), including precipitation, shortwave radiation, maximum and minimum air temperature, and water vapor pressure.

Main Results

The best performing models achieved median Nash-Sutcliffe Efficiency (NSE) scores of 0.744 for 1-day lead simulations and 0.603 for 7-day lead simulations.
Model performance generally declines as the simulation lead time increases.
LSTM-based Seq2One models statistically significantly outperformed Seq2Seq models for 1-day lead simulations (p = 9.772 × 10−3), but the performance difference was not statistically significant for 4-day and 7-day lead simulations.
Ensemble mean models consistently outperformed single models and ensemble median models, with the ensemble mean recommended for predictions.
Models generally underestimated the variability of observed GWLs (α-NSE < 1).
Approximately 14% of the models exhibited subpar performance (NSE < 0), which was generally not improved by simply changing the training-validation data splits.
Groundwater signatures: Yearly variance (0.465) and annual periodicity (0.443) showed moderate positive Spearman correlation with model NSE.
Failed models were predominantly found in locations with lower periodicity signatures and negative skewness, suggesting challenges in modeling systems influenced by irregular anthropogenic activities.
The models generally captured most groundwater dynamics well, with median evaluation criterion (En) scores above 0.7 for most signatures, but struggled with reproducing skewness, longest recession, and low pulse duration.
Site aridity showed a stronger negative correlation with model performance (−0.367), implying worse performance in drier locations.

Contributions

Comprehensive evaluation of LSTM-based Seq2Seq and Seq2One models for multi-step daily groundwater level simulation across a large sample of 249 unconfined wells in the contiguous United States.
Investigation into the efficacy of alternative training-validation splits for addressing poorly performing deep learning models in groundwater hydrology.
Pioneering integration of groundwater signatures into a multi-faceted model evaluation framework, providing deeper insights into model capabilities, limitations, and the specific groundwater dynamics captured or missed.
Confirmation of the benefits of ensemble modeling in deep learning for groundwater level prediction, specifically recommending the use of ensemble mean over median for improved accuracy.

Funding

Monash University (financial support for the first author's research project and resources for the project).

Citation

@article{Boo2026Deep,
  author = {Boo, Kenneth Beng Wee and Chow, M. and Wong, Wai Peng and Ahmed, Ali Najah and El-Shafie, Ahmed},
  title = {Deep learning for groundwater level simulation in unconfined aquifers across the contiguous United States: Analyzing simulations at multiple lead times and integrating groundwater signatures},
  journal = {Journal of Hydrology},
  year = {2026},
  doi = {10.1016/j.jhydrol.2026.134949},
  url = {https://doi.org/10.1016/j.jhydrol.2026.134949}
}

Original Source: https://doi.org/10.1016/j.jhydrol.2026.134949