Kim et al. (2025) Development of the machine learning and deep learning models with SHAP strategy for predicting groundwater levels in South Korea
Identification
- Journal: Scientific Reports
- Year: 2025
- Date: 2025-10-10
- Authors: Sungwon Kim, Meysam Alizamir, Salim Heddam, Sun Woo Chang, Il-Moon Chung, Özgür Kişi, Christoph Külls
- DOI: 10.1038/s41598-025-19545-y
Research Groups
- Department of Railroad Construction and Safety Engineering, Dongyang University, Yeongju, Republic of Korea
- Institute of Research and Development, Duy Tan University, Da Nang, Vietnam
- School of Engineering and Technology, Duy Tan University, Da Nang, Vietnam
- Faculty of Science, Agronomy Department, Hydraulics Division, Laboratory of Research in Biodiversity Interaction Ecosystem and Biotechnology, University 20 Août 1955, Skikda, Algeria
- Department of Hydro Science and Engineering Research, Korea Institute of Civil Engineering and Building Technology, Goyang-si, Republic of Korea
- Department of Land, Water and Environmental Research, Korea Institute of Civil Engineering and Building Technology, Goyang-si, Republic of Korea
- Department of Civil Engineering, University of Applied Sciences, Lübeck, Germany
- Department of Civil Engineering, Ilia State University, Tbilisi, Georgia
- School of Civil, Environmental and Architectural Engineering, Korea University, Seoul, South Korea
Short Summary
This study developed and compared machine learning and deep learning models to predict groundwater levels (GWLs) in Jeju Island, South Korea, under three input data scenarios. The Random Forest model, utilizing lagged GWL data (Scenario 03), achieved the highest predictive accuracy, with its interpretability enhanced by SHAP analysis and statistical validation via ANOVA.
Objective
- To develop and evaluate various machine learning (SGB, RF, GRNN, GMDH) and deep learning (Deep ESN, LSTM) models for predicting groundwater levels (GWLs) in the Bongseong well, Jeju Island, South Korea.
- To compare the predictive performance of these models across three distinct input data scenarios, including meteorological data, neighboring well GWLs, local groundwater indicators, and lagged GWLs.
- To enhance the interpretability of the best-performing model's predictions using the SHapley Additive exPlanations (SHAP) strategy and validate the results with a one-way Analysis of Variance (ANOVA) test.
Study Configuration
- Spatial Scale: Jeju Island, South Korea, specifically Aewol-eup. Focus on the Bongseong monitoring well, with data from 7 other monitoring wells (Sanga1, Sanga2, Sanga3, Eom1, Jangcheon1, Hagwi1, Hagwi3) and meteorological stations (Aewol (1), Witse Oreum).
- Temporal Scale: Daily time scale, from June 1, 2011, to December 31, 2020 (3,502 days). Data split into 80% for training (June 1, 2011 – January 31, 2019) and 20% for testing (February 1, 2019 – December 31, 2020).
Methodology and Data
- Models used:
- Machine Learning: Stochastic Gradient Boosting (SGB), Random Forest (RF), Generalized Regression Neural Networks (GRNN), Group Method of Data Handling (GMDH).
- Deep Learning: Deep Echo State Network (Deep ESN), Long Short-Term Memory (LSTM).
- Interpretability & Validation: SHapley Additive exPlanations (SHAP) strategy, One-way Analysis of Variance (ANOVA) test.
- Data sources:
- Meteorological data: Daily rainfall (Aewol (1) station), daily air temperature, relative humidity, and wind speed (Witse Oreum station).
- Groundwater data: Daily groundwater levels (GWLs) from Bongseong well and 7 other monitoring wells. Daily groundwater temperature, electric conductivity, and pressure from Bongseong well.
- Data Access: Groundwater Information Management System, Jeju island (https://water.jeju.go.kr/obsvsystem/gwobsv/obsvData).
Main Results
- The Random Forest model in Scenario 03 (RF3) achieved the best overall predictive accuracy for GWLs in the Bongseong well during the testing procedure, with an RMSE of 0.053 m, a Correlation Coefficient (CC) of 1.000, and a Nash–Sutcliffe Efficiency (NSE) of 1.000.
- Scenario 03, which utilized meteorological data and 1-day to 15-day lead-time GWLs from the Bongseong well itself, consistently outperformed scenarios 01 (meteorological data + GWLs from 7 other wells) and 02 (meteorological data + local groundwater indicators).
- SHAP analysis for the best model (RF3) identified the 1-day lead-time GWL (GWL_T-01) as the most significant feature indicator, having the strongest positive impact on predictive ability.
- The one-way ANOVA test confirmed the robustness of all models in Scenario 03, indicating that their predicted values were statistically similar to the measured values (all null hypotheses accepted), with RF3 showing the highest P-value (0.993) and lowest F-statistic (6.9 × 10⁻⁵).
- RF3 demonstrated substantial performance improvement, increasing predictive accuracy by 116.98% (vs. RF1), 552.83% (vs. GRNN1), 1,439.62% (vs. RF2), 737.74% (vs. GRNN2), and 83.02% (vs. GMDH3) based on RMSE values during testing.
Contributions
- Comprehensive comparative evaluation of six machine learning and deep learning models, including less commonly applied ones, for groundwater level prediction in a hydrogeologically unique region.
- Systematic assessment of three distinct input data scenarios, highlighting the critical importance of incorporating lagged groundwater level time series data for superior predictive performance.
- Pioneering application of the SHAP strategy to interpret the contributions of individual input features to the model's predictions, providing valuable insights into the underlying hydrological processes and model behavior.
- Robust statistical validation of model performance using a one-way ANOVA test, enhancing the reliability and trustworthiness of the predictive results.
- Identification of the Random Forest model with lagged GWL inputs as a highly effective and interpretable solution for groundwater level forecasting in complex environments.
Funding
- KICT Research Program (Project no. 20250442-001: Development of Demonstration Technology for Integrated Operation of Subsurface Dam and Sand Storage Dam) funded by the Ministry of Science and ICT.
- Korea Environment Industry & Technology Institute (KEITI) through the Water Management for Drought Program, funded by the Korea Ministry of Environment (MOE) (2020361002).
Citation
@article{Kim2025Development,
author = {Kim, Sungwon and Alizamir, Meysam and Heddam, Salim and Chang, Sun Woo and Chung, Il-Moon and Kişi, Özgür and Külls, Christoph},
title = {Development of the machine learning and deep learning models with SHAP strategy for predicting groundwater levels in South Korea},
journal = {Scientific Reports},
year = {2025},
doi = {10.1038/s41598-025-19545-y},
url = {https://doi.org/10.1038/s41598-025-19545-y}
}
Original Source: https://doi.org/10.1038/s41598-025-19545-y