Samalavičius et al. (2026) Sinkhole risk forecasting in the Lithuania–Latvia Karst region using artificial intelligence
Identification
- Journal: Journal of Hydrology Regional Studies
- Year: 2026
- Date: 2026-03-31
- Authors: Vytautas Samalavičius, Jānis Bikše, I. Zaslavsky, Ieva Lekstutytė, Jurga Arustienė, Gintaras Žaržojus, Assemzhan Kunsakova, Inga Retiķe, Sonata Gadeikienė, Saulius Gadeikis
- DOI: 10.1016/j.ejrh.2026.103372
Research Groups
- Institute of Geosciences at Vilnius University, Lithuania
- Faculty of Science and Technology at University of Latvia, Latvia
- San Diego Supercomputer Center, University of California San Diego, USA
Short Summary
This study develops an end-to-end, remote-sensing–informed and data-driven workflow to reconstruct missing daily groundwater-level (GWL) records and to forecast monthly sinkhole formation risk in the Lithuania–Latvia transboundary gypsum karst region. Models combining groundwater level, seasonal encoding, and hydroclimatic features achieved high accuracy (~0.96), high-risk precision (~0.98), and recall (~0.85), highlighting multi-week hydroclimatic preconditioning as the dominant driver.
Objective
- To develop an end-to-end, remote-sensing–informed and data-driven workflow to reconstruct missing daily groundwater-level (GWL) records.
- To forecast monthly sinkhole formation risk in the Lithuania–Latvia transboundary gypsum karst region.
- To simultaneously address two problems: 1) reconstructing daily gaps in sparse GWL records and; 2) producing operational predictors of sinkhole risk at monthly resolution without relying on dense in situ networks.
Study Configuration
- Spatial Scale: Lithuania–Latvia transboundary gypsum karst region, approximately 150–170 kilometers long and 50–80 kilometers wide, encompassing seven groundwater monitoring wells.
- Temporal Scale: Data records from 2003 to 2024, with daily resolution for groundwater level reconstruction and monthly resolution for sinkhole risk forecasting.
Methodology and Data
- Models used:
- Groundwater Level Imputation (Regression): ExtraTreesRegressor (ETR), RandomForestRegressor (RFR), GradientBoostingRegressor (GBR). ETR was selected as the primary model.
- Sinkhole Risk Assessment (Classification): RandomForestClassifier (RFC), LightGBM (LGBM), XGBoost (XGB), ExtraTreesClassifier (ETC). RFC was selected as the primary model.
- Techniques: Supervised machine learning, time-aware cross-validation with block masking, SMOTE–Tomek for class imbalance control, Variance Inflation Factor (VIF) for multicollinearity screening, SHAP for feature attribution, and GridSearchCV for hyperparameter tuning.
- Data sources:
- Climate: European gridded observational climate dataset (E-OBS) for daily near-surface air temperature (degrees Celsius) and precipitation (millimeters).
- Evapotranspiration: Global Land Evaporation Amsterdam Model (GLEAM) for Actual Evapotranspiration (AET) and Potential Evapotranspiration (PET) (millimeters per day).
- Water Storage: Global Land Data Assimilation System (GLDAS) for GRACE-based Terrestrial Water Storage (TWS) and Groundwater Storage (GWS) anomalies (millimeters).
- Groundwater Observations: Latvian Environment, Geology and Meteorology Centre (Latvia) and Lithuanian Geological Survey (LGT) PoˇzVIS information system (Lithuania).
- Sinkhole Data: LGT GEOLIS (Lithuania) for sinkhole formation dates.
Main Results
- Groundwater Level Imputation: The ExtraTreesRegressor (ETR) model achieved cross-validated R² values ranging from 0.44 to 0.79 and Mean Absolute Error (MAE) from 0.12 to 0.40 meters across different wells. Same-day GLDAS groundwater storage was identified as the dominant predictor, and models incorporating seasonal encoding showed improved performance. Increased high-frequency variability (noise) in groundwater levels correlated with reduced model accuracy.
- Sinkhole Risk Assessment: A "high risk" threshold was empirically defined as ≥4 newly formed sinkholes per month. Models combining groundwater features with seasonal encoding (GS Combined) demonstrated the highest performance, with an accuracy of approximately 0.96, high-risk precision of approximately 0.98, and high-risk recall of approximately 0.85. Explainable analyses indicated that sinkhole formation is primarily driven by the long-term hydraulic state of the groundwater system and its seasonal modulation, with elevated groundwater levels and positive anomalies increasing risk. Sinkhole clusters frequently coincided with, or occurred within ±30 days of, groundwater-level peaks.
Contributions
- Developed a novel, end-to-end, remote-sensing–driven machine learning workflow that simultaneously reconstructs missing daily groundwater-level records and forecasts monthly sinkhole formation risk in data-scarce karst regions.
- Demonstrated that process-informed ML pipelines can effectively bridge the gap between sparse observations and operationally relevant geohazard assessment.
- Identified multi-week hydroclimatic preconditioning and seasonally modulated groundwater levels as dominant drivers for sinkhole formation, with sinkhole clusters occurring within ±30 days of groundwater-level peaks.
- Provided a scalable, cost-effective framework for early-warning systems in transboundary karst settings, reducing reliance on dense in situ monitoring networks.
- Emphasized the importance of data continuity, sensor maintenance, and metadata documentation over spatial density for effective sinkhole management.
Funding
- Research Council of Lithuania (LMTLT), Agreement no. S-IMPRESSU-24-3.
- GRANDE-U “Groundwater Resilience Assessment through Integrated Data Exploration for Ukraine” project (NSF Award 2409395).
- Latvian Council of Science, Contract no. 11-1.N-462.
Citation
@article{Samalavičius2026Sinkhole,
author = {Samalavičius, Vytautas and Bikše, Jānis and Zaslavsky, I. and Lekstutytė, Ieva and Arustienė, Jurga and Žaržojus, Gintaras and Kunsakova, Assemzhan and Retiķe, Inga and Gadeikienė, Sonata and Gadeikis, Saulius},
title = {Sinkhole risk forecasting in the Lithuania–Latvia Karst region using artificial intelligence},
journal = {Journal of Hydrology Regional Studies},
year = {2026},
doi = {10.1016/j.ejrh.2026.103372},
url = {https://doi.org/10.1016/j.ejrh.2026.103372}
}
Original Source: https://doi.org/10.1016/j.ejrh.2026.103372