Ali et al. (2025) Aquifer-specific flood forecasting using machine learning: A comparative analysis for three distinct sedimentary aquifers

Identification

Journal: The Science of The Total Environment
Year: 2025
Date: 2025-10-30
Authors: Ali J. Ali, Ashraf Ahmed
DOI: 10.1016/j.scitotenv.2025.180756

Research Groups

Department of Civil and Environmental Engineering, Brunel University London, Uxbridge, UB8 3PH, United Kingdom

Short Summary

This study comparatively analyzes four machine learning models (TFT, Informer, LSTM, XGBoost) for multi-horizon (1-4 days) flood forecasting across three distinct sedimentary aquifers (Limestone, Chalk, Greensand) in the Thames Basin, UK. The research reveals that model accuracy is highly dependent on aquifer-specific hydrogeological characteristics, with Limestone showing very high accuracy (R² = 0.98–0.99) and Greensand exhibiting poor predictability (R² ≤ 0).

Objective

To ascertain how aquifer-specific variables affect the prediction reliability of four machine learning models (Temporal Fusion Transformer, Informer, Long Short-Term Memory, and XGBoost) for multi-horizon (1-4 days) flood forecasting in the Thames Basin, UK.
To enhance flood forecasting and risk management in the Thames Basin by combining hydrological records of rainfall, groundwater levels, and river stages with advanced machine learning algorithms.

Study Configuration

Spatial Scale: Thames Basin, UK (area exceeding 16,200 km²), focusing on three distinct sedimentary aquifer types: Chalk, Limestone, and Greensand. Three monitoring sites were selected within each aquifer type.
Temporal Scale: Data collected from April 2011 to early 2025. Hourly observations were aggregated to daily averages. Multi-horizon flood forecasting was performed for lead times of 1 to 4 days.

Methodology and Data

Models used: Temporal Fusion Transformer (TFT), Informer, Long Short-Term Memory (LSTM), Extreme Gradient Boosting (XGBoost).
Data sources: Environment Agency's Hydrological Data Explorer and local weather stations. Data types included rainfall, groundwater levels (GWL), and river levels.
Data preprocessing: Hourly observations were aggregated to daily averages. Missing data were filled using linear interpolation. All variables were normalized to a 0-1 scale using the MinMaxScaler method. Lagged features (1-3 days) and 3-day rolling averages for rainfall, river level, and GWL were created.
Validation: A strict holdout validation methodology was used, with 85% of the dataset for training/validation and 15% as a chronologically split holdout test set.

Main Results

Model performance varied significantly based on aquifer type and forecasting horizon.
Limestone Aquifer: Demonstrated very high prediction accuracy across all models and horizons, with R² values consistently between 0.98 and 0.99. TFT and Informer showed marginally better performance at shorter horizons (SMAPE: 1.53–1.64 %). This high accuracy is attributed to rapid and distinct groundwater-river interactions (correlation coefficient, r = 0.84).
Chalk Aquifer: Showed moderate prediction accuracy. For a 1-day horizon, R² values ranged from 0.77 to 0.80, decreasing to 0.48–0.62 for a 4-day horizon. LSTM and Informer performed slightly better at shorter horizons, while TFT maintained more stable performance over longer horizons. Groundwater-river interaction showed a moderate correlation (r = 0.26).
Greensand Aquifer: Exhibited poor forecasting ability across all models and horizons, with R² values often low or negative (R² ≤ 0), particularly for horizons longer than two days. This is due to delayed, complex, and diffuse groundwater-river interactions, and a weak negative association between river levels and groundwater (r = -0.14).
Transformer-based models (TFT and Informer) generally outperformed XGBoost, especially in aquifers with quick groundwater-river reactions (e.g., Limestone). LSTM also proved to be a reliable sequential baseline.
Prediction accuracy consistently decreased with increasing forecasting horizons (1-4 days) across all aquifers and models.
RMSE and MAE values, reported on the normalized 0-1 scale, reflected these trends (e.g., Limestone RMSE: 0.02–0.04; Greensand RMSE: 0.06–0.07).

Contributions

Presents the first aquifer-specific, multi-horizon comparative analysis of flood forecasting in the Thames Basin using advanced machine learning.
Explicitly demonstrates that geological variability across different aquifer types (Chalk, Limestone, Greensand) significantly impacts the reliability of flood forecasts, challenging assumptions of hydrological homogeneity in previous studies.
Provides the first evidence that transformer-based models can detect consistent differences in predictability driven by subsurface controls.
Introduces a novel paradigm for flood forecasting that integrates both data-driven approaches and hydrogeological information.
Offers a more physically consistent early warning method by combining groundwater level data with advanced transformer architectures.
Highlights the critical influence of subsurface hydrology on prediction reliability, revealing aquifer-specific geological limitations in ML-based forecasting, with direct implications for resilience planning and flood risk management at the watershed scale.

Funding

UKRI project 10063665

Citation

@article{Ali2025Aquiferspecific,
  author = {Ali, Ali J. and Ahmed, Ashraf},
  title = {Aquifer-specific flood forecasting using machine learning: A comparative analysis for three distinct sedimentary aquifers},
  journal = {The Science of The Total Environment},
  year = {2025},
  doi = {10.1016/j.scitotenv.2025.180756},
  url = {https://doi.org/10.1016/j.scitotenv.2025.180756}
}

Original Source: https://doi.org/10.1016/j.scitotenv.2025.180756