Wang et al. (2026) Flash flood forecasting in North East England through weak label-guided mixture of experts with multi-scale explainability
Identification
- Journal: Journal of Hydrology Regional Studies
- Year: 2026
- Date: 2026-03-31
- Authors: Jessica Wang, J.E. Sanderson, Wai Lok Woo
- DOI: 10.1016/j.ejrh.2026.103402
Research Groups
- School of Computer Science, Northumbria University, Newcastle Upon Tyne, United Kingdom
Short Summary
This study introduces a Weak Label–Guided Mixture of Experts (WL–MoE) framework for multi-horizon flash flood water-level forecasting in five fast-response catchments in North East England. The framework significantly improves predictive accuracy, particularly for high-water events, by leveraging specialized convolutional experts and a two-stage training scheme, while also providing multi-scale interpretability.
Objective
- To develop and evaluate a Weak Label–Guided Mixture of Experts (WL–MoE) framework for robust, multi-horizon flash flood water-level forecasting that is accurate under severe regime imbalance and interpretable, particularly for high-impact events.
Study Configuration
- Spatial Scale: Five fast-response catchments in Northumberland, North East England, along the River Tyne and its tributaries: Haltwhistle (W1), Acomb (W2), Riding Mill (W3), Stocksfield (W4), and Hepscott (W5).
- Temporal Scale: Water-level and rainfall records at 15-minute resolution. Data covers 2016–2025 for most sites and 2022–2025 for Stocksfield. The model performs 32-step water-level forecasting, corresponding to an 8-hour lead time.
Methodology and Data
- Models used:
- Proposed: Weak Label–Guided Mixture of Experts (WL–MoE) framework, comprising:
- Continuous Wavelet Transform (CWT) for input transformation (Morlet wavelet for water level, Mexican Hat wavelet for rainfall).
- Soft gating network (Multilayer Perceptron) for input-conditional expert weighting.
- Multiple convolutional expert networks (CNNs) with identical backbone architecture.
- Two-stage training scheme: weak-label guided initialisation (using DTW-based TimeSeriesKMeans clustering) followed by fine-tuning.
- Explainability suite: Macro-level expert-usage profiling and micro-level Grad-CAM saliency maps.
- Baselines: Multilayer Perceptron (MLP), Long Short-Term Memory network (LSTM), Convolutional Neural Network (CNN), Informer (Transformer-based), TimesNet (multi-period time-series), TsMixer (all-MLP mixing).
- Proposed: Weak Label–Guided Mixture of Experts (WL–MoE) framework, comprising:
- Data sources: Public hydrometric archive from the UK Department for Environment, Food and Rural Affairs (DEFRA).
Main Results
- WL–MoE significantly improved mean Nash–Sutcliffe Efficiency (NSE) from 0.8344 to 0.9008 on the full test set and from 0.3942 to 0.7285 on the high-water subset, relative to the strongest baseline (TsMixer average).
- The largest performance gains were observed during rapidly rising high-water events, indicating that specialized experts better represent flood-onset and recession dynamics in flashy catchments.
- The smooth soft-gating behavior suggests that hydrological states are transitional rather than sharply separated, contributing to more reliable and interpretable forecasts.
- An expert pool of nine experts provided the optimal balance between predictive accuracy, stable expert specialization, and computational cost, with diminishing returns observed for larger pools.
- Macro-level interpretability revealed consistent specialization of experts into distinct hydrometeorological niches. Micro-level Grad-CAM saliency maps highlighted physically meaningful time–scale features, such as high-frequency rainfall for flood onset and mid/low frequencies for recession.
Contributions
- Introduction of WL-MoE, a weak-label-guided regime-conditional mixture-of-experts framework for flash-flood water-level forecasting, featuring a learned soft-gating function and a two-stage training design to stabilize expert specialisation under severe regime imbalance.
- Proposal of a two-stage training scheme that counterbalances severe class skew, ensuring minority flood regimes are represented by dedicated experts and improving peak-level accuracy.
- Unification of pattern-level expert-usage profiling with instance-level Grad-CAM saliency to provide global and local explanations that respect temporal order and align with domain expectations.
Funding
- This project was funded by DEFRA (Department for Environment, Food and Rural Affairs) as part of the £200 million Flood and Coastal Innovation Programmes, managed by the Environment Agency.
Citation
@article{Wang2026Flash,
author = {Wang, Jessica and Sanderson, J.E. and Woo, Wai Lok},
title = {Flash flood forecasting in North East England through weak label-guided mixture of experts with multi-scale explainability},
journal = {Journal of Hydrology Regional Studies},
year = {2026},
doi = {10.1016/j.ejrh.2026.103402},
url = {https://doi.org/10.1016/j.ejrh.2026.103402}
}
Original Source: https://doi.org/10.1016/j.ejrh.2026.103402