Guo et al. (2026) Simulation and Rapid Prediction of Water Quantity and Quality Processes Based on Numerical Models and Deep Learning
Identification
- Journal: Water Resources Management
- Year: 2026
- Date: 2026-03-01
- Authors: Qingyuan Guo, Jingming Hou, Tian Wang, Xinxin Pan, Guangxue Luan, R.F. Zhang
- DOI: 10.1007/s11269-026-04505-6
Research Groups
- State Key Laboratory of Water Engineering Ecology and Environment in Arid Area, Xi’an University of Technology, Xi’an, China
- Xi’an Meteorological Bureau of Shanxi Province, Xi’an, China
Short Summary
This study develops a coupled 1D-2D numerical model (GAST-SWMM) to simulate urban water quantity and quality processes, generating a training database for a Long Short-Term Memory (LSTM) deep learning model. The LSTM model provides rapid and accurate predictions of pollutant concentrations on urban surfaces and within sewer networks, outperforming other machine learning models and significantly reducing computational time.
Objective
- To develop a city-scale coupled 1D-2D numerical model for pollutant transport and dispersion to generate a training database for machine learning models, addressing data scarcity in urban pollutant monitoring.
- To develop and evaluate an LSTM deep learning model for rapid and accurate prediction of pollutant concentrations on urban surfaces and within sewer networks, comparing its performance against other machine learning models (KNN, RF) and the physics-based model.
Study Configuration
- Spatial Scale: Xiaozhai subdistrict, Yanta District, Xi’an, Shaanxi Province, China, covering approximately 17 square kilometers (km²). Digital Elevation Model (DEM) resolution is 4 meters × 4 meters (m × m). The sewer system includes 735 stormwater nodes, 766 pipes, 98 subcatchments, and 3 outfalls.
- Temporal Scale: Daily rainfall data from 1990–2020 for Xi’an. Simulation duration for each rainfall event is 4 hours (h) with a 10-minute (min) time interval, resulting in 25 time steps. Rainfall events with return periods ranging from 5 to 200 years were generated.
Methodology and Data
- Models used:
- Physics-based Numerical Model: GAST-SWMM coupled model (GAST for 2D surface flow and pollutant transport based on shallow water equations, SWMM for 1D sewer network hydrodynamics and pollutant transport). GAST uses a finite-volume method with HLLC Riemann solver, TVD-MUSCL scheme, two-step Runge-Kutta method, and GPU parallel computing (CUDA).
- Deep Learning Model: Long Short-Term Memory (LSTM) neural network.
- Comparative Machine Learning Models: K-Nearest Neighbors (KNN) and Random Forest (RF).
- Data sources:
- Training Database Generation: 500 synthetic rainfall events generated using the Chicago rainfall intensity formula for Xi’an (based on 1990–2020 data), driving the calibrated GAST-SWMM model to produce surface and sewer pollutant concentration time series.
- Calibration/Validation Data: Field observations of rainfall, flow, and water quality (Total Suspended Solids - TSS) from two events (June 23, 2016, and July 24, 2016) at a downstream outfall, reported by Xue (2018).
- Geospatial Data: Digital Elevation Model (DEM), land use data, subcatchment delineation, sewer network data, and node information.
- Input Features for LSTM: Maximum rainfall intensity within 2 h, maximum rainfall intensity within 4 h, cumulative rainfall, mean rainfall intensity, pre-peak rainfall, temporal variation coefficient, and peak rainfall coefficient.
- Evaluation Metrics: Nash–Sutcliffe Efficiency (NSE), Mean Absolute Error (MAE), Root Mean Square Error (RMSE).
Main Results
- Physics-based Model Performance: The coupled 1D-2D GAST-SWMM model achieved good accuracy, with Nash–Sutcliffe Efficiency (NSE) values greater than 0.8 for both flow and water quality simulations when validated against observed data.
- Pollutant Transport Characteristics (Physics-based Model):
- Surface pollutant concentrations generally decrease as the rainfall return period increases (e.g., 0.15–0.20 kilograms per cubic meter (kg/m³) in early stages for small return periods), due to stronger dilution from higher rainfall intensity.
- Pollutant loads, however, increase substantially and the affected area expands with increasing rainfall return periods, as larger runoff volumes mobilize more pollutants.
- Localized high-concentration zones emerge in later stages of intense rainfall events (e.g., 10-year and 50-year return periods) due to sewer surcharge and overflow, particularly in low-lying areas.
- Pollutant concentrations at sewer nodes typically peak between 1800 seconds (s) and 3600 s, with peak flows at key nodes ranging from 0.27 cubic meters per second (m³/s) to 10.14 m³/s, and at the outlet from 28.65 m³/s to 53.59 m³/s.
- Deep Learning Model Performance (LSTM):
- Accuracy: The LSTM model demonstrated high predictive accuracy for pollutant concentrations.
- For surface concentrations, NSE consistently exceeded 0.95, MAE was below 0.0015 kg/m³, and RMSE was below 0.002 kg/m³.
- For sewer node concentrations, NSE was approximately 0.95. Errors were higher for peak concentrations (e.g., MAE up to 0.00872 kg/m³ for Node 1 under a 2-year event, RMSE up to 0.01308 kg/m³ for Node 3 under a 10-year event).
- Forecasted maximum and mean concentrations, and total pollutant load, were generally lower than simulated values, while the forecasted pollutant-affected area was higher. The largest errors (0.00245–0.01304 kg/m³) occurred in predicting maximum pollutant concentrations.
- Efficiency: The LSTM model's average simulation time for a single rainfall event was less than 0.83 s, making it 5317 to 6000 times faster than the GAST-SWMM model.
- Comparison: LSTM consistently maintained an accuracy (NSE) above 0.90 across all scenarios, outperforming KNN (92.36% to 88.93%) and RF (89.93% to 88.64%) in both accuracy and computational efficiency.
- Accuracy: The LSTM model demonstrated high predictive accuracy for pollutant concentrations.
Contributions
- Development of a novel city-scale coupled 1D-2D numerical model (GAST-SWMM) for comprehensive simulation of urban water quantity and quality processes, including surface flow, sewer network dynamics, and pollutant transport (build-up, wash-off, and dispersion).
- Introduction of a robust methodology to generate high-accuracy training databases for machine learning models using physics-based simulations, effectively addressing the challenge of limited pollutant monitoring data in urban areas.
- Creation of an LSTM-based rapid prediction model capable of forecasting urban pollutant concentrations on surfaces and within sewer networks with high accuracy (NSE > 0.95) and exceptional computational efficiency (less than 0.83 s per event), significantly outperforming traditional numerical models and other ML approaches (KNN, RF).
- Provision of technical support for rapid response, early warning systems, and refined urban water-environment management in data-scarce cities, enabling dynamic prediction of pollutant transport under various rainfall conditions.
Funding
- Open Research Fund Program of State Key Laboratory of Eco-hydraulics in Northwest Arid Region, Xi’an University of Technology (Grant No. 2024KFKT-7)
- National Key R&D Program of China (2024YFC3012403)
- Science and Technology Program of Xi’an (24SFSF0010)
- Technology Innovation Leading Program of Shaanxi (2024QY-SZX-27)
Citation
@article{Guo2026Simulation,
author = {Guo, Qingyuan and Hou, Jingming and Wang, Tian and Pan, Xinxin and Luan, Guangxue and Zhang, R.F.},
title = {Simulation and Rapid Prediction of Water Quantity and Quality Processes Based on Numerical Models and Deep Learning},
journal = {Water Resources Management},
year = {2026},
doi = {10.1007/s11269-026-04505-6},
url = {https://doi.org/10.1007/s11269-026-04505-6}
}
Original Source: https://doi.org/10.1007/s11269-026-04505-6