Gonzalez et al. (2026) Machine learning and predictive models for water management: a systematic review

Identification

Journal: Frontiers in Water
Year: 2026
Date: 2026-02-05
Authors: Miguel Gonzalez, Sergio Gabriel Ceballos Pérez, Hugo Nathanael Lara Figueroa, Francisco Jacob Ávila Camacho, Leonardo Miguel Moreno Villalba, Juan Manuel Stein Carrillo, A. Cano
DOI: 10.3389/frwa.2026.1756052

Research Groups

Financial Engineering Department, Polytechnic University of Pachuca, Pachuca, Hidalgo, Mexico
Department of Researchers for Mexico, Secretary of Science, Humanities, Technology and Innovation, Mexico City, Mexico
National Technological Institute of Mexico/TES Ecatepec, Ecatepec, Estado de México, Mexico

Short Summary

This systematic review analyzes the application of machine learning (ML) in water management, identifying dominant algorithms, performance metrics, and methodological gaps. It concludes that ML is a strategic tool for water management, particularly for forecasting and bias correction, but requires improved reproducibility, uncertainty quantification, and integration of anthropogenic factors for operational maturity.

Objective

To conduct a systematic review of the literature on machine learning applications in water management, focusing on prevalent hydrological tasks, algorithms, performance, data schemas, validation protocols, and identifying current gaps and future research opportunities.

Study Configuration

Spatial Scale: The reviewed studies predominantly focused on medium-sized basins (1,000–50,000 km²), with coverage also extending to small (<1,000 km²) and large (>50,000 km²) basins. Geographic focus was primarily on Asia, Europe, and North America.
Temporal Scale: Data series used in the reviewed studies typically spanned 10–50 years. The systematic review itself analyzed publications from 2010 to 2025.

Methodology and Data

Models used:
- Machine Learning: Deep learning models (Long Short-Term Memory (LSTM), Convolutional Neural Networks (CNN)), ensemble methods (Random Forest (RF), XGBoost, CatBoost), shallow models (Artificial Neural Networks (ANN), Support Vector Machines (SVM), Extreme Learning Machines (ELM)).
- Hybrid Models: Combinations of ML algorithms with physical hydrological models (e.g., SWAT, VIC, H08, CaMa-Flood).
Data sources:
- Review Process: Scopus, Web of Science, IEEE Xplore, ScienceDirect, MDPI Water thematic collection, Google Scholar, and manual reference screening.
- Reviewed Studies: In-situ stations, remote sensing (e.g., TRMM, MODIS, CHIRPS), reanalysis datasets (e.g., NLDAS, GLDAS), and global hydrological databases (e.g., CAMELS, Caravan, GRDC, E-OBS, PRISM, USGS).

Main Results

Deep learning models, particularly LSTM, demonstrated superior performance in hydrological time series prediction, achieving a median Nash–Sutcliffe Efficiency (NSE) of 0.87 (range: 0.75–0.94) for streamflow and reducing Root Mean Square Error (RMSE) by 15–40% compared to statistical or shallow ANN models.
Ensemble methods (RF, XGBoost, CatBoost) showed high robustness and consistency, especially in data-constrained or noisy environments, with a median NSE of 0.81 and high classification accuracy (90–100%) for hydrological drought.
Hybrid ML + physical model approaches effectively reduced structural biases of physical models, improving accuracy with a median NSE of 0.91 and RMSE reductions of 18–33% compared to physical baselines.
Key methodological gaps identified include a lack of reproducibility (only 17% of studies published full hyperparameters, 11% included quantitative uncertainty intervals), insufficient integration of anthropogenic factors (only 22% explicitly modeled human interventions), and limited use of comprehensive hydrological metrics (NSE or Kling–Gupta Efficiency (KGE) in 56% of studies, KGE decomposition in 11%).
The geographic focus of the reviewed studies was predominantly Asia, Europe, and North America, with data series typically spanning 10–50 years.

Contributions

Provides a structured synthesis of the state-of-the-art in ML for water management, following PRISMA 2020 guidelines.
Systematically compares methodological approaches, identifies dominant algorithms and their performance characteristics across various hydrological tasks (streamflow, drought, flood, global hydrology).
Highlights critical challenges in the field, including poor reproducibility, inadequate uncertainty quantification, insufficient integration of anthropogenic factors, and the opacity of deep learning models.
Proposes a clear research agenda emphasizing the need for physically informed hybrid models, adoption of FAIR principles for reproducibility, explicit representation of human activities, quantitative uncertainty analysis, and standardized benchmarking using hydrologically relevant metrics.

Funding

The author(s) declared that financial support was not received for this work and/or its publication.

Citation

@article{Gonzalez2026Machine,
  author = {Gonzalez, Miguel and Pérez, Sergio Gabriel Ceballos and Figueroa, Hugo Nathanael Lara and Camacho, Francisco Jacob Ávila and Villalba, Leonardo Miguel Moreno and Carrillo, Juan Manuel Stein and Cano, A.},
  title = {Machine learning and predictive models for water management: a systematic review},
  journal = {Frontiers in Water},
  year = {2026},
  doi = {10.3389/frwa.2026.1756052},
  url = {https://doi.org/10.3389/frwa.2026.1756052}
}

Original Source: https://doi.org/10.3389/frwa.2026.1756052