Abbasizadeh et al. (2025) Can causal discovery lead to a more robust prediction model for runoff signatures?
Identification
- Journal: Hydrology and earth system sciences
- Year: 2025
- Date: 2025-09-30
- Authors: Hossein Abbasizadeh, Petr Máca, Martin Hanel, Mads Troldborg, Amir AghaKouchak
- DOI: 10.5194/hess-29-4761-2025
Research Groups
- Faculty of Environmental Sciences, Czech University of Life Sciences Prague, Prague, Czech Republic
- The James Hutton Institute, Invergowrie, Dundee DD2 5DA, Scotland, UK
- Department of Civil and Environmental Engineering, University of California, Irvine, CA, USA
- United Nations University Institute for Water, Environment and Health, Hamilton, ON, Canada
- Department of Earth System Science, University of California, Irvine, CA, USA
Short Summary
This study investigates whether incorporating causal relationships between catchment attributes, climate indices, and runoff signatures can lead to more robust and interpretable prediction models. The findings indicate that models trained on causally identified parent variables, particularly Bayesian Networks and Generalized Additive Models, demonstrate enhanced robustness and parsimony across diverse environments compared to models using all available predictors.
Objective
- To determine if direct causes (causal parents) of runoff signatures, representing independent causal mechanisms, can explain catchment responses across different environments.
- To investigate whether training prediction models using causal parents results in more robust, parsimonious, and physically interpretable predictions compared to models that do not use causal information.
- Specifically, the study aims to: (1) identify causal relationships between catchment attributes, climate characteristics, and 11 runoff signatures using the PC causal discovery algorithm; (2) execute prediction models using both causal parents and all selected variables across the entire dataset and clustered subsets; and (3) evaluate the robustness of these causal and non-causal models.
Study Configuration
- Spatial Scale: 671 catchments spanning the contiguous USA.
- Temporal Scale: Catchment attributes and runoff signatures were calculated over the period from 1 October 1989 to 30 September 2009.
Methodology and Data
- Models used:
- Causal Discovery: Peter and Clark (PC) causal discovery algorithm (PC-stable variant).
- Prediction Models: Bayesian Network (BN) (specifically Gaussian BN), Generalized Additive Model (GAM), Random Forest (RF).
- Clustering: K-medoids (PAM) for continuous variables (soil, topography), Gower distance for mixed variables (climate, geology, vegetation).
- Data sources:
- Catchment Attributes and MEteorology for Large-sample Studies (CAMELS) dataset (Newman et al., 2015; Addor et al., 2017).
- The dataset includes 41 catchment and climate attributes (22 continuous variables selected for causal discovery and prediction) and 11 runoff signatures.
- Attribute data sources: State Soil Geographic Database (STATSGO), Global Lithological Map (GLiM), Global HYdrogeology MaPS (GLHYMAPS), Moderate Resolution Imaging Spectroradiometer (MODIS).
Main Results
- The PC algorithm successfully identified physically meaningful causal parents for most runoff signatures, aligning with hydrological processes (e.g., snow fraction driving baseflow index, precipitation characteristics for high flows). However, it occasionally missed expected influences (e.g., land cover for high-flow frequency).
- Causal parents identified by the PC algorithm were not always the most highly correlated variables, emphasizing the distinction between correlation and causation.
- Bayesian Network (BN) models exhibited the smallest decrease in accuracy between training and test simulations across all environments, demonstrating high transferability and robustness to sample size and predictor distribution shifts, despite generally lower overall accuracy compared to non-linear models.
- Using causal parents significantly improved the robustness and parsimony of Generalized Additive Models (GAMs), mitigating overfitting, especially in environments with smaller training sets. Causal GAMs often outperformed non-causal GAMs in test sets.
- Random Forest (RF) models achieved the highest overall accuracy. However, non-causal RF models (using all variables) often showed a more significant drop in accuracy between training and test phases compared to causal RF models (using only causal parents). For low-flow duration, high-flow duration, low flows (Q5), and high flows (Q95), causal RF models performed comparably to non-causal RF models with fewer predictors.
- Model performance was highest in catchments with homogeneous soil properties and high precipitation/low elevation. Conversely, performance was lowest in areas characterized by high variability in forest fraction/leaf area index or high elevation/steep slopes.
Contributions
- Introduces a novel framework for predicting runoff signatures by integrating causal inference techniques into predictive modeling, moving beyond traditional correlation-based feature selection.
- Demonstrates the first application of causal discovery to identify direct causes of runoff signatures, enhancing the interpretability and physical consistency of hydrological prediction models.
- Shows that using causally identified parent variables can improve model robustness and parsimony, particularly for GAMs, by reducing overfitting and maintaining accuracy across diverse environmental conditions with fewer predictors.
- Provides insights into how causal discovery can inform catchment classification and regionalization efforts by identifying independent variables that determine consistent model performance across different catchment groups.
Funding
- e-INFRA CZ project (ID 90254)
- ˇCeská Zemˇedˇelská Univerzita v Praze (grant nos. 2023B0026 and 2024B0003)
- Ministerstvo Školství, Mládeže a Tˇelovýchovy (AdAgriF – Advanced methods of greenhouse gases emission reduction and sequestration in agriculture and forest landscape for climate change mitigation (grant no. CZ.02.01.01/00/22_008/0004635))
Citation
@article{Abbasizadeh2025Can,
author = {Abbasizadeh, Hossein and Máca, Petr and Hanel, Martin and Troldborg, Mads and AghaKouchak, Amir},
title = {Can causal discovery lead to a more robust prediction model for runoff signatures?},
journal = {Hydrology and earth system sciences},
year = {2025},
doi = {10.5194/hess-29-4761-2025},
url = {https://doi.org/10.5194/hess-29-4761-2025}
}
Original Source: https://doi.org/10.5194/hess-29-4761-2025