Papalexiou et al. (2025) Machine unlearning: bias correction in neural network downscaled storms

Identification

Journal: Journal of Hydrology
Year: 2025
Date: 2025-11-28
Authors: Simon Michael Papalexiou, Antonios Mamalakis
DOI: 10.1016/j.jhydrol.2025.134689

Research Groups

Institute of Global Water Security, Hamburg University of Technology, Hamburg, Germany
Department of Environmental Sciences, University of Virginia, Charlottesville, VA, USA
School of Data Science, University of Virginia, Charlottesville, VA, USA

Short Summary

This study evaluates four machine learning models for downscaling precipitation using synthetic benchmark storms, demonstrating that combining machine learning with post-processing bias correction ("machine unlearning") is crucial for reliable outputs, especially for Wasserstein Generative Adversarial Networks (WGANs). It finds that raw neural network outputs struggle to reproduce key statistical properties and wet/dry boundaries, necessitating systematic bias correction for operational use.

Objective

To evaluate the performance of various neural network models (Linear Network, Fully Connected Network, UNet, and Wasserstein Generative Adversarial Network) in downscaling synthetic precipitation storm fields from a coarse 6 × 6 grid to a fine 60 × 60 grid.
To demonstrate that combining machine learning with post-processing bias correction approaches ("machine unlearning") yields improved and operationally reliable performance by correcting biases in statistical properties and extreme values.

Study Configuration

Spatial Scale: Downscaling from a coarse 6 × 6 grid to a fine 60 × 60 grid (a tenfold downscaling).
Temporal Scale: 55,000 synthetic storm fields generated, with 50,000 used for training/validation and 5,000 for testing. Storms move with an advection velocity of (6, -3) units per time step.

Methodology and Data

Models used:
- Storm Generation: Complete Stochastic Modeling System (CoSMoS)
- Downscaling Neural Networks:
  - Linear Network (LNet)
  - Fully Connected Network (FCNet)
  - Convolutional Neural Network (UNet)
  - Wasserstein Generative Adversarial Network (WGAN)
- Bias Correction:
  - Linear Bias Correction (LBC)
  - Nonlinear Bias Correction (NLBC) using Generalized Exponential Type I (GE1) distribution fitting
Data sources: Synthetic benchmark storms generated by CoSMoS, characterized by a highly skewed GE4 distribution, Ali-Mikhail-Haq-Weibull spatiotemporal correlation structure, constant advection velocity, and affine anisotropy.

Main Results

Raw NN Outputs:
- LNet, FCNet, and UNet oversmoothed storm cells and boundaries, failing to capture fine-scale structures.
- All models produced small negative values and failed to reproduce the observed probability of zero (p0 = 0.7), leading to inaccurate wet/dry regions.
- Summary statistics for positive precipitation were distorted: mean and second L-moment were underestimated, while L-skewness and L-kurtosis were overestimated.
- LNet, FCNet, and UNet overestimated temporal and spatial correlations.
- WGAN accurately captured complex spatiotemporal dependencies and realistic spatial patterns but still failed to reproduce the correct probability of zero and tail behavior.
Linear Bias Correction (LBC):
- Eliminated negative values and accurately represented wet/dry boundaries by matching the observed p0.
- Significantly improved summary statistics: mean and second L-moment were close to observed values, and L-skewness and L-kurtosis overestimations were largely eliminated.
- Resulted in minor improvements in spatiotemporal dependence structures.
- WGAN continued to show superior performance, but discrepancies in tail behavior persisted.
Nonlinear Bias Correction (NLBC):
- Effectively aligned the empirical distributions with benchmark data, significantly improving the representation of extreme values and correcting tail behavior.
- Preserved accurate wet/dry regions and well-reproduced summary statistics.
- For WGAN, NLBC resulted in minimal mean bias, better alignment in L-skewness and L-kurtosis, and additional slight enhancements in spatiotemporal dependencies, exhibiting nearly zero bias in spatial correlations.
- WGAN, combined with NLBC, emerged as the most promising model for replicating both distributional properties and fine-scale dependencies.

Contributions

This study provides a comprehensive comparison of four neural network architectures for precipitation downscaling using fully controlled synthetic benchmark storms, allowing for rigorous evaluation of statistical and spatiotemporal properties.
It introduces and demonstrates the critical role of post-processing bias correction, termed "machine unlearning," for neural network outputs in hydrologic downscaling, addressing limitations in reproducing wet/dry boundaries, marginal distributions, and extreme values.
It highlights WGAN's superior ability to capture complex spatiotemporal dependencies and fine-scale storm structures compared to other neural networks, while also showing that its distributional biases can be effectively corrected.
The research proposes a principled two-stage workflow for operational downscaling: first, using generative models for realistic structural representation, followed by calibration (bias correction) to enforce marginal properties, mirroring established practices in climate modeling.

Funding

The CoSMoS R package (Papalexiou et al., 2021a, Papalexiou et al., 2021b) was used.

Citation

@article{Papalexiou2025Machine,
  author = {Papalexiou, Simon Michael and Mamalakis, Antonios},
  title = {Machine unlearning: bias correction in neural network downscaled storms},
  journal = {Journal of Hydrology},
  year = {2025},
  doi = {10.1016/j.jhydrol.2025.134689},
  url = {https://doi.org/10.1016/j.jhydrol.2025.134689}
}

Original Source: https://doi.org/10.1016/j.jhydrol.2025.134689