Saliba et al. (2026) Genetic-algorithm based changepoints detection and homogenization of precipitation series
Identification
- Journal: Stochastic Environmental Research and Risk Assessment
- Year: 2026
- Date: 2026-01-01
- Authors: Youssef Saliba, Alina Bărbulescu
- DOI: 10.1007/s00477-025-03142-6
Research Groups
- Technical University of Civil Engineering, Bucharest, Romania
- Transilvania University of Brașov, Brașov, Romania
Short Summary
This study introduces a multi-island genetic algorithm (GA) within a Minimum Description Length (MDL)-based statistical framework to detect and correct artificial changepoints in precipitation time series. The method reliably identifies changepoints in synthetic data and successfully homogenizes the monthly Sulina precipitation series by correcting a detected shift in May 2004.
Objective
- To develop and apply an MDL-based methodology, incorporating a multi-island genetic algorithm, to detect and correct changepoints (abrupt shifts in mean values) in climate time series, specifically focusing on precipitation.
- To propose and implement a homogenization procedure for monthly precipitation series from a network of meteorological stations, leveraging the GA approach and reference series to minimize an MDL objective function, especially when metadata is unavailable.
Study Configuration
- Spatial Scale: A network of 10 meteorological stations in Dobrogea, Romania. The homogenization procedure was applied to the Sulina station, using Jurilovca and Tulcea stations as references. Distances between stations are measured in kilometers.
- Temporal Scale: Monthly precipitation series spanning from 1965 to 2019.
Methodology and Data
- Models used:
- Regression modeling with autoregressive error structures (Periodic Autoregressive Process - PAR)
- Iterative Cochrane–Orcutt method
- Yule–Walker equations
- Minimum Description Length (MDL) principle for model selection
- Multi-island Genetic Algorithm (GA) with island partitioning, migration, and partial restarts
- Generalized Least Squares (GLS) for iterative parameter estimation
- Haversine distance for geographic proximity
- Pearson correlation coefficient for statistical similarity
- Median for estimating central tendency in segments
- Welch’s t-test for comparing means
- Autocorrelation Function (ACF) for assessing temporal consistency
- Data sources:
- Synthetic data generated to emulate realistic climate variability with seasonal components, linear trends, abrupt mean shifts, and autocorrelated errors.
- Monthly precipitation series recorded at 10 meteorological stations in Dobrogea, Romania, for the period 1965–2019. Data is available on the ECAD site (https://www.ecad.eu/).
Main Results
- Changepoint Detection on Simulated Data:
- For a series of length 200 with one changepoint (CP) at time index 100, the algorithm reliably detected the CP at time index 101 for a noise-to-shift ratio (κ) of 1.0, and perfectly at time index 100 for κ values of 1.5 and 2.0.
- For a series of length 400 with two CPs at time indices 150 and 300 (κ=1.0), the first CP was perfectly detected, and the second was predominantly found at time index 303 (in 99 out of 100 runs).
- For a series of length 800 with three CPs at time indices 200, 400, and 600 (κ=1.0), CPs were identified in the vicinity of the true locations (e.g., true CP at 200 detected around 199, 400 around 398, and 600 around 598).
- Changepoint Detection and Homogenization of Sulina Series:
- The algorithm identified a significant changepoint at index 473 (corresponding to May 2004) in the Sulina precipitation series, which was cross-validated by the nonparametric CUSUM test.
- The homogenization procedure, using Jurilovca and Tulcea as reference stations (distances 73.946 km and 65.507 km, correlations 0.74 and 0.71 respectively), applied an adjustment factor of -0.197542 to the post-CP segment.
- Diagnostic tests confirmed the effectiveness of homogenization:
- Welch’s t-test showed that significantly different pre- and post-CP means in the target–reference difference series (–0.487 vs. −0.802) converged after adjustment (post-CP mean −0.605, pre-CP mean −0.486).
- The Autocorrelation Function (ACF) of the adjusted target–reference differences showed no abrupt shifts or structural breaks, with lag-0 correlation of 1 and lag-1 correlation of approximately 0.2, indicating consistent serial dependence.
- Spatial coherence analysis revealed that the adjusted Sulina series aligned more closely with the local neighbor composite (Pearson correlation 0.767, bias -0.491, RMSE 0.789) than with the overall regional average (Pearson correlation 0.725, bias -0.54, RMSE 0.83).
Contributions
- Introduction of a novel Minimum Description Length (MDL) methodology for detecting and correcting changepoints in hydrological time series, specifically designed for situations without metadata.
- Development of a multi-island genetic algorithm (GA) for optimal selection of changepoint locations and autoregressive orders, incorporating island partitioning, migration, and partial restarts to enhance search efficiency and prevent premature convergence.
- Integration of regression modeling with periodically autoregressive (PAR) dependent residuals, estimated iteratively via a Generalized Least Squares (GLS) framework, to accurately capture seasonal cycles and autocorrelation.
- Demonstration of the methodology's robustness through extensive simulations under various noise levels and multiple structural breaks, and its practical effectiveness in homogenizing a real-world precipitation series (Sulina, Romania).
Funding
Not explicitly stated in the provided text.
Citation
@article{Saliba2026Geneticalgorithm,
author = {Saliba, Youssef and Bărbulescu, Alina},
title = {Genetic-algorithm based changepoints detection and homogenization of precipitation series},
journal = {Stochastic Environmental Research and Risk Assessment},
year = {2026},
doi = {10.1007/s00477-025-03142-6},
url = {https://doi.org/10.1007/s00477-025-03142-6}
}
Original Source: https://doi.org/10.1007/s00477-025-03142-6