Boukdire et al. (2025) Interpolation and Machine Learning Methods for Sub-Hourly Missing Rainfall Data Imputation in a Data-Scarce Environment: One- and Two-Step Approaches
⚠️ Warning: This summary was generated from the abstract only, as the full text was not available.
Identification
- Journal: Hydrology
- Year: 2025
- Date: 2025-11-10
- Authors: Mohamed Boukdire, Çağrı Alperen İnan, Giada Varra, Renata Della Morte, Luca Cozzolino
- DOI: 10.3390/hydrology12110297
Research Groups
Not specified in the provided text.
Short Summary
This study develops and evaluates machine learning and interpolation approaches for imputing missing 10-minute rainfall data, demonstrating that a two-step machine learning approach, which first classifies rain/no-rain periods, consistently outperforms direct methods and traditional interpolation techniques.
Objective
- To develop and test machine learning (Multilayer Perceptron, Random Forest) and interpolation methods (Inverse Distance Weighting, Ordinary Kriging) for the imputation of missing 10-minute rainfall data, particularly under data-scarce conditions.
Study Configuration
- Spatial Scale: Two distinct environments in the Campania region (Southern Italy): mountainous and coastal. Spatial scenarios included using all nearby stations, stations within the same cluster, and the three most highly correlated stations.
- Temporal Scale: 10-minute rainfall data. Tests included using data from the imputed time interval only and data from a time window containing several time intervals before and after the imputed time interval.
Methodology and Data
- Models used: Multilayer Perceptron (MLP), Random Forest (RF) for machine learning; Inverse Distance Weighting (IDW), Ordinary Kriging (OK) for interpolation. A Random Forest classifier was specifically used in the first step of the two-step approach.
- Data sources: Sub-hourly rainfall depth observations from rain gauges, under data-scarce conditions where rainfall depth was the only available variable.
Main Results
- The two-step approach consistently improved imputation accuracy across all spatial and temporal scenarios compared to the direct approach.
- The Random Forest classifier in the two-step approach showed strong performance in classifying rain/no-rain periods, with an average weighted F1 score of 0.961 (mountainous) and 0.957 (coastal), and average Accuracy of 0.928 (mountainous) and 0.946 (coastal).
- For the regression step, Random Forest achieved the highest performance in the mountainous area with an R² of 0.541 and a Root Mean Square Error (RMSE) of 0.109 mm, considering all stations.
- Using time windows with lagged data significantly improved results by capturing atmospheric dynamics and connecting rainfall instances across different time levels and stations.
- Machine learning models consistently outperformed spatial interpolation methods due to their ability to manage complex data structures.
Contributions
- Introduction and validation of a novel two-step imputation approach that separates rain/no-rain classification from rainfall depth estimation, leading to improved accuracy for sub-hourly rainfall data.
- Demonstration of the significant benefits of incorporating time-lagged data windows for capturing atmospheric dynamics in rainfall imputation.
- Confirmation that machine learning models are superior to traditional spatial interpolation methods for high-resolution rainfall data imputation, especially in data-scarce conditions.
- Evaluation of these methods in diverse geographical settings (mountainous and coastal) under challenging data-scarce conditions.
Funding
Not specified in the provided text.
Citation
@article{Boukdire2025Interpolation,
author = {Boukdire, Mohamed and İnan, Çağrı Alperen and Varra, Giada and Morte, Renata Della and Cozzolino, Luca},
title = {Interpolation and Machine Learning Methods for Sub-Hourly Missing Rainfall Data Imputation in a Data-Scarce Environment: One- and Two-Step Approaches},
journal = {Hydrology},
year = {2025},
doi = {10.3390/hydrology12110297},
url = {https://doi.org/10.3390/hydrology12110297}
}
Original Source: https://doi.org/10.3390/hydrology12110297