Poyam et al. (2025) Assessment of performance of conventional and machine learning methods for estimating missing precipitation data
Identification
- Journal: Acta Geophysica
- Year: 2025
- Date: 2025-11-07
- Authors: Akhilesh Poyam, Vikas Kumar Vidyarthi, Manikant Verma
- DOI: 10.1007/s11600-025-01717-z
Research Groups
Department of Civil Engineering, National Institute of Technology Raipur, Raipur, India.
Short Summary
This study assesses the performance of fifteen conventional and two machine learning (Artificial Neural Network and Long Short-Term Memory) methods for estimating missing precipitation data, finding that the Artificial Neural Network generally outperforms all conventional methods and LSTM for longer missing periods.
Objective
- Calculate missing precipitation data using fifteen different conventional methods based on weightage, spatial distance, and linear regression techniques.
- Predict missing precipitation using Artificial Neural Network (ANN) and Long Short-Term Memory (LSTM) techniques.
- Compare the performance of all fifteen conventional methods, ANN, and LSTM in filling missing precipitation data.
Study Configuration
- Spatial Scale: Raipur rain gauge station (target) and 12 surrounding rain gauge stations in Chhattisgarh, India. The study area covers approximately 226 square kilometers, with radial distances between stations ranging from 29 kilometers to 130 kilometers.
- Temporal Scale: 41 years (1980-2020) of precipitation data for training and testing. Missing data periods were simulated for 6 months, 1 month, and 15 days.
Methodology and Data
- Models used:
- Conventional: Arithmetic Average (AA), Normal Ratio (NR), Correlation Coefficient Weighted (CCW), Inverse Distance Weighting (IDW), Geographical Coordinates (GC), UK method (UK), Closest Station Method (CSM), Multiple Imputations (MI) (based on k-nearest neighbor), Multiple Linear Regression (MLR), Normal Ratio with Geographical Coordinates (NRGC), Modified Correlation Coefficient with Inverse Distance Weighting (MCCIDW), Modified Old Normal Ratio with Inverse Distance (ONRID), Normal Ratio Inverse Distance Weighting with Correlation (NRIDC), Modified Normal Ratio based on Square Root Distance (MNRT), Weighted Linear Regression (WLR).
- Machine Learning: Artificial Neural Network (ANN) (Multi-Layered Perceptron with feed-forward backpropagation and Levenberg–Marquardt algorithm), Long Short-Term Memory (LSTM).
- Data sources: Ground-based precipitation data from 12 rain gauge stations in Chhattisgarh, India, obtained from the Indian Meteorological Department (IMD), Pune.
Main Results
- The Artificial Neural Network (ANN) technique generally outperformed all fifteen conventional methods across all assessed missing data durations.
- For 6 months of missing data, ANN performed best with a Mean Absolute Error (MAE) of 1.82 mm, Pearson's correlation coefficient (R) of 0.970, and Root Mean Square Error (RMSE) of 3.78 mm, outperforming LSTM.
- For 1 month of missing data, ANN performed best with MAE of 2.62 mm, R of 0.976, and RMSE of 4.22 mm, also outperforming LSTM.
- For 15 days of missing data, LSTM showed slightly better performance with MAE of 2.30 mm, R of 0.987, and RMSE of 3.28 mm, compared to ANN (R of 0.99 but slightly higher error metrics).
- The accuracy of all methods increased as the duration of missing precipitation data decreased (i.e., 15 days > 1 month > 6 months).
- Among conventional methods, the Normal Ratio (NR) method performed best in the weightage-based category, while Inverse Distance Weighting (IDW) and Geographical Coordinates (GC) methods performed well in the spatial distance-based category.
Contributions
- Provided a comprehensive comparative assessment of fifteen conventional and two state-of-the-art machine learning methods for imputing missing precipitation data.
- Demonstrated the superior performance of Artificial Neural Networks over traditional methods and Long Short-Term Memory for longer missing precipitation periods (6 months and 1 month).
- Identified LSTM as a highly effective method for shorter missing periods (15 days), showing slightly better accuracy than ANN in this specific scenario.
- Highlighted the practical advantage of ANN, which requires only precipitation data from nearby gauges without needing additional spatial information (like distance or location), making it suitable for data-scarce or inaccessible regions.
- Offered a robust framework and specific recommendations for selecting appropriate methods for missing precipitation data estimation in hydro-climatic studies.
Funding
The data were acquired freely from the Indian Meteorological Department, Pune. Computational facilities were provided by the Department of Civil Engineering, NIT Raipur. No specific funding projects or programs were listed.
Citation
@article{Poyam2025Assessment,
author = {Poyam, Akhilesh and Vidyarthi, Vikas Kumar and Verma, Manikant},
title = {Assessment of performance of conventional and machine learning methods for estimating missing precipitation data},
journal = {Acta Geophysica},
year = {2025},
doi = {10.1007/s11600-025-01717-z},
url = {https://doi.org/10.1007/s11600-025-01717-z}
}
Original Source: https://doi.org/10.1007/s11600-025-01717-z