Malakouti (2025) Leveraging SHapley Additive exPlanations (SHAP) and fuzzy logic for efficient rainfall forecasts
Identification
- Journal: Scientific Reports
- Year: 2025
- Date: 2025-10-20
- Authors: Seyed Matin Malakouti
- DOI: 10.1038/s41598-025-22081-4
Research Groups
Amirkabir University of Technology, Tehran, Iran
Short Summary
This study introduces a hybrid machine learning framework combining a Light Gradient Boosting Machine (LGBM) classifier with a fuzzy logic system to deliver rapid, reliable, and interpretable daily rainfall forecasts using ten years of meteorological data from diverse Australian locations. The framework demonstrates superior accuracy and computational efficiency compared to conventional models, providing valuable insights for decision-makers.
Objective
- To develop and evaluate a hybrid machine learning framework, integrating a Light Gradient Boosting Machine (LGBM) classifier with a fuzzy logic system, for efficient, accurate, and interpretable daily rainfall forecasting.
Study Configuration
- Spatial Scale: Seven weather stations across southeastern Australia (Sydney Airport, Canberra, Melbourne, Brisbane, Adelaide, Perth, Darwin), covering diverse climate regimes from temperate to subtropical.
- Temporal Scale: Ten years of daily meteorological observations (2013–2022).
Methodology and Data
- Models used:
- Primary: Hybrid Light Gradient Boosting Machine (LGBM) classifier combined with a Mamdani-style fuzzy logic system.
- Baselines for comparison: Logistic Regression, Decision Trees, Random Forests, Gradient Boosting, Artificial Neural Networks (ANN), Support Vector Machines (SVM), AdaBoost, XGBoost, RAINER (ensemble), AdaNAS (neural-fuzzy).
- Data sources:
- Daily meteorological records from seven Australian weather stations (Kaggle dataset).
- 14 meteorological features: Minimum Temperature (°C), Maximum Temperature (°C), Relative Humidity at 9 am (%), Relative Humidity at 3 pm (%), Sea-Level Pressure at 9 am (hPa), Sea-Level Pressure at 3 pm (hPa), Wind Speed at 9 am (m/s), Wind Speed at 3 pm (m/s), Wind Gust Speed (m/s), Cloud cover at 9 am (oktas), Cloud cover at 3 pm (oktas), Sunshine (hours), Evaporation (mm), Daily Rainfall (mm).
- Data preprocessing: Median imputation for skewed variables (precipitation), k-nearest-neighbors (KNN) imputation for temperature and humidity. Feature engineering included 1-day and 7-day lag features, 30-day moving averages, cumulative monthly sums, diurnal temperature range, and vapor-pressure deficit.
- Evaluation: 10-fold cross-validation, grid search for hyperparameter tuning.
Main Results
- Rainfall "tomorrow" prediction:
- The LGBM model achieved an accuracy of 85.42% and an Area Under the Curve (AUC) of 0.8818.
- Average execution time was 4.678 seconds per forecast.
- The calibration curve (reliability) showed high reliability with a calibration root-mean-square error (Cal-RMSE) of approximately 0.03.
- Class prediction error analysis indicated a false-positive rate of approximately 3% and a false-negative rate of approximately 17%.
- Rainfall "today" prediction:
- The LGBM model achieved a mean accuracy of 99.6% (range: 98.8%–100.0%) and an average AUC of 0.998 (range: 0.995–1.000) across 10-fold cross-validation.
- Average inference time was 2.98 seconds per forecast.
- Fuzzy Logic Component:
- Produced a likelihood score of 78.53% for given conditions (25 °C, 65% humidity).
- Matched validation data with 100% accuracy after tuning membership-function parameters.
- The fuzzy system's calibration curve showed good alignment with observed frequencies.
- Performance Comparison: The hybrid framework consistently outperformed conventional classifiers (e.g., Logistic Regression, Decision Trees, Random Forests, Gradient Boosting) in both accuracy and computational efficiency. It also demonstrated competitive or superior performance compared to other hybrid models (XGBoost–Fuzzy, RAINER, AdaNAS) across diverse Australian climates.
- Interpretability: The fuzzy logic component provides interpretable rule-based outputs (e.g., "High Humidity3 pm AND Low Sunshine ⇒ Very High Rain Likelihood"). SHAP analysis confirmed Sunshine and Humidity3 pm as top features, followed by Cloud3 pm and Pressure3 pm.
- Feature Importance (SHAP): Wind direction (e.g., WindDir3pm_E) and
rainToday_Nowere identified as top features, followed by continuous variables like Humidity3pm, Cloud3pm, Pressure3pm, and WindGustSpeed. WindGustSpeed exceeding 41.16 m/s (80 knots) significantly elevated rain probability.
Contributions
- Design of a novel hybrid Light Gradient Boosting Machine (LGBM) and fuzzy logic system that effectively balances predictive performance with inference speed for rainfall forecasting.
- Systematic comparison of the proposed framework against several baseline machine learning algorithms on a large, real-world meteorological dataset from diverse Australian climatic regions.
- Demonstration of how fuzzy-logic explanations enhance decision-makers’ trust in model outputs by providing interpretable, rule-based insights into rainfall likelihood.
Funding
Not explicitly stated in the paper.
Citation
@article{Malakouti2025Leveraging,
author = {Malakouti, Seyed Matin},
title = {Leveraging SHapley Additive exPlanations (SHAP) and fuzzy logic for efficient rainfall forecasts},
journal = {Scientific Reports},
year = {2025},
doi = {10.1038/s41598-025-22081-4},
url = {https://doi.org/10.1038/s41598-025-22081-4}
}
Original Source: https://doi.org/10.1038/s41598-025-22081-4