Kashefi et al. (2026) Prediction of monthly precipitation and maximum 24 h precipitation using Random Forest, Decision Tree and XGBoost models
Identification
- Journal: Modeling Earth Systems and Environment
- Year: 2026
- Date: 2026-01-06
- Authors: Mahdi Kashefi, Hojat Karami, Mehdi Niksefat, Hamidreza Ghazvinian
- DOI: 10.1007/s40808-025-02714-3
Research Groups
- Semnan University, Semnan, Iran: Mahdi Kashefi, Hojat Karami, Hamidreza Ghazvinian
- Iran University of Science and Technology, Tehran, Iran: Mehdi Niksefat
Short Summary
This study evaluated Decision Tree, Random Forest, and XGBoost models for predicting monthly and monthly maximum 24-hour precipitation in Lamerd, Iran, finding that XGBoost consistently outperformed the other models, with average humidity being the most influential meteorological input.
Objective
- To compare the performance of Decision Tree, Random Forest, and XGBoost algorithms in modeling monthly precipitation and monthly maximum 24-hour precipitation in Lamerd, Fars Province, Iran.
- To identify the most effective model for precipitation prediction in this semi-arid region and assess the sensitivity of the models to various meteorological input parameters.
Study Configuration
- Spatial Scale: Lamerd city, Fars Province, Iran (approximately 7.889 x 10^9 square meters). Data from Lamerd Synoptic Station (27°20’ N, 53°11’ E, 410 meters above sea level).
- Temporal Scale: Monthly data from January 1996 to December 2024 (28 years, 336 monthly records). Training data: 1996–2018; Testing data: 2019–2024.
Methodology and Data
- Models used: Decision Tree (DT), Random Forest (RF), XGBoost (XGB).
- Data sources: Monthly meteorological data from Lamerd Synoptic Station, operated by the Iran Meteorological Organization (IRIMO).
- Input variables: Average temperature, absolute maximum temperature, absolute minimum temperature, average maximum temperature, average minimum temperature, average relative humidity, maximum relative humidity, minimum relative humidity, and evaporation.
- Target variables: Monthly total precipitation (Pmonth) and monthly maximum 24-hour precipitation (PMax24hr).
- Preprocessing: Data standardization (scaling to 0.1–0.9), quality control, and imputation of missing daily records.
- Validation: 70% training / 30% testing split (time-series consistent). Time-series cross-validation (5 folds) for XGBoost. Performance evaluated using R², RMSE, MAE, NRMSE, NSE, KGE. Statistical comparison of distributions using Kruskal-Wallis test. Evaluation of extreme event detection using Probability of Detection (POD), False Alarm Ratio (FAR), and Critical Success Index (CSI) with a threshold of 20 mm for PMax24hr. Sensitivity analysis performed using RMSE perturbation and SHAP values.
Main Results
- All three models (DT, RF, XGBoost) exhibited satisfactory performance in predicting both monthly and monthly maximum 24-hour precipitation.
- The XGBoost model consistently outperformed Decision Tree and Random Forest models in both training and testing phases for both precipitation types.
- XGBoost performance on the test dataset:
- For monthly precipitation (Pmonth): R² = 0.9337, MAE = 6.5032 mm, RMSE = 11.2456 mm, NRMSE = 9043.4.
- For monthly maximum 24-hour precipitation (PMax24hr): R² = 0.9450, MAE = 3.0358 mm, RMSE = 6.3950 mm, NRMSE = 6.0559.
- Sensitivity analysis (XGBoost, SHAP-based):
- Average relative humidity (RHavg) was the most influential input parameter for both monthly precipitation (31–33% importance) and monthly maximum 24-hour precipitation (22–25% importance).
- Evaporation (E) consistently demonstrated the lowest importance (1–2% for Pmonth, 2–3% for PMax24hr).
- Extreme event detection (PMax24hr > 20 mm, test phase): XGBoost achieved the highest POD (0.87) and CSI (0.72) with a relatively low FAR (0.18), indicating superior detection of extreme events.
- The Kruskal-Wallis test showed no significant difference in median ranks between observed and XGBoost-predicted values for PMax24hr (p-value = 0.19) and Pmonth (p-value = 0.09), suggesting effective capture of distributional features.
Contributions
- Implemented a time-series-consistent data split (1996–2018 for training, 2019–2024 for testing) to ensure temporal robustness, addressing a common methodological gap in regional precipitation forecasting studies.
- Conducted a comprehensive head-to-head comparative evaluation of Decision Tree, Random Forest, and XGBoost models for both total monthly and monthly maximum 24-hour precipitation.
- Performed a detailed sensitivity analysis using both RMSE perturbation and SHAP values to provide interpretable insights into the dominant meteorological drivers of precipitation in a semi-arid climate.
- Utilized a diverse set of meteorological input variables, including various temperature and humidity parameters, to capture their collective impact on precipitation.
- Incorporated robust evaluation metrics, including hit-rate metrics for extreme event detection and the Kruskal-Wallis test for distributional comparison, enhancing the validation of model predictions.
Funding
This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.
Citation
@article{Kashefi2026Prediction,
author = {Kashefi, Mahdi and Karami, Hojat and Niksefat, Mehdi and Ghazvinian, Hamidreza},
title = {Prediction of monthly precipitation and maximum 24 h precipitation using Random Forest, Decision Tree and XGBoost models},
journal = {Modeling Earth Systems and Environment},
year = {2026},
doi = {10.1007/s40808-025-02714-3},
url = {https://doi.org/10.1007/s40808-025-02714-3}
}
Original Source: https://doi.org/10.1007/s40808-025-02714-3