Sumith et al. (2026) Comparative Analysis of Deep Learning and Machine Learning Models for Evapotranspiration Prediction in Semi-Arid Regions: Statistical Model Evaluation with a Paired t-Test and Bootstrap Resampling

Identification

Journal: Water Resources Management
Year: 2026
Date: 2026-02-23
Authors: K. V. Sumith, Bhavya S
DOI: 10.1007/s11269-025-04411-3

Research Groups

Department of Civil Engineering, Sir M. Visvesvaraya Institute of Technology, Bangalore, India

Short Summary

This study comprehensively evaluates deep learning (LSTM, RNN, GRU) and traditional machine learning models (DT, RF, SVM, ANN, GBM) for evapotranspiration prediction in semi-arid regions using climatic factors. The Long Short-Term Memory (LSTM) model demonstrated superior predictive accuracy and statistical significance, establishing it as the most effective and reliable model for this application.

Objective

To evaluate and compare the effectiveness of deep learning (LSTM, RNN, GRU) and traditional machine learning models (DT, RF, SVM, ANN, GBM) in predicting evapotranspiration (ET) using climatic factors (rainfall, temperature, sunshine hours) in semi-arid regions.
To statistically assess the performance differences and uncertainties of these models using paired t-tests and bootstrap resampling.
To identify the most accurate and reliable model for enhancing water resource management tools in water-scarce regions.

Study Configuration

Spatial Scale: Gandhi Krishi Vigyan Kendra (GKVK), Bangalore, Karnataka, India (13.1° N latitude, 77.6° E longitude), an agricultural research campus spanning approximately 559.14 hectares at an altitude of 930 meters above sea level.
Temporal Scale: Monthly climatic data from January 1996 to December 2020 (25 years).

Methodology and Data

Models used:
- Deep Learning: Long Short-Term Memory (LSTM), Recurrent Neural Network (RNN), Gated Recurrent Unit (GRU).
- Machine Learning: Decision Tree (DT), Random Forest (RF), Support Vector Machine (SVM), Artificial Neural Network (ANN), Gradient Boosting Machine (GBM).
Data sources: Monthly climatic data from the Indian Meteorological Department (IMD).
- Input variables: Rainfall (millimeters), Mean Temperature (degrees Celsius), Daily Sunshine Hours (hours).
- Target variable: Evapotranspiration (ET) (millimeters).
- Data pre-processing included linear interpolation for missing values, outlier correction using climatological thresholds, derivation of year and month features, min-max normalization, and chronological splitting into 70% training and 30% testing sets.
- 5-fold cross-validation was used for hyperparameter tuning via a grid search strategy.
- Model performance was evaluated using R² (Coefficient of Determination), RMSE (Root Mean Squared Error), MSE (Mean Squared Error), and MAE (Mean Absolute Error).
- Statistical comparison was performed using paired t-tests (significance at p < 0.05).
- Uncertainty analysis was conducted using bootstrap resampling (1,000 iterations) to estimate 95% confidence intervals and standard errors.

Main Results

The Long Short-Term Memory (LSTM) model exhibited the highest predictive performance with a test R² of 0.66, a test RMSE of 0.11 millimeters, and a test MAE of 0.07 millimeters.
Recurrent Neural Network (RNN) showed strong performance with a test R² of 0.59, a test RMSE of 0.11 millimeters, and a test MAE of 0.07 millimeters.
Gated Recurrent Unit (GRU) achieved a test R² of 0.57, a test RMSE of 0.11 millimeters, and a test MAE of 0.08 millimeters.
Traditional machine learning models like Decision Tree (DT) and Gradient Boosting Machine (GBM) showed significant overfitting, with high training R² (0.97 and 0.99, respectively) but poor generalization to unseen data (test R² of -0.15 and 0.03, respectively).
Paired t-tests revealed statistically significant differences:
- LSTM significantly outperformed RNN (p = 0.0091).
- RNN significantly outperformed GRU (p = 0.0324).
- No statistically significant difference was found between LSTM and GRU (p = 0.2694).
Bootstrap analysis indicated low standard errors for LSTM (SE = 0.0118), RNN (SE = 0.0113), and GRU (SE = 0.0114), with tight 95% confidence intervals, suggesting stable and reliable predictions for these deep learning models.

Contributions

Provides a comprehensive comparative analysis of a wide array of deep learning and traditional machine learning models for evapotranspiration prediction in semi-arid regions.
Utilizes robust statistical methods, including paired t-tests and bootstrap resampling, to rigorously evaluate model performance, statistical significance of differences, and prediction uncertainty.
Identifies LSTM as the most effective and reliable model for ET prediction in the studied context, highlighting its superior ability to capture temporal dependencies and seasonal patterns.
Offers practical implications for improved irrigation management, drought monitoring, and water resource planning in water-scarce regions by providing accurate and reliable ET forecasts.

Funding

The author declares that no funding was received for this research.

Citation

@article{Sumith2026Comparative,
  author = {Sumith, K. V. and S, Bhavya},
  title = {Comparative Analysis of Deep Learning and Machine Learning Models for Evapotranspiration Prediction in Semi-Arid Regions: Statistical Model Evaluation with a Paired t-Test and Bootstrap Resampling},
  journal = {Water Resources Management},
  year = {2026},
  doi = {10.1007/s11269-025-04411-3},
  url = {https://doi.org/10.1007/s11269-025-04411-3}
}

Original Source: https://doi.org/10.1007/s11269-025-04411-3