PANDYA et al. (2025) From raw to reliable: machine learning bias correction of reanalysis data for improved drought severity classification

Identification

Journal: Journal of Hydrology
Year: 2025
Date: 2025-12-31
Authors: PA PANDYA, G.V. Prajapati, Biswajeet Pradhan, D.D. Vadalia, Paras Hirapara, D.J. Patel, S.H. Parmar
DOI: 10.1016/j.jhydrol.2025.134892

Research Groups

Centre of Excellence on Soil and Water Management, Research, Testing and Training Centre, Junagadh Agricultural University, Junagadh 362001 Gujarat, India
Centre for Advanced Modelling and Geospatial Information Systems (CAMGIS), School of Civil and Environmental Engineering, Faculty of Engineering & IT, University of Technology Sydney, Ultimo, NSW 2007, Australia
Directorate of Information Technology, Junagadh Agricultural University, Junagadh 362001 Gujarat, India

Short Summary

This study develops a scalable machine learning bias correction approach for reanalysis data (ERA5, NASA POWER) to improve drought severity classification in vulnerable regions of India. It finds that Random Forest is the most reliable method for bias correction, significantly enhancing the accuracy of drought monitoring.

Objective

To develop and evaluate a comprehensive and scalable machine learning bias correction approach for reanalysis datasets (ERA5 and NASA POWER) for minimum temperature, maximum temperature, and rainfall.
To assess the impact of these bias corrections on drought severity classification using the Standardized Precipitation Evapotranspiration Index (SPEI).

Study Configuration

Spatial Scale: Saurashtra and Middle Gujarat, India. Evaluation included leave-one-station-out (LOSO) tests across multiple stations.
Temporal Scale: Temporally independent tests were conducted. Analysis covered periods experiencing moderate, severe, and extreme drought events.

Methodology and Data

Models used: Quantile Mapping (QM), Multiple Linear Regression (MLR), Random Forest (RF), and Extreme Gradient Boosting (XGBoost) for bias correction. Standardized Precipitation Evapotranspiration Index (SPEI) for drought monitoring. Pearson’s correlation and Cohen’s Kappa coefficient for evaluation.
Data sources: ERA5 reanalysis data, NASA POWER reanalysis data. Observation data (implied as ground truth for bias correction).

Main Results

Random Forest (RF) emerged as the most reliable overall machine learning method for bias correction, demonstrating strong temporal and spatial generalization.
In temporally independent tests, MLR and RF performed strongly for temperature, while rainfall showed slightly better performance with MLR and comparable results from RF. Quantile Mapping (QM) retained substantially higher residual errors.
Under leave-one-station-out (LOSO) evaluation, RF clearly outperformed all other approaches across minimum temperature, maximum temperature, and rainfall.
- RF achieved Root Mean Square Error (RMSE) values generally near 1 °C for temperature and approximately 120 mm for rainfall.
- QM RMSE values were considerably higher, often exceeding 3 °C for minimum and maximum temperature and above 135–160 mm for rainfall.
Bias correction significantly improved categorical drought agreement, measured by Cohen's Kappa coefficient.
- For ERA5, Kappa improved from 0.389 to 0.494 (MLR) to 0.519–0.586 (RF/XGBoost).
- For NASA POWER, Kappa improved from 0.288 to 0.588 (MLR) to 0.546–0.613 (RF/XGBoost).
Most improvements in drought classification occurred for moderate drought months, increasing correct classifications from 63% to 81% (+18%) for ERA5 and 46% to 79% (+33%) for NASA POWER. Gains for severe and extreme drought events were limited due to their rarity.

Contributions

Presents a comprehensive and scalable machine learning-based bias correction methodology for reanalysis data (ERA5, NASA POWER) specifically tailored for drought monitoring in data-scarce, heterogeneous regions.
Identifies Random Forest as the most robust and generalizable machine learning technique for bias correction of temperature and rainfall across both temporal and spatial scales.
Quantifies the significant improvement in drought severity classification accuracy (using SPEI and Cohen's Kappa) when using bias-corrected reanalysis data compared to raw data, particularly for moderate drought events.
Underscores the limitations of using raw reanalysis data for operational drought monitoring and provides a robust solution for transforming them into reliable datasets.

Funding

Not explicitly mentioned in the provided paper text.

Citation

@article{PANDYA2025From,
  author = {PANDYA, PA and Prajapati, G.V. and Pradhan, Biswajeet and Vadalia, D.D. and Hirapara, Paras and Patel, D.J. and Parmar, S.H.},
  title = {From raw to reliable: machine learning bias correction of reanalysis data for improved drought severity classification},
  journal = {Journal of Hydrology},
  year = {2025},
  doi = {10.1016/j.jhydrol.2025.134892},
  url = {https://doi.org/10.1016/j.jhydrol.2025.134892}
}

Original Source: https://doi.org/10.1016/j.jhydrol.2025.134892