PANDYA et al. (2025) From raw to reliable: machine learning bias correction of reanalysis data for improved drought severity classification
Identification
- Journal: Journal of Hydrology
- Year: 2025
- Date: 2025-12-31
- Authors: PA PANDYA, G.V. Prajapati, Biswajeet Pradhan, D.D. Vadalia, Paras Hirapara, D.J. Patel, S.H. Parmar
- DOI: 10.1016/j.jhydrol.2025.134892
Research Groups
- Centre of Excellence on Soil and Water Management, Research, Testing and Training Centre, Junagadh Agricultural University, Junagadh 362001 Gujarat, India
- Centre for Advanced Modelling and Geospatial Information Systems (CAMGIS), School of Civil and Environmental Engineering, Faculty of Engineering & IT, University of Technology Sydney, Ultimo, NSW 2007, Australia
- Directorate of Information Technology, Junagadh Agricultural University, Junagadh 362001 Gujarat, India
Short Summary
This study develops a scalable machine learning bias correction approach for reanalysis data (ERA5, NASA POWER) to improve drought severity classification in vulnerable regions of India. It finds that Random Forest is the most reliable method for bias correction, significantly enhancing the accuracy of drought monitoring.
Objective
- To develop and evaluate a comprehensive and scalable machine learning bias correction approach for reanalysis datasets (ERA5 and NASA POWER) for minimum temperature, maximum temperature, and rainfall.
- To assess the impact of these bias corrections on drought severity classification using the Standardized Precipitation Evapotranspiration Index (SPEI).
Study Configuration
- Spatial Scale: Saurashtra and Middle Gujarat, India. Evaluation included leave-one-station-out (LOSO) tests across multiple stations.
- Temporal Scale: Temporally independent tests were conducted. Analysis covered periods experiencing moderate, severe, and extreme drought events.
Methodology and Data
- Models used: Quantile Mapping (QM), Multiple Linear Regression (MLR), Random Forest (RF), and Extreme Gradient Boosting (XGBoost) for bias correction. Standardized Precipitation Evapotranspiration Index (SPEI) for drought monitoring. Pearson’s correlation and Cohen’s Kappa coefficient for evaluation.
- Data sources: ERA5 reanalysis data, NASA POWER reanalysis data. Observation data (implied as ground truth for bias correction).
Main Results
- Random Forest (RF) emerged as the most reliable overall machine learning method for bias correction, demonstrating strong temporal and spatial generalization.
- In temporally independent tests, MLR and RF performed strongly for temperature, while rainfall showed slightly better performance with MLR and comparable results from RF. Quantile Mapping (QM) retained substantially higher residual errors.
- Under leave-one-station-out (LOSO) evaluation, RF clearly outperformed all other approaches across minimum temperature, maximum temperature, and rainfall.
- RF achieved Root Mean Square Error (RMSE) values generally near 1 °C for temperature and approximately 120 mm for rainfall.
- QM RMSE values were considerably higher, often exceeding 3 °C for minimum and maximum temperature and above 135–160 mm for rainfall.
- Bias correction significantly improved categorical drought agreement, measured by Cohen's Kappa coefficient.
- For ERA5, Kappa improved from 0.389 to 0.494 (MLR) to 0.519–0.586 (RF/XGBoost).
- For NASA POWER, Kappa improved from 0.288 to 0.588 (MLR) to 0.546–0.613 (RF/XGBoost).
- Most improvements in drought classification occurred for moderate drought months, increasing correct classifications from 63% to 81% (+18%) for ERA5 and 46% to 79% (+33%) for NASA POWER. Gains for severe and extreme drought events were limited due to their rarity.
Contributions
- Presents a comprehensive and scalable machine learning-based bias correction methodology for reanalysis data (ERA5, NASA POWER) specifically tailored for drought monitoring in data-scarce, heterogeneous regions.
- Identifies Random Forest as the most robust and generalizable machine learning technique for bias correction of temperature and rainfall across both temporal and spatial scales.
- Quantifies the significant improvement in drought severity classification accuracy (using SPEI and Cohen's Kappa) when using bias-corrected reanalysis data compared to raw data, particularly for moderate drought events.
- Underscores the limitations of using raw reanalysis data for operational drought monitoring and provides a robust solution for transforming them into reliable datasets.
Funding
- Not explicitly mentioned in the provided paper text.
Citation
@article{PANDYA2025From,
author = {PANDYA, PA and Prajapati, G.V. and Pradhan, Biswajeet and Vadalia, D.D. and Hirapara, Paras and Patel, D.J. and Parmar, S.H.},
title = {From raw to reliable: machine learning bias correction of reanalysis data for improved drought severity classification},
journal = {Journal of Hydrology},
year = {2025},
doi = {10.1016/j.jhydrol.2025.134892},
url = {https://doi.org/10.1016/j.jhydrol.2025.134892}
}
Original Source: https://doi.org/10.1016/j.jhydrol.2025.134892