Sun et al. (2025) A Machine Learning-Based Quality Control Algorithm for Heavy Rainfall Using Multi-Source Data

⚠️ Warning: This summary was generated from the abstract only, as the full text was not available.

Identification

Journal: Remote Sensing
Year: 2025
Date: 2025-12-09
Authors: Hao Sun, Qing Zhou, Lijuan Shi, Cuina Li, Shuang Qin, Dan Yao, Mingyi Xu, Yang Huang, Hu Qin, Y. Guan
DOI: 10.3390/rs17243976

Research Groups

Not specified in the provided text.

Short Summary

This study developed a machine learning-based quality control algorithm for heavy rainfall by integrating multi-sensor observations, demonstrating that gradient boosting models significantly outperform conventional threshold-based methods, thereby enhancing data reliability and interpretability.

Objective

To develop and evaluate a machine learning-based quality control algorithm for heavy rainfall by integrating automatic weather station observations with remote sensing data and minute-level data.

Study Configuration

Spatial Scale: Regional (implied by automatic weather station observations and remote sensing data).
Temporal Scale: Training and evaluation samples from 1 June 2022 to 31 December 2024, utilizing minute-level data resolution.

Methodology and Data

Models used: eXtreme Gradient Boosting (XGBoost), Light Gradient Boosting Machine (LightGBM), Categorical Boosting (CatBoost), Gradient Boosted Regression Trees (GBRT). SHapley Additive exPlanations (SHAP) for model interpretability.
Data sources: Automatic weather station observations, remote sensing data (radar composite reflectivity, satellite cloud-top temperature), minute-level precipitation data, and metadata.

Main Results

Gradient boosting models significantly outperformed conventional precipitation-threshold-based methods for heavy rainfall quality control.
XGBoost, in particular, achieved an increase in precision by 0.110 and recall by 0.162, leading to a substantial reduction in both false alarms and missed detections of anomalous heavy rainfall events.
Radar composite reflectivity, satellite cloud-top temperature, and minute-level precipitation were identified as dominant contributors to model predictions.
SHAP-based interpretability analysis demonstrated that the model's decision logic aligns with meteorological physical principles, identifying characteristic patterns (e.g., low radar reflectivity and elevated cloud-top temperatures) as anomalous rainfall events.
The model effectively identified anomalous minute-level precipitation extremes as critical signals for detecting instrument malfunctions, data encoding, and transmission errors.

Contributions

Development of a novel machine learning-based quality control algorithm for heavy rainfall through the integration of multi-sensor observations.
Demonstrated superior performance of gradient boosting models over conventional threshold-based methods, significantly enhancing the reliability of quality-controlled heavy rainfall data.
Provided interpretability of the model's decision logic using SHAP, confirming its physical consistency with meteorological principles and increasing trustworthiness.
Identified key data features (radar composite reflectivity, satellite cloud-top temperature, minute-level precipitation) as dominant predictors for detecting anomalous heavy rainfall events.
Showcased the potential for operational implementation due to improved accuracy, reduced false alarms and missed detections, and the ability to identify various types of data errors.

Funding

Not specified in the provided text.

Citation

@article{Sun2025Machine,
  author = {Sun, Hao and Zhou, Qing and Shi, Lijuan and Li, Cuina and Qin, Shuang and Yao, Dan and Xu, Mingyi and Huang, Yang and Qin, Hu and Guan, Y.},
  title = {A Machine Learning-Based Quality Control Algorithm for Heavy Rainfall Using Multi-Source Data},
  journal = {Remote Sensing},
  year = {2025},
  doi = {10.3390/rs17243976},
  url = {https://doi.org/10.3390/rs17243976}
}

Original Source: https://doi.org/10.3390/rs17243976