Seo et al. (2026) Global 0.25-degree gridded Snow water equivalent data derived from machine learning using in-situ measurements
Identification
- Journal: Scientific Data
- Year: 2026
- Date: 2026-03-10
- Authors: Jungho Seo, Mahdi Panahi, Junsu Kim, Sayed Mohammadreza Bateni, Yeonjoo Kim
- DOI: 10.1038/s41597-026-06895-z
Research Groups
- Civil and Environmental Engineering, Yonsei University, Seoul, South Korea
- Geography, Kyung Hee University, Seoul, South Korea
- Civil, Environmental and Construction Engineering, and Water Resources Research Center, University of Hawaii at Manoa, Honolulu, Hawaii, USA
- UNESCO-UNISA African Chair in Nanoscience and Nanotechnology College of Graduate Studies, University of South Africa, Muckleneuk Ridge, Pretoria, South Africa
Short Summary
This study developed SWEML, a novel global daily snow water equivalent (SWE) product at 0.25° (~25 km) resolution for 1980–2020, utilizing a machine learning-based Random Forest algorithm trained on in-situ measurements. SWEML demonstrated superior accuracy (overall RMSE 10.33 mm) compared to ten existing reference datasets, particularly in high-elevation regions, and showed robust performance even in data-sparse areas like the Andes.
Objective
- To develop and validate a novel global daily Snow Water Equivalent (SWE) product (SWEML) with 0.25° spatial resolution for 1980–2020, utilizing a machine learning-based Random Forest algorithm trained on in-situ measurements, meteorological forcings, and terrain attributes.
Study Configuration
- Spatial Scale: Global, covering all land areas from 90°S to 90°N and 180°W to 180°E, excluding Antarctica, at a 0.25° (approximately 25 km) grid resolution.
- Temporal Scale: Daily resolution for a 41-year period from 1980 to 2020.
Methodology and Data
- Models used:
- Machine Learning: Random Forest (RF) algorithm, specifically Random Forest Regression (RFR) for SWE estimation and Random Forest Classification (RFC) for regionalization.
- Clustering: Mini-batch k-means (MBK) clustering for regionalizing in-situ measurements into 14 clusters.
- Data sources:
- In-situ SWE measurements: SNOTEL, RSSD, GHCNd, HSSC, SCAN, CSS, NVE, Swiss GCOS sites (total 11,687 grid points).
- Remote Sensing (bias-corrected): Daily SWE Dataset for China (Jiang et al., 2022a).
- Meteorological Forcing: ERA5 reanalysis dataset (precipitation, precipitation type, relative humidity, snow depth, land surface temperature, downward solar radiation, antecedent precipitation index).
- Terrain Attributes: GLDAS Dominant Vegetation Type Dataset Version 2, ETOPO1 (mean and standard deviation of digital elevation), ERA5 reanalysis dataset (slope, aspect, anisotropy of orography).
- Snow Observation Mask: Derived using ERA5-Land SWE, snow depth, and precipitation type.
- Reference Datasets for Validation: GLDAS v2.0/v2.2, MERRA-2, ESA GlobSnow v3 (ESAGB), ESA CCI CDR v3.1 (ESASWE), Brown Temperature Index Model (B-TIM/BRSWE), Crocus snowpack model (CROCUS/CSSWE), AMSR-E/AMSR2, University of Arizona SWE (UASWE), Andes Snow Reanalysis SWE (ADSWE), Airborne Gamma SWE (GAMMA).
Main Results
- A global daily SWE product (SWEML) with 0.25° spatial resolution for 1980–2020 was successfully developed.
- SWEML achieved an overall root mean square error (RMSE) of 10.33 mm and a bias of -7.13 mm when compared against raw in-situ measurements.
- SWEML consistently outperformed seven diverse reference datasets (GLDAS, ESAGB, AMSR-E/AMSR2, BRSWE, CSSWE, MERRA-2, ESASWE) in terms of accuracy, exhibiting significantly lower RMSEs across snow-dominant regions. For instance, in the Rocky Mountains, SWEML's RMSE was 16.51 mm, representing a 68.35% reduction compared to the second-lowest RMSE dataset in that region.
- The product demonstrated the highest accuracy in high-elevation regions (above 2000 m), with an RMSE of 7.30 mm, normalized mean absolute error (NMAE) of 0.32%, and Pearson correlation coefficient (R) of 0.98.
- SWEML exhibited lower biases and reduced variability in bias compared to most reference datasets.
- The model effectively captured temporal SWE features, including interannual peak SWE patterns and consistent low monthly RMSEs, particularly during the critical snowmelt period from spring to early summer.
- In the Andes, a region without training data, SWEML's spatial patterns and peak SWE time series showed high consistency with the high-resolution ADSWE, achieving a correlation of 0.79 for annual mean peak SWE.
- Validation against Airborne Gamma SWE (GAMMA) over North America showed SWEML had a strong correlation (0.76) with UASWE, a high-resolution assimilated product.
- The underlying Random Forest Regression models demonstrated high predictive skill (R2 ranging from 0.81 to 0.99) across the 14 clusters, maintaining robust performance even under spatially independent cross-validation conditions.
- The Random Forest Classification model for cluster assignment achieved a high overall accuracy of 0.895.
Contributions
- Presents the first globally gap-free daily SWE product (SWEML) generated using a machine learning approach (Random Forest), covering 41 years.
- Provides a highly accurate and robust SWE dataset across diverse geographical regions and elevations, demonstrating superior performance compared to existing conventional and assimilated SWE products.
- Highlights the capability of machine learning to capture the complexities of SWE dynamics, offering valuable insights that can inform the development of physically based models.
- Demonstrates reliable predictive performance in data-sparse regions (e.g., the Andes) without direct training data, showcasing the potential of ML to overcome observational limitations.
- Offers a valuable resource for various applications, including water resource management, climate change impact assessment (e.g., drought and flood prediction), and improving the initialization and assimilation in land surface and global climate models.
- The methodology allows for ready extension and updates of the SWEML product due to the open-access and regularly updated nature of the ERA5 forcing data.
Funding
- Basic Science Research Program through the National Research Foundation of Korea (RS-2024-00456724)
- Korea Environment Industry & Technology Institute (KEITI) R&D Program for Innovative Flood Protection Technologies against Climate Crisis (MOE, RS-2023-00218873)
- KEITI Wetland Ecosystem Value Evaluation and Carbon Absorption Value Promotion Technology Development Project (2022003640002)
- Yonsei Frontier Program for Outstanding Scholars of 2023
- International Joint Research Grant by Yonsei Graduate School
Citation
@article{Seo2026Global,
author = {Seo, Jungho and Panahi, Mahdi and Kim, Junsu and Bateni, Sayed Mohammadreza and Kim, Yeonjoo},
title = {Global 0.25-degree gridded Snow water equivalent data derived from machine learning using in-situ measurements},
journal = {Scientific Data},
year = {2026},
doi = {10.1038/s41597-026-06895-z},
url = {https://doi.org/10.1038/s41597-026-06895-z}
}
Original Source: https://doi.org/10.1038/s41597-026-06895-z