Yoon et al. (2025) Interpretational Pitfalls in SOM-Based Clustering: A Case Study of Extreme Cold Events in South Korea
Identification
- Journal: Atmosphere
- Year: 2025
- Date: 2025-12-29
- Authors: Jae-Seung Yoon, Sunmin Park, Il-Ung Chung
- DOI: 10.3390/atmos17010044
Research Groups
- Division of Ocean and Atmosphere Sciences, Korea Polar Research Institute, Incheon, Republic of Korea
- Climate Change Research Team, National Institute of Meteorological Sciences, Seogwipo-si, Republic of Korea
- Department of Atmospheric and Environmental Sciences, Gangneung-Wonju National University, Gangneung, Republic of Korea
Short Summary
This study investigates interpretational pitfalls in Self-Organizing Map (SOM) clustering for extreme cold events in South Korea, revealing significant within-node heterogeneity where many events poorly match their assigned cluster patterns, and proposes a pattern-correlation-based post-processing method (SOM-PC) to enhance the physical interpretability of SOM-derived patterns.
Objective
- To highlight the interpretational risks inherent in Self-Organizing Map (SOM) clustering when identifying large-scale atmospheric patterns associated with regional extreme weather events, specifically severe cold events in South Korea.
- To demonstrate how misinterpretations can arise due to within-node heterogeneity in SOM clusters.
- To propose and evaluate a spatial pattern correlation analysis as a post-processing step to mitigate these issues and enhance the physical interpretability of SOM-derived patterns.
Study Configuration
- Spatial Scale: Daily mean surface temperature and sea level pressure data with a spatial resolution of 2.5° × 2.5° over the Korean Peninsula (34–43° N, 124–131° E) for defining cold events, and a larger domain (20–80° N, 40–240° E) for SOM-based cluster analysis of large-scale circulation patterns.
- Temporal Scale: 73 years, from 1949 to 2021, focusing on 223 severe January cold events.
Methodology and Data
- Models used: Self-Organizing Map (SOM) for clustering, and a pattern-correlation-based post-processing approach (SOM-PC) to refine SOM results.
- Data sources: Daily mean surface temperature and sea level pressure from the NCEP-NCAR reanalysis dataset.
Main Results
- Initial SOM classification (2 × 2 nodes) showed that 30.94% (69 out of 223) of severe cold events were "atypical," exhibiting weak or conflicting similarity (Pattern Correlation Coefficient (PCC) < 0.4) with their assigned cluster patterns.
- Optimizing the SOM node configuration to 2 × 3 (6 clusters) reduced the proportion of atypical cases to 21.52% (48 out of 223), a reduction of approximately 9.42%, but did not fully eliminate within-node heterogeneity.
- The SOM-PC approach, which explicitly identified and excluded atypical cases based on a PCC threshold of 0.4, significantly reduced the proportion of atypical cases among the remaining 154 events from 30.94% to 3.9% (6 out of 154), representing a 27.04% reduction relative to the original SOM classification.
- The average PCCs between SOM-PC cluster patterns and their member cases showed a modest increase compared to standard SOM clustering, indicating improved internal consistency.
- Long-term trend analyses of cold event occurrences demonstrated a strong dependence on the chosen PCC threshold, highlighting the sensitivity of SOM-based circulation trend analyses to classification choices.
Contributions
- Quantifies and highlights the significant interpretational pitfalls of SOM-based clustering, particularly the issue of within-node heterogeneity, when applied to localized extreme weather events.
- Demonstrates that optimizing SOM node configuration alone is insufficient to fully resolve the problem of atypical cases being grouped into representative clusters.
- Proposes and validates a pattern-correlation-based post-processing methodology (SOM-PC) as an effective strategy to explicitly identify and exclude atypical cases, thereby enhancing the physical interpretability and internal consistency of SOM-derived clusters.
- Underscores the critical importance of applying explicit diagnostics for within-cluster heterogeneity to ensure robust interpretation of results from SOM and similar data-driven tools in climate and weather studies.
Funding
- Korea Meteorological Administration Research and Development Program “Development and Assessment of Climate Change Scenario” (Grant KMA2018-00321)
- Research Institute of Natural Science, Gangneung-Wonju National University
Citation
@article{Yoon2025Interpretational,
author = {Yoon, Jae-Seung and Park, Sunmin and Chung, Il-Ung},
title = {Interpretational Pitfalls in SOM-Based Clustering: A Case Study of Extreme Cold Events in South Korea},
journal = {Atmosphere},
year = {2025},
doi = {10.3390/atmos17010044},
url = {https://doi.org/10.3390/atmos17010044}
}
Original Source: https://doi.org/10.3390/atmos17010044