Gbodjo et al. (2025) Self-supervised representation learning for cloud detection using Sentinel-2 images

Identification

Journal: Remote Sensing of Environment
Year: 2025
Date: 2025-12-25
Authors: Yawogan Jean Eudes Gbodjo, Lloyd Haydn Hughes, Matthieu Molinier, Tuia Devis, Jun Li
DOI: 10.1016/j.rse.2025.115205

Research Groups

VTT Technical Research Centre of Finland Ltd, Espoo, Finland
ICEYE Oy, Espoo, Finland
Environmental Computational Science and Earth Observation Laboratory, EPFL, Sion, Switzerland
Nanjing University of Aeronautics and Astronautics, Nanjing, China

Short Summary

This study leverages self-supervised representation learning (Momentum Contrast and DeepCluster) for accurate cloud and cloud shadow detection in Sentinel-2 imagery, demonstrating that these methods outperform industry standards and several supervised approaches with significantly fewer labeled data.

Objective

To address cloud and cloud shadow detection in optical satellite images using self-supervised representation learning.
To assess if Momentum Contrast (MoCo) and DeepCluster, fine-tuned with limited labels, can outperform existing physical rule-based, weakly supervised, and fully supervised methods for cloud and cloud shadow detection using Sentinel-2 imagery.

Study Configuration

Spatial Scale: Country-level (China for WHUS2–CD+) and global (CloudSEN12 dataset). Image patches of 256 pixels × 256 pixels and 509 pixels × 509 pixels, resampled to 10 m spatial resolution.
Temporal Scale: Monotemporal (single-scene) cloud masking approaches are employed, utilizing spatially and temporally diverse Sentinel-2 image datasets.

Methodology and Data

Models used:
- Self-supervised learning methods: Momentum Contrast (MoCo), DeepCluster.
- Backbone architecture: ResNet-18 (modified for 13 spectral bands).
- Segmentation network: 4-layer convolutional network.
- Loss functions: InfoNCE (for MoCo pretraining), Cross Entropy (for DeepCluster pretraining), combined Dice loss and Binary Cross Entropy (for fine-tuning).
Data sources:
- Sentinel-2 images (Level 1C, Top of Atmosphere reflectance, 13 spectral bands, 10 m spatial resolution).
- SEN12MS dataset (global land cover classification dataset, used for MoCo pretraining).
- WHUS2–CD+ dataset (36 Sentinel-2 tiles over China, manually labeled for clouds).
- CloudSEN12 dataset (global dataset, 49,400 image patch collections, with high-quality annotations for clear pixels, thin cloud, thick cloud, and shadow pixels).

Main Results

MoCo and DeepCluster, fine-tuned with only 25% of annotated data, consistently outperformed physical rule-based methods (FMask, Sen2Cor) and weakly supervised methods (WDCD, GAN-CDM) for cloud detection on both WHUS2–CD+ and CloudSEN12 datasets.
On WHUS2–CD+, MoCo (FT 25%) achieved an F1-score of 0.88 and IoU of 0.79; DeepCluster (FT 25%) achieved an F1-score of 0.89 and IoU of 0.80, significantly higher than Sen2Cor (F1: 0.63, IoU: 0.47).
On CloudSEN12, MoCo (FT 25%) achieved a Balanced Accuracy (BA) of 0.90; DeepCluster (FT 25%) achieved a BA of 0.90, higher than FMask (BA: 0.84) and Sen2Cor (BA: 0.71).
When compared to fully supervised methods trained with limited data (15%, 25%, 50%), MoCo and DeepCluster consistently outperformed CD-FM3SF on WHUS2–CD+ and generally outperformed UNetMobV2 on CloudSEN12 (especially at 15% and 25% fractions).
Both self-supervised methods handle clouds better than cloud shadows, with cloud shadow detection showing higher omission and commission errors.
A fraction of annotations (25% or 50%) already enables near-best performances, suggesting that few annotated data are necessary to achieve reliable performance.

Contributions

First study to examine Momentum Contrast (MoCo) and DeepCluster, two self-supervised learning methods, for cloud and cloud shadow detection using Sentinel-2 data. Previous works focused on cloud type classification or segmentation using different inputs (MODIS, ground-based images).
Provided a comprehensive evaluation of the proposed methods using various training data fractions on a country-level (WHUS2–CD+) and a global (CloudSEN12) dataset, comparing them to several state-of-the-art physical-rule based, weakly supervised, and fully supervised methods.

Funding

RepreSent project, funded by the European Space Agency (ESA) Contract No: 4000137253/22/I-DT – CCN3.

Citation

@article{Gbodjo2025Selfsupervised,
  author = {Gbodjo, Yawogan Jean Eudes and Hughes, Lloyd Haydn and Molinier, Matthieu and Devis, Tuia and Li, Jun},
  title = {Self-supervised representation learning for cloud detection using Sentinel-2 images},
  journal = {Remote Sensing of Environment},
  year = {2025},
  doi = {10.1016/j.rse.2025.115205},
  url = {https://doi.org/10.1016/j.rse.2025.115205}
}

Original Source: https://doi.org/10.1016/j.rse.2025.115205