Gbodjo et al. (2025) Self-supervised representation learning for cloud detection using Sentinel-2 images
Identification
- Journal: Remote Sensing of Environment
- Year: 2025
- Date: 2025-12-25
- Authors: Yawogan Jean Eudes Gbodjo, Lloyd Haydn Hughes, Matthieu Molinier, Tuia Devis, Jun Li
- DOI: 10.1016/j.rse.2025.115205
Research Groups
- VTT Technical Research Centre of Finland Ltd, Espoo, Finland
- ICEYE Oy, Espoo, Finland
- Environmental Computational Science and Earth Observation Laboratory, EPFL, Sion, Switzerland
- Nanjing University of Aeronautics and Astronautics, Nanjing, China
Short Summary
This study leverages self-supervised representation learning (Momentum Contrast and DeepCluster) for accurate cloud and cloud shadow detection in Sentinel-2 imagery, demonstrating that these methods outperform industry standards and several supervised approaches with significantly fewer labeled data.
Objective
- To address cloud and cloud shadow detection in optical satellite images using self-supervised representation learning.
- To assess if Momentum Contrast (MoCo) and DeepCluster, fine-tuned with limited labels, can outperform existing physical rule-based, weakly supervised, and fully supervised methods for cloud and cloud shadow detection using Sentinel-2 imagery.
Study Configuration
- Spatial Scale: Country-level (China for WHUS2–CD+) and global (CloudSEN12 dataset). Image patches of 256 pixels × 256 pixels and 509 pixels × 509 pixels, resampled to 10 m spatial resolution.
- Temporal Scale: Monotemporal (single-scene) cloud masking approaches are employed, utilizing spatially and temporally diverse Sentinel-2 image datasets.
Methodology and Data
- Models used:
- Self-supervised learning methods: Momentum Contrast (MoCo), DeepCluster.
- Backbone architecture: ResNet-18 (modified for 13 spectral bands).
- Segmentation network: 4-layer convolutional network.
- Loss functions: InfoNCE (for MoCo pretraining), Cross Entropy (for DeepCluster pretraining), combined Dice loss and Binary Cross Entropy (for fine-tuning).
- Data sources:
- Sentinel-2 images (Level 1C, Top of Atmosphere reflectance, 13 spectral bands, 10 m spatial resolution).
- SEN12MS dataset (global land cover classification dataset, used for MoCo pretraining).
- WHUS2–CD+ dataset (36 Sentinel-2 tiles over China, manually labeled for clouds).
- CloudSEN12 dataset (global dataset, 49,400 image patch collections, with high-quality annotations for clear pixels, thin cloud, thick cloud, and shadow pixels).
Main Results
- MoCo and DeepCluster, fine-tuned with only 25% of annotated data, consistently outperformed physical rule-based methods (FMask, Sen2Cor) and weakly supervised methods (WDCD, GAN-CDM) for cloud detection on both WHUS2–CD+ and CloudSEN12 datasets.
- On WHUS2–CD+, MoCo (FT 25%) achieved an F1-score of 0.88 and IoU of 0.79; DeepCluster (FT 25%) achieved an F1-score of 0.89 and IoU of 0.80, significantly higher than Sen2Cor (F1: 0.63, IoU: 0.47).
- On CloudSEN12, MoCo (FT 25%) achieved a Balanced Accuracy (BA) of 0.90; DeepCluster (FT 25%) achieved a BA of 0.90, higher than FMask (BA: 0.84) and Sen2Cor (BA: 0.71).
- When compared to fully supervised methods trained with limited data (15%, 25%, 50%), MoCo and DeepCluster consistently outperformed CD-FM3SF on WHUS2–CD+ and generally outperformed UNetMobV2 on CloudSEN12 (especially at 15% and 25% fractions).
- Both self-supervised methods handle clouds better than cloud shadows, with cloud shadow detection showing higher omission and commission errors.
- A fraction of annotations (25% or 50%) already enables near-best performances, suggesting that few annotated data are necessary to achieve reliable performance.
Contributions
- First study to examine Momentum Contrast (MoCo) and DeepCluster, two self-supervised learning methods, for cloud and cloud shadow detection using Sentinel-2 data. Previous works focused on cloud type classification or segmentation using different inputs (MODIS, ground-based images).
- Provided a comprehensive evaluation of the proposed methods using various training data fractions on a country-level (WHUS2–CD+) and a global (CloudSEN12) dataset, comparing them to several state-of-the-art physical-rule based, weakly supervised, and fully supervised methods.
Funding
- RepreSent project, funded by the European Space Agency (ESA) Contract No: 4000137253/22/I-DT – CCN3.
Citation
@article{Gbodjo2025Selfsupervised,
author = {Gbodjo, Yawogan Jean Eudes and Hughes, Lloyd Haydn and Molinier, Matthieu and Devis, Tuia and Li, Jun},
title = {Self-supervised representation learning for cloud detection using Sentinel-2 images},
journal = {Remote Sensing of Environment},
year = {2025},
doi = {10.1016/j.rse.2025.115205},
url = {https://doi.org/10.1016/j.rse.2025.115205}
}
Original Source: https://doi.org/10.1016/j.rse.2025.115205