Doornbos et al. (2025) Ending Overfitting for UAV Applications - Self-Supervised Pretraining on Multispectral UAV Data

Identification

Journal: ISPRS annals of the photogrammetry, remote sensing and spatial information sciences
Year: 2025
Date: 2025-10-29
Authors: Jurrian Doornbos, Önder Babur
DOI: 10.5194/isprs-annals-x-2-w2-2025-31-2025

Research Groups

Information Technology Group, Wageningen University, Wageningen, the Netherlands
Software Engineering and Technology, Eindhoven University of Technology, Eindhoven, the Netherlands

Short Summary

This research investigates whether self-supervised pretraining can address the "small data problem" in UAV-based deep learning for remote sensing. It demonstrates that using an efficient self-supervised learning framework (FastSiam) tailored for multispectral UAV imagery significantly improves model generalization and reduces overfitting, even with extremely limited labelled data, outperforming end-to-end trained models.

Objective

To investigate whether transfer learning techniques, specifically self-supervised pretraining, can overcome the "small data problem" in UAV-based deep learning models, enabling them to generalize effectively across diverse environments without requiring prohibitive amounts of labelled examples.
To showcase efficient self-supervised pretraining on a large-scale, diverse UAV multispectral dataset.
To determine the effectiveness of the pretraining process on task-specific evaluation.

Study Configuration

Spatial Scale:
- Pretraining: msuav100k dataset comprising 104,840 image chips of 512 pixels x 512 pixels, derived from 28 diverse open-access datasets. Ground Sample Distances (GSDs) range from 0.01 meters to 0.25 meters.
- Task-specific evaluation (Vineyard Segmentation): Three distinct vineyards in Portugal (Esac1, Esac2, Valdoeiro, Quinta de Baixo). Vineyard areas: 23,000 m², 29,000 m², 32,000 m². Orthomosaics divided into 224 pixels x 224 pixels sub-images. GSDs: 0.025 meters (Esac1 & Esac2), 0.0125 meters (Valdoeiro), 0.0145 meters (Quinta de Baixo).
Temporal Scale:
- Pretraining: 2 epochs (ResNet18: 18,000 seconds; Swin-T-tiny: 187,200 seconds).
- Task-specific training: 300 epochs.
- Vineyard data collection: October 2022 (Esac1 & Esac2), April 2022 (Valdoeiro), July 2022 (Quinta de Baixo).

Methodology and Data

Models used:
- Self-supervised learning framework: FastSiam (an optimized version of SimSiam for efficiency).
- Backbone architectures: ResNet18 (11.2 million parameters) and Swin Transformer (Tiny) (27.5 million parameters).
- Segmentation head: U-Net-like encoder-decoder structure with skip connections (ResNet18-based head: 3.9 million parameters; Swin Transformer-based head: 8.5 million parameters).
- Baseline: RandomForest classifier.
Data sources:
- Pretraining dataset: msuav100k, a collection of 104,840 multispectral UAV image chips (512 pixels x 512 pixels) from 28 diverse datasets. Imagery contains at least four spectral bands (Green, Red, RedEdge, Near-Infrared). Captured by various sensors including DJI Mavic 3M, DJI Phantom 4 Multispectral, Parrot Sequoia, MicaSense RedEdge, and MicaSense Altum/PT.
- Task-specific evaluation dataset: Vineyard segmentation dataset from Barros et al. (2022). Multispectral UAV imagery (Red, Green, Blue, Red-Edge, Near-Infrared bands) captured by a DJI drone with a Micasense Altum sensor over three distinct vineyards in Portugal. Includes annotated segmentation masks for binary classification of vine plants.

Main Results

Self-supervised pretraining using FastSiam significantly improves model performance and generalization compared to end-to-end training across all tested scenarios.
For models trained on the Train-all subset, the ResNet18 backbone with pretraining (ResNet18-bb) achieved a mean F1 score of 0.72, representing a 19 percentage point improvement over its end-to-end trained counterpart (ResNet18-ee, mean F1 of 0.53).
For models trained on the Train-vary subset (diverse samples), the Swin-T architecture with a pretrained backbone (Swin-T-bb) achieved the highest mean F1 score of 0.80 across all test sites, demonstrating superior generalization.
Pretrained models benefit more from diversity in training samples than from sheer volume, especially in low-data scenarios.
FastSiam pretraining converges rapidly, with both ResNet18 and Swin-T-tiny backbones achieving optimal negative cosine similarity loss within approximately 1 epoch (3200 training steps).
Pretraining acts as an effective regularizer, limiting overfitting and improving generalization across varying environmental conditions.
While pretrained backbones showed statistically significant improvements even with a single labelled image (Train-1), the overall F1 scores (below 0.7) indicate that a minimum threshold of task-specific data is still required for usable results.

Contributions

Demonstrates the efficacy of self-supervised pretraining, specifically using FastSiam on a large-scale, diverse multispectral UAV dataset (msuav100k), to address the "small data problem" in remote sensing.
Provides strong evidence that self-supervised pretraining acts as an effective regularizer, significantly reducing overfitting and enhancing generalization capabilities of deep learning models for UAV applications.
Highlights that for pretrained models, the diversity of training samples is more critical than their sheer volume for achieving robust performance in downstream tasks.
Establishes a pathway for making advanced Deep Learning techniques more accessible for practical UAV applications by requiring modest computational resources and fewer labelled examples.

Funding

Horizon Europe program, ICAERUS project (contract number 101060643).

Citation

@article{Doornbos2025Ending,
  author = {Doornbos, Jurrian and Babur, Önder},
  title = {Ending Overfitting for UAV Applications - Self-Supervised Pretraining on Multispectral UAV Data},
  journal = {ISPRS annals of the photogrammetry, remote sensing and spatial information sciences},
  year = {2025},
  doi = {10.5194/isprs-annals-x-2-w2-2025-31-2025},
  url = {https://doi.org/10.5194/isprs-annals-x-2-w2-2025-31-2025}
}

Original Source: https://doi.org/10.5194/isprs-annals-x-2-w2-2025-31-2025