Doornbos et al. (2025) Ending Overfitting for UAV Applications - Self-Supervised Pretraining on Multispectral UAV Data
Identification
- Journal: ISPRS annals of the photogrammetry, remote sensing and spatial information sciences
- Year: 2025
- Date: 2025-10-29
- Authors: Jurrian Doornbos, Önder Babur
- DOI: 10.5194/isprs-annals-x-2-w2-2025-31-2025
Research Groups
- Information Technology Group, Wageningen University, Wageningen, the Netherlands
- Software Engineering and Technology, Eindhoven University of Technology, Eindhoven, the Netherlands
Short Summary
This research investigates whether self-supervised pretraining can address the "small data problem" in UAV-based deep learning for remote sensing. It demonstrates that using an efficient self-supervised learning framework (FastSiam) tailored for multispectral UAV imagery significantly improves model generalization and reduces overfitting, even with extremely limited labelled data, outperforming end-to-end trained models.
Objective
- To investigate whether transfer learning techniques, specifically self-supervised pretraining, can overcome the "small data problem" in UAV-based deep learning models, enabling them to generalize effectively across diverse environments without requiring prohibitive amounts of labelled examples.
- To showcase efficient self-supervised pretraining on a large-scale, diverse UAV multispectral dataset.
- To determine the effectiveness of the pretraining process on task-specific evaluation.
Study Configuration
- Spatial Scale:
- Pretraining: msuav100k dataset comprising 104,840 image chips of 512 pixels x 512 pixels, derived from 28 diverse open-access datasets. Ground Sample Distances (GSDs) range from 0.01 meters to 0.25 meters.
- Task-specific evaluation (Vineyard Segmentation): Three distinct vineyards in Portugal (Esac1, Esac2, Valdoeiro, Quinta de Baixo). Vineyard areas: 23,000 m², 29,000 m², 32,000 m². Orthomosaics divided into 224 pixels x 224 pixels sub-images. GSDs: 0.025 meters (Esac1 & Esac2), 0.0125 meters (Valdoeiro), 0.0145 meters (Quinta de Baixo).
- Temporal Scale:
- Pretraining: 2 epochs (ResNet18: 18,000 seconds; Swin-T-tiny: 187,200 seconds).
- Task-specific training: 300 epochs.
- Vineyard data collection: October 2022 (Esac1 & Esac2), April 2022 (Valdoeiro), July 2022 (Quinta de Baixo).
Methodology and Data
- Models used:
- Self-supervised learning framework: FastSiam (an optimized version of SimSiam for efficiency).
- Backbone architectures: ResNet18 (11.2 million parameters) and Swin Transformer (Tiny) (27.5 million parameters).
- Segmentation head: U-Net-like encoder-decoder structure with skip connections (ResNet18-based head: 3.9 million parameters; Swin Transformer-based head: 8.5 million parameters).
- Baseline: RandomForest classifier.
- Data sources:
- Pretraining dataset: msuav100k, a collection of 104,840 multispectral UAV image chips (512 pixels x 512 pixels) from 28 diverse datasets. Imagery contains at least four spectral bands (Green, Red, RedEdge, Near-Infrared). Captured by various sensors including DJI Mavic 3M, DJI Phantom 4 Multispectral, Parrot Sequoia, MicaSense RedEdge, and MicaSense Altum/PT.
- Task-specific evaluation dataset: Vineyard segmentation dataset from Barros et al. (2022). Multispectral UAV imagery (Red, Green, Blue, Red-Edge, Near-Infrared bands) captured by a DJI drone with a Micasense Altum sensor over three distinct vineyards in Portugal. Includes annotated segmentation masks for binary classification of vine plants.
Main Results
- Self-supervised pretraining using FastSiam significantly improves model performance and generalization compared to end-to-end training across all tested scenarios.
- For models trained on the
Train-allsubset, the ResNet18 backbone with pretraining (ResNet18-bb) achieved a mean F1 score of 0.72, representing a 19 percentage point improvement over its end-to-end trained counterpart (ResNet18-ee, mean F1 of 0.53). - For models trained on the
Train-varysubset (diverse samples), the Swin-T architecture with a pretrained backbone (Swin-T-bb) achieved the highest mean F1 score of 0.80 across all test sites, demonstrating superior generalization. - Pretrained models benefit more from diversity in training samples than from sheer volume, especially in low-data scenarios.
- FastSiam pretraining converges rapidly, with both ResNet18 and Swin-T-tiny backbones achieving optimal negative cosine similarity loss within approximately 1 epoch (3200 training steps).
- Pretraining acts as an effective regularizer, limiting overfitting and improving generalization across varying environmental conditions.
- While pretrained backbones showed statistically significant improvements even with a single labelled image (
Train-1), the overall F1 scores (below 0.7) indicate that a minimum threshold of task-specific data is still required for usable results.
Contributions
- Demonstrates the efficacy of self-supervised pretraining, specifically using FastSiam on a large-scale, diverse multispectral UAV dataset (msuav100k), to address the "small data problem" in remote sensing.
- Provides strong evidence that self-supervised pretraining acts as an effective regularizer, significantly reducing overfitting and enhancing generalization capabilities of deep learning models for UAV applications.
- Highlights that for pretrained models, the diversity of training samples is more critical than their sheer volume for achieving robust performance in downstream tasks.
- Establishes a pathway for making advanced Deep Learning techniques more accessible for practical UAV applications by requiring modest computational resources and fewer labelled examples.
Funding
- Horizon Europe program, ICAERUS project (contract number 101060643).
Citation
@article{Doornbos2025Ending,
author = {Doornbos, Jurrian and Babur, Önder},
title = {Ending Overfitting for UAV Applications - Self-Supervised Pretraining on Multispectral UAV Data},
journal = {ISPRS annals of the photogrammetry, remote sensing and spatial information sciences},
year = {2025},
doi = {10.5194/isprs-annals-x-2-w2-2025-31-2025},
url = {https://doi.org/10.5194/isprs-annals-x-2-w2-2025-31-2025}
}
Original Source: https://doi.org/10.5194/isprs-annals-x-2-w2-2025-31-2025