Wang et al. (2025) CitrusNet: A vision transformer-CNN approach for citrus detection from multi-source imagery with multi-scale feature integration
Identification
- Journal: Computers and Electronics in Agriculture
- Year: 2025
- Date: 2025-12-01
- Authors: Haochen Wang, Juan Shi, Hamed Karimian, Fei Wang, Faizan Javed, Bo Liu, Shengnan Shi, Ziwei Li, Tao Yang
- DOI: 10.1016/j.compag.2025.111260
Research Groups
- School of Marine Technology and Geomatics, Jiangsu Ocean University, Lianyungang, China
- Key Laboratory of Marine Meteorological Disaster Prevention and Mitigation of Jiangsu Province, Lianyungang, China
- School of Engineering, The University of Western Australia, Crawley, WA, Australia
- Centre for Water and Spatial Science, The University of Western Australia, Crawley, WA, Australia
- School of Electrical Engineering and Telecommunications, The University of New South Wales, Sydney, NSW, Australia
Short Summary
This paper introduces CitrusNet, a novel deep learning model combining Vision Transformers and Convolutional Neural Networks with multi-scale feature integration, to accurately detect citrus fruits across diverse multi-source imagery, outperforming state-of-the-art models.
Objective
- To develop a robust deep learning model capable of accurately detecting citrus fruits from multi-source imagery (UAV, mobile, AI-synthesized) despite variations in scale, resolution, and sensor types, for efficient crop monitoring and management.
Study Configuration
- Spatial Scale: Multi-scale, adapting to citrus fruits in various complex backgrounds and at different sizes, using imagery from UAVs, mobile devices, and AI synthesis.
- Temporal Scale: Implied for crop monitoring and yield forecasting, but no specific duration or frequency is mentioned.
Methodology and Data
- Models used: CitrusNet, a hybrid Vision Transformer-CNN model, incorporating:
- Improved Residual Multi-Layer Perceptron (Res-MLP) based on Swin Transformer.
- Convolutional Neural Network (CNN) with a plug-and-play Adaptive Feature Fusion Module (AFM).
- Decoupled detect head with a Multi-Scale Depthwise Fusion Module (MSDM).
- Data sources: Self-created Citrus Multi-Source Detection Dataset (CMSDD), comprising UAV, mobile, and AI-synthesized imagery.
Main Results
- CitrusNet achieved high performance metrics for citrus detection:
- Precision: 91.20%
- Recall: 87.16%
- F1 score: 0.891
- mAP50: 94.07%
- mAP50:95: 84.25%
- The model demonstrated superior accuracy and robustness compared to state-of-the-art models, making it a promising solution for citrus crop monitoring.
Contributions
- Introduction of CitrusNet, a novel hybrid Vision Transformer-CNN deep learning model designed for robust citrus detection in multi-source imagery.
- Enhancement of the Residual Multi-Layer Perceptron (Res-MLP) using Swin Transformer to improve perceptual capability across diverse scales by integrating local and global features.
- Proposal of a plug-and-play Adaptive Feature Fusion Module (AFM) within the CNN, which automatically adjusts feature channel weights to enhance multi-scale feature fusion.
- Development of a decoupled detection head incorporating a Multi-Scale Depthwise Fusion Module (MSDM) to improve the model’s adaptability to complex backgrounds and targets at varying scales.
- Creation of the Citrus Multi-Source Detection Dataset (CMSDD) to facilitate research and development in multi-source citrus detection.
Funding
- Not specified in the provided paper text.
Citation
@article{Wang2025CitrusNet,
author = {Wang, Haochen and Shi, Juan and Karimian, Hamed and Wang, Fei and Javed, Faizan and Liu, Bo and Shi, Shengnan and Li, Ziwei and Yang, Tao},
title = {CitrusNet: A vision transformer-CNN approach for citrus detection from multi-source imagery with multi-scale feature integration},
journal = {Computers and Electronics in Agriculture},
year = {2025},
doi = {10.1016/j.compag.2025.111260},
url = {https://doi.org/10.1016/j.compag.2025.111260}
}
Original Source: https://doi.org/10.1016/j.compag.2025.111260