Tan et al. (2026) UGFF-VLM: Uncertainty-Guided and Frequency-Fused Vision-Language Model for Remote Sensing Farmland Segmentation

⚠️ Warning: This summary was generated from the abstract only, as the full text was not available.

Identification

Journal: Remote Sensing
Year: 2026
Date: 2026-01-15
Authors: Ke Tan, Yanlan Wu, Hui Yang, Xiaochun Ma
DOI: 10.3390/rs18020282

Research Groups

Not specified in the provided text.

Short Summary

This paper proposes an Uncertainty-Guided and Frequency-Fused Vision-Language Model (UGFF-VLM) for remote sensing farmland extraction, which addresses challenges in ambiguous text-visual alignment and loss of high-frequency boundary details. The UGFF-VLM achieves excellent and stable performance, demonstrating the highest mean Intersection over Union (mIoU) and significant improvements in boundary precision and robustness across diverse geographical environments.

Objective

To address the challenges of ambiguous text-visual alignment and loss of high-frequency boundary details in existing vision-language models for remote sensing farmland extraction.
To propose an Uncertainty-Guided and Frequency-Fused Vision-Language Model (UGFF-VLM) that enhances the model's ability to recognize polymorphic features and effectively improves boundary segmentation accuracy.

Study Configuration

Spatial Scale: Not specified in the provided text.
Temporal Scale: Not specified in the provided text.

Methodology and Data

Models used: Uncertainty-Guided and Frequency-Fused Vision-Language Model (UGFF-VLM), which integrates:
- Vision-language models (as a base).
- Uncertainty-Guided Adaptive Alignment (UGAA) module.
- Frequency-Enhanced Cross-Modal Fusion (FECF) mechanism.
Data sources: FarmSeg-VL dataset.

Main Results

The proposed UGFF-VLM delivers excellent and stable performance for remote sensing farmland extraction.
It achieves the highest mIoU across diverse geographical environments.
Demonstrates significant improvements in boundary precision.
Shows enhanced robustness against false positives.
Mitigates issues of recognition confusion and poor generalization in purely vision-based models caused by farmland feature polymorphism.
Effectively enhances boundary segmentation accuracy.

Contributions

Proposes the UGFF-VLM, a novel vision-language model specifically designed for remote sensing farmland extraction, which leverages semantic prior knowledge from textual descriptions.
Introduces an Uncertainty-Guided Adaptive Alignment (UGAA) module to dynamically adjust cross-modal fusion based on alignment confidence, improving text-visual alignment.
Develops a Frequency-Enhanced Cross-Modal Fusion (FECF) mechanism to preserve high-frequency boundary details in the frequency domain, addressing detail loss during fusion.
Provides a reliable method for the precise delineation of agricultural parcels in diverse landscapes, overcoming limitations of existing methods regarding recognition confusion, poor generalization, and boundary accuracy.

Funding

Not specified in the provided text.

Citation

@article{Tan2026UGFFVLM,
  author = {Tan, Ke and Wu, Yanlan and Yang, Hui and Ma, Xiaochun},
  title = {UGFF-VLM: Uncertainty-Guided and Frequency-Fused Vision-Language Model for Remote Sensing Farmland Segmentation},
  journal = {Remote Sensing},
  year = {2026},
  doi = {10.3390/rs18020282},
  url = {https://doi.org/10.3390/rs18020282}
}

Original Source: https://doi.org/10.3390/rs18020282