详细信息

Vision Foundation Model Guided Multi-Modal Fusion Network for Remote Sensing Semantic Segmentation ( EI收录) `被引量：64`

文献类型：期刊文献

英文题名：Vision Foundation Model Guided Multi-Modal Fusion Network for Remote Sensing Semantic Segmentation

作者：Pan, Chen[1] Fan, Xijian[1,2] Tjahjadi, Tardi[3] Guan, Haiyan[4] Ye, Qiaolin[1] Fu, Liyong[2] Wang, Ruili[1,5]

第一作者：Pan, Chen

机构：[1] Nanjing Forestry University, Nanjing, 210037, China; [2] Institute of Forest Resource Information Techniques, Chinese Academy of Forestry, Beijing, 100091, China; [3] University of Warwick, Coventry, CV4 7AL, United Kingdom; [4] Nanjing University of Information Science and Technology, Nanjing, 210044, China; [5] Massey University, Auckland, 0745, New Zealand

年份：2024

外文期刊名：SSRN

收录：EI(收录号：20240263165)

语种：英文

外文关键词：Convolutional neural networks - Mapping - Modal analysis - Remote sensing - Semantics

摘要：With the rapid development of Earth observation sensors, the fusion of remote sensing (RS) data in multi-modal semantic segmentation has garnered significant research focus in recent years. The fusion of multi-modal data presents challenges due to discrepancies in image acquisition mechanisms among different sensors, leading to misalignment issues. To mitigate this challenge, this paper presents VSGNet, a novel multi-modal fusion framework designed for RS semantic segmentation. The work aims to utilise vision structure guidance derived by vision foundation model for accurate segmentation without the need for auxiliary sensors. Specifically, the framework incorporates a cross-modal collaborative network for feature embedding that blends a convolutional neural network and vision transformer to simultaneously capture both local information and long-range dependencies from the input modalities. Subsequently, a multi-scale cross-modal feature fusion comprising fusion enhancement and feature re-calibration modules is proposed to emphasise the adaptive multiscale interaction of diverse complementary cues between each modality while suppressing the impact of noise and uncertainties present in RS data. Extensive experiments conducted on five diverse RS datasets, i.e., ISPRS Potsdam, ISPRS Vaihingen, LoveDA, iSAID and Tree Mapping, demonstrate VSGNet outperforms state-of-the-art RS semantic segmentation models. The source code for implementing VSGNet and Tree Mapping dataset will be publicly available at https://github.com/Pcccc1/VSGNet. ? 2024, The Authors. All rights reserved.

参考文献：

正在载入数据...

详细信息

Vision Foundation Model Guided Multi-Modal Fusion Network for Remote Sensing Semantic Segmentation ( EI收录) 被引量：64

参考文献：

Vision Foundation Model Guided Multi-Modal Fusion Network for Remote Sensing Semantic Segmentation ( EI收录) `被引量：64`