ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences
Download
Publications Copernicus
Download
Citation
Articles | Volume V-1-2022
ISPRS Ann. Photogramm. Remote Sens. Spatial Inf. Sci., V-1-2022, 129–136, 2022
https://doi.org/10.5194/isprs-annals-V-1-2022-129-2022
ISPRS Ann. Photogramm. Remote Sens. Spatial Inf. Sci., V-1-2022, 129–136, 2022
https://doi.org/10.5194/isprs-annals-V-1-2022-129-2022
 
17 May 2022
17 May 2022

PSCNET: EFFICIENT RGB-D SEMANTIC SEGMENTATION PARALLEL NETWORK BASED ON SPATIAL AND CHANNEL ATTENTION

S. Q. Du1,2, S. J. Tang1,2, W. X. Wang1,2, X. M. Li1,2, Y. H. Lu3, and R. Z. Guo1,2 S. Q. Du et al.
  • 1School of Architecture and Urban Planning, Research Institute for Smart Cities, Shenzhen University, Shenzhen, P.R. China
  • 2Key Laboratory of Urban Land Resources Monitoring and Simulation, Ministry of Natural Resources, Shenzhen, P.R. China
  • 3School of Resource and Environmental Sciences, Wuhan University, Wuhan 430072, P.R. China

Keywords: Deep Learning, Semantic Segmentation, RGB-D Fusion, Channel Attention,Spatial Attention

Abstract. RGB-D semantic segmentation algorithm is a key technology for indoor semantic map construction. The traditional RGB-D semantic segmentation network, which always suffer from redundant parameters and modules. In this paper, an improved semantic segmentation network PSCNet is designed to reduce redundant parameters and make models easier to implement. Based on the DeepLabv3+ framework, we have improved the original model in three ways, including attention module selection, backbone simplification, and Atrous Spatial Pyramid Pooling (ASPP) module simplification. The research proposes three improvement ideas to address these issues: using spatial-channel co-attention, removing the last module from Depth Backbone, and redesigning WW-ASPP by Depthwise convolution. Compared to Deeplabv3+, the proposed PSCNet are approximately the same number of parameters, but with a 5% improvement in MIoU. Meanwhile, PSCNet achieved inference at a rate of 47 FPS on RTX3090, which is much faster than state-of-the-art semantic segmentation networks.