Volume IV-1
ISPRS Ann. Photogramm. Remote Sens. Spatial Inf. Sci., IV-1, 29-36, 2018
https://doi.org/10.5194/isprs-annals-IV-1-29-2018
© Author(s) 2018. This work is distributed under
the Creative Commons Attribution 4.0 License.
ISPRS Ann. Photogramm. Remote Sens. Spatial Inf. Sci., IV-1, 29-36, 2018
https://doi.org/10.5194/isprs-annals-IV-1-29-2018
© Author(s) 2018. This work is distributed under
the Creative Commons Attribution 4.0 License.

  26 Sep 2018

26 Sep 2018

SEMANTIC SEGMENTATION OF AERIAL IMAGERY VIA MULTI-SCALE SHUFFLING CONVOLUTIONAL NEURAL NETWORKS WITH DEEP SUPERVISION

K. Chen1,2, M. Weinmann3, X. Sun1, M. Yan1, S. Hinz4, B. Jutzi4, and M. Weinmann4 K. Chen et al.
  • 1Key Laboratory of Technology in Geo-Spatial Information Processing and Application System, Institute of Electronics, Chinese Academy of Sciences, Beijing, P. R. China
  • 2University of Chinese Academy of Sciences, Beijing, P. R. China
  • 3Institute of Computer Science II, University of Bonn, Bonn, Germany
  • 4Institute of Photogrammetry and Remote Sensing, Karlsruhe Institute of Technology, Karlsruhe, Germany

Keywords: Semantic Segmentation, Aerial Imagery, Multi-Modal Data, Multi-Scale, CNN, Deep Supervision

Abstract. In this paper, we address the semantic segmentation of aerial imagery based on the use of multi-modal data given in the form of true orthophotos and the corresponding Digital Surface Models (DSMs). We present the Deeply-supervised Shuffling Convolutional Neural Network (DSCNN) representing a multi-scale extension of the Shuffling Convolutional Neural Network (SCNN) with deep supervision. Thereby, we take the advantage of the SCNN involving the shuffling operator to effectively upsample feature maps and then fuse multiscale features derived from the intermediate layers of the SCNN, which results in the Multi-scale Shuffling Convolutional Neural Network (MSCNN). Based on the MSCNN, we derive the DSCNN by introducing additional losses into the intermediate layers of the MSCNN. In addition, we investigate the impact of using different sets of hand-crafted radiometric and geometric features derived from the true orthophotos and the DSMs on the semantic segmentation task. For performance evaluation, we use a commonly used benchmark dataset. The achieved results reveal that both multi-scale fusion and deep supervision contribute to an improvement in performance. Furthermore, the use of a diversity of hand-crafted radiometric and geometric features as input for the DSCNN does not provide the best numerical results, but smoother and improved detections for several objects.