ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences
Download
Publications Copernicus
Download
Citation
Articles | Volume V-3-2022
ISPRS Ann. Photogramm. Remote Sens. Spatial Inf. Sci., V-3-2022, 17–23, 2022
https://doi.org/10.5194/isprs-annals-V-3-2022-17-2022
ISPRS Ann. Photogramm. Remote Sens. Spatial Inf. Sci., V-3-2022, 17–23, 2022
https://doi.org/10.5194/isprs-annals-V-3-2022-17-2022
 
17 May 2022
17 May 2022

HOW FAR SHOULD I LOOK? A NEURAL ARCHITECTURE SEARCH STRATEGY FOR SEMANTIC SEGMENTATION OF REMOTE SENSING IMAGES

M. C. M. de Paulo1, J. N. Turnes3, P. N. Happ2, M. P. Ferreira1, H. A. Marques1, and R. Q. Feitosa2 M. C. M. de Paulo et al.
  • 1Dept. of Defense Engineering, Military Institute of Engineering, Rio de Janeiro, Brazil
  • 2Dept. of Electrical Engineering, Pontifical Catholic University of Rio de Janeiro, Rio de Janeiro, Brazil
  • 3Dept. of System Design Engineering, University of Waterloo, Waterloo, Canada

Keywords: Neural Architecture Search, Semantic Segmentation, Remote Sensing, Satellite imagery, Convolutional Neural Networks

Abstract. Neural architecture search (NAS) is a subset of automated machine learning that tries to find the best neural network to perform a given task. In this article, a network search space is defined and applied to perform the semantic segmentation of satellite imagery. Due to the spatial nature of the data, the search space uses cells that group parallel operations with kernels of different sizes, providing options to accommodate the neighborhood information required to perform a better classification. The architecture search space follows a UNet-like network. The proposed approach uses scaled sigmoid gates, a strategy for network pruning that was adapted to search for the best operations on the cell search space. The architecture achieved by the proposed approach uses wider kernels on lower resolution feature maps, which leads to the interpretation that some pixels required information from pixels farther away than expected. The resulting network was compared to a very similar UNet-like network that only used 3×3 convolutions. The resulting network shows slightly better results on the test set.