QUALITY ASSESSMENT OF 3D RECONSTRUCTION USING FISHEYE AND PERSPECTIVE SENSORS

Recent mathematical advances, growing alongside the use of unmanned aerial vehicles, have not only overcome the restriction of roll and pitch angles during flight but also enabled us to apply non-metric cameras in photogrammetric method, providing more flexibility for sensor selection. Fisheye cameras, for example, advantageously provide images with wide coverage; however, these images are extremely distorted and their non-uniform resolutions make them more difficult to use for mapping or terrestrial 3D modelling. In this paper, we compare the usability of different camera-lens combinations, using the complete workflow implemented in Pix4Dmapper to achieve the final terrestrial reconstruction result of a well-known historical site in Switzerland: the Chillon Castle. We assess the accuracy of the outcome acquired by consumer cameras with perspective and fisheye lenses, comparing the results to a laser scanner point cloud.


INTRODUCTION
In recent years, more and more decent quality sensors have become available at an affordable price.In this paper, we address the question of the most suitable sensor-lens combination for terrestrial and indoor 3D modeling purposes.The goal is to identify the camera with the highest accuracy and to discuss the accuracy achieved from various data capture and processing.The use of fisheye cameras, for example, resolves the issue of limited space and restricted conditions for setting up camera stations; both problems are dominantly present in indoor and terrestrial city modelling.Fisheye lenses have a wide range of focus and their wide field of view makes it possible to capture a scene with far fewer images.This also leads to more stable and scalable bundle block adjustment on the camera's exterior and interior orientations.These fisheye lens advantages relate to the efficiency in data capture, which today, since the processing is highly automated, contributes dominantly to the total cost of terrestrial or indoor surveying.The disadvantage of fisheye lenses is their large and very non-uniform ground sampling distance (GSD), when compared to perspective lenses.Thus, finding a good compromise between accuracy and acquisition plus processing effort is one of the major questions we address in this paper.We began to look at fisheye lenses as a result of our previous work (C.Strecha, T. Pylvanainen, P. Fua, 2010) on large scale city modelling.It was during this work we observed that in terrestrial city modelling one needs many perspective images to reconstruct particularly narrow streets, for which fisheye lenses would help substantially to reduce the complexity and acquisition time.

Software
All results were generated by Pix4Dmapper (Pix4D, 2014) version 1.2.The software is partially based on earlier published work (C.Strecha, T. Pylvanainen, P. Fua, 2010) (C.Strecha, T. Tuytelaars, L. Van Gool, 2003) (Strecha, et al., 2008) and implements a fully automated photogrammetric workflow for indoor, terrestrial, oblique and nadir imagery.It is capable of handling images with missing or inaccurate geo-information.The approach combines the automation of computer vision techniques (C.Strecha and W. von Hansen and L. Van  The software supports both standard perspective camera models and equidistant fisheye camera models (Kannala, et al., 2006) (Schwalbe) (Hughes, et al., 2010).Given the world coordinates (, , ) in the camera centric coordinate system, the angle θ between the incident ray and the camera direction is given by: The angle θ is then further modelled by the polynomial coefficient   , which is part of the camera model: The projection of the world coordinates into image pixel coordinates (, ) is then given by: with the affine transformation parameters (, , , ,   ,   ).Note that (, ) indicates the focal length and (  ,   ) is the principal point coordinate in image space.The polynomial coefficient  1 is redundant to the parameters (, ) and is defined to be 1.

Reconstruction Site
The Chillon Castle (Figure 1) is an island castle located on the shore of Lake Geneva, close to the city of Montreux.It consists of 100 independent, yet partly connected, buildings and is one of the most visited historic monuments in Switzerland.
Figure 1.The Chillon Castle

GNSS Data:
A Trimble R 10 GNSS receiver was used in RTK-mode using a virtual reference station for measuring 11 marked points two times in different satellite constellations.The mean difference between the two measurements was 10 mm in x and y and 20 mm in z.Some of the GCPs were visible in the images (as shown in Figure 3) and some of them were used to define the tachymetric network as described in following.

Tachymeter Data:
A tachymetric network (Figure 2) was measured with a Trimble 5601 total station from the land side in order to survey control points which are measurable in both terrestrial and aerial images and detectable in the laser scan.Placing surveying marks on facades is strictly prohibited at the cultural heritage site.Instead, we used distinctive natural points of the buildings as control points (Figure 3).Most of the points were measured from different viewpoints with both instrument faces.For example, they could be surveyed with reflectorless distance or as spatial intersection without distance measurement (e.g. the spheroids on the top of the roofs).The least square adjustment leads to standard deviations of 8 mm, 7 mm and 6 mm, respectively, in x, y and z, taking the errors of the GNSSpoints into account.To acquire a comparable geodetic datum for the laser scan, 9 distinctive points from the tachymetric measurement were used, along with the laser scan targets, during registration of the scans.This resulted in the mean residuals of 2 mm for the laser targets and 10 mm for the tachymetric points after adjustment, which seems sufficient, considering the fact that the tachymetric points were not perfectly detectable in the gridded point cloud.
Gross errors were removed manually from the registered scan result, and each point was in average 5 mm apart from the adjacent ones.The colored laser scan point cloud is shown in Figure 5 below.The following processing was performed using Pix4Dmapper in default mode for processing and point cloud densification.
Everything was calculated on a standard computer with 16 GB of RAM.
For cameras writing the geolocation directly into image EXIF (Canon 6D), the initial processing automatically uses the corresponding coordinate system.For all the other cameras, a local coordinate system was used.After importing the ground control points in the Swiss coordinate system (CH 1903) and marking three of them, a re-optimization led to the final coordinate system with the ability just to select a GCP in the rayCloud editor.The image view then automatically presents all the local image parts where this GCP can be seen side by side.This makes the GCP selection highly efficient and avoids opening and closing a lot of images.
Multi-view stereo processing of terrestrial pictures tends to produce point clouds with errors in "densification of the sky."Therefore, Pix4Dmapper implements an "Image Annotation" option.This is an automatic segmentation of the image, such that one can simply draw once around the outline of the measurable object to filter out the sky region seen in Figure 7. Since this cut affects rays to the infinity, there is no need to mark the annotation areas in all of the pictures.In the following, we will further describe the different datasets taken.For each camera, we give a short description and show the reconstruction.Table 1 shows the characteristics of the different datasets.The residuals of the GCPs are listed from south to north (left to right from the view of the land side facade) and grouped in GCPs on the facade to the land side.Above this facade, GCPs on top of the towers, on the backside of the facade (all natural distinctive points), and on the ground, are marked with plates, mainly for the aerial pictures, but partially also detectable in the terrestrial images. in a quite limited space.In addition to the land-side pictures, images were taken from the lake-side using a boat.Because the lens was un-mounted from the body between lake-side and landside images, variation in the interior parameters can be expected.Therefore, two different sets of camera parameters were calculated.To do this, two single Pix4Dmapper projects were generated and merged together after finishing the initial processing (bundle adjustment) for the two individual projects.

Sony
The results of this and the dense point cloud are shown in Figure 8.

Sony Alpha7R:
The Sony Alpha7R has a 36MP full frame CMOS sensor.Currently there are only a limited set of lenses available that support the full frame sensor.We used Sony 2.8/16mm fisheye lens with M-mount and the corresponding Emount adapter to get the full 36MP image.All images have been taken with a tripod.

Canon 7D:
Comparable to the NEX 7, this image set nearly surrounds the whole exterior, excluding a part in the north of the land-side, which has a bigger height difference and vegetation.In addition to the land-side pictures, which included images normal to the facade and oblique in horizontal and vertical orientation, there were also pictures taken from the facades on the lake-side using a boat, leading to a nearly closed circle around the facade.

Canon 6D:
The Canon 6D has a 20.2MP full frame CMOS sensor.An integrated GPS receiver tags geolocation into the EXIF data that was automatically used by Pix4Dmapper as an estimate of exterior positions.Table 2. Residuals of the used GCPs.

RESULTS AND DISCUSSION
Figure 16 shows the color-coded difference between the laser scan and the reconstructions of the different datasets as signed distances in the range [-150 mm, 150 mm].In the histograms, the distance for most points fall within this range.Some plots have large red areas.These do not represent reconstruction errors, but simply show that no data has been acquired in this area for the dataset in question.
The data sets from the NEX 7 (16 mm), Canon 7D (20 mm) and Canon 6D (28 mm) show an obvious systematic error in the areas where single laser scan stations are registered.The point cloud registration of the scan station in the north, which is quite near to the end of the wall, doesn't seem to be sufficient, and the area in the very north shows identical errors up to 4 cm.Besides the imperfect scan registration in these areas, the scanner rays were reflected in an acute angle from the surface, which possibly also had an effect on the results.
The parts of the building with a normal surface pointing more to the scan station, like the building projection at the bottom in the north area or the parts of the last small tower facing to the scan station, have small differences of up to 10 mm.For all datasets, we fitted a Gaussian to the deviation histograms and estimated the mean and standard deviation.Table 3 summarizes these results.For all datasets, the mean is significantly lower than the standard deviation, indicating that the results are identical within their error.The datasets taken with a tripod have the lowest deviations, followed by the Phantom2 Vision.The fisheye lenses mostly reach good accuracies; when the field of view was not too large, no significant degradation in accuracy could be detected compared to a perspective lens.The 8 mm fisheye has the highest deviation, indicating that the 180degree field of view introduces a larger error.For the GoPro Hero3+, the rolling shutter also contributes to the overall error, making it less accurate.
Table 2 shows the results of the bundle adjustment.The standard deviation of the GCPs is in the range of a few millimeters and comparable for all cameras.The projection and xyz-errors show a clear increase for the two rolling shutter cameras (GoPro and Phantom2 Vision).While the projection error of the global shutter cameras is around 0.5, the rolling shutter cameras have an error of 2 or 3 pixels.Taking the rolling shutter into account in the mathematical model would most likely decrease this error.Table 3.Comparison of the different datasets to the laser scan data, which in its highest point sampling mode required an acquisition time of ~4 hours.The deviation to laser corresponds to a Gaussian fit of the deviation histograms in Figure 16.

CONCLUSION
We compared photogrammetric 3D models from images of different cameras and lenses with the results of a laser scanner.
The accuracy of image-derived 3D models is generally comparable to that of a laser scanner, but with a large advantage of ease of use, highly reduced acquisition time (4 hours for the laser scan, 10 minutes for the GoPro), and cost.A comparison of different perspective and fisheye lenses was done to investigate the influence of their smaller and non-uniform GSD to see if it outweighs the clear advantage for image based 3D modelling (i.e.better performance in poorly-textured indoor scenarios as well as flexibility in capturing a complex scene with a very limited distance to the scene).
From the experiments one can conclude that even fisheye lenses can reach accuracies that are substantially below 10cm for an approximate object distance of 15m.The worst result is obtained from a full 180-degree lens, while the less extreme fisheye lenses on the Phantom2 Vision and for the GoPro have results that are additionally disturbed by their rolling shutters and the fact that they have been triggered in a handheld situation.This was done on purpose to show their accuracy when the acquisition time is as fast as possible.Our experience shows that the quality of both the Phantom and GoPro increases when the cameras are mounted on a tripod or when the rolling shutter is accounted for in the camera model.
The presented scenario shows that while the accuracy of a full 180-degree fisheye is not as good as that of a perspective lens, smaller angle fisheye lenses on good cameras give comparable results to perspective lenses.The rolling shutter of consumer cameras has an additional effect, which makes the comparison more difficult.
For indoor scenarios as shown in Figure 17, perspective cameras would be impractical.Fisheye cameras yield very good results in this indoor situation.

Figure 2 .
Figure 2. Tachymetric network for surveying the control and verification points

Figure 4 .
Figure 4. Results of tachymetric control and verification points 2.3.3Laser Scanner: Three scan stations were set up to get independent measurements of the facade on the land side.The two scanners, Faro Focus 3D X330 and the Trimble TX5 (identical to Faro Focus 3D X130), were measuring simultaneously.Both ranging errors are specified with ±2 mm at 10 and 25 meters.Six spherical laser scan targets were placed around the scene.

Figure 5 .
Figure 5. Laser scan point cloud registered from three individual laser scans (blue points: laser scan targets, red points: tachymetric points).The registration was performed using the bounded software provided with the laser scanner 2.4 Image acquisition and processing For the cameras and lenses that possessed a manual setting option, the following settings were taken in order to keep a stable imagecapturing condition:  No zooming for the lenses with variable focal length  Manual focus setting once for all the images  Physical image stabilization turned off  Small ISO values to prevent ISO noise  Higher aperture values to get larger depth of field  Tripod mount for all cameras except for the GoPro and Phantom 2 vision datasets

Figure 6 .
Figure 6.Efficient marking of GCPs: the white light ray shows the measuring from the actual image

Figure 8 .
Figure 8. Results of the bundle adjustment.Automatic tie points, GCPs, and camera centres are shown in rayCloud Editor (top) and the dense model at the bottom for the Sony Nex 7

Figure 9 .
Figure 9. Results of the bundle adjustment.Automatic tie points, GCPs, and camera centres shown in rayCloud Editor (top) and the resulting dense point cloud at the bottom for the Canon 7D

Figure 11 .
Figure 11.Results of the bundle adjustment.Automatic tie points, GCPs, and camera centres shown in rayCloud Editor (top) and the dense point cloud (bottom) for the 8mm dataset

Figure 12 .
Figure 12. Results of the bundle adjustment.Automatic tie points, GCPs, and camera centres shown in rayCloud Editor

Figure 13 .
Figure 13.Results of the bundle adjustment.Automatic tie points, GCPs, and camera centres shown in rayCloud Editor (top) and the dense point cloud (bottom) for the 28mm lens dataset

Figure 14 .
Figure 14.Results of the bundle adjustment.Automatic tie points, GCPs, and camera centres shown in rayCloud Editor (top) and the resulting dense point cloud (bottom)

Figure 15 .
Figure 15.Results of the bundle adjustment.Automatic tie points, GCPs, and camera centres shown in rayCloud Editor (top) and the resulting dense point cloud (bottom)

Figure 16 .
Figure 16.Color coded difference between the laser scan points and the photogrammetric triangulated meshes (red and blue values indicate missing data for the photogrammetric point cloud)

Table 1 .
Data description for the different datasets ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume II-3/W4, 2015 PIA15+HRIGI15 -Joint ISPRS conference 2015, 25-27 March 2015, Munich, Germany NEX 7: This image set surrounds the whole exterior of the castle.In front of the facade, different distances were used This contribution has been peer-reviewed.The double-blind peer-review was conducted on the basis of the full paper.doi:10.5194/isprsannals-II-3-W4-215-2015