PRELIMINARY INVESTIGATION ON POSSIBILITY OF SUPER RESOLUTION OF UAV ORTHOIMAGES

Since inspection of infrastructures using UAV images seems to be efficient, many systems for infrastructure maintenance using UAV images have been developed recently. For the purpose of more efficient image acquisition, we started an investigation on the possibility of super resolution (SR) of UAV images to obtain high-resolution (HR) orthoimages suitable for infrastructure maintenance. This paper reports an preliminary investigation using existing UAV images acquired for 3D measurement that were not be intended to be utilized for SR. We produced HR orthoimages by three SR methods: image interpolation of a single low-resolution (LR) image by cubic convolution, SR by resampling, and SR based on observation equations. Both SR by resampling and SR based on observation equations utilize multiple overlapping LR images. Results of the investigation demonstrate that SR based on observation equations using multiple overlapping images would be able to provide higher resolution orthoimages than those produced by an ordinary method. The results show that an inaccurate DSM utilized in SR processing degrades the quality of SR results as well. Furthermore, the results illustrate that the quality of the result of SR processing depends rather upon the characteristic of a lens utilized in image acquisition. We think that further investigations on SR using UAV images would be necessary in order to put SR to practical use.


Inspection of infrastructures using UAV images
Japan has been building many infrastructures since the end of World War II. In the 21st century the importance of infrastructure maintenance is rapidly growing, and becomes higher than that of infrastructure building now. Currently, efficient inspection of infrastructures is seriously required because of a shortage of workers, especially skilled workers. Therefore, inspection of infrastructures using digital camera images is currently required in order to cut down on human works.
Inspection of infrastructures using images acquired by a digital camera on board an unmanned aerial vehicle (UAV), which becomes more popular recently, seems to be a promising means for efficient inspection. Many systems for maintenance of a concrete structure by using UAV images have been developed. Many of them is aimed to detect only hair cracks of submillimetre wide on a surface of a concrete structure.
A system developed by Mizukami, et al. (2018) aimed to detect cracks of 0.2 mm wide on a surface automatically by image processing techniques. Moreover, their system aimed to detect exfoliations of 10 mm deep on a surface, openings of 10 mm wide in a joint, and level differences of 10 mm high in a joint by using 3D measurement results provided by a piece of structure from motion (SfM) software. The performance of their system satisfies the standards of maintenance of such concrete structures as dams, head works, and open channels in Japan.
Their system was designed to acquire high-resolution images from 0.2 mm to 0.5 mm GSD (ground sampling distance) in order to detect cracks of 0.2 mm wide on a surface. On the other hand, their system was designed to acquire images of equal to or smaller than 5mm GSD in order to detect exfoliations of 10 mm deep on a surface, openings of 10 mm wide in a joint, and level differences of 10 mm high in a joint.
Since the difference of GSD between two requirements is very large, their system utilizes two different sets of a camera and a lens. They adopted a lens of 135mm focal length for detection of cracks of 0.2 mm wide on a surface, and a lens of 50mm focal length for 3D measurement.

Need of super resolution of UAV images
Maintenance of such concrete structures as dams, head works, and open channels requires detection of cracks of 0.2 mm wide on a surface, while maintenance of such concrete structures as coastal dikes requires detection of cracks of simply 1.0 mm wide on a surface.
We considered that some methods to make resolution of observed images higher might enable to detect cracks of 1.0 mm wide on a surface by using UAV images of 5mm GSD. If UAV images acquired for 3D measurement by using a lens of 50mm focal length can be utilized for detection of cracks of 1.0 mm wide, the cost of inspection will be reduced.
Accordingly, we decided to start an investigation on possibility of super resolution (SR) of UAV images. Most cases of UAV photogrammetry in Japan adopt high overlapping ratios in image acquisition planning. One of the typical image acquisition plans is that the overlapping ratio of successive images along a flight strip, which is called OL for short, is 90%, and that of adjacent flight strips, which is called SL for short, is 60%. The high overlapping ratios make a region of interest (ROI) be photographed by many UAV images. Figure 1 shows a ROI, which 27 UAV images photographed as Figure 2 shows. This suggests that SR using multiple overlapping images may be applicable to obtain HR images. However, it is expected that an unstable flight of a UAV might make SR rather difficult. Since infrastructure inspection using UAV images utilizes orthoimages frequently, our study focuses on the image resolution of UAV orthoimages. Considering that UAV images utilized for 3D measurement are highly overlapped, our study focuses on SR using multiple overlapping images as well. This paper reports our preliminary investigation by using existing UAV images acquired for 3D measurement which were not be intended to be utilized for SR.

Techniques of super resolution
Super resolution (SR) is to get resolution beyond the diffraction limit of an optical device. SR can be classified into two types: one is a technique to obtain higher resolution during observation by using a dedicated observation system, and the other is a technique to obtain higher resolution after observation by using digital image processing. The former is called hardware-based SR, or device-dependent SR. The latter is called SR by data processing, digital SR, or mathematical SR. Since it is difficult to have a dedicated device for SR on board our UAV system, we decided to investigate methods of SR by data processing.
SR by data processing can be classified into two types: one is processed in the frequency domain, and the other is in the spatial domain (Yang, Huang, 2017). Since several SR methods in the spatial domain seems to be easily adaptable to UAV imaging system for inspection of facilities made of concrete, we decided to investigate these methods.
Furthermore, SR methods in the spatial domain can be classified into two types: one is methods to obtain a high-resolution (HR) image from a single observed low-resolution (LR) image, and the other is those to construct a HR image from several observed LR images. The former, which is denoted SR-S for short in the paper, is also called image interpolation (Meijering, 2002). We investigated two types of cubic convolution popular in image processing as SR-S.

Let
, be an analogue image formed on the focal plane of an imaging system, , be a HR (fine) image, and , be observed LR (coarse) images. Here, is the number of the observed LR images, and 1, 2, ⋯ , represents a sequence of multiple LR images.
, and , are both sampled images.

,
is obtained by shifting the sampling position by an amount Δ smaller than the pixel interval of the LR image when sampling , .
, is sometimes called a sub-pixel shift and overlapping image. The pixel shifts Δ may be unequally spaced when a camera on board a UAV acquires images. LR images utilized in SR are sub-pixel shift and overlapping images.

Super resolution by using multiple low resolution images
Popular methods of SR by using multiple LR images are (i) SR by resampling and (ii) SR based on observation equations.

SR by resampling:
SR by resampling, which is denoted SR-R for short, is a method that constructs a HR image by averaging multiple resampled LR images. Each observed pixel of a LR image , is mapped to corresponding sampling position on a HR image , by a geometric transformation . An average of the resampled LR images , becomes a HR image , . , Translation, Helmert transformation, affine transformation, or perspective transformation is often utilized as depending on a problem. Nearest-neighbour interpolation or cubic convolution is popular as an interpolation method utilized in this method. This method is sometimes called SR by registration.

SR based on observation equations: A simple model that a LR image
, is a sub-pixel shift and overlapping image is usually adopted in SR based on observation equations, which is denoted SR-O for short. We assume that each pixel of a HR image , and a LR image , is a square area, and is sampled at uniform intervals on an analogue image , .
Let ℎ , be a point spread function (PSF), then a LR image , , of a pixel shift , can be represented by the following equation: If the grey level , of the analogue image takes a constant value , within the pixel of the corresponding HR image, that is to say the PSF is a square wave, can be represented by the following equation: where , is the area of , included in the pixel , of , , when the area of one pixel of , is 1.
Observation Equations (2) as to all pixels of all LR images , , provide , if the number of observation equations is equal to or more than the number of pixels of the estimated HR image. Solving the observation equations is usually executed by iteration.
We adopt the iterative back projection (IBP) method proposed by Irani and Peleg (Irani, Peleg, 1993) in the investigation. Since the method is basic and simple but useful, it is one of the most popular method. Figure 4 shows is performed by repeating the following processing.

SR by the Irani and Peleg method as
Step 1: Set an initial value , of the HR image. Any image can be utilized because it does not affect the convergence result. However, from the viewpoint of the convergence speed, the average value image of LR images acquired by using Equation (1) is often utilized.
Step 2: Perform the simulation of a LR image observation on , . Let the obtained LR image be , , then the observation simulation is expressed by the following equation.
where ℎ , is a PSF in the observation, and , is the coordinates on the , plane (HR image) corresponding to the centre of the pixel , on the LR image is located. Step should be equal to the actual observed LR image , . Let the observation simulation error be defined by the following equation: If is equal to or smaller than an appropriate designated value , the processing is terminated.
Step 4: Modify , so that becomes smaller. Then set the modified , as , , and return to Step 2. Let , be the pixel of the LR image that observes the pixel , of the HR image, and ∆ , be the correction amount for a pixel , . Moreover, let , be a weight as to ∆ , , and its weighted average be a total correction amount ∆ , as to the pixel , of the HR image.
, is represented by the following equation. where, , ℎ ,  is an appropriate scaling parameter.
Many studies on SR by using multiple LR images adopted the IBP method proposed by Irani and Peleg or its modified methods. Many of them utilized synthesize images and a few studies processed real images. Fukue et al. (2009) studied on the impact of radiometric density noise and geometric position noise. They utilized a digitized aerial photogram as a HR image, but LR images were synthesized by assumed observation equations. Matsumoto et al. (2009) reported an experiment on SR for detection of hair cracks of sub-millimetre wide on a concrete structure. They processed LR images acquired by using a real camera in not actual situation. There are few studies on SR by using multiple UAV images for inspection of infrastructure.

PROCESS OF SUPER RESOLUTION IN THE STUDY
We conducted SR processing in the study as follows: [1] Estimate internal and external camera parameters of UAV images by using SfM software Pix4Dmapper Pro. This step has already been conducted by the dataset providers.
[2] Generate point clouds by using Pix4Dmapper. This step has already been conducted by the dataset providers as well.
[3] Create a digital surface model (DSM) from the generated point clouds.
[4] Create HR orthoimages by using the estimated internal and external camera parameters, and the created DSM. We adopt four interpolation methods: nearest neighbour interpolation (NN), bi-linear interpolation (BL), cubic convolution proposed by Riffman (1973) (CR), and cubic convolution proposed by Simon (1975) (CS). Results by CR and CS are utilized for evaluation of SR by using a single LR image (SR-S).
The interpolation functions w(t) of cubic convolution by Riffman and by Simon are represented by Equation (10) and Equation (11) respectively.
[5] Conduct SR by resampling (SR-R). Since the accuracy of camera parameters of each image and point clouds obtained by Pix4Dmapper are insufficient for SR by using multiple LR images, fine image registration is executed in the study. An image which photographed the ROI at the nearest to its image centre is selected as a reference orthoimage of the fine registration. Grey level of each orthoimage except the reference orthoimage are adjusted to that of the reference orthoimage as much as possible.
We adopt translation as geometric transformation, and we estimated shifts between orthoimages by least squares matching. Then each LR orthoimage for SR is created by using its estimated shift to the reference orthoimage. Results by using four interpolation methods NN, BL, CR, and CS are utilized for evaluation of SR-R.
[6] Conduct SR based on observation equations (SR-O). We conduct fine image registration in the same way as the abovementioned SR-R.
We adopt two types of a PSF ℎ , : one is a square wave function, and the other is based on the Gaussian function. We assume that the latter has an effect on the 8-neighbourhood  of the target pixel , in the HR image, and is defined by the following equation.

DATASETS UTILIZED IN THE STUDY
We utilized two existing datasets in the preliminary investigation: one is a dataset for a feasibility study of UAV photogrammetry for "i-Construction" (N. Takahashi, et al., 2017), and the other is a dataset acquired in an evaluation experiment of the system developed by Mizukami, et al. The former is called Dataset T, while the latter is called Dataset M.

Dataset T
In the 2016 fiscal year the Ministry of Land, Infrastructure, Transport and Tourism of Japan started a program integrating construction and ICT in earthwork and concrete placing. The program was named "i-Construction". Dataset T was acquired in a field experiment to investigate whether the procedures of UAV photogrammetry following the standards for "i-Construction" are feasible or not.
The standard of UAV photogrammetry for "i-Construction" includes some requirements in image acquisition planning and measurement accuracy (Ministry of Land, Infrastructure, Transport and Tourism of Japan, 2016). In image acquisition planning, GSD should be equal to or smaller than 10mm. As for overlapping ratios, OL should be equal to or greater than 90%, and SL should be equal to or greater than 60%.
The field experiment by Takahashi, et al. was executed at a reinforcement site of the river embankment of the Edo River. Sony 6000 and ZEISS Loxia 2.8/21 were utilized in the experiment. We utilized 891 UAV images that acquired by an image acquisition plan of 10mm GSD, 90% OL, and 60% SL. Figure 5 shows an orthoimage produced by Pix4Dmapper. Pix4Dmapper reported that the average GSD is 9.7mm and the mean reprojection error is 0.169 pixels.

Dataset M
Dataset M was obtained in an evaluation experiment of the system developed by Mizukami, et al. 3D measurement of their system is conducted basically following the standard of UAV photogrammetry for "i-Construction". However considering the required accuracy of 3D measurement their system requests that GSD of images should be equal to or smaller than 5mm.
The experiment was executed at a coastal dike of the Nishi-Kunisaki Coast. Sony 9 and ZEISS Loxia 2/50 were utilized for 3D measurement in the experiment. We utilized 579 UAV images that acquired by an image plan of 3.6mm GSD, 93% OL, and 63% SL. Figure 6 shows an orthoimage by Pix4Dmapper. Pix4Dmapper reported that the average GSD is 3.8mm and the mean reprojection error is 0.114 pixels.

RESULTS AND DISCUSSION
As for Dataset T, we show orthoimages of 4mm GSD created from UAV images of 9.7mm GSD here. Orthoimages around a square target of 30cm side for a control point for 3D measurement are mainly shown. On the other hand, as for Dataset M, we show orthoimages of 1mm GSD created from UAV images of 3.8mm GSD here. Orthoimages around a square chart of 28cm side are mainly shown. The chart shown in Figure 7 was placed in order to check the resolution of acquired images. Widths of black and white stripes are 1mm, 2mm, …, 8mm. Contrasts of all the following result images are enhanced. NN and BL cannot make HR images as expected. The results shown in Figure 8 indicate that CR and CS would not improve image resolution greatly. We cannot find the large difference between CR and CS as well. Figure 9 shows results of SR by resampling (SR-R) using overlapping LR images. The result orthoimages of Dataset T were created from 16 UAV images, and those of Dataset M were created from 40 UAV images.

SR by resampling (SR-R)
Jaggies cannot be found in the orthoimages created by using NN, and the orthoimages created by using BL are not excessively smoothed. As for NN and BL, SR-R would be able to provide much higher resolution orthoimages than SR-S.
As for CR and CS, Figure 9 shows that SR-R would be able to provide better orthoimages than SR-S. However, we cannot find that CR and CS would improve the resolution of created orthoimages greatly. The small improvement is contrary to our expectation, and we suspect that its possible cause would be the degradation of resolution near to outer edges of an acquired image. We will discuss about it later.
As for interpolation method, CR and CS provide higher resolution orthoimages than NN and BL. However, we think that the differences of the quality of created orthoimages between four interpolation methods would not be so large. We cannot select a better interpolation method for SR-R between CR and CS the same as SR-S.

SR based on observation equations (SR-O)
Figure 10 and Figure 11 show results of SR based on observation equations (SR-O) using overlapping LR images. The results shown in Figure 10 were obtained when a square wave function was utilized as a PSF, while those shown in Figure 11 were obtained when CS was utilized as an interpolation method. The result orthoimages of Dataset T were created from 16 UAV images, and those of Dataset M were created from 40 UAV images the same as SR-R. Figure 10 indicates that SR-O would be able to provide higher resolution orthoimages than SR-R as to all interpolation methods. However, the degree of the improvement of the resolution are small against our expectation. We suspect that the degradation of resolution near to outer edges of an acquired image would make the improvement of resolution smaller.
The differences of the quality of created orthoimages between four interpolation methods would be small. We recognize that CR and CS provide somewhat higher resolution orthoimages than NN and BL.  Figure 9. Results of SR-R orthoimages. Since σ 3 in the Gaussian function means that the degree of spreading (blurring) of a point object to adjacent pixels is extremely small, we think that the effect of a Gaussian function of σ 3 as a PSF in SR-O is nearly equal to that of a square wave function as a PSF in SR-O.
Our current investigation results indicate that the best SR method of UAV orthoimages would be SR-O using a square wave function or a Gaussian function of σ 3 as a PSF and CR or CS as an interpolation method. However, we recognize that the resolution of created orthoimages is not sufficient for our request. Figure 12 shows two orthoimages of Dataset T. One is an orthoimage created from a reference image by using NN as an interpolation method, and the other is a result orthoimage of SR-O using a square wave function as a PSF and CS as an interpolation method.

Impact of an inaccurate DSM utilized in SR processing
The resolution of the orthoimage created by SR-O does not seem to provide higher resolution than that created from a single image at areas covered by grass. Moreover, a small vertical pole shown in the right centre part of the orthoimage created from a single image was disappeared in the orthoimage created by SR-O.
An inaccurate DSM utilized in SR processing caused these faults as Fukue et al. (2009) suggest. We should utilize an accurate DSM enough to provide HR orthoimages by SR-O. However, we consider that it would not be so easy.  In order to investigate an impact of a degradation of resolution near to outer edges, we created two types of orthoimages by SR-O using a square wave function as a PSF and CS as an interpolation method. Let be a radial distance from the image centre on a focal plane. One is orthoimages created from images where a photographed ROI locates inside the circle of 18 , and the other is orthoimages created from images where a photographed ROI locates inside the circle of 12 . The results indicates that exception of degraded images near to outer edges would be effective in creating HR orthoimages. The quality of orthoimages created from even half of acquired images would be superior to that of orthoimages created from all of acquired images.

CONCLUSION
The results of our investigation demonstrate that SR-O would be able to provide higher resolution orthoimages than those produced by an ordinary method. On the other hand, the results shows that the improvement of resolution would not be large against the number of utilized overlapping LR.
The results show that an inaccurate DSM utilized in SR processing degrades the quality of SR results as well. However, we consider that it would not be easy to produce an accurate DSM corresponding to the required HR images. We have decided to develop a method to produce a more accurate DSM for SR processing.
Furthermore, the results illustrate that the quality of the results of SR processing depends rather upon the characteristic of a lens utilized in image acquisition. If a lens has a large degradation of resolution near to outer edges, one should not utilize acquired images that photographed a ROI near to outer edges.
We think that further investigations on SR using UAV images would be necessary in order to put SR to practical use. At first, we will conduct an experiment in order to design an image acquisition planning for not only 3D measurement but also SR processing.