An Approach For Stitching Satellite Images In A Bigdata Mapreduce Framework

In this study we present a two-step map/reduce framework to stitch satellite mosaic images. The proposed system enable recognition and extraction of objects whose parts falling in separate satellite mosaic images. However this is a time and resource consuming process. The major aim of the study is improving the performance of the image stitching processes by utilizing big data framework. To realize this, we first convert the images into bitmaps (first mapper) and then String formats in the forms of 255s and 0s (second mapper), and finally, find the best possible matching position of the images by a reduce function.


INTRODUCTION
Image stitching is obtain a single image from images that have common areas with each other. Stitched images can be used in panoramic view of images, high-resolution display of mosaic images on digital maps , medical imaging and other application related to 3D environment modeling using real world images (Chia-Yen Chen, 1998), ("Image stitching -Wikipedia,",2017). Image stitching is basically divided into direct techniques and feature based techniques. Direct techniques perform operations according to pixel intensities of input images. Each pixel intensity of the image is compared with each pixel intensity of the other image. In this approach, comparing each pixel has a high complexity. In this approach, images are shifted relative to each other in order to find the degree of similarity between the pictures. These methods using pixel-pixel mapping are commonly known as direct methods (Pravenaa and Menaka, 2016). In this study, a direct technique was used in the developed algorithm. Feature based techniques solve a relationship between images based on the extracted properties of the input images (Pravenaa and Menaka, 2016), (Bonny and Uddin, 2016). For this approach; feature extraction, image registration and image blending are the following stages. There are many feature detection methods for feature based methods such as SURF, SIFT, MSER (Shaikh and Patankar, 2015), Harris, FAST (Pravenaa and Menaka, 2016). Image registration is pre-processing step which is used to merge images that at different times of the same scene. (Sayar et al., 2013). Image blending is the process of obtaining a seamless image with a smoother transition between images (Pravenaa and Menaka, 2016).
Apache Hadoop is a library of software developed with the Java programming language, which makes it possible for large datasets to be processed by clusters of computers. Hadoop is a data processing environment built on distributed file system, specially designed for very large-scale data processing (Zikopoulos, 2012). Map/reduce is a distributed programming model that consists of map and reduce steps. Users define a map function that consists of key / value pairs and a reduce function that combines all the values associated with the same key. Programs written with the map/reduce programming paradigm are automatically parallelized and can be processed on a large set (Sanjay Ghemawat, 2004).  aimed at creating a distributed and scalable architecture by using a bigdata framework based on map/reduce for mosaic satellite image stitcihng and object extraction. (Eken and Sayar, 2015) performed a vector-based case study to demonstrate that high performance for stitcihng satellite images can be achieved in accordance with the hadoop map reduce framework. (Sozykin and Epanchintsev, 2015) presented a distributed system by using Hadoop's map/reduce computation paradigm for image processing. Basic image processing operations such as SIFT, edge detection are distributed using Java image processing library OpenIMAJ and Java2D. In the work they do, existing libraries are made available for distributed use. (Rajak et al., 2015) presented a Hadoop map/reduce based architecture to store the program output in HBase for remote sensing satellite data. Algorithms proposed for map/reduce solution are image registration, watershed image segmentation, image mosaicing and gauss filter. Experiments on satellite data from Landsat images have shown that using Hadoop clusters to process high resolution satellite image data has a positive effect on productivity. The result is that at least 7X speed can be achieved even for complex image processing algorithms using a four-node cluster. (Vemula and Crick, 2015) have developed a library based on Hadoop map/reduce, which makes it possible to process images on a large scale. They have designed the work to abstract Hadoop's technical details of the powerful map/reduce system and provide an easy mechanism for users to manipulate large image data sets. They have developed a distributed system for Laplacian filtering, Canny edge detection and k-means image segmentation. (White et al., 2010) have implemented various practical computer vision algorithms such as classifier training, floating windows, clustering, bag-of-features, background subtraction and image registration using the map/reduce framework. (Golpayegani and Halem, 2009), (Lv et al., 2010) perform some satellite image processing algorithms using Hadoop map/reduce framework but before using images as raw in Hadoop, convert them to text format and then binary form. This pre-process has taken a lot of calculation time because they do not use the images raw. (Tesfamariam, 2011) also introduced the processing of large-sized satellite images based on map/reduce in his work and done a state study on edge detection algorithms such as Sobel, Laplacian and Canny.
Due to the positive effect on the performance of using the Hadoop map/reduce programming approach in the referenced studies, it has been decided to use Hadoop map/reduce framework for image stitching process. Apache Hadoop is a library of software developed with the Java programming language that allows large datasets to be processed by clusters of computers. Hadoop is a data processing environment on a distributed, clustered file system specially designed for very large-scale data processing ("IBM, Paul Zikopoulos, Chris Eaton, Paul Zikopoulos-Understanding Big Data Analytics for Enterprise Class Hadoop and Streaming Data-McGraw-Hill Osborne Media,",2011) .Map/reduce is a distributed programming model that consists of map and reduce steps, developed to process large sets of data on a large cluster. Users define a map function that consists of key/value pairs and a reduce function that combines all the values associated with the same key (Sanjay Ghemawat, 2004). In this study, algorithms and results are presented for creating a single new image with reference to the point where the biggest overlap index is on two pictures using Hadoop map/reduce distributed computation paradigm.

ARCHITECTURE
In this study, images are combined with the reference of the coordinate most commonly intersected with each other. The algorithm developed works on single node. The algorithm was developed in accordance with map/reduce framework using java programming language. The application architecture consists of two mapper functions running in parallel. The output of the 1.mapper function is used as an input in the 2.mapper function.While 1.map function consists of creating new space for images, 2.mapper function consists of calculating common intersections between images and two image combining processes with reference to the most common intersection coordinates calculated.

Conversion Image to Bitmap and String Format
Images are converted into bitmaps which represented as matrices consists of "0" and "255" elements. This image to bitmap conversion is performed by using threshold value "128" In this study, bitmaps are represented as a string in order to apply input format in map function as a text. String format of image is defined as each image column seperated by a comma ',' character and each image row seperated by a semicolon ';' character. If image1 (Figure1.a) is represented as a string " 255,255,255,255,255;255,0,255,255,255;255,255,0,0,0;255,25 5,255,255,255;255,255,255,255,255;" value is obtained.

Creating New Space for Images
Images are defined in the new space so that all possible matching cases between the two images can be calculated. It is the first map function to apply images to define a new space.

Calculation of the Maximum Number of Black Pixel Overlaps Between Images
With the block size created by referencing the size of the first matrix, the first matrix is traversed over the second matrix and the overlapping black pixels are counted. The elements of image1 are considered as a floating frame on image2(in new space). Black pixels in common indices are counted where oneto-one matching between the first image and the second image. The state of the floating window is shown in Figure 3. In Figure  3, the elements of the first image matrix are matched with 5x5 matrices each marked with a different color in the second image matrix. This operation is performed for all 5x5 matrices defined on the image2 and common black pixels are counted for each pairing state. The intersection coordinate, which is the highest intersection numbers obtained, is saved and used in the merging process of the next step The mathematical expression of the calculation the largest number of matches between pictures is shown by equation (1).

Max number of intersections = ∑ ∑
The operations is performed on the 2.map function defined in the algorithm. The input of the map function is defined as the output of the 1.map function.
Input : Image1 (String Format) <tab> Image2 in new Space (String Format) Output : New Image Defined in String Format With The Combination of Image1 and Image2

Combining Images
The images are combined with reference to the largest intersection coordinate calculated between the two images. The largest intersection coordinate obtained for the first picture and the second picture merging operation is calculated as the point (4,6) on the second picture. With reference to point (4,6), the elements of image1 with a matrix size of 5x5 units are marked by the green border area shown in Figure 4.a. After considering all matching cases, the largest intersection coordinate obtained for the merging. It is calculated as the point (4,6) on the second picture. It is seen that when the matrix elements (framed by the green frame shown on image1 and image2) are overlap each other, three black pixel values are matched.The matrix elements of image1 are placed on image 2 with reference to point (4,6) on image2.
If size of image1 is assumed (nxm) and size of image2 is assumed (axb), size of merged image is defined in (n+a)x(m+b).
Merging algorithm for image1 in (nxm) dimension and image2 in (axb) dimension is shown in Algorithm1.

Application
(a)Image1 (b) Image2 Figure 5. Simple images used as input data in the application  (c) Two images combined Figure 7. Implementation of two image merging with true two gray images.
The n 4 complexity of the developed algorithm and its operation on a single node has not made it possible to use the performance of the algorithm in large images over time. When the time performance graph of the map functions in the application shown in Figure 8. a and b is examined, a dramatically increase in the process duration is observed with the increase of the matrix size. Although the tests made enable the processability of large images, the desired performance can not be achieved in terms of the performance of the developed application over time. The next step of the algorithm is to make it usable on large images by providing processability on a Hadoop cluster.
(a) Matrix size -process time (sec) graph for creating new space mapper (b) Matrix size -process time (sec) graph for merging images mapper Figure 8. Application performance graph for different matrix sizes.

CONCLUSION
In the study, an algorithm was developed to process images in accordance with Hadoop map/reduce framework. It is intended to test the performance of the map function which will work by giving an image as input to the map function expressed in text format. It is aimed to create a scalable system for large size images by reading images through the Hadoop file system and using the map/reduce programming approach. It has been tried to solve the merging process with n 4 complexity with reference to the point where the two images contain the most similarity with each other. One of the improvement points of the developed algorithm is to use an algorithm that will determine the threshold value with the black-and-white pixel balance of the images as they are converted to the black-and-white pixel.The conversion of images to black and white according to the threshold value of "128" appears to be a challenge to find matches and a reduction in the accuracy of finding matches on the actual images with a high white density or black density.. The algorithm developed is based on the counting of black pixels. It is a success when the black color intensity of the pictures is low. But when the white density of the image is too high, it is difficult to catch the intersections. The next step is to add a preprocessing step that subtracts the overall color intensity of the images and the intersection will be able to positively affect the performance of the calculated application with reference to this step. Pre-processing step of detecting the edges on the images can be effective in obtaining rapid and accurate results on the calculation of matching situations between images and the detection of the point where the images are combined. These preprocessing steps can be included in the next step.