A COARSE-TO-FINE BAND REGISTRATION FRAMEWORK FOR MULTI/HYPERSPECTRAL REMOTE SENSING IMAGES CONSIDERING CLOUD INFLUENCE

Band registration is one of the most critical steps in the production of multi/hyperspectral images and determines the accuracy of applications directly. Because of the characteristics of imaging devices in some multi/hyperspectral satellites, there may be a time difference between bands during push-broom imaging, which leads to displacements of moving clouds with respect to the ground. And a large number of feature points may gather around cloud contours due to the high contrast and rich texture, resulting in building a transformation more suitable for moving clouds and making ground objects ghosted and blurred. This brings a big challenge for registration methods based on feature extraction and matching. In this paper, we propose a novel coarse-to-fine band registration framework for multi/hyperspectral images containing moving clouds. In the coarse registration stage, a cloud mask is generated by grayscale stretching, morphology and other operations. Based on this mask, a coarse matching of cloud-free regions is performed to eliminate large misalignment between bands. In the refinement stage, low-rank analysis and RASL (Robust Alignment by Sparse and Low-rank decomposition) are used to optimize the rank of coarse results to achieve fine registration between bands. After experiments on a total of 102 images (83 cloudy images and 19 cloud-free images with all 32 bands) from Zhuhai-1 hyperspectral satellite, our method can achieve a registration accuracy of 0.6 pixels in cloudy images, 0.41 pixels in cloud-free images, which is enough for subsequent applications.


INTRODUCTION
With the continuous development of science and technology, more multi/hyperspectral sensors have been equipped on earth observation satellites, which effectively expands the ability to observe the earth through rich spectral information. However, due to the characteristics of imaging devices in some multi/hyperspectral satellites, the imaging time of the same ground object may vary in different bands during push-broom imaging, which leads to misalignment between bands. Therefore, band registration is one of the most important steps in the production of multi/hyperspectral data and there are mainly two types of registration methods at present. One is the method that takes into account satellite imaging models (hereinafter referred to as model-based method), and the other is the method that uses computer vision (hereinafter referred to as vision-based method). Model-based methods use attitude, orbit, and time data of satellites during imaging to build a geometric positioning model between ground points and image pixels. The relative relationship between bands can be restored using this pointto-point model, which may achieve a high registration accuracy. But it needs many satellite operational parameters that are not easy to get and use. And the calculation for the satellite imaging model is complicated and tedious. Vision-based methods usually estimate a transformation by extracting and matching tie points 1 in different bands and uses it for alignment. Compared with model-based methods, it is more straightforward Figure 1. Influence of moving clouds on multi/hyperspectral band registration in the cloudy image. (a) Registration results of band red, green and blue using the vision-based method. Roads become rainbow-like due to the movement of clouds. (b) Results after eliminating cloud motion with our methods. and requires fewer parameters and computations.However, as mentioned before, there may be a slight movement of moving clouds in different bands with respect to the ground due to the imaging time difference. And many tie points may gather around cloud contours because of the strong contrast and rich texture when extracting and matching features. Transformations based on these tie points are inaccurate and will lead to a poor registration or even failure for ground objects while a good accuracy for clouds as shown in Fig.1. It brings a big challenge for vision-based methods, but many do not take into account this impact of moving clouds on band registration. Aiming at the problem of poor registration for ground objects in cloudy multi/hyperspectral images, we propose a novel coarse-to-fine band registration framework that eliminates the influence of moving clouds.
The main structure of this paper is as follows: Some work related to this topic is introduced in Section 2. Then the coarseto-fine band registration framework is described in Section 3 in detail. Extensive experiments are performed on data from Zhuhai-1 hyperspectral satellite and results are in Section 4. Conclusions are summarized in Section 5.

Cloud Detection in Remote Sensing Images
Clouds cover about 50% of the earth's surface (Randall, Corsetti, 1989), (King et al., 2013), which indicates that clouds inevitably appear in most remote sensing images. Cloud detection is a subject that has been studied for years, many methods have been proposed for general or specific scenarios. In general, it can be divided into two categories: methods based on image grayscale (hereinafter referred to as grayscale-based method) and methods based on physical spectrum (hereinafter referred to as spectrum-based method).Grayscale-based methods use computer vision methods for cloud detection based on the grayscale value of pixels rather than the physical spectral reflection characteristics, which are quite simple and easy to use. It can be further divided into learning-based methods and non-learning methods. Different convolutional neural networks (Ozkan et al., 2018), (Li et al., 2019a), (Shao et al., 2019) were used for cloud detection and got some good results. As for non-learning methods, it involves broader technologies. A nonnegative matrix factorization method was used for cloud detection (Li et al., 2019b). (Xu et al., 2019) studied noise-adjusted principal components transform for removing clouds in optical remote sensing images. , (Deng et al., 2016) tried to propose a general and automatic method for detecting clouds. Besides using a single image, (Wen et al., 2018),  used low-rank analysis to remove clouds in remote sensing image sequences and proposed a coarse-to-fine framework for detecting and inpainting clouds from other time serial images. Spectrum-based methods are mainly used in multispectral images. It can be divided into three categories: threshold, radial transfer, and statistical. Among them, the threshold method is a commonly used method because of its simplicity and small calculation amount (Hagolle et al., 2010), (Jedlovec et al., 2008), (Zhu et al., 2015). Clouds are considered cold water vapor which has higher reflectivity and lower temperature than the ground. Simple visible and infrared window thresholding methods perform quite well in cloud detection (Ackerman et al., 1998). It may have higher detection accuracy in complex scenes while a narrower application because of its requirements on the data type. And it may be hard to distinguish objects with similar spectral characteristics such as snow from clouds.

Multi/Hyperspectral Image Band Registration
Currently, two kinds of methods are widely used for multi/hyperspectral band registration : model-based methods and vision-based methods. Model-based methods usually build a rigid geometric imaging model for simulation based on satellite operational parameters during the observation process and solve the geographic location of ground objects pixel by pixel. It may achieve high accuracy in some complex scenarios, while it has to use many operational parameters that are not easy to get and may change often. On the other hand, the process of model-based methods is quite sophisticated, which hinders the versatility for different sensors and platforms, such as small drones. (Jiang et al., 2013) achieved high accuracy band-to-band registration for ZY-3 satellite by virtual re-imaging. (Jiang et al., 2019) proposed a band-to-band registration method based on positioning consistency constraint for Zhuhai-1 hyperspectral satellite. A pixel in a certain band can be calculated to the corresponding ground point by establishing the imaging model, and then the ground point can be calculated back to other bands. Theoretically, points in the other bands should coincide with the points in the reference band, but they will not actually coincide completely. This difference between bands can be used to find the relationship for band registration. Vision-based methods usually extract corresponding feature pairs (such as points, lines, etc.) and use these relationships to fit a transformation (such as affine, homography, etc.) between bands. (Pan et al., 2011) used a Harris corner detector (Harris et al., 1988) for feature detection and matching. And they also used OpenMP technology to implement the parallel band-to-band registration. (Yi et al., 2008) modified the SIFT(Scale-invariant Feature Transform) (Lowe, 1999) for better performance on multispectral images, which they called SR-SIFT.  also modified the SIFT for remote sensing image registration. Besides modifying existing features in computer vision, (Ye, Shan, 2014) proposed a two-step registration method for multispectral images. They used SR-SIFT for primarily matching. And then a Harris corner detector and an LSS(Local Self-Similarity) descriptor were used for fine matching. It achieves good results in cloud-free images, but cannot eliminate the cloud movement in bands. There are still some other methods that do not leverage on feature extraction and matching. (Dong et al., 2011) provided a registration method based on optical flow, it is suitable for small images but needs lots of calculations when images are large. (Wang et al., 2018) proposed a deep learning framework for remote sensing image registration. However, it just works on optical images and needs a lot of labeled data to train the network. (Peng et al., 2012) proposed a new method for alignment of batch images, which is called RASL (Robust Alignment by Sparse and Low-rank decomposition). They regard image alignment as an optimization problem of the rank of a matrix and achieve some great results. Inspired by this work, (Hu et al., 2014) used rank minimization for multiangle multi/hyperspectral images. But the initialization of optimization is complex which makes it hard for practical use. Generally compared with model-based methods, vison-based methods are more universal and can be used for alignment when we do not know parameters for building an imaging model.
As a summary, we propose a coarse-to-fine multi/hyperspectral band registration framework for cloudy images to eliminate the influence of moving clouds on registration. We adopt grayscalebased methods for cloud detection because of its simplicity and good performance, which can be used for panchromatic images. And we achieve coarse alignment by vision-based methods for the high efficiency and wide applicability. Finally, we use RASL for fine registration based on coarse stage results. Fig.2 shows the main flow of the coarse-to-fine framework for multi/hyperspectral band registration.In the coarse stage, two processings are performed on the input band simultaneously, one is to generate the cloud mask for feature filtering based on thresholding and morphological operations, the other is to enhance the contrast of cloud-free part for a better feature extraction. After extraction and filtering, the filtered features are matched to find a coarse band-wise transformation. In the refinement stage, results from the coarse registration are used as initial inputs to RASL for optimization, and finally a more accurate registration result is obtained. Figure 2. Flowchart of the proposed coarse-to-fine registration framework for multi/hyperspectral images.

Coarse Band Registration
In cloudy images, a large amount of cloud-contour feature points are included because of the strong contrast and rich texture around cloud contours. And due to the brightness of clouds, the ground becomes darker after grayscale stretching, which further aggravates this situation and builds a model more suitable for clouds, resulting in a poor registration accuracy for ground objects. In the coarse stage, our goal is to increase the proportion of ground feature points in total as much as possible, so as to find the model that fits most matches (ground features) for alignment. We generate a cloud mask to filter cloud-contour points and perform grayscale stretching to enhance the contrast of the ground for more feature points. In practice, the cloud mask and filtered points may not be absolutely accurate due to errors, but it is enough for the initialization of the refinement stage. For finding the cloud mask, the input image is pre-processed to make the bright part brighter and the dark part darker. A new stretching function is proposed as Eq.1: Where x and y are the input and output normalized grayscale value, m is a coefficient used to adjust the stretching effect.
The larger m indicates more obvious effects. We set m = 4 for experiments in this paper. After obtaining the stretched image, OTSU binarization (Otsu, 1979) is performed on it. Then a dilation operation is used to remove fragmented pixels and noise, and multiple erosion operations are used to expand the cloud range. For the ability to describe details we use small morphological kernels. And to filter "fake" feature points around cloud contours as possible, a "greedy strategy" is adopted for a complete coverage of the cloud. On the other hand, due to limited sensor quantization levels, the grayscale contrast of remote sensing images is usually low, which may lead to failure and mismatching of algorithms based on grayscale information. Therefore, grayscale stretching is usually performed on images before feature extraction. But with the existence of clouds, grayscale value and contrast in dark cloud-free parts(compared with clouds) reduces significantly after ordinary grayscale stretching, which affects the quantity and quality of features. For a better feature extraction, another effective grayscale stretching algorithm is designed by combining logarithmic stretching and linear stretching. The core grayscale transformation is as shown in Eq.2.
Where f (i, j) and g(i, j) are the input and output grayscale value of the pixel at the location (i, j). a and b represent the minimum and maximum grayscale values before stretching, while c and d are the expected corresponding values after stretching. k is the adjustment coefficient, usually set to 1. According to the normal distribution, we take grayscale values in the range of 2σ for stretching, that is, keeping pixels with a distribution in the range of about 96%. As for pixels out the range of 2% -98%, we set them the grayscale value of 2% or 98% pixels. To further increase the stability and reliability of features in low texture and contrast parts, a multifeature strategy is adopted. Single feature may not output satisfactory results in remote sensing images. While different feature extraction algorithms can complement each other and help to solve this problem. We mainly use three mixed features of SIFT (Lowe, 1999), SURF(Speeded Up Robust Features) (Bay et al., 2006), and ORB(Oriented FAST and Rotated BRIEF) (Rublee et al., 2011) in this paper. Compared with SURF and ORB, SIFT is more robust, resulting in a better performance in complex scenarios. While SURF and ORB are more efficient based on simpler algorithms. These three features are extracted independently and combined to get final feature points. After obtaining feature points, FLANN(Fast Library for Approximate Nearest Neighbors) (Muja, Lowe, 2009) is used for fast matching, and RANSAC(RANdom SAmple Consensus) (Fischler, Bolles, 1981) algorithm is used to eliminate the error matches. Finally, the least-squares method is used to solve the optimal transformation model. We select 2D Affine as the band-wise transformation model in this paper. In order to reduce the influence of grayscale difference between different bands on matching, we adopt "iterative reference" strategy based on the idea of adjacent bands are more similar in spectrum signatures. It takes the resampled output band of this iteration as the input for the next iteration.Some results in coarse stage are shown in Fig.3.

Fine Band Registration
The coarse stage eliminates large-scale translation, rotation, and scaling between bands, but it is not accurate enough. Results from the coarse stage are used as initial values for further optimization. Inspired by (Peng et al., 2012), the registration of hyperspectral bands can be converted to the optimization of the rank of a low-rank matrix. Suppose I1, I2, ..., In are images aligned with each other, w and h are width and height of the image. Images can be reduced to n one-dimensional vector v1, v2, ..., vn by stacking rows or columns of every image together, the length of this vector is m = w × h. Since images are aligned and similar in content, there will be a linear relationship The rank of A will be minimum when this linear relationship is strongest, which means that images are aligned.
Considering practical multi/hyperspectral bands of remote sensing images, this linear relationship is destroyed to a certain extent due to cloud movement, occlusion, and spectral reflection characteristics of different ground objects between bands. The aforementioned method cannot be used directly and it is necessary to eliminate the non-linear change of grayscale values between bands as much as possible. Inspired by RPCA(Robust Principal Component Analysis) (Candès et al., 2011), it can be resolved by decomposing the band into a lowrank part and a sparse part. Suppose B1, B2, ..., Bn are bands in multi/hyperspectral image, Bi indicates the i-th band(i ≤ n). These multi/hyperspectral bands can be expressed as: B1 = B l 1 + B s 1 , B2 = B l 2 + B s 2 ,..., Bn = B l n + B s n , while B l i indicates the low-rank part of i-th band, and B s i indicates the sparse part. Generally, the value of B s i is large but only affects a part of the band, and is used to correct the non-linear variation of grayscale values between bands. Let vec() be the operation that converting a two-dimensional band into a one-dimensional vector by rows or columns and D be the matrix after each band is stacked, so we can write Eq.3.
Where L = [vec(B l 1 )|vec(B l 2 )| · · · |vec(B l n )] is a low-rank matrix used to describe the overall linear relationship between bands. S = [vec(B s 1 )|vec(B s 2 )| · · · |vec(B s n )] is a sparse matrix used to express the non-linear relationship components between bands, such as the non-linear grayscale change caused by spectral characteristics of the ground objects. In addition to the nonlinear changes in grayscale between bands, there is some misalignment which breaks the linear relationship between bands. For B1, B2, ..., Bn bands, we can introduce n transformations τ = {τ1, τ2, · · · , τn} (such as affine, homography, etc.) to make B1 • τ1, B2 • τ2, · · · , Bn • τn aligned as written Eq.4 and adjust transformations τ simultaneously during the optimization for finding the minimum rank of D.
Both the non-linear changes of grayscale and misalignment between bands can be expressed by Eq.4, these two variables can be optimized at the same time to achieve accurate registration between bands. Our goal is to find the optimal low-rank matrix L and transformation τ . If bands are strictly aligned, the rank of L should be optimized as much as possible, and the objective function is as follows: The L0 norm indicates the number of non-zero elements in the sparse matrix S. The above equation can be more conveniently solved in the Lagrangian form: Where γ is a parameter used to adjust the weight. While in Eq.6 both the objective and constraints are nonconvex and nonlinear, it is hard to solve this optimization problem directly. To solve this problem, we adopt the methods in (Chandrasekaran et al., 2011), (Lin et al., 2009), (Peng et al., 2012). We convert the nonconvex objective function in Eq.6 to its convex surrogate constraint, replace the rank() with the nuclear norm or sum of the singular values: where m is the row number of L and then replace the L0 norm S 0 with L1 norm: ij |Sij|. After applying this operation to Eq. 6, we can get a new objective function to optimize in Eq.7. Here λ is a coefficient. Suggested by (Candès et al., 2011), it should be √ C/m , where C is typically a constant and usually set to C ≈ 1 . m is the row number of L and equals to the product of w and h of a certain band. Now the objective function is continuous and convex. The only problem is the non-linearity of constraint D • τ = L + S . If τ changes very small, we can linearize the current estimate of τ to approximate this constraint. We assume that transformation τ = [τ1|τ2| · · · |τn] ∈ R p×n , p suggests the number of parameters for the transformation model. For example, p = 6 in 2D Affine model. n indicates the number of bands needs to be aligned. For is the Jacobian of the i-th band corresponding to the transformation parameters τi and {εi} is the standard basis for R n . Eq.7 can be written as follows: Here Eq.8 is a convex optimization problem, where L, S, ∆τ are unknowns. By continuously iterating to solve Eq.8, we can get a series of convex problems until it converges. Finally, we can get the minimum estimate result of Eq.5.

Data Introduction
The experiment data comes from Zhuhai-1 hyperspectral satellite constellation(OHS-A,B,C,D), which was launched on April 26, 2018. It is the first commercial hyperspectral satellite in China and opens a new era of commercialization(Orbita, 2020). A push-broom mode is adopted for hyperspectral imaging in Zhuhai-1 satellite. The imaging system is composed of three separated CMOS(Complementary Metal Oxide Semiconductors) sensors with an overlap of fifty pixels among them (Jiang et al., 2019). Every CMOS sensor contains 5056 × 2968 pixels with a GSD(Ground Sample Distance) of 10m, and is divided into 32 stripe regions along the flight direction with a filter on its surface, covering a range of 400 -1000nm in spectrum. In practice, 8 lines are used for integral imaging, as shown in Fig.4. The different imaging position on CMOS leads to the misalignment between bands. However, we cannot directly eliminate it by history data due to its variation among images with different time and sensors, which is caused by random factors during the imaging. For Zhuhai-1 Satellite, the difference between adjacent bands along the flight direction in the raw image is 64.6 pixels in average, and the cumulative offset of all bands reaches about 1950 pixels in general. Experiments were performed on a total of 102 hyperspectral images, of which 83 cloudy images and 19 cloud-free images using all 32 bands with a size of 600 × 400 pixels, as listed in Tab Table 1. Information about experiment data. We did not use data of OHS-B because we did not get data from it.

Evaluation Method
Registration accuracy can be evaluated by manually selecting tie points on different bands. However, it is not realistic to use this method with a large amount of data in experiments.  uses automatic feature extraction and matching to find tie points for evaluation in multispectral images. However, almost no ground object appears in all hyperspectral bands due to diverse reflection characteristics, which brings great challenges for tie point extraction. Although the aforementioned "iterative reference" strategy can handle some spectral variation between bands, it is not feasible to use it to obtain the same tie point on all bands directly. But we can get the band-wise transformation by feature extraction and matching first, and then multiply all band-wise transformations together to get an overall transformation for evaluation. These band-wise transformations have an approximate direction due to the unique push-broom imaging, which means the overall transformation reflects the maximum registration accuracy. In other words, the accuracy for any two bands between the first and the last band in the aligned image will be smaller than it. For one checkpoint, we calculate Euclidean distance between its two corresponding coordinates P in the first band and P in the last band as the registration error. For one image, we select some evenly distributed points and then transform them to the last band based on the overall transformation. The final registration accuracy of this image is calculated by RMSE(Root Mean Squared Error) of all selected checkpoints, as written in Eq.9.
Where erri is the registration error for i-th checkpoint, n is the number of checkpoints. We extract 4000 SIFT feature points for band-wise 2D Affine transformations and select 17 checkpoints for evaluation, as shown in Fig.5.

Results and Comparisons
Because we do not get operational parameters for building imaging model, only vision-based methods are compared. To verify our coarse-to-fine registration framework, "direct method", "cloud mask method" and our method were performed on 83 cloudy images respectively. "direct method" means we extract and match features directly and use it for building transformations without filtering any cloud-contour points. "cloud mask method" refers that we take into account the movement of clouds and use methods proposed in this paper to eliminate the movement and filter tie points. As aforementioned, a mix-feature strategy(SIFT, SURF, and ORB) is used for feature extraction. We extract a maximum number of 20,000 feature points per band. And the morphological operation kernel size for generating the cloud mask is 7, the number of iterations is 8. Finally, we calculate the RMSE of all 17 checkpoints in every image as the registration error using Eq.9. The registration accuracy of all 83 cloudy images is shown in Fig.6. The comparison of some registration results in cloudy images is shown in Fig.7. It can be seen that the "direct method" has a better registration result on cloud contours, while ghosting appears on ground objects. "cloud mask method" has partially eliminated the influence of cloud movement, making the registration of the ground objects slightly better. Compared with the first two methods, our method can best eliminate the influence of cloud movement and achieve the highest registration accuracy for ground objects. Comprehensive statistics are performed on all 83 cloudy images, results are shown in Tab.2. In cloudy images, the mean RMSE of our method on all test data is 0.6 pixels. Compared with the 3.44 pixels accuracy of the "direct method", it is significantly improved by about 80%. To evaluate the variation of accuracy, we compare the biggest and smallest RMSE among all separated bands. Our method achieves an RMSE within a range of 0.28 -0.9 in pixel, which means all bands achieved an accuracy better than 1 pixel. The comparisons of the first and the last band of the same ground features are shown in Fig.8. And the composite images after band registration are shown in Fig.9. In order to test the generality of our method, experiments were also performed on 19 cloud-free images. Since there are no moving clouds in images, the "direct method" is compared with our method. The results are shown in Tab.3. For images without clouds, the mean RMSE registration accuracy of the "direct method" is 0.81 pixels while we can continue improving about 50% to a mean RMSE of 0.41 pixels on test data. Some registration results are shown in Fig.10.  After a large number of experiments, our method can achieve an accuracy of mean RMSE of 0.6 pixels in cloudy images and 0.41 pixels in cloud-free images. It is slightly better than res-   ults in (Jiang et al., 2019), which uses the geometric positioning model for Zhuhai-1 hyperspectral satellite band registration. And we also simplifies the production of data to some extent. In addition, based on our framework, every band is decomposed into a sparse and a low-rank part.We can effectively identity and remove speckle noises and unevenly stripes using sparse mat-  Table 2. Comparison of "direct method", "cloud mask method" and our framework in 83 cloudy images. (Accuracy unit: pixel). Max dx, dy and ds are the max errors among all images in x-direction, y-direction and plane respectively, while Min dx, dy and ds are the opposite. Var. means the variance. Increase PCT1,PCT2,PCT3 are the percentage increase in "cloud mask method" relative to the "direct method",our method relative to the "cloud mask method", and our method relative to the "direct method" respectively.  Table 3. Comparison of "direct method" and our framework in 19 cloud-free images. (Accuracy unit: pixel). Max dx, dy and ds are the max errors among all images in x-direction, y-direction and plane respectively, while Min dx, dy and ds are the opposite. Var. means the variance. Increase PCT is the percentage increase in our framework relative to the "direct method".
rix, while keeping spectral signatures of ground objects. And inpainting corrupted or damaged parts by low-rank component of that band.Some results are shown in Fig.11.

CONCLUSION
In this paper, we analyzed the influence of moving clouds on multi/hyperspectral band registration and proposed a novel coarse-to-fine framework, which achieves high-accuracy in cloudy images. In the coarse stage, a registration is perfumed based on generated cloud mask and feature matching. In the refinement stage, the low-rank analysis and RASL method are used to optimize the rank of coarse results for fine registration results. After a large number of experiments, our method can achieve a good accuracy in both cloudy and cloud-free images, which can meet the needs of subsequent applications. In the future, data from other hyperspectral satellites can also be used for testing. In addition, how to give a more accurate initial value for optimization is also an open problem needs to be studied.