DECISION-BASED FUSION OF PANSHARPENED VHR SATELLITE IMAGES USING TWO-LEVEL ROLLING SELF-GUIDANCE FILTERING AND EDGE INFORMATION

: Pan-sharpening (PS) fuses low-resolution multispectral (LR MS) images with high-resolution panchromatic (HR PAN) bands to produce HR MS data. Current PS methods either better maintain the spectral information of MS images, or better transfer the PAN spatial details to the MS bands. In this study, we propose a decision-based fusion method that integrates two basic pan-sharpened very-high-resolution (VHR) satellite imageries taking advantage of both images simultaneously. It uses two-level rolling self-guidance filtering (RSGF) and Canny edge detection. The method is tested on Worldview (WV)-2 and WV-4 VHR satellite images on the San Fransisco and New York areas, using four PS algorithms. Results indicate that the proposed method increased the overall spectral-spatial quality of the base pan-sharpened images by 7.2% and 9.8% for the San Fransisco and New York areas, respectively. Our method therefore effectively addresses decision-level fusion of different base pan-sharpened images.


INTRODUCTION
Earth observation (EO) satellites such as WorldView, QuickBird and Gaofen, provide a low-spatial-resolution multispectral (MS) image of rich spectral content and a single-band high-spatialresolution panchromatic (PAN) image without considerable spectral information.Fusing the MS and PAN images produces a high-resolution MS (called pan-sharpened) image of a relatively high spectral information content.
Several image fusion methods (also called pan-sharpening algorithms) have been proposed to integrate the above-mentioned images (Snehmani et al., 2017;Javan et al., 2021;Peijuan et al., 2021).Generally, every PS process can be performed on three levels: at the pixel level, the feature level, or the decision level (Belgiu and Stein, 2019).PS methods can be classified into five major categories of component substitutions (CS)-based methods, multi-resolution analysis (MRA)-based methods, variational optimization (VO)-based methods, hybrid methods and deep learning (DL)-based methods (Javan et al., 2021).Methods from each category have their strengths and weaknesses: some are good at preserving the spectral information of the MS image, while others focus more on transferring the spatial information of the PAN band to the MS image.In the first case, the pan-sharpened image largely has the spectral and color characteristics of the MS image, while in the latter case it may be more similar to the PAN image in terms of spatial features, for instance, sharpness of edges and spatial patterns.
Fusion of two or more pan-sharpened images being produced using different PS methods can be a good solution to enable us to take advantage of each base method, i.e. to enjoy the strength of each algorithm and on the other hand avoid their weaknesses.In this regard, Lou et al. (2013) purposed a decision-based method for the fusion of two pan-sharpened remotely sensing images.Their method was based on image segmentation and then locally selecting the best pan-sharpened image.Due to the segmentation * Corresponding author errors, their algorithms cannot always produce high-quality pansharpened images.Based on the object-based spectral quality assessment (Samadzadegan and Javan, 2011;Rodríguez-Esparragón et al., 2017;Toosi et al., 2020a) we proposed a decision-based fusion of multiple pan-sharpened images based on image multi-resolution segmentation and classification and showed that our new combined fusion strategy can increase the spectral quality of the initial pan-sharpened images by about 37%, without image spatial distortions and spectral artifacts (Toosi et al., 2020b).
Nowadays this kind of very-high-resolution (VHR) pansharpened images are widely used in applications where both spatial resolution and spectral information are required simultaneously (Mehravar et al., 2022).Urban mapping and land use/land cover (LULC) classification are two topics that require "high spatiality" and "high spectrality" information for these data (Javan et al., 2019;Gaur et al., 2015).Due to the wide range of applications more endeavors are needed to be conducted to take the advantage of two or more pan-sharpened images as much as possible.To do this, inspired by the guided filtering-based common pan-sharpening method developed by Meng et al. (2016), we propose a new method for decision-level fusion of two pan-sharpened images (produced by the means of two different basic PS algorithms) based on the edge-preserving rolling self-guidance filtering (RSGF) paradigm and edge detection.
The remainder of this paper is organized as follows: Section 2 is devoted to describing the proposed methodology, describing the concepts of RSGF, and the dataset.The results of the implementation phase and some discussions related to the obtained results are presented in Section 3. Finally, some concluding remarks are drawn in Section 4.

Methodology
A diagram of the proposed method for decision-based fusion of two pan-sharpened images is presented in Figure 1.According to this flowchart, the initial PAN and MS images are fused using two different PS methods, called "Method 1" and "Method 2", while the output of the two PS methods are known as "PS 1" and "PS 2", respectively.
Next, the spectral and spatial qualities of the obtained pansharpened images are assessed compared to the initial MS and PAN images.The standard form of Wald's protocol (Wald et al., 1997) is commonly used to assess the spectral quality (Figure 2a).Based upon the mentioned standard scheme the initial MS and PAN data are downscaled (with a scale factor of 1/4) and resampled and then fused and the fused product is compared to the downscaled images.In our study, we modify Wald's protocol (Figure 2-b), due to our specific decision-fusion paradigm.In this scheme, the MS image is sharpened using the PAN band and the obtained pan-sharpened image is rescaled and resampled to the 4×4 downer scale.Finally, it is compared with the initial MS and PAN data (Ghassemian, 2016).
After the spectral and spatial quality assessment, the pansharpened output with a relatively better spectral quality is denoted as PSSpc while that with a relatively better spatial quality is denoted as PSSpt.Then the histogram of the target PSSpt image is matched to that of the reference PSSpc image, and the output is named PSSpt-HM.We chose this alternative in order to have more compatible outputs in the subsequent processing chain.We then minimize the objective function according to Eq. 1.

𝑗
(1) where M(Pj) is the optimal color mapping function that aims to minimize the distance (d) between the histogram sets of PSSpc and PSSpt, named  = {  } =1  and  = {} =1  , simultaneously.Thus, minimization of Θ leads to matching the histogram of PSSpt to that of PSSpc.
At the next step the obtained PSSpt-HM is decomposed into its principal component as PC1, PC2, …, PCn.In this way, a singleband image, i.e.PC1, is obtained by reducing the data dimensionality that contains the majority of spatial information of the PSSpt-HM image, with the lowest possible noise level.As an equation, the PCs are obtained as Eq. 2.
The Canny edge detector (Canny, 1986) is used to extract edges from the PC1 resulting into the extracted edge map E. In addition, the edge-preserving rolling self-guidance filter (RSGF) (Zhang et al., 2014) is applied to PC1.The obtained result is a base layer (B) that contains both low frequency and strong edge information (Meng et al., 2016).The base layer may contain a low variety of slight edges.By subtracting the PC1 and B, smoothed by a Gaussian low-pass filter (GLF), the detail layer (D) is obtained (Eq.3). =  1 −  *  (3) where  *  is referred to as the filtered base layer.
By applying multi-level thresholding (MLT) (Otsu, 1979) D is segmented into a desired detail mask (Dm) and an area containing no considerable details.Eq. 4 shows how this method looks for the optimum threshold to minimize the intra-class variance of image pixels.The overall spatial features of PC1 are extracted by integrating the edge map E with the detailed map Dm using the OR logical operation as Eq. 6.
= ||  (6) where || denotes the OR logical operator.Pixels that were recognized as edges in E or as details in Dm, will be present in the final SF image.
As in the VHR satellite image due to the high diversity of pixel values and high heterogeneousness of pixels around the edges and spatial details we dilate the obtained primary SF by morphological operations (Eq.7) to obtain neighbor pixels around the edge and details pixels: =   ⊕ SE (7) where and SE denote the morphological dilation operation and the desired structural element, respectively,   is the primary spatial features map and   the dilated map.For brevity,   is called the m1 mask and its complement representing the flat area the m2 mask ( 2 = 1 −  1 ).
The m1 and m2 masks can be exerted to the PSSpt-HM and PSSpc, respectively according to the linear composition equation of Eq. 8, providing us the final pan-sharpened image, denoted as PSfinal.
=  1 ×    +  2 ×   (8) Based upon the decision-based fusion paradigm of Eq. 8, PSfinal takes the pixels related to spatial features from all spectral bands of PSSpt-HM.This is because we consider the spatial characteristics of objects in these areas to be important and PSSpt-HM better performs from the spatial quality point of view.Similarly, the reminder pixels which are related to the flat area with no noticeable spatial features are captured from the PSSpc image and are assigned to PSfinal.
During the final phase of the proposed method, the quality of the obtained PSfinal is evaluated both subjectively and objectively.The subjective evaluation is done by the human (expert) visual system (HVS) while the latter is done using the quality assessment metric and according to the second form of Wald's protocol.

Rolling guidance filtering (RGF)
VHR satellite images contain a variety of spatial features including structures and object edges.Guidance filters as edgepreserving filters are utilized to perform edge-preserving smoothing on these kinds of images.This is usually conducted using the content of another image, called a guidance image, to influence the filtering process.Sometimes the target image can jointly be considered as the guidance image; in such a case the guidance filtering is called self-guidance filtering (Meng et al., 2016).
The edge-preserving guidance filtering involves an input image I, a guidance image G, and an output image IGF (which is assumed to be a linear transformation of G in a local kernel wk centered at pixel k (Eq.9).
=     +   , ∀ ∈   (9) where Gi and IGFi stand for the ith pixel of the guidance image and output images in kernel wk.Also, wk is a square kernel of size (2R+1)×(2R+1), where R is the radius of the kernel.Furthermore, as and bk are constant linear coefficients in wk and can be estimated by minimizing the squared difference between the input and output images (He et al., 2012;Gao et al., 2016).From a practical point of view, the guidance filtered images are usually estimated as Eq. 10.
=  ̅    +  ̅  (10) where Different types of guidance filters have been proposed so far which are used for PS tasks as well (Gao et al., 2016).One of the most effective methods was proposed by Zhang et al. (2014).They developed an iterative edge-preserving image guidance filtering method called rolling guidance filtering (RGF) that converges rapidly.This fast method is extensible to accommodate various types of image operations and achieves real-time performance and produces artifact-free results.

Data description
In this study, two VHR satellite imageries are used.Each dataset includes one PAN band and its corresponding MS images.The first dataset is a Worldview (WV)-2 image from the San Francisco-United States of America (USA) and the second one is a WV-4 from the New York (NY)-USA.The mentioned dataset are shown in Figure 3 in true color composite (TCC), i.e.RGB, and some pieces of information are provided in

State-of-the-art PS methods
In this study, four PS methods are used: the Ehlers (Klonus and Ehlers, 2009), Gram-Schmidt (GS) (Laben and Brower, 2000), Smoothing Filter-based Intensity Modulation (SFIM) (Liu, 2000), and Brovey (Gillespie et al., 1987) methods.A brief description of these methods is provided in Table 2.It is to some degree irrelevant which PS methods is used in this study because it does not make much of a difference when implementing our method.We tried as much as possible to choose methods that behave differently in terms of performance based on the results of our former meta-analysis study (Javan et al., 2021).

Ehlers
In this method (which is also known as Fast Fourier-Transform IHS-Transform) image fusion is conducted in the Fourier space of the PAN and MS images.
*MSMean: a mean image that is produced by an average filter.

RESULTS AND DISCUSSIONS
Ehlers and SFIM PS methods are applied to the San Francisco dataset and Brovy-GS pairs are used for PS of the NY images.It is noteworthy that as usually there is a ratio of 1:4 between the MS and PAN images so MS data is first upsampled to the resolution of the initial PAN band (↑  ̃, 4×4) and then  ̃ fused with the PAN information.Figure 4 presents the results of all PS methods in two case studies.Based on the second form of Wald's protocol the spectral and spatial quality assessment is conducted based on the multi-modal Spectral Angle Mapper (SAM) (Alparone et al., 2007) and the Spatial Correlation Coefficient (SCC) (Pushparaj and Hegde, 2017), respectively.These metrics as two of the most commonly used measures for spectral and spatial quality assessment are defined according to Eq. 13 and Eq.14, respectively.Where If and Ir referred to the fused and MS images, respectively.The bar symbol above some elements is related to the mean value of that element.Furthermore, M and N define the image dimensions.SAM values vary between 0 to 90 degrees where the SAM=0, as the ideal value, means that there is no spectral difference between the reference MS and pan-sharpened images.On the other hand, SCC measures the spatial similarity between the pan-sharpened image and the reference PAN band.The ideal SCC value is +1 where the spatial similarity is maximum and the value -1 refers to the minimum possible spatial similarity.
In the NY case, the Ehlers method has relatively better SCC while SFIM has a relatively better spectral performance so Ehlers and SFIM are known as PSSpt and PSSpc, respectively.In the case of the San Francisco dataset, GS produced a pan-sharpened image with relatively better SCC, and on the other hand, SAM for the Brovey method was closer to the ideal value, so here GS and Brovey are labeled as PSSpt and PSSpc, respectively.The results of the quality assessment are provided in During the next step, the histogram of Ehlers and GS pansharpened images are matched to that of SFIM and Brovey, respectively.Then they are decomposed to their principal components using the PCA that their first principal components (PC1) containing the maximum spatial information and the least possible noise are shown in Figure 5.The obtained PC1s are now ready for extraction of some important spatial information.First, the edges are extracted from both PC1s using the Canny operator (Figure 6).  = 4,   = 0.1,  = 4,  = 0.05.This led to the production of base layers (B) that contain both low-frequency information and strong edges (Figure 7).A Gaussian filter (with  = 5) was applied to the obtained base layer (B) and the filtered output is subtracted from the PC1s.This operation leads to the extraction of detail layers (D). Figure 8 presents the obtained detail layers for both case studies.MLT provides us with the optimum thresholds for segmenting the significant features.For the San Francisco and NY cases, the thresholds were estimated as 22.04 and 3.95, respectively.Figure 9 presents the mentioned thresholds (T) on the histograms of detail layers and also the obtained masks which are produced by applying T to the detail layers.The primary edge map (E) is integrated with the obtained detail mask (Dm) using the OR logical operation and then using a structural element with the size of 2, the results are dilated to contain the neighboring pixels around the edge, and details pixels as The final results which are called SFfinal-mask (or for brevity we call it m1) and its complement (m2) are shown in Figure 10.Note that m1 and m2 are masks that are related to the high-details and flat areas, respectively.
The linear combination of m1, m2, PSSpt-HM, and PSSpt (Eq.8) produces the final PS image in a decision-based fusion scheme.The obtained decision-fused PS image for both case studies are shown in Figure 11.Here the concept of our decision-based fusion is better found.The "decision-based fusion" term indicates that the pixels of two base pan-sharpened images are transferred to the final image with a decision that is made based on whether the pixels are part of the detailed class or part of the flat class.As the subjective (visual) spectral and spatial quality assessment of the results, Figure 12 provides a comparison between the initial pan-sharpened images and the final decisionbased fused images.By scrutinizing the results it can be seen that in the flat areas the final pan-sharpened outputs are similar to the PSSpc images while in the details area they inherit the characteristics of the PSSpt-HM.
Here the objective quality assessment is done using SAM and SCC metrics.For spatial quality assessment, according to the second scheme of the Wads' protocol, the obtained results are compared to the initial PAN images while for the spectral quality evaluation they are first downsampled to the resolution of initial MS bands, and then the evaluation is conducted between PSfinal and MS.By calculating the above-mentioned metrics it is seen that for the San Francisco case study the evaluation results are as follows: SAM=3.0201 and SCC=0.9700.Furthermore, for the NY case, the results are as: SAM=0.7770 and SCC=0.9641.By Where QIP Spc ,   , and  stands for the improvement of spectral quality, the improvement of spatial quality, and the overall improvement of joint spectral-spatial quality (all in percent unit) of the final pan-sharpened image in comparison to the initial pan-sharpened images, respectively.the tilts symbol (~) denotes the metric value for the final pan-sharpened image while the min and max are referred to as the minimum and maximum value of SAM or SCC for the primary pan-sharpened images, respectively.Since the performance of no two PS methods is the same, therefore our proposed method is applicable for fusing them at a decision level.The significant role of our proposed decisionfusion method is grasped when the two base PS methods have completely different performances in terms of preserving spectral information of the MS image and transmitting spatial information of the PAN band to the MS image.For better understanding, we consider the results of our meta-analysis on the performance of a variety of state-of-the-art PS methods (Javan et al., 2021).In the mentioned study a scatter plot has been produced in which the relative strength of different methods is illustrated in Figure 13.Our decision-level fusion method is so effective for the fusion of two pan-sharpened images that are located on the diagonal of this scatter plot (shown in yellow directions).As far as the methods located in these directions, our proposed method can produce more interesting results, as it can create new pan-sharpened data with both characteristics of the base images.

CONCLUSIONS
In the current study, we proposed a new decision-based fusion method for the integration of two pan-sharpened VHR satellite imageries.The developed method is based on the extraction of significant spatial features using edge-preserving rolling selfguidance filtering and the edge information of images.The performance of the proposed algorithm was tested using four state-of-the-art PS methods and two VHR satellite images.The results indicate that this decision-level fusion of pansharpened images can take the advantage of the base pansharpened images.It can effectively make a trade-off between the spectral and spatial quality of the base pan-sharpened images, i.e. it tries to reach the overall spectral-spatial quality of the final pansharpened image somewhere between the least and the most prior quality of the initial pan-sharpened images.
Future research can implement a decision-based fusion merging our method with the results of the object-level analysis of image spectral quality to propose a method that can decide both based on the spatial fidelity and the local spectral quality of the image.
The result of our method can effectively be used in basic remote sensing research as it can create new data that could not be produced by any single PS method.At the end of this paper, we hope that the proposed method will open a new horizon in remote sensing image fusion.

5 )Figure 1 .
Flowchart of the proposed decision-fusion methodwhere  0 2 () and  1 2 () are variances of the two segments, and  0 of the two segments separated by an optimum threshold (T).The vertical bar | indicates conditional equality, i.e. conditional on those pixels of D where the D>T.

Figure 3 .
Figure 3. Presentation of datasets: (a) Resized PAN and MS image from San Francisco, (b) Resized PAN and MS image from NY on component substitution of the first GS component (GS1) of the MS image and the PAN band.SFIM   =    * ×

Figure 12 .
Figure 12.Subjective evaluation of final pan-sharpened images obtained by the decision-fusion method: (a) San Francisco dataset, (b) NY dataset

Table 1 .
Table 1 related to them.Data descriptions

Table 3 ,
in which the superior methods are discriminated by bold font and the others are shown by the standard font.

Table 3 .
The results of spectral and spatial quality assessment