MOVING OBJECT DETECTION METHOD OF VIDEO SATELLITE BASED ON TRACKING CORRECTION DETECTION

It is the focus of current research that how to realize high precision and real-time dynamic monitoring and tracking of moving targets by video satellites because of instantaneous and dynamic continuous observation of targets in a certain area by the video satellites. The existing detection and tracking methods for moving objects have target misdetection and missed detection, which reduces the accuracy of moving object detection. In this paper, a Tracking Correction Detection Correction (TCD) method is proposed to solve these problems. Firstly, the background model is established by using the improved ViBe target detection algorithm, and the moving target mask is obtained by adaptive threshold calculation. By using pyramid structure iterative algorithm, the moving object can be classified as noise or real object according to the set of detection results of different detection windows. The high-order correlation vector tracking method is used to modify the detection result of the moving target acquired in the previous frame, and finally the accurate detection result of the moving target is obtained. The comparison analysis between the frame difference (FD) method, GMM method, ViBe method and TCD method shows that the TCD method has better robustness for noise, light and background dynamic changes, and the test results of TCD method are more complete and the real-time is better. It is proved by this work that the accuracy of the target detection of TCD method has reached 85%, which has a high engineering application value.


INTRODUCTION
Video satellite is one of the hot spots in the development of remote sensing satellite. It can realize the dynamic and real-time monitoring of hotspots and targets through continuous imaging of targets in a certain time range. With the development of video satellite technology, dozens of low Earth Orbit (LEO) small satellites have been launched in the world for video shooting, such as the South Africa's Sumbandliasat video satellite (Triharjanto, R. H et al., 2007), the America's skysat-1 and skysat-2 satellites (Ao W et al., 2019), and the Chin's "Tian Tuo-3" test satellite and "Jilin-1" commercial satellite (Chen L et al., 2020, etc. The "Jilin-1" commercial satellite consists of an optical remote sensing satellite, two video satellites and a technical verification satellite, as well as the " Jilin-1" video 03, 04, 05, 06 and 07 satellites, with spatial resolution and coverage of 0.92 m and 19km × 4.5km, respectively . Currently, it is a key issue in the field of remote sensing and computer vision that how to make full use of these remote sensing video data and achieve effective detection of important targets in the Earth observation field of view. At present, the Frame difference (FD) method, Background Difference, Optical Flow, Mixed Gauss Model (GMM) and ViBe Algorithm and so on are the commonly used methods of moving object detection. The FD is simple and fast in computation, and has a strong robustness on the scene containing the moving target. However, it is more suitable for simple moving scenes because of "empty" phenomenon when the target is detected (Ju J et al., 2019). The Background Difference method is the most widely used target detection method which is characterized by relatively simple algorithm and small amount of calculation, and can detect the short-time stationary target. However, it is not suitable for the complex background situation because of sensitivity for the changes of the external environment ( Nisarg Shah et al., 2018). The Optical Flow method can accurately calculate the velocity of moving target and deal with the rotation of target, but it is difficult to satisfy the real-time requirement because the calculation of the method is highly complex and sensitive to noise (Jing Bai et al., 2018). The Mixed Gauss Model (GMM) is usually well adapted to complex scenes, but it is computationally large, computationally efficient and sensitive to light (Charles-Alban Deledalle et al., 2018). The ViBe Algorithm has a good robustness and high detection precision, but there may be a lot of holes in the moving object, a fault in the middle of the moving object and the ghost phenomenon in the process of building the background sample when dealing with complex scene . On this basis, a Track Before Detect (TBD) method has been proposed for moving object detection, which mainly includes dynamic programming track before detect (DP-TBD) (Meng N et al., 2019), particle filter track before detect (PF-TBD) and Hough Transform track before detect (HT-TBD) ( Meiqin Liu et al., 2017Liu et al., , Y. F et al., 2017, etc., greatly improves the detection precision of moving targets, but lacks robustness and real-time. In recent years, in order to solve the problem of target tracking association, the second-order relationship between adjacent targets has been established. For example, MCMC Data Association method , Network Flow method (Azadi Moghaddam Arani, A et al., 2019), K shortest path method (KSP) and minimum Clique graph optimization method (Yue H et al., 2019, Porretta, L et al., 2019, etc. However, these methods only consider the second-order relationship between targets, they are not robust when dealing with nonlinear motion in dense scenes or frequent occlusion. Based on these methods, a video satellite moving target detection method based on tracking correction detection is proposed in this paper. Firstly, the contrast gain algorithm is used to pre-process the video data to increase the difference between the moving target and the background, which is convenient to detect the moving target in later. And then, the improved ViBe algorithm is used to obtain the moving object mask. In order to effectively remove a large number of holes in the moving object, an iterative algorithm based on pyramid structure (PS) is proposed, which takes the centre of foreground object as the centre, analysis the position information of the object in the original image through detection networks on different scales. By using the Sobel derivative to detect the edge, the moving object can be classified as noise or real object according to the set of detection results of different detection windows. Because the tracking of moving target is easy to be wrong or Miss, the high-order Vector Association method is used in this paper to measure the similarity of the target trajectory generated in multiple time domains, and correctly correlate the target trajectory by distinguishing the apparently similar targets. Finally, through the method of tracking postdetection, the target after ViBe detection is further corrected to improve the detection accuracy of the moving target.

Data preprocessing
The data captured by the video satellite are color images. In the process of moving object detection, the efficiency of color image processing is lower than that of gray image processing. Therefore, it is necessary to distinguish the image frames in the video stream, and the purpose is to separate the bright and dark frame images to improve the target detection efficiency. In this paper, the image average grayscale value is obtained by weighting the RGB three channels of each frame image, so as to separate the light and dark frame image. The grayscale value of the image frame at the point (x, y) is : Where gR(x, y), gG(x, y) and gB(x, y) are the R, G, B components of the image in position (x, y), respectively, α, β, and γ (0<α, β, γ<1 且 α+β+γ=1) are the proportion of the three channels, respectively, by adjusting the three parameter values to adapt different color space images. Because the color information of moving targets extracted by video satellites is scattered seriously under the condition of limited optical resolution, the targets appear as white spots in video satellite images, while the background information is weak. In this paper, by using a frame enhancement method based on Non-subsampled Shearlet Transform (NSST) and adaptive parameter guided filtering (Sharma et al., 2018), the noise in high-frequency components of each frame is firstly filtered by nonlinear transformation to enhance the texture and edge of the frame. And then, the guiding filter is used to enhance the low frequency component, which improves the sharpness and contrast of the image frame. Finally, the frame enhancement results are obtained by the NSST inverse transform, which increases the difference between the moving target and the target background, and facilitates the late detection of the moving target. The specific solution method is shown in formula (2) .
( ) Where RE is the enhancement image, εE is the enhancement parameter, and the larger the value, the enhancement image details are clearer, while the noise will be amplified at the same time. g is the image to be filtered; I is the output image by filter.

Improved vibration pyramid iterative moving target detection algorithm based on ViBe
The ViBe detection algorithm, proposed by Barnich et al in 2011, is a pixel-level background modeling method, and is a fast pixel-level foreground detection algorithm (Chen R et al., 2019). The main process of this algorithm includes background model initialization, foreground detection and background model updating. The ViBe algorithm initializes the background sample by random selection strategy, compares the current pixel with its background sample to determine whether the pixel belongs to the foreground or background, and updates the background sample with random update and neighborhood diffusion mechanism.
(1) Background model initialization. The background model is initialized by randomly selecting the neighborhood pixel values from the first frame of the video sequence. According to the size of the image, a sample set is established for each pixel of the background model.
(2) Each pixel of the i frame image is processed to determine whether the current pixel is a background. The specific calculation formula is as follows : Where dis(fi(x, y), vj) represents the pixel value and Euclidean space distance of the i frame image at the location of fi(x, y); R is the radius threshold. When the pixel (x, y) is the background pixel, the background is updated. Background update is divided into sample set update and neighborhood update. The sample set update is a randomly selected sample that replaces the corresponding background sample set with the pixel value of the current pixel. The neighborhood update refers to random selection of a location in the neighborhood of the pixel point, and then a random sample in the background sample set corresponding to that location, replaced it with the current pixel. Because of the large number of holes in the moving target after detection by ViBe algorithm, we use the operations of Erosion and Dilation to remove the noise of the moving object detected by ViBe algorithm, and realize the closing of the object. An iterative algorithm based on pyramid structure (PSIA) is proposed to analyze the position information of the object in the original image, by using different scales to detect grid with the foreground target center as the center. According to the combination of detection results of different scales detection windows, it can be concluded that the moving object belongs to noise or real object, by using Sobel derivative to detect edge (Kun Zhang et al., 2018). The PSIA first establishes a set of detection templates for each moving target, which can be expressed as Tpl . Taking the current detected moving target point as the center, the N pixels of the moving target neighborhood are selected as the size of the detection template. The detection of the moving target is achieved by detecting the neighborhood of each target point. In order to be compatible with the sensitivity of the algorithm to moving objects of different sizes, a neighborhood weight coefficient is set to determine the weight of the neighborhood according to the neighborhood size, and a detection pyramid is constructed.
The most important part of pyramid construction is to establish the neighborhood process, which usually adopts the average neighborhood method and random neighborhood method. A circle can be draw with a radius of R around the point P which is a assumed target point in space. We can draw a largest internal square in the circle, and call all points that fall on the square boundary as the same neighborhood set θ . The length of the set is the radius of the circle, the size of the set is the number of elements contained in the set, the center of the circle is the center of the neighborhood, and the length of the set is 0. The relationship between the size of the set and the radius of the set R is as follows, Each pixel in Figure 1 is represented by a square. There are three sets in total, of which the center set is θ0 with the length of 0. The size L0 of θ0 is set to 1. The other two sets are θ1 and θ2 with the length of R1=1 and R2=2, respectively. The size L1 and L2 of the two sets are set to 8 and 25, respectively.

Figure. 1 Same neighborhood sets
Both mean neighborhood method and random neighborhood method are based on the same neighborhood. The total probability of the same neighborhood is 1. The average probability of each element having the same probability, while the random probability of each element having different probability. As shown in figure 2, where all detection units in the same neighborhood in figure (1) have the same probability distribution, while the probability distribution of all detection units in figure (2) is not necessarily the same.  Figure. 2 Two kinds of neighborhood Because of the uncertainty of the location and size of the moving object in the image, the relationship between the moving object and the neighborhood origin point may exist three cases of exclusion, inclusion and true inclusion. (c) Figure. 3 Relationship between moving object and neighborhood origin point. (a) exclusion, (b) inclusion, (c) true inclusion Exclusion means that the detection template and the moving target template only partly intersect, but there is only a little overlap between the moving target and the detection template, that is: Where, ϕ T is the intersection of the moving object detection template and the neighborhood. If the intersection is less than the given threshold value, the moving object and the detection template have less intersection, and the detection template can be regarded as invalid detection template which means that it will not participate in the moving target calculation. On the contrary, if the moving target and its neighborhood have a larger coincidence degree, the moving template can be considered credibility and used to calculate and detect moving target. True inclusion is a special case of inclusion (namely card(ϕ T )≤card(Mtpl)), in which the detection template can be directly judged as a moving target. When the moving target size is large such as aircraft, aircraft carrier, etc. the fixed detection grid may not contain the moving targets completely; when the target movement is not obvious, it will be misjudged as the static target misdetection or misdetection. Therefore, different size grids are set up to be compatible with different size moving objects, and a queue Qk(i, j) is taken as the status queue of the mid-point detection results of the first level image.
Where, M is the grid size; k is a constant (k>1) ; W is the base weight, which is generally set to 1. When a point x(i, j) is detected as a moving target in the k grid, a weight is added to the queue. From formula 6, it can be seen that the detection weight W increases with the increasing of detection grid M in a certain range. When M exceeds a certain range, the grid complexity is increasing, and the weight W becomes smaller (namely M≥Mk ). In this case, the algorithm will degrade to a pure ViBe detection algorithm to reduce the risk of misidentifying large buildings as moving targets because of background moving. After the statistics of the grid detection results are completed, the detection results will be judged. When the state value is greater than a given threshold, the point is considered to be the foreground, otherwise it is considered to be the background.
Because of many holes and noises existed in the algorithm detection target, it is necessary to perform morphological operations on the final result, and set the morphological template with a size of 3 to inflate the original image.
According to the principle of the ViBe algorithm, it can be found that the ViBe algorithm detects the ghost phenomenon of moving objects in the stable stage. Although the size of moving objects is limited in the video satellite, the movement of largescale moving objects such as airplanes, warships, rockets and so on will still have ghost phenomenon. In addition, due to the complex and unpredictable working environment of the video satellite, some phenomena such as mirror reflection of high-rise buildings and lake reflection will cause false detection. A pixel memory mechanism is set to set memory pixels for each pixel. If the pixel point exists for a limited time, the pixel point will be deleted to eliminate the ghosting and glare quickly.

High-order correlation multi-target tracking technique
Because many targets in every frame data obtained by the video satellite, the targets often cross the boundary or are blocked. In order to distinguish each target from other targets and track its trajectory in different frames, all targets in each frame need to be associated with the corresponding targets in the preceding and subsequent frames. A high-order vector correlation is used to realize moving target tracking in this work. Figure 4 shows the relationship between the moving target video frame and the moving target vector. The moving object vector is set to: Where, center is the position of the center of mass of the moving object; size is the minimum external matrix of the moving object; speed is the velocity of the moving object; archX is the angle between the moving direction and the x-axis (the increasing direction of the image column).  Where, P(x, y) is the gray value of the target point; ꭥ is the area of the moving target. In some special situation, when the centroid of the moving object deviates too far from the center of the external rectangle of the moving object, it is not suitable to be the tracking point. Therefore, the distance between the mass center of the moving object and the center of the minimum bounding rectangle of the moving object should be considered. And then, a trace point can be selected by setting a threshold. According to the angle between the moving direction of the moving object and the x-axis (the increasing direction of the image column), the direction of the moving object trajectory can be determined. The trajectory of the moving object is calculated by the velocity of the moving object and the angle between the direction of the moving object and the x-axis (the increasing direction of image column).
In view of the situation that the target is easily lost when it is blocked, the moving target memory algorithm (J. Zhong et al., 2018) is used to synthetically judge by the position, grayscale, area, shape and other attributes of the moving target. The position of the target is predicted when it is blocked and continues to be tracked when the target reappears.
In this paper, the moving target is tracked to further modify the detected results. The detection result A is intersected with the target B which can be tracked normally to improve the detection precision of moving target.

ANALYSIS OF EXPERIMENTAL RESULTS
In this paper, three sets of data are used for experimental verification. The experimental environment is ubuntu 16.04, the development language is Opencv 3.4.3 and the memory is 16G. GMM, FD and ViBe are compared with the TCD method proposed in this paper. The precision, recall and F-Score are used to evaluate the accuracy of moving target detection. Detection Precision reflects the detection accuracy of the detected foreground pixels; Recall is the proportion of all positive samples in the test set that were correctly identified as positive samples.
F-Score is a measure of recall and accuracy. α is a parameter that balances recall and precision weights. In the experiment, the α is set to 1, because both the precision and recall are treated equally here.

Experiment 1
The SkyBox Sat-1 data is used to verify the accuracy of the algorithm. SkyBox Sat-1 is a video staring satellite launched by SkyBox Image company in 2013. The satellite is the world's first low earth orbit video staring satellite with a resolution and width of 1.1 m and 2.0 km × 1.1 km, respectively. It supports staring at the ground for 90 seconds. The research area is located at Burj Khalifa and the images from the satellite were taken on April 9, 2014. In order to facilitate the statistical data, 5 random numbers are generated first, and the adjacent 5 frames of data are taken as a set, and then the statistical monitoring results are taken. In this paper, the basic weight W is set to 10, and the range of detection grid M is {9, 49, 91, 121, 196}. The detection grid size is extracted from the detection grid each time and added to the detection state matrix. The neighborhood maximum iteration value is set equal to M (namely M=Mk). The confidence threshold value is set to 80 (namely Tc=80). The results by using four moving target detection algorithms, including FD, GMM, ViBe, and TCD are shown in figure 5. Due to high resolution of the SkyBox satellite, the target and terrain background movement are obvious. The accuracy evaluation method is used to analyze and verify the results of these algorithms (Table 1). The results show that the FD method has the highest error detection rate. The detection result of ViBe method is better than GMM and FD. The precision and recall of the TCD algorithm proposed in this paper are the highest.

Experiment 2
The second set of data from the Jilin-1 satellite. The satellite was launched by China Changguang Satellite Company on October 7, 2015. The orbit of the Jilin-1 commercial satellite is a sun-synchronous orbit with an orbit of about 650 km, and comprises of an optical remote sensing satellite, two smart imaging video satellites with a ground resolution pixel of 1.12 m and a technology validation satellite. The Jilin-1 satellite has a video coverage range of 4.6 km × 3.4 km, a duration of 90 seconds and a frame rate of 25 FPS. The study area chosen for this work is a flat located in Tianjin, China, and images were photographed on April 23, 2017. The results by using four moving target detection algorithms, including FD, GMM, ViBe, and TCD are shown in figure 6. The accuracy evaluation method is used to analyze and verify the results of these algorithms (Table 2). Due to the weak intensity of the moving target of the satellite, the robustness of detection results of the four methods is not high. However, the TCD algorithm can improve the precision to about 90%, and the recall is not less than 70%, which are higher than that of the other three algorithms. In addition, the FD algorithm also has good detection effect in flat area, because the error caused by the background moving is relatively small.

Experiment 3
The third set of data is from the UrtheCast satellite, which was launched by UrtheCast Company, Canada in January 2014. The satellite has a spatial resolution of 1.0 m, video coverage of 5.0 km 3.4 km, video duration of 90 seconds, frame rate of 3 FPS, and 1080P full HD color video. The results by using four moving target detection algorithms, including FD, GMM, ViBe, and TCD are shown in Figure 7. The accuracy evaluation method is used to analyze and verify the results of these algorithms (Table 3). The results show that the precision, recall and F-Score of the TCD algorithm are all outperform the other three methods. From the above three sets of experimental results, it can be seen that the ViBe algorithm has a very good detection accuracy, but the missed detection is too high, and the recall is very low. F-Score shows that the traditional ViBe algorithm is not as good as the FD method. The GMM has a relatively good detection effect, not only has a higher detection accuracy, but also the recall is relatively stable. The TCD algorithm can greatly reduce the missed detection rate of weak target and improve the result of program detection by multiple overlay detection of the detection area. Hence, the TCD algorithm has incomparable advantages compared with other algorithms.

CONCLUSIONS
Based on the research of the traditional methods, a method of moving object detection for video satellite based on trackingcorrection detection was proposed in this paper. Through the preprocessing method of image contrast enhancement, the difference between the moving target and the target background is increased, which is convenient for the late detection of the motion target. Because of the incomplete detection of moving target existed by using the ViBe algorithm, an iterative algorithm based on pyramid structure is proposed. Taking the center of foreground object as the center, the position information of the object in the original image is analyzed through detection networks on different scales. According to the set of detection results of different detection windows, the moving object is identified as noise or real object. Because the tracking of moving target is easy to be wrong or miss, the highorder Vector Association method is used in this paper to measure the similarity of the target trajectory generated in multiple time domains, and correctly correlate the target trajectory by distinguishing the apparently similar targets. Finally, through the method of tracking post-detection, the target after ViBe detection is further corrected to improve the detection accuracy of the moving target.