EFFECT OF KEYFRAMES EXTRACTION FROM THERMAL INFRARED VIDEO STREAM TO GENERATE DENSE POINT CLOUD OF THE BUILDING'S FACADE

: Keyframes extraction is required and effective for the 3D reconstruction of objects from a thermal video sequence to increase geometric accuracy, reduce the volume of aerial triangulation calculations, and generate the dense point cloud. The primary goal and focus of this paper are to assess the effect of keyframes extraction from the thermal infrared video sequence on the geometric accuracy of the dense point cloud generated. The method of keyframes extraction of thermal infrared video presented in this paper consists of three basic steps. (A) The ability to identify and remove blur frames from non-blur frames in a sequence of recorded frames. (B) The ability to apply the standard baseline condition between sequence frames to establish the overlap condition and prevent the creation of degeneracy conditions. (C) Evaluating degeneracy conditions and keyframes extraction using Geometric Robust Information Criteria (GRIC). The performance evaluation criteria for keyframes extraction in the generation of the thermal infrared dense point cloud in this paper are to assess the increase in density of the generated three-dimensional point cloud and reduce reprojection error. Based on the results and assessments presented in this paper, using keyframes increases the density of the thermal infrared dense point cloud by about 0.03% to 0.10% of points per square meter. It reduces the reprojection error by about 0.005% of pixels (2 times).


INTRODUCTION
Today, 3D information obtained from close-range images that overlap has a wide range of applications, including 3D modeling of urban environments (Bakogiannis, 2020), urban mapping and planning (Peleshko, 2020), virtual reality (Caciora, 2021), change detection (Han, 2021) and damage assessment (Chowdhury, 2020), architecture (Liu, 2021), and digital tourism (Poux, 2020).Based on advancements in camera technology and image processing algorithms, the use of Unmanned Aerial Vehicles (UAVs) has piqued the interest of researchers and activists in the fields of computer vision and photogrammetry in recent years.In this regard, UAVs have become a standard and reliable platform for 3D model data collection due to advantages such as their high maneuverability in urban environments, the ability to obtain images with high overlap and different viewing angles from a close distance to the object, and the ability to use image processing algorithms such as Structure from Motion (SfM) (Jarzabek-Rychard, 2016).However, the accuracy of the models generated by the UAV photogrammetry method is dependent on factors such as flight path design, capturing overlapped images with a standard baseline, avoiding dead areas, and having high contrast in the images (Koch, 2019;Motayyeb, 2022).Today, in addition to traditional imaging with metric and nonmetric cameras, video as a source for capturing overlapped images in photogrammetry applications has been proposed.Because video recording captures information in a stream faster than imaging; it contains a large volume of images or frames; as a result, this feature creates a very high overlap in sequence frames and can reduce possible dead areas in imaging.Furthermore, when imaging with the camera, it is possible to capture blur images due to the impulses of the UAV platform; this is although in video recording, due to the high volume of frames obtained, this problem will be overcome by extracting non-blur frames.However, the use of video frames in the process of 3D modeling of objects is associated with imaging geometry challenges and high calculations mathematical.For example, a short distance between two sequence frames' baselines leads to degeneracy conditions, and the fundamental matrix is not generated during the modeling process; if the volume of images increases, it is difficult for the processing system to process them concurrently (Zhang, 2017).Therefore, keyframe extraction as a representative of all captured frames capable of overcoming the issues mentioned above is critical in the 3D modeling process.As previously stated, to reconstruct the 3D model using video frames, keyframes must be extracted for accurate geometric estimation of the 3D model (Choi, 2016).The following section reviews related works in keyframes extraction from a video frame sequence.Keyframes extraction has been investigated from two perspectives: radiometric and geometric.From a radiometric standpoint, the radiometric quality of each frame is evaluated, and low-quality or blurry frames are eliminated.Various types of studies and methodologies have been offered to determine the rate of blurred frames (Ming-Chao, 1997;Frederic, 2002;Ong, 2003).BLUR METRIC (BLuM) is one of these metrics (Crete, 2007).This method's primary objective is to blur the original image and study the behavior of nearby pixels.Using the threshold (obtained by averaging the BLuM measurement across a set of high-quality frames), the blurred frames are eliminated (Crete, 2007).An approach based on the quantification of blurriness in the image, which is an automatic processing to determine blur frames and is introduced with the abbreviation SIEDS, is one of the other criteria for checking blur frames (Sieberth, 2016).This method digitally processes images with a certain degree of blurriness to determine the quantitative degree of blurriness of the image.The frames with low modeling geometry are identified and removed from the sequence of frames based on the geometric aspect of keyframes extraction.As previously stated, keyframes extraction from the geometric standpoint is related to the examination of criteria such as the number of frames, the position of the extracted frames, their stability, and the baseline of the overlap between the frames to avoid degeneracy conditions and obtain Epipolar geometry suitable for 3D reconstruction.Following is a review of related work from a geometric standpoint.Xie et al. (2015) proposed a hierarchical approach for keyframes extraction.To extract keyframes accurately, this method considers only one reference keyframe and several candidate frames adjacent to it as a sequence to calculate local correspondences.The keyframe extraction criterion is then calculated using the ratio of corresponding points and the Geometric Robust Information Criteria (GRIC) (Torr, 1998) criterion for selecting and extracting keyframes (Xie, 2015).Hossein pour et al. ( 2016) present a method for keyframes extraction from video sequences while minimizing reprojection error in their study.To avoid degeneracy conditions, the proposed method includes removing blur frames, applying an overlapping filter between frames, selecting an appropriate baseline between two frames, and utilizing the GRIC criterion (Hossein Pour, 2016).Following that, Choi et al. (2016) present a method for extracting frames containing helpful information from a video captured by a handheld camera in their study.In their research, they propose an approach that combines extraction criteria based on determining the appropriate baseline between frames, frame jumping for fast search in the movie, GRIC geometric information criteria to calculate frame by frame homography and fundamental matrix, and removing blur frames (Choi, 2016).Zhang et al. (2017) present a fast approach to keyframe extraction and an optimal matching method based on geometric constraints of the path and flight direction.This method is primarily offered to improve keyframe extraction efficiency and obtain more accurate corresponding points.Therefore, the frame is extracted as the keyframe if it meets the degeneracy conditions and the corresponding ratio requirements by calculating the GRIC value and the number of corresponding points (Zhang, 2017).Dadras Javan et al. (2019) used the BluM metric as a measure to assess the radiometric quality and method (Seo, 2003;Seo, 2008) for geometric keyframe extraction from a sequence of thermal video frames in their study (Dadras Javan, 2019).Azimi et al. (2022) proposed a method for keyframes extraction.This method divides the angle between the normal to the surface and the observation vector of each point in each image into four distinct patches.The keyframe is chosen as the camera frame that covers most areas of all points (Azimi, 2022).In this paper, the thermal infrared video sequence was used instead of the visible range to investigate the application of GRIC in keyframes extraction.The use of thermal infrared cameras is due to the limitations of visible cameras in adverse weather and at night.The main disadvantages of thermal infrared images are their low spatial resolution and geometric accuracy.In other words, because of the relatively large pixel dimensions of thermal infrared cameras and the short focal length, 3D models reconstructed from thermal infrared images have a low spatial resolution (Dadras Javan, 2019).Therefore, one of the limitations of previous studies is determining the appropriate threshold to identify and remove blur frames, which is optimized in this paper's proposed method by transferring images to the frequency space.Also, given that the video data used in this paper were recorded from the facades of a building with very low textures, among other challenges, we can mention feature extraction and matching algorithms in thermal infrared frames.In this regard, instead of using Kanade-Lucas-Tomasi (KLT) feature tracker algorithms in keyframes extraction methods, the proposed method utilizes the Scale-Invariant Feature Transform (SIFT) algorithm and matching key points (Suhr, 2009;Kumar, 2018;Wang, 2022).The steps in the keyframe extraction method presented in this paper are as follows: (1) the ability to recognize and remove blur frames from a sequence of thermal infrared video recorded frames.
(2) The ability to apply the standard baseline condition between sequence frames to establish the overlap condition and avoid degeneracy.As the second goal of this paper is to evaluate the method of keyframe extraction, the role of this method in the generation of the dense point cloud from building facades has been investigated further below.This goal is associated with threedimensional modeling of buildings based on thermal infrared images to evaluate building thermal properties, heat loss, air leakage, and humidity (Kylili, 2014;Dahaghin, 2021).This paper is organized as follows: Section 2 presents the proposed paper method, and Section 3 discusses the implementation and the results of the proposed algorithm's evaluation.Section 4 concludes with conclusions and future suggestions.

PROPOSED METHOD
To improve the geometric accuracy and calculation speed of the thermal infrared dense point cloud, keyframes extraction from the video is required as a pre-processing step.The main goal and focus of the paper are to investigate the effect of keyframes extraction from the thermal infrared video sequence on the geometrical accuracy of the thermal infrared dense point cloud.Figure 1 depicts the proposed method for keyframes extraction from thermal infrared video.According to Figure 1, in the proposed paper method, a nonmetric thermal infrared camera geometric calibration step is performed to reduce relative orientation error and bundle adjustment to generate the dense point cloud with optimal geometric accuracy.The quantity of blurriness, the overlap of baseline, the condition of degeneracy, and the extraction of keyframes have been examined.Finally, the reprojection error and density of the thermal infrared point cloud have been utilized to evaluate the effect of keyframes extraction from the thermal infrared video sequence.The proposed method's steps are outlined below.

Geometric Calibration of Thermal Infrared Camera
In this paper, the geometrical calibration of the thermal infrared camera using a combination of photogrammetry and computer vision methods has been examined.The calibration pattern is a rectangular plate with hollow circles.Using the calibration pattern with circular targets and extracting the two-dimensional coordinates of the center of the circles to relate with the ground space and estimate the calibration parameters yields acceptable results (Usamentiaga, 2017).Furthermore, because of the flexible geometry of the circle, the optimal ellipse can be identified in the images captured from the calibration pattern (Datta, 2009;Usamentiaga, 2017).The calibration pattern used in this paper is shown in Figure 2. Because of the spatial resolution and low contrast of thermal infrared cameras, circular targets are captured as ellipses in the image; therefore, the Hough Transformation (Chia, 2007) algorithm was used to fit and extract the exact twodimensional coordinates of the focal center of ellipse targets in the image space.Finally, using the geometric calibration mathematical model, the association between twodimensional and three-dimensional space is established, and the geometric calibration parameters are estimated.The equations of the collinearity condition are defined by equations ( 1) and ( 2). (1) Where c = principal distance r = rotation matrix elements xa, ya = image coordinates xp, yp = principal point coordinates X0, Y0, Z0 = coordinates of projection centre X, Y, Z = object coordinates Additionally, lens distortion parameters are iteratively calculated using Brown's equations (Brown, 1971).Equations ( 3) and ( 4) define Brown's equations. (3) Where x', y' = corrected image coordinates k = radial distortion coefficients p = tangential distortion coefficients

Evaluating the Radiometric Quality and Removing the Blur Frame
The presence of low radiometric quality frames, which appear as image motion in the frames, is one of the significant limitations in the processing of video frames (Cai, 2009).Because keyframe extraction is considered an essential step in the process of 3D reconstruction from video, the idea of identifying and removing motion and blur frames is proposed (Rashidi, 2013).Changes in the intensity of pixels along the edges have been extensively studied to quantify the blurriness effect (Marziliano, 2002;Yun-Chung, 2004;Varadarajan, 2008).The Fast Fourier Transform (FFT) metric was used in this paper to identify and remove blur frames.This metric expresses the radiometric quality of an image based on its blurriness in frequency space (De, 2013;Pagaduan, 2021).The magnitude spectrum image of the FFT is frequently displayed to assess the geometric and radiometric quality of the frames because it contains more information about the geometric structure and radiometric quality of the image in spatial space (Gonzalez, Woods, 2002).FFT mathematical equations are expressed in equations ( 5) and ( 6) (Abdel-Qader, 2003).The proposed method for determining the radiometric quality of frames consists of three steps: (a) removing image noise to prevent frame blur detection, (b) determining the optimal threshold value for image blur detection (c) based on the optimal threshold value, classifying the frames into the blur and non-blur frames.Because thermal infrared images contain noise by definition, removing the noisy frames is a necessary step.In this paper, to remove noise, the bilateral filter has been used (Tomasi, 1998;Paris, 2009).This image filter preserves the main edges of the image, and its output is by preserving the edges and reducing noise.The FFT algorithm calculates image frequencies at various points and decides on image blurriness based on the frequency level.As a result, image blur or non-blur quality is measured using high and lowfrequency values.If the values with a low frequency have a high number, the image is considered blurred, and vice versa (Abdel-Qader, 2003).Following the conversion of the frames to the frequency domain and the generation of the magnitude spectrum image, the optimal threshold value is determined using highfrequency values in the form of 50-frame intervals.The average magnitude spectrum is calculated to identify and remove blur frames in the desired intervals.As a result, frames with average magnitude spectrum values less than the threshold are identified as blurred and removed from the keyframes extraction process to improve geometric and radiometric accuracy.

Keyframes Extraction with Optimal Geometry
Keyframes extraction with optimal geometry is a method for extracting frames containing acceptable geometric information for 3D reconstruction from high radiometric quality (non-blur) video frames to improve geometric accuracy and reduce calculation volume in the 3D reconstruction process.Figure 3 illustrates the process of extracting keyframes with optimal geometry.According to Figure 3, the method of keyframes extraction with optimal geometry includes checking the standard baseline between sequence frames by evaluating the appropriate overlap between extracted features in sequence frames and the GRIC criterion to obtain frames with optimal geometry during the 3D reconstruction process.

Key Points Extraction and Matching
The SIFT algorithm is used to extract features in this paper (Lowe, 2004).The descriptors of key points are then matched using the kd-tree method of the Approximate Nearest Neighbors (ANN) algorithm for each pair of frames to perform image matching (Arya, 1998).To match the key points of two frames I and J, a kd-tree is constructed from the feature descriptors of frame J.For each feature in the frame I, the kd-tree is used to locating the nearest neighbor in frame J. Using ANN's priority search method, each search is restricted to visiting no more than 200 trees to increase efficiency (Snavely, 2008).Instead of classifying false matches based on the distance to the nearest neighbor, the ratio test described by (Lowe, 2004) was used in this paper.The two nearest neighbors in frame J with distances of d1 and d2 are found for the feature descriptor in frame I.If the ratio of d1 to d2 is less than 0.6, the correspondences are matches.

Standard Baseline between Sequence Frames
One of the most critical steps in the 3D reconstruction process is determining the amount of overlap and the standard baseline of sequence frames.Furthermore, the standard baseline between sequence frames should be sufficient to reduce the uncertainty of the depth calculation resulting from the triangulation method of the corresponding features in the 3D reconstruction process.Figure 4 illustrates that a short standard baseline increases measurement error compared to a long baseline (Ahmed, 2010;Choi, 2016).In this paper, the ratio coefficient of the corresponding key points between the two frames was used to evaluate the standard baseline between the frames to keyframes extraction, according to equation ( 7). ( 7) Where RC = ratio of corresponding key points TC = number of corresponding key points Tf = total number of key points extracted The camera's movement is inversely proportional to the numerical value of RC.As a result, in the first few frames where the camera is fixed or moves slightly, this numerical value is close to one.As a result of moving the camera, the numerical value of the ratio coefficient decreases; therefore, this criterion is used as a suitable solution to estimate the camera movement to establish an appropriate standard baseline between two sequence frames.Finally, to determine the search range for keyframe extraction, two thresholds, maximum and minimum, must be chosen.The minimum and maximum permissible thresholds in this paper are 0.6 and 0.8.Therefore, frames with a standard baseline are permitted if the numerical value of the ratio factor RC falls between these two thresholds.

Keyframes Extraction Based on the Prevention of Degeneracy Conditions
The fundamental matrix is used to investigate the overall structure of the camera in various locations, as well as the connection of the corresponding features between two frames.However, in the case of degeneracy, estimating the position of the camera is impossible.In degeneracy conditions, two essential modes are motion degeneracy and structure degeneracy (Torr et al., 1999).In the case of motion degeneracy, the epipolar geometry is not established if the camera rotates around its axis without transmission.However, the camera's homography matrix can be calculated using known control points in three-dimensional space.Structure degeneracy occurs when all three-dimensional points of an object are placed on a flat plane.In this case, it is impossible to estimate the fundamental matrix using the corresponding features, and the epipolar geometry is not established, as in the case of motion degeneracy.In degeneracy modes, the homography matrix is used to match frame pairs.As a result, using the modes above, a comparison between homography and fundamental matrices is made.Finally, the GRIC optimal geometric information criterion is used to compare two homography and fundamental matrices.The desired value is calculated by adding the two optimal fit components and the saving model.The optimal GRIC criterion is defined Using equation ( 8). ( 8) Where n = number of corresponding extracted features ei = vector of residuals r = dimension of the measurement data k = motion model parameters In equation ( 8), the parameter n expresses the number of corresponding extracted features in solving the fundamental matrix and homography, ei is the vector of residuals, the standard deviation of the measurement of points, and r is the dimension of the measurement data (for two frames, r=4, which is equivalent to the coordinates of the corresponding points in the two frames), and k is equal to the motion model parameters for the homography and fundamental matrices (for example, the number 8 for homography and 7 for fundamental) and the dimensions of the model structure (2 for homography matrix and 3 for fundamental).Based on equation ( 9), the residual vector is considered while evaluating the optimal fit component.Residual values are computed using the symmetric transfer error obtained from the estimation of the homography matrix by the RANSAC algorithm (Fischler, 1981).In addition, the number of residuals is estimated using the Simpson error coming from the RANSAC algorithm's estimation of the fundamental matrix.Regarding this, the parameter controlling the error value of the residuals   3 λ is used to manage the high values of the estimated error of the homography and fundamental matrices, which is equivalent to 2 in this paper.In Table 1, the needed optimal geometric parameters for computing the GRIC criterion in two homography and fundamental models are compared (Torr, 1999).  .The numerical value of the GRIC criterion for the fundamental matrix is continuously computed to be less than that of the homography matrix, regardless of the optimal fit component in equation ( 7).If the residuals in the fundamental matrix have high values, degeneracy requirements exist, and the homography matrix must be employed.Finally, using the condition provided by equation ( 10), a keyframe will be extracted that has the lowest numerical value of the GRIC criterion of the fundamental matrix model between two frames compared to the homography matrix model.

Generation of Thermal Infrared Point Cloud
In this paper, keyframes extracted from thermal infrared video and photogrammetry and computer vision techniques such as SfM and Multi-view stereo (MVS) were used to generate a dense point cloud (Ullman, 1979;Furukawa, 2010).Following the extraction of keyframes from the thermal infrared video sequence, the SIFT technique extracts the corresponding features between the keyframes.In this regard, the fundamental matrix can be computed by using the corresponding key features between pairs of frames (Longuet-Higgins, 1981).The candidate fundamental matrix is then evaluated using the RANSAC algorithm, and any incorrect features are removed.The SfM algorithm then converts the two-dimensional coordinates of the corresponding features of keyframe pairs into threedimensional coordinates by using the correct corresponding key features and the bundle adjustment.Then sparse thermal infrared point cloud is generated.The MVS algorithm is then applied to increase the density of the sparse point cloud.The implementation and evaluation of the paper's results are presented in the following sections.

EXPERIMENTAL RESULTS
In this paper, two thermal infrared video data sets were recorded from the facade of the patriarchal palace in Aliabad village, Aradan-Garmsar city, Semnan province, to implement and assess the proposed method.The study area is located at longitude 52.3034 and latitude 35.1600.The flight path in this paper is designed to collect data from the building's facade vertically at a distance of 11 meters and a flight altitude of 1.70 meters.The data were collected from two different facades of the building with the same flight parameters settings during the early winter season, early night hours, and the same weather conditions.The second data is used to evaluate the performance of the proposed method, and it was recorded by a UAV under the same conditions as the first data (test).A vertical flight UAV with a low flight altitude and an MC1-640s thermal infrared camera produced by KeiiElectro Optics Technology with a frame rate of 30 frames per second was also used to collect data.Tables 2 and  3 3. UAV technical specifications.

Camera Calibration
In this paper, interior orientation parameters and lens distortions are estimated as input to the point cloud generation algorithm to improve the geometric accuracy of dense point cloud generated using SfM and MVS algorithms.Following the extraction of the focal center of ellipse targets in the image's two-dimensional space using the Hough transform algorithm, the connection between the twodimensional and three-dimensional space is established using collinearity condition equations, and the interior orientation parameters are calculated.Figure 5 4 also contains the numerical results of the interior orientation parameters and lens distortions estimated using collinearity condition equations of geometric calibration based on pixels for the thermal infrared camera used in this paper.Table 4 displays the values and average standard deviation of each calibration parameter for 13 images captured using the proposed method from the calibration pattern based on the pixel.The cx and cy parameters in Table 4 represent the focal length along the x and y axes, the xp and yp parameters are the principal point coordinates, the k1, k2, and k3 parameters are the lens's radial distortions, and the p1 and p2 parameters are the tangential distortion coefficients 1 Standard deviation of the non-metric thermal infrared camera used in this paper.

Keyframes Extraction and Generating Point Cloud
The purpose of this paper is to investigate the performance evaluation of keyframes extraction in the generation of the thermal infrared dense point cloud to increase density and reduce reprojection error of the 3D point cloud.These criteria are effective in triangulation calculations to 3D reconstruction because keyframes have accurate geometry and high radiometric quality.The proposed algorithm was developed in Python and used the OpenCV library.The primary frames of thermal infrared video are extracted in the first step as input data for the proposed method.Then the blur frames are removed from the video frame dataset using the FFT metric by selecting the optimal threshold.Figure 6 depicts the results of extracting blur and non-blur frames.Following that, blur frames are removed from the keyframe extraction process to improve the geometric accuracy and radiometric quality of the thermal infrared dense point cloud generation.After assessing the radiometric quality of the frames, 853 blur frames were determined and removed from the test dataset, which contained 1957 primary frames, and 214 blur frames were identified and removed from the evaluation dataset, which had 907 primary frames.Two upper and lower thresholds were used in this paper to evaluate the ratio coefficient of the corresponding key points between pairs of frames.In other words, determining the threshold involves calculating the percentage of corresponding key points between two frames.The upper and lower threshold values in this paper are 0.6 and 0.8, respectively.If the ratio calculated between two frames is greater than the upper threshold value, the baseline between the two frames is short; if the numerical value of the ratio is less than the lower threshold, the baseline between the two frames is long.ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume X-4/W1-2022 GeoSpatial Conference 2022 -Joint 6th SMPR and 4th GIResearch Conferences, 19-22 February 2023, Tehran, Iran (virtual) Figure 7 shows an example of calculating the corresponding ratio coefficient between the frames.Figure 7 shows the corresponding ratio coefficient between the frames in the various intervals of the frames between the upper and lower threshold limits, which are 0.6 and 0.8, respectively.Finally, the frames with the standard overlapping baseline in the range between the upper and lower thresholds proceed to the step of evaluating the degeneracy conditions with the optimal selection criterion of GRIC.The degeneracy conditions between pairs of frames are estimated using the GRIC in this step.The numerical value of GRIC is then estimated for the fundamental and homography matrices using equation ( 8).In the following step, using equation ( 10), a keyframe is extracted where the fundamental matrix model between two frames has the lowest value compared to the homography matrix model.Figure 8     The numerical evaluation of the density of dense point cloud generated and the amount of reprojection error for the modes of using video data and the keyframes of the used test dataset are presented in Table 5.In this regard, the results show an increase in density and a decrease in reprojection error of the test dataset's point cloud generated using keyframes.Also, Figure 10 shows the output results of generating a 3D dense thermal point cloud using video data and the keyframes of the evaluation dataset.show that using keyframes increases the density of the output point cloud.In addition, the results of the numerical evaluation of the density of dense point cloud generated and the amount of reprojection error for the video data use modes, as well as the keyframes of the used evaluation dataset, are presented in Table 6.The results of the numerical evaluation of the density of dense point cloud generated and the amount of reprojection error for the use modes of video data, as well as the keyframes of the used evaluation dataset, are presented in Table 6.In this regard, the results show an increase in density and a decrease in reprojection error of the point cloud generated using keyframes for the evaluation dataset.Based on the results of tables 5 and 6, the use of keyframes increases the density by about 0.03% to 0.10% of points per square meter and reduces the reprojection error by about 0.005% of pixels (2 times) for the thermal infrared dense point cloud are tested and evaluated datasets.In the future, the proposed method in this paper will be quantitatively and qualitatively compared to a competitive process of keyframe extraction to generate the thermal dense point cloud.The conclusion and suggestions for future research are next.

CONCLUSION
The primary goal of this paper is to investigate the effect of keyframe extraction from a thermal infrared video sequence on the geometrical accuracy of a dense thermal point cloud.Therefore, a method for evaluating the effect of extracting keyframes from a sequence of thermal infrared images to generate a dense thermal point cloud has been presented.Based on the findings, extracting keyframes from an image sequence improves the geometric accuracy of the point cloud, increases the speed, and decreases the volume of triangulation calculations.Based on the paper's results, keyframe extraction increases the density of the thermal infrared point cloud by about 0.03% to 0.10% points per square meter.It reduces the reprojection error by about 0.005% pixels (2 times).Among the paper's challenges are the limitations of the thermal infrared camera, such as low contrast and spatial resolution.These constraints reduce the texture of the images and limit the number of features that can be extracted from them to assess the baseline of the overlapping, match the features, and generate points in threedimensional space.Another limitation and an effective step for identifying and extracting blur frames are selecting the optimal threshold to evaluate the radiometric quality of the frames.In this regard, it is possible to mention improving the contrast of thermal infrared images to increase the details and number of features that can be extracted from the thermal infrared images, as well as improving matching methods, which will be investigated in future studies.In this regard, it is possible to mention enhancing the contrast of thermal infrared images to increase the details and the number of features that can be extracted from thermal infrared images, as well as enhancing the matching, to check and solve the mentioned limitations.In future research, an attempt will be made to compare the limitations above and the proposed method of this paper to a competitive method from both a quantitative and qualitative standpoint.

Figure 1 .
Figure 1.Flowchart of the proposed method.

Figure 3 .
Figure 3. Flowchart of keyframes extraction with optimal geometry.
Figure 4 depicts the comparison of short and long baselines.A. long B. short Figure 4. Standard base length.
of the measurement error in calculating the extracted features.
contain more detailed information about the technical specifications of the thermal infrared camera and UAV used in the paper.

Figure 5
Figure 5 depicts the results of extracting the focal center of 221 ellipse targets using the Hough transform algorithm.Table4also contains the numerical results of the interior orientation parameters and lens distortions estimated using collinearity condition equations of geometric calibration based on pixels for the thermal infrared camera used in this

Figure 6
Figure 6 depicts the results of evaluating the radiometric quality of the frames.Figures (6.A) Show the high and low frequencies of the image of the magnitude spectrum of the frames, Figures (6.B) Display a non-blur frame, and Figures (6.C) Illustrate a blur frame.Video frames are considered in intervals of 50 frames in this paper to select and remove blur frames in specific intervals.As previously stated, the optimal threshold value is determined based on high-frequency values.To identify and remove blur frames in the desired intervals, the average value of the magnitude spectrum image is calculated for each frame.The average values of the magnitude spectrum of frames with minimum values in the desired range are then extracted as blur frames, while frames with maximum values are extracted as non-blur frames.
Figure 6 depicts the results of evaluating the radiometric quality of the frames.Figures (6.A) Show the high and low frequencies of the image of the magnitude spectrum of the frames, Figures (6.B) Display a non-blur frame, and Figures (6.C) Illustrate a blur frame.Video frames are considered in intervals of 50 frames in this paper to select and remove blur frames in specific intervals.As previously stated, the optimal threshold value is determined based on high-frequency values.To identify and remove blur frames in the desired intervals, the average value of the magnitude spectrum image is calculated for each frame.The average values of the magnitude spectrum of frames with minimum values in the desired range are then extracted as blur frames, while frames with maximum values are extracted as non-blur frames.

Figure 7 .
Figure 7.The calculation of the corresponding ratio coefficient between the frames.
Figure7shows an example of calculating the corresponding ratio coefficient between the frames.Figure7shows the corresponding ratio coefficient between the frames in the various intervals of the frames between the upper and lower threshold limits, which are 0.6 and 0.8, respectively.Finally, the frames with the standard overlapping baseline in the range between the upper and lower thresholds proceed to the step of evaluating the degeneracy conditions with the optimal selection criterion of GRIC.The degeneracy conditions between pairs of frames are estimated using the GRIC in this step.The numerical value of GRIC is then estimated for the fundamental and homography matrices using equation (8).In the following step, using equation (10), a keyframe is extracted where the fundamental matrix model between two frames has the lowest value compared to the homography matrix model.Figure8depicts the GRIC criteria and keyframes results for several extracted keyframes.
Figure 9 depicts the output results of the thermal infrared 3D dense point cloud for using video data and keyframes from the test dataset.A. Video B. Keyframe Figure 9.The output of the generation of the 3D thermal dense point cloud of the test dataset.

Figure ( 9
Figure (9.A) Depicts the output of the thermal infrared 3D dense point cloud of the test dataset for video mode, and figure (9.B) Illustrates the output of the 3D dense point cloud for keyframe In this regard, the visual resultsshow that using keyframes increases the density of the output point cloud.In addition, the results of the numerical evaluation of the density of dense point cloud generated and the amount of reprojection error for the modes of using video data and the test dataset keyframes are presented in Table5.
The output of the generation of the 3D thermal dense point cloud of the evaluation dataset.

Figure ( 10
Figure (10.A) Depicts the output of the thermal infrared 3D dense point cloud of the evaluation dataset in video mode, and figure (10.B) Illustrates the output of the 3D dense point cloud in keyframe mode.In this regard, the visual results show that using keyframes increases the density of the output point cloud.In addition, the results of the numerical

Table 4 .
Interior orientation parameters and thermal infrared camera lens distortions (pixels).

Table 5 .
The output results of the thermal infrared dense point cloud generation for the test dataset.

Table 6 .
The output results of the thermal infrared dense point cloud generation for the evaluation dataset.