AUTOMATED CO-REGISTRATION OF INTRA-EPOCH AND INTER-EPOCH SERIES OF MULTISPECTRAL UAV IMAGES FOR CROP MONITORING

Abstract. The application of UAV-based aerial imagery has advanced exponentially in the past two decades. This can be attributed to UAV operational flexibility, ultra-high spatial resolution, inexpensiveness, and UAV-based sensors enhancement. Nonetheless, the application of multitemporal series of multispectral UAV imagery still suffers significant misregistration errors, and therefore becoming a concern for applications such as precision agriculture. Direct image georeferencing and co-registration is commonly done using ground control points; this is usually costly and time consuming. This research proposes a novel approach for automatic co-registration of multitemporal UAV imagery using intensity-based keypoints. The Speeded Up Robust Features (SURF), Binary Robust Invariant Scalable Keypoints (BRISK), Maximally Stable Extremal Regions (MSER) and KAZE algorithms, were tested and parameters optimized. Image matching performance of these algorithms informed the decision to pursue further experiments with only SURF and KAZE. Optimally parametrized SURF and KAZE algorithms obtained co-registration accuracies of 0.1 and 0.3 pixels for intra-epoch and inter-epoch images respectively. To obtain better intra-epoch co-registration accuracy, collective band processing is advised whereas one-to-one matching strategy is recommended for inter-epoch co-registration. The results were tested using a maize crop monitoring case and the; comparison of spectral response of vegetation between the UAV sensors, Parrot Sequoia and Micro MCA was performed. Due to the missing incidence sensor, spectral and radiometric calibration of Micro MCA imagery is observed to be key in achieving optimal response. Also, the cameras have different specifications and thus differ in the quality of their respective photogrammetric outputs.



INTRODUCTION
Recently, the application of drone technology in crop monitoring has become rife. Nex & Remondino (2014) review the use of unmanned aerial vehicles for 3D mapping applications, and highlights agriculture as a domain that consumes digital surface models (DSM) and orthoimages to extract useful information on crop status. In addition, the ultra-high multispectral and multitemporal resolution of UAV imagery is undoubtedly an enabler of Precision Agriculture to obtain actionable crop properties (Elarab et al. 2015).
UAVs are embraced across domains because they are flexible lowaltitude Remote Sensing (RS) platforms. Thus, they are not affected by cloud occlusion, and can achieve ground sampling distances (GSD) of up to 3cm or less depending on the flight parameters and the aim of the acquisition (Nex & Remondino, 2014). This is still ten times higher the spatial resolution of the best VHR satellite imagery. In addition, UAVs provide an inexpensive alternative to satellites and other platforms for aerial image acquisition; they increasingly offer tools and inspire innovations that seal the gap between terrestrial and aerial (high-altitude) platforms (Nex et al. 2015).
Conversely, UAVs face some drawbacks: regulatory constraints on the application of drones and licensing of drone pilots vary from country to country; limited areal coverage due to the battery endurance per flight; the instability of lightweight platforms; atmospheric elements such as strong winds and rain affect drone operations; the payload limit; image co-registration complexities, and difficulties in radiometric and geometric corrections (Freeman et al. 2015;Yang et al. 2017).
Accurate image co-registration is vital for reliable change detection assessment and accurate comparative analysis of crop phenology (Fytsilis et al., 2016;Tilly et al., 2014). Several models and algorithms that automate the co-registration process have been proposed. However, multispectral cameras with several lenses still suffer misregistration setbacks as demonstrated in related works of Jhan et al. (2017) and Rey et al. (2013). This is partly due to the fact that the technology space is dynamic and new camera sensors with different specifications and more abilities are continuously being engineered.
On the other hand, co-registration of multitemporal series is vital for reliable spatiotemporal analysis of crop's spectral properties. Misclassification of crop growth per pixel, vegetation index extraction errors, interpolation errors in values between available observations, and harvest index variation prediction are some of the inherent errors due to misregistration (Lobell, 2013).
The aim of this study was to provide a novel approach for accurate UAV-based multispectral and multitemporal monitoring of crops without repeated establishment of Ground Control Points (GCPs) which is laborious during photogrammetric processing. Usually, GCPs are meant to minimize systematic errors and deformations in images, stabilize bundle solutions, and determine correct 3D reconstruction (Nex & Remondino, 2014). However, the lack of GCPs did not hamper this study since the acquisition of the first epoch was assumed to be the reference epoch, and registration assessments of subsequent acquisitions were based on the first epoch.

RELATED WORKS
Misregistration of UAV multispectral and multitemporal imagery has attracted the need of researchers to propose and develop novel methodologies to resolve this problem. Kelcey & Lucieer (2012) suggest procedures to calibrate the six bands Mini MCA camera including radiometric correction, noise reduction and affine transformation for simultaneous image registration and correction of lens distortion. Turner et al. (2014) develop a semi-automated workflow for accurate spatial co-registration of a visible camera, six -band Micro MCA multispectral sensor, and a thermal infrared camera. Using the Scale Invariant Feature Transform (SIFT), a mean accuracy of 1.78pixels is achieved. This was deemed sufficient for monitoring of Antarctic moss beds. Jhan et al. (2016) present a modified projective transformation model based on the principles of plane-to-plane projection to undertake accurate band-to-band registration (BBR) of RGB and Mini MCA 12 multispectral imagery. It is noted that feature matching for narrow band multispectral and hyperspectral sensor with no overlapping spectral range is difficult. An accuracy of 0.33 pixels is achieved for the proposed BBR method. However, coregistration errors of less 0.6 pixels was obtained between Mini MCA reference band and the RGB ortho-images.
A novel approach to automate the co-registration of UAV-based multi-temporal RGB image blocks without the use of GCPs is presented by Aicardi et al., (2016). The first acquisition is chosen as the reference dataset. The orientation parameters of the anchor images are fixed; this constrains the bundle block adjustment of the slave images to be aligned with the reference image. An array of tests to assess both manual and automatic registration approaches for the selected anchor images provides reliable results, which are quite comparable to a GCP-based strategy. Onyango et. al. (2017) use keypoint descriptors to accurately estimate orientation parameters of UAV images through coregistration of oblique imagery. Using AKAZE, brute force is implemented to find putative correspondences and Lowe's ratio test used to discard wrong matches. Multiple homographies are computed using the putative correspondences to filter out remaining mismatches.
Recently, Banerjee, Raval, & Cullen (2018) optimize feature descriptors techniques to align UAV-hyperspectral images in a spectrally complex environment. It is observed that for band-toband alignment, keypoint descriptors are inclined to spectral order vis a vis temporal order. In addition to spatial invariance, spectrally invariant descriptors will go a long way in improving the efficacy of the band-to-band alignment process.

EQUIPMENT AND DATA
This research seeks to accurately co-register and assess the data quality of the images acquired by the Micro-MCA Tetracam camera mounted on the Matrice 600 UAV, and the Parrot Sequoia camera mounted on the Phantom 4 UAV.

The Parrot Sequoia and Micro MCA 6 Tetracam
The Parrot Sequoia (PS) multispectral sensor captures the electromagnetic spectrum in four separate parts: green, red, rededge and Near Infrared (NIR). It incorporates the Global Positioning System (GPS), Inertial Measurement Unit (IMU) and magnetometer thus increased accuracy of data capture. The Parrot Sequoia also integrates an irradiance sensor to continuously record light conditions. Figure 1 shows the Parrot Sequoia camera system. The micro MCA multispectral camera has six separate cameras. Each camera is synchronized with the other cameras so that each can capture the same scene at the exact same time of exposure. During each exposure instant, six separate channels of visible or NIR radiation move through each lens and filter to form separate monochromatic images on each sensor.  The spectral wavelength range of each UAV is shown in table 1.   Figure 3 below. The two sensors were mounted on different UAV platforms due to their physical properties of size and weight, and the complexity of sensor and UAV integration in the case of the Micro MCA.

Image Acquisition and Data properties
The maize field (approximately 25 acres) is located in the periphery of Gronau city, Germany (52° 10'N, 6° 55'E). Image acquisition was done in three time-steps for the Parrot Sequoia and only one acquisition for Micro MCA as shown in

METHODOLOGY
This section gives a detailed description of the approaches taken and methods used to realize the main aim of co-registration of multitemporal series of multispectral UAV imagery. The general overview of methods, processes, decisions, intermediate and final outputs are captured in the flowchart in Figure 4.

Photogrammetric Workflow
Pix4D software was used for most of the photogrammetric workflow involving the Parrot Sequoia images. Aligning the Micro MCA images was done using Tetracam's Pixel Wrench II (PW2), based on a calibration file that contains the relative orientation between the master and slave bands since the Micro MCA images are captured in a RAW file format. The RAW format images were converted to multipage TIFs using PW2 and subsequently processed in Agisoft Photoscan due to its aggressiveness to correct the rolling shutter effect.
Initial photogrammetric processing includes keypoints detection and matching of single images; estimation of interior and exterior orientation, aerial triangulation, bundle block adjustment, tie point generation, as well as georeferencing. Dense points sufficient enough to estimate planes and geometry of the image scene are then generated. The point clouds are used as an input to generate a Digital Surface Model (DSM), which is thereafter also used as an input to generate orthophoto bands for the whole scene. Each band therefore had its independent orthophoto. The identification of corresponding keypoints between successive overlapping images is an integral part of image coregistration. Desirable keypoints must be devoid of noise, blurs, illumination variances and geometric differences. Experimentation revealed that for SURF, the higher the strongest feature threshold, MetricThreshold, the lesser the blobs, and the higher the octaves the larger the detected blobs.
For BRISK, the minimum contrast, MinContrast', specifies the minimum intensity difference between a region and its immediate surrounding. It is a scalar in the range of zero (0) and one (1). An increase in this value would lead to a decrease in the number of blobs detected. Similarly, the minimum quality, 'MinQuality', ranges between zero (0) and one (1); it denotes the minimum accepted quality of detected regions. When the value tends towards one (1) erroneous blobs are removed.
In MSER, the size of the region is a two-element vector denoting the minimum and maximum areas of regions in pixels to be allowed in the detection process. At varying intensity thresholds, the maximum area variation between extremal regions is specified by a positive scalar between 0.1 to 1. An increase in this value results in detection of more external regions which may be less stable.
Finally, for KAZE, the local extrema is a function of the Hessian threshold, which is specified as a scalar greater than or equal to zero (0). An increase in this value excludes less significant local extrema. The multiscale detection factor and the scale levels are scalars in the range of 3 to 10. Larger features are detected by increasing the multiscale detection factor whereas smoother scale changes and additional intermediate scales between octaves are realized by increasing the scale levels.

Feature Description:
Feature description is a function of the neighbouring pixels; it is done by extracting the intensity gradients of the neighbouring pixels and stored as a vector of numbers describing the center pixel. The vector size of the neighbourhood can vary between 64 and 128 pixels, and the descriptor can be considered rotation invariant if orientation of the feature vector is computed.

Feature Matching:
To select strong matches, a matching threshold is specified according to different metrics (e.g. L1 or L2 norm); it represents a percent of the distance from a perfect match. Two feature vectors are a match when the distance between them is less than the set threshold. The higher the matching threshold, the more matches obtained (not necessarily 'good' matches).

Outlier Removal and Transformation matrix
In this study, the outliers in matched points were excluded using the M-estimator Sample Consensus (MSAC), which is a variant of the RANSAC (Torr & Zisserman, 2000). RANSAC suffers a setback; it is sensitive to the threshold that defines inliers and outlier. A very large threshold tends to rank all the hypotheses equally and qualify them as good for the fitted model. Conversely, a very small threshold tends to be unstable in estimating parameters. The MSAC (see equation 1) partially compensates for this undesirable effect. It penalizes the outliers equally but scores the inliers on how well they fit the data. .
Where ρ2 is the robust error term, and T is the threshold for considering inliers.
A set of putative matches are taken in, and a random selection picks the best set of matches to fit the model, and computes the transformation matrix between the inlying points. The outlier matches are defined by a distance threshold between features in band 'A' and band 'B' upon inverting the geometric transformation. Only points that meet this threshold were used to compute the transformation matrix. Estimation of the transformation matrix was done at the image and orthophoto level. At both levels, the transformation matrix was compared element by element, and quantified as a RMSE for comparison between different camera positions. The 2D similarity transformation method was used in this study because it retains angles and length ratios, and because the orthophotos are planimetric and geometrically similar. The Transformation matrix was further decomposed to fetch out the band-to-band rotation and translation (i.e. relative orientation).

Band-to-band Co-registration 4.3.1 Intra-Epoch registration:
To examine misregistration within a single acquisition, it was necessary to establish the best band to use as the reference. The reference band should have sufficient keypoints to be matched with features extracted from other bands.

Inter-Epoch registration:
Two approaches were evaluated as shown in Figure 6. The 'many-to-one' registration involved estimating the transformation matrix between all the subsequent bands of each acquisition with the red edge band of the first acquisition; geometrically transforming them; assessing their pairwise registration accuracy; and stacking them together per epoch. On the other hand, the 'one-to-one' approach involved using all bands in the first acquisition as the reference. Bands from subsequent acquisitions were considered slaves, and were thus aligned to spectrally corresponding bands of the first acquisition.

Co-registration Accuracy Assessment
The misregistration error between the spectral bands was measured by computing the projection distance between inlying point pairs. The horizontal positional positions of inliers before and after registration were used to assess the co-registration accuracy as illustrated in Figure 7. The positional accuracy and distance between detected features between bands is expected to be less than half a pixel after co-registration. For a perfect registration, the differential distance between conjugate pairs should be zero. Thus, values tending towards zero are desirable. (2) (3) The RMSEx and RMSEy were used to evaluate systematic displacements in either direction. The combined RMSEr was used for overall registration accuracy assessment. The closer the value is to zero, the more accurate it is. The registration threshold was 0.5 of a pixel, therefore RMSEs less than 0.5 were considered 'good' and acceptable.

Vegetation Index and Spectral Analysis
Spectral indices are designed to give an approximate measure of vegetation status. The Normalized Difference Vegetation Index (NDVI) was used to characterize crop health in this research. NDVI is computed as shown below:- To statistically assess and compare spectral variability between the two UAV cameras, intra-farm zonation was done. Spectral signatures of two classes of crops (photosynthetically active and less active) were extracted from corresponding composite images of Parrot Sequoia and Micro MCA Tetracam. This was aimed at explaining the spectral variability between the sensors.

EXPERIMENTATION, RESULTS AND ANALYSIS
In this section, a series of tests are run to inform coregistration decisions including selection of the master band, and the optimal parameters to use. All the algorithms are tested using default parameters, and further optimized for this particular vegetation scene and dataset combination. In addition, the experiments aim to compare the performance of the algorithms since they are architecturally different; SURF and KAZE use float descriptors, while MSER and BRISK use binary descriptors.

Master band Selection 5.1.1 Feature Detection using default parameters
The results indicated that KAZE outperformed SURF, BRISK and MSER by detecting three times the number of points detected. Also, the red edge was selected as the master band because it had the highest number of detected keypoints thus offering a higher chance of correct matches with other bands. The performance of the algorithms is shown in Figures 8.  Figure 9 shows algorithm performance for inter-epoch matching. Despite being seen to have more outliers than inliers in the first and second band combinations, the overall performance of KAZE depicts more inliers than the other algorithms. On the other hand, BRISK and MSER failed to converge to find sufficient points after 1000 iterations for the master and red band combination. Thus, KAZE and SURF were selected for subsequent tests in this study.

Parameterization for feature detection
The results show that the higher the octave the more features are detected cumulatively. KAZE presents a sharp cumulative increase in points detected from the first to the second octave, but somewhat plateaus by the third octave.  It was observed that a stable condition of feature detection is reached in scale level five and second octave because the number of features detected per band beyond these points are less than five percent of the total number of features detected. In addition, an increase in scale levels increases the computational time.

Parameterization for feature description and matching
A feature size of 128 provides a higher description accuracy but consequently decreased the number of matched and inlying features, which were insufficient for MSAC to fit the best model to estimate the transformation matrix. Therefore, misregistration errors were still evident after registration.  On the other hand, with a descriptor size of 64, the number of matched and inlying features increased. The descriptor size of 64 was therefore selected since the number of inliers were enough for accurate estimation of the transformation matrix.

Intra-epoch band-to-band registration (Parrot Sequoia)
Using the Red edge as the master, image level analysis revealed a systematic displacement attributed to the basis distance of the cameras as presented graphically in Figure 11. Figure 11. Showing image level systematic displacement The red edge and NIR combination exhibited a displacement of about 11 pixels in the "Y" direction. The red edge and red showed a shift of about 6 pixels in both directions. The last combination, red edge and green, unveiled a uniform displacement of about 6 pixels in the "X" direction.
At the image level, misregistration is reduced from an average of 10 pixels to 0.28 pixel. The co-registration results at the image level within same epoch are presented in Table 6.  Table 6. Image point pair distances before and after registration On the other hand, misregistration of compositely processed orthophotos is at a minimal average of 0.3 pixels. This can be attributed to corrections during image level co-registration, triangulation, georeferencing, and orthorectification.

Intra-epoch accuracy assessment
At the orthophoto level, the results show that the bands are aligned to subpixel accuracy. The positional RMSE of the inliers is equal for epoch one and three, and decimal differences in epoch two, before and after registration. Since the registration procedure was intensity-based, the slight differences in epoch 2 could be attributed to randomness during outlier removal.  Table 7. PS horizontal positional RMSE (pixels) using SURF

Inter-epoch band-to-band registration (Parrot Sequoia) 5.5.1 Many-to-One Registration
Subpixel accuracies were obtained using projection distance thresholds of 0.7 and 0.5 for SURF and KAZE respectively. SURF could not find sufficient point pairs to estimate geometric transformation between orthophotos of epoch 1 and 2 at a projection distance threshold of 0.5. However, the point pair distances obtained with SURF and KAZE after registration is presented in Figure 13. Misregistration errors in the range of 0.02 -1.16 pixels for SURF, against KAZE's 0.02 -1.37 pixels for all band combinations were evident. Figure 12. Point pair statistic of many-to-one band registration

One-to-One Registration
The displacement between corresponding bands was seen to be systematic across all the bands. It was observed that the red band combination recorded the lowest number of inliers whereas green had the highest. Nonetheless, the inliers allowed for accurate estimation of the transformation matrix, and thus subpixel accuracies were obtained. From the statistics presented in boxplots presented in Figure 13 (b), it is observed that the mean registration error is in the range of 0.26 -0.38.

Inter-epoch accuracy assessment
The results presented in this section are those of aligning epochs two and three to epoch one using both matching strategies. As demonstrated in tables 8 and 9, both SURF and KAZE obtained subpixel registration accuracies. SURF however recorded lower RMSE values than KAZE.  Table 9. Points pair RMSE (pixels) of one-to-one registration The one-to-one registration approach yields better results than many-to-one approach. The many-to-one approach recorded an average RMSE of 0.5 against the one-to-one approach of 0.36 across all band combinations. The similarity in spectral properties per band combination in one-to-one approach is one of the possible reasons as to why band pairs are better aligned.

Parrot Sequoia Versus Micro MCA NDVI Analysis
Comparative NDVI zonal statistics were computed for both UAV images. Zones A, C, D, E, G, H and I are within the maize field but with different crop densities as shown in Figure 14. The Vegetation Index (VI) response as extracted with PS and Micro MCA are highly correlated; the spatial averages of each zone recorded a positive correlation of 0.86. Despite the differences in GSD and in image quality between the two cameras, the spatial variability of NDVI is highly comparable. The average deviation of the mean values between the two datasets is 0.16, with the PS registering higher scores than Micro MCA. As shown in Figure 15, the highest average deviations are seen in zones E and I; Micro MCA depicts comparatively low NDVI values than PS.

Inter-epoch Analysis of NDVI
Having established the variations in NDVI from images acquired in the same day but with different spatial resolutions, an interepoch comparative analysis of vegetation performance was done for all the three epochs. Being a qualitative assessment, the zonal statistics presented in Figure 16 illustrate the observed spatiotemporal dynamics of NDVI values.

Spectral Variability Analysis
Both cameras can spectrally distinguish photosynthetically active and less active vegetation; the green band reflects more than the red which absorbs most reflectance, and a sharp transition is seen in the red edge. The average spectral deviation between the cameras is by a factor of 1.6 and 1.3 for photosynthetically active and less active crops respectively. The observed deviations could be attributed to image quality; difference in camera calibration, and spectral band width. More significantly, it can be attributed to the missing incidence sensor on the Micro MCA thus changes in irradiance during capture are not accounted for. Vegetation surfaces do not reflect light evenly in all directions i.e. Lambertian properties. The non-uniform reflectance hampers pixel-wise comparison and thus zonal comparison in this study.

DISCUSSION
The complex nature of vegetation surfaces calls for more stringent parameters as far as UAV-based multispectral sensing is concerned. In this study, a side overlap of 40% was used for image acquisition. This was observed to be insufficient to generate a stable photogrammetric block; artificial zonal variability corresponding to flight lines was evident on the orthophoto as shown on Figure 19 below. The results show that region detectors that use float descriptors are more robust than the ones using binary detectors. SURF and KAZE detected and indexed correct matches in all the spectral channels while the binary descriptors, MSER and BRISK, failed to find enough keypoints to qualify as inliers for all the band combinations; binary descriptors compare pixel differences whereas float descriptors compute intensity gradients.
In this study, horizontal positional RMSE of the inlying conjugate points was used for accuracy assessment; another possible way to assess the co-registration accuracy could be to compute the epipolar geometry between band-pairs and compute the residual error of the distances of matched points from their corresponding epipolar lines (Onyango et al., 2017). It is however important to note that accuracy varies depending on the method used to estimate geometric transformation. Jhan et al. (2017), argue that image planes are not exactly parallel thus the use of similarity and affine transformation for band-to-band coregistration is unsuitable. On the flip side, image blocks in this study were orthorectified, thus assumptions of planarity, parallelism and similarity were made. As such, the use of similarity transformation method to estimate the geometric transformation.
SURF and KAZE obtained subpixel registration accuracies but SURF was observed to be faster. KAZE employs the additive operator splitting, which has been reported to be quite inefficient (Gerke, Nex, & Jende, 2016). On the other hand, KAZE is more rigorous in feature detection than SURF. The results of this study demonstrated that both KAZE and SURF are effective algorithms for co-registration of multispectral images.
Intra-epoch and inter-epoch coregistration achieved sub-pixel accuracy in the range of 0.16 -0.22 and 0.28 -0.39 pixels respectively, which is adequate for accurate crop monitoring. The results of Townshend et al. (1992), demonstrate that to obtain NDVI values with a 90% confidence level, intra-epoch coregistration accuracy of 0.2 pixels or less should be obtained.

CONCLUSION
The main objective of this research was to investigate intra-epoch and inter-epoch misregistration of multispectral UAV imagery, and to explore the potentials of unmanned aerial systems for crop monitoring. This study proposes an intensity-based feature detection and description method to automatically co-register both intra-epoch and inter-epoch multispectral imagery. SURF and KAZE were tested and both detectors demonstrated the ability to co-register multispectral imagery to subpixel accuracy.
In light of the results obtained in this study, SURF is equally robust, more efficient, and computationally inexpensive than KAZE. On the other hand, KAZE is more vigorous than SURF and always detects more keypoints hence increasing the chances to get more correct matches per band combination. Both algorithms can be used satisfactorily for intra-epoch and interepoch co-registration, although their performance will vary based on parameterization and image quality. The presented coregistration approaches of many-to-one and one-to-one can therefore be used but the one-to-one is preferred.
The analysis of NDVI results between Sequoia and Micro MCA demonstrated that due to differences in spectral regimes of multispectral UAV imagery, the use of one system throughout the monitoring period is prudent. In summation, both Parrot Sequoia and Micro MCA are applicable for crop monitoring; they both have the spectral bands vital for monitoring photosynthetic activities of crops. Although Micro MCA has two more bands and can therefore sense more spectral properties, these additional spectral features offer a basis for future research. Also, in this study, GCPs were not used to scale, georeferenced and estimate distortions within the photos. Thus, greater inter-epoch displacements. To investigate the misregistration error between orthophotos that have been processed using GCPs in all the epochs would go a long way in contributing to inter-epoch coregistration methods.