THE EFFECT OF SPECTRAL MIXTURES ON WEED SPECIES CLASSIFICATION

: Site-specific weed management (SSWM) is a precise and resource-efficient approach that can result in more productive and sustainable agricultural practices. SSWM requires weed maps, in which the vegetation-related pixels are segmented from the soil and other substances and then classified into crops and different weed species. Such classification with a high spatial resolution is significant for SSWM since preventing economic losses due to weeds requires making management decisions at meter scales. In this regard, hyperspectral sensors can capture leaf anatomy and biochemistry variations, suggesting many advantages for weed classification. However, the typical tradeoff between spectral and spatial resolution poses a challenge for applying hyperspectral imaging in large scales and scenarios of high densities and tiny seedlings at early growth due to mixed pixels. Mixture analysis methods were previously demonstrated to offer opportunities for dealing with mixed pixels in vegetation ecology and agriculture. Nonetheless, they were not widely utilized for weed classification. This study aims to reveal the impact of the spectral mixture on classification results using supervised classification, spectral unmixing, and spatial analysis. We attempted to characterize how the spectral mixture of different weed species and soil at different growth stages affects classification results. Our results suggest that spectral mixtures are probably a significant factor driving misclassifications when classifying weed species. Their effect can be characterized by spatial analysis and fractions obtained by spectral u n m ixing. We assume that the subpixel information provided by the fraction maps may add information about the spectral mixture that can assist in interpreting misclassification pixels alongside the widely used confusion matrix. This contribution is highly relevant at coarser spatial resolutions.


Site-specific weed management (SSWM)
Weeds are the main biotic factor limiting agriculture production, causing 9% of the world's yield losses (Neve et al., 2018). The annual economic loss due to weed in the U.S. only is estimated at $27 billion (Neve et al., 2018). Herbicides are the primary tool for weed control in most industrial countries. However, herbicide application is associated with environmental pollution and human health, and over-reliance on herbicides has resulted in herbicide-resistant weeds. Site-specific weed management (SSWM) aims to reduce herbicide application by spraying only weed patches and adjusting herbicide rates according to the weed density and the species composition (Wang, Zhang, and Wei, 2019). SSWM is a more precise and resource-efficient approach that can result in more productive and sustainable agricultural practices . SSWM requires weed maps, in which the vegetation-related pixels are segmented from the soil and other substances and then classified into crops and different weed species. Proximal and remote platforms acquire image data for SSWM. Ground-based proximal sensing allows capturing spectral and spatial data at high resolutions; however, it is more time-consuming and is less efficient for large scales than remote sensing. Remote sensing (e.g., satellites\airborne sensors) allows for covering larger areas with the price of coarser spatial resolution . High spatial resolution is highly significant for SSWM since preventing economic losses due to weeds requires making management decisions at meter scales. * Corresponding author

Hyperspectral imaging for weed classification
Image-based machine learning techniques provide great potential for weed identification (Lati et al., 2021). Various sensing technologies were previously assessed for weed classification, including RGB, multi and hyperspectral cameras, and LIDAR. Each technology suggests its advantages and limitations. Hyperspectral sensors can capture variations in traits such as leaf anatomy and biochemistry. For example, the visible range (VIS) respond to variation in pigments content and photosynthetic activity (Ustin et al., 2009), near-infrared range (NIR) respond to anatomical traits (Zwiggelaar, 1998), and shortwave infrared (SWIR) indicates water, sugars and protein content in the leaf (Buitrago et al., 2018). Therefore Spectral signatures capture variation related to species, physiological state (Ronay et al., 2021) phenological stage (Basinger et al., 2020). Despite the potential to improve the classification results, only a little research attempted to classify weed species using hyperspectral data. The classification faces many environmental and biological challenges. Firstly, the plant's spectral reflection may vary at different phenological stages and environmental conditions. Secondly, high spatial resolution is needed for detection and classification at an early stage (mostly the critical time for weed control) and for overcoming the challenging condition of overlapping leaves at the later growth stages . Basinger et al. (2020) investigated the effect of the phenological stage on crop and weed species classification using hyperspectral spectra of leaf and canopy levels. The authors indicated that small changes in phenology over a week could affect the plant spectra. Herrmann et al. (2013) examined the potential of high spectral and spatial resolution ground-level spectral imaging for weed-crop classification in a wheat field. They found that sunlight vegetation can be better separated than shaded vegetation. Besides, they indicated the importance of the red edge for the classification between broadleaf, grasses, and wheat. Finally, the authors mentioned the need to further explore the influence of reduced spatial resolution for dealing with mixed pixels.

The tradeoff between spectral and spatial resolution
Hyperspectral imaging obtains promising weed/crop classification results. Nonetheless, previous studies have used ground-based hyperspectral data at the leaf level or captured images from a low altitude above the canopy. Such data acquisition is irrelevant for field application, and up-scaling is essential. However, there is a tradeoff between spectral resolution and spatial resolution. At coarser spatial resolutions, the spectral signature obtained from a pixel in an image may include the measured reflectance of multiple landcover types creating a spectral mixture. As the pixels capture the reflectance from a larger area, more materials are likely involved in the mixture. Thus, mixed pixels pose a challenge when applying image classification since the training procedure mostly relies on the "pure-pixels" spectra. Under high weed densities, multiple species spectra will be present in the same pixel. The species mix may lead to errors in species classification and miss-detection of the spatial information about their locations. The same problem occurs at early growth stages when weed seedlings are smaller than the pixel size. In such cases, the pixel spectrum is obtained as a mix of the weeds and their surrounding soil spectra.
Some studies investigated the effect of spectral mixing on estimating physiological traits such as chlorophyll content. (Jay et al., 2017)suggested that a resolution that compromises estimation accuracy and data acquisition efficiency can be found for different physiological traits depending on crop type. (van Leeuwen et al. (2021) used simulated hyperspectral images to evaluate the limitation of species identification and the effect of spatial resolution and species richness in the scene. Classification accuracies were lower at coarser spatial resolutions, yet, species richness in the scene did not affect accuracy results. Considering the SSWM, Louargant et al. (2017) used multispectral images with degraded resolutions for weed classification. Besides, they simulated mixed pixels to test the effect of spectral mixing on the soil-vegetation and monocotyledon-dicotyledon weeds classification. Finally, they concluded that pixels with a high vegetation rate are required to correctly classify between soil and vegetation and between monocots and dicots. Thus, Previous research that addressed the spectral mixture effect primarily relied on simulated data. Moreover, the mixing effect was not examined on weed classification using hyperspectral images. Therefore, it should be further characterized using real hyperspectral data at varying resolutions. It is also valuable to investigate how this effect interacts with growth stage, species composition, density, etc.

Spectral mixture analysis
The components involved in a spectral mixture can be quantified using spectral unmixing methods. The result provides the subpixel information missing at coarser resolutions. Hyperspectral unmixing refers to the process of separating the spectral mixture into a collection of pure spectral signatures, socalled End Members (EM). The product of unmixing an image is a set of fraction maps corresponding to a set of EMs. In each map, the EM's abundance are quantified for each pixel. Recent advances in algorithms even suggest the opportunity to create high-resolution fraction maps by fusing fraction maps with highresolution RGB images creating an abundance map with additional subpixel spatial information (Kizel and Benediktsson, 2020). Thus, fraction maps suggest valuable subpixel information essential for the accurate classification of weed species at different scales. Some studies in agriculture and ecology demonstrated the use of unmixing methods for various applications. In agriculture, unmixing methods were examined to monitor disease and pests while assuming that an infected plant and a healthy plant are separated endmembers . The pixel purity index, which allows finding the most "pure" pixels in a spectral image, was used to identify endmembers and develop thresholds to classify tomatoes late blight infection (Zhang, Qin, and Liu, 2005). Another study used fraction maps of healthy plants, damaged plants, and soil endmembers to detect spider mites' damage severity in cotton (Fitzgerald, Maas, and Detar, 2004). In ecology, fraction maps were used to estimate forest species abundance (Stagakis, Vanikiotis, and Sykioti, 2016). Medina, Manian, and Chinea, (2013) used the fraction map of airborne hyperspectral images for biodiversity assessment by calculating the Shannon entropy index of each pixel as a measure of the mixing level that corresponds to the biodiversity. Hence, Mixture analysis methods were demonstrated to offer vegetation mapping and agricultural applications and should be utilized for precision agriculture and research applications. Different vegetation spectra can be successfully unmixed to fractions and provide information about species abundances and physiological conditions of crop plants. Nonetheless, the unmixing approach has not yet been evaluated for weed mapping and SSWM purposes, and its ability to mitigate the tradeoff between spatial and spectral resolution was not determined. Furthermore, the analysis of classification results usually relies on a confusion matrix and overlooks the effect of spectral mixtures on the classification. We assume that the subpixel information provided by the fraction maps may add information about the spectral mixture that can assist in interpreting misclassified pixels alongside the widely used confusion matrix. This contribution is highly relevant at coarser spatial resolutions. Revealing the impact of the spectral mixture on classification results may allow locating pixels in the scene that are more prone to misclassifications. Once those pixels are located, it can be beneficial to approach them differently to achieve better classification accuracies and more accurate vegetation maps. The main objective of this study is to utilize mixture analysis to investigate and understand the relationship between spectral mixtures and the misclassification of weed species. Here, as a preliminary step, we attempted to characterize how the spectral mixture of different weed species and soil at different growth stages affects classification results. We acquired a time series of hyperspectral images of a scene containing multiple weed species during early growth for testing and validation. First, we classified the images using support vector machine (SVM) supervised classification. Besides, we unmix the images to derive fraction maps corresponding to four EMs. Then, we analyzed the misclassifications in the scene based on two factors, 1) their spatial location and 2) the fraction composition in each pixel.

Data acquisition
To investigate the effect of spectral mixtures on weed species classification, we experimented with data acquired under different growth stages, species, functional group, composition, and plant densities. Three weed species were used: Amaranthus retroflexus, Solanum nigrum, Setaria adhaerens (representing different botanical groups (monocots, dicots), and photosynthesis mechanisms (C3, C4). Besides, a crop plant (Triticum) was selected, and all species simulated a scene with different combinations of species and sowing densities. We sowed the plants in a sowing tray divided into cells (2 x 2 cm), which allowed for controlling the densities and arrangement of the different species (Figure 1a). Each variation included nine cells, where the wheat sown in the middle cell, and the weeds around it at different combinations and densities. Each cell in the tray contained one type of species. We included 5-8 seeds and 10-16 seeds per cell in the low and high densities, respectively. The tray included 26 variations of species composition and densities, with four randomly distributed repetitions across the scene ( Figure  1b). The trays were irrigated daily using sprinklers to allow germination. The emergence of the plants occurred ~5 days after sowing. We acquired hyperspectral (Specim IQ, Oulu, Finland) images of the scene on six days during the two weeks after emergence. Each image frame contained a barium-sulfate calibration panel to retrieve the reflectance spectra based on the same reference/conditions. For each image, each species' pixels were manually labeled on the RGB image obtained by the camera which provides a higher resolution for sampling ( Figure 2). We applied the labeling process using the MATLAB app "Image Labeler". Three different grouping methods were chosen to allow examination of classification based on the plant's species (Amaranthus retroflexus, Solanum nigrum, Setaria adhaerens(, and botanical groups (monocots, dicots) and photosynthesis mechanisms (C3, C4).

SVM classification
To examine the effect of mixed pixels on weed species classification, we applied the Support Vector Machine (SVM) classifier to the data. First, to reduce the complexity of the data, we applied principal components analysis (PCA). Then, we selected the first ten components of the hyperspectral data and used them for the SVM classification. The study relies only on vegetation pixels since soil and vegetation classification is better established than species classification. The classifier was trained on a random sample of 5% of the pixels from each group. Then, to assure that the stability of the classifier is independent of the training set, the training process was repeated ten times, using different randomly selected pixels set. Eventually, the classification results were compared to the ground truth image to generate a confusion matrix and calculate the overall accuracy.

Spectral unmixing
Fraction maps were generated to examine the hypothesis that spectral mixtures increase the rate of misclassifications. We used the Vectorized Projected Gradient Descend Unmixing (VPGDU) algorithm for this purpose. The VPGDU is robust to illumination variation and computationally efficient (Kizel et al., 2017). First, we extracted eight different sets of four EMs using random 5% of the pixels in each group based on the ground truth image (Figure 2). Then, to prevent bias in the EM extraction, we repeated the process eight times and used the mean EMs.

Characterizing misclassifications
We hypothesize that mixed pixels are most likely to on areas of transition from one landcover type to another. Accordingly, misclassifications are more likely to occur around the edges of the image. Thus, we first used the Canny algorithm to detect the edges. Then, we calculated the distance of each pixel from the nearest edge. Finally, we examined the correlation between the distance from the nearest edge and the probability of misclassifying a pixel. Besides, we tested the relationship between each EM fraction and the probability of a pixel being misclassified.

Figure 2
An example for the ground truth data for the three grouping methods. (a) RGB visualization of the hyperspectral image, (b), (c), and (d) ground truth data for plant species, botanical groups, and photosynthesis mechanisms, respectively.

The probability for misclassifications as affected by distance from the nearest edge
The Overall Accuracy (OA) ranged between 81.01%-94.82% for the classification between species. When imposed on the image, misclassified pixels seem to be mostly located on the edges of the plant's leaves connecting to classes of weed-soil or between two weed species. The regions that contain misclassifications seem to be characterized by mixed pixels (Figure 3b, 3c). Figure 4 presents the probability of pixel misclassification as affected by their distance from the nearest edge. The analysis showed that the probability for misclassification decreased as the distance from the nearest edge increased. This suggested that spectral mixtures, which are more likely to occur at the boundaries between different objects, are a major factor for misclassifications between weed species

Probability for misclassifications as affected by specific EM fractions
As a first step, we examined how the misclassification rate is affected by the pixels spectral mixture. We tested the hypothesis that the fraction size of a specific endmember in the pixel affects its probability to be misclassified. Accordingly, we expected that for the weeds EM's the probability for misclassification will decrease as the fraction of the endmember increases. Opposingly, we expected that as the fraction of soil increases, the probability of misclassification of weed species will decrease. Figures 5,6,7 and 8 present the results obtained for each one of the EM's. As expected, when the fraction of soil has increased, the probability for misclassification of weed species decreases (figure 5). We repeated a similar analysis for all the images acquired during the experiment. For Seteria adherents the opposite trend was expected, and the results confirmed that as the fraction of Seteria adherents increases, the probability for misclassification decreases ( figure 6). Unexpectedly, this trend did not seem to be as significant when we examined the effect of the Amaranthus retroflexus fraction on the misclassification rate ( figure 7). For Solanum nigrum, the fraction did not seem to influence the misclassification rate ( figure 8). This might be related to the fact that both Amaranthus retroflexus and Solanum nigrum are dicotyledon weeds, and therefore share more spectral similarities (Herrmann et al., 2013).

Figure 4
The effect of distance (D) from the nearest edge on mean misclassification probability (P) on 6 days of measurments. Distance is measured in pixels. As the distance from the nearest edge increase the probability for misclassification decreases.
The mean was calculated based on 5 different sowing plates. Error bars represent the probabilities' standard deviation.

Figure 5
The effect of soil fraction (F) in the pixel on the mean misclassification probability (P) on 6 days of measurments. As the soil fraction increases the probability for misclassification between weed species increase. The mean was calculated based on 5 different sowing plates. Error bars represent the probabilities' standard deviation.

Figure 6
The effect of Seteria adherents fraction (F) in the pixel on misclassification probability (P) on 6 days of measurements. The mean was calculated based on 5 different sowing plates. Error bars represent the probabilities' standard deviation.

Figure 7
The effect of Amaranthus retroflexus fraction (F) in the pixel on misclassification probability (P) on 6 days of measurements. The mean was calculated based on 5 different sowing plates. Error bars represent the probabilities' standard deviation. To better understand these results, further analysis needs to consider the other fractions involved in the mix and their relative size compared to the other fractions. Additionally, unmixing will be tested for the other grouping methods, which will allow examining this relationship for monocotyledons versus dicotyledons. The results are continuous with the results achieved by Louargant et al., (2017), who showed the effect of vegetation fraction on vegetation-soil, and monocotyledons and dicotyledons classification outcome. However, the results presented here suggest an alternative approach to characterize this relationship, using spectral unmixing.

CONCLUSIONS
This study aims to investigate and understand the relationship between spectral mixtures and the misclassification of weed species. Preliminary results showed that misclassification can be characterized by their location in the image and are more frequent in proximity to the edges between different weed species and weed and soil. Quantitive and visual analysis suggested that those pixels are generally affected by fractions of the different EM in the scene. This outcome was later confirmed by the combined analysis of the unmixing and the classification results. The fractions size of soil and Seteria adherents were shown to affect the rate of misclassification, but the presence of this effect was found to be dependent on the EM identity as the fractions of Amaranthus retroflexes and Solanum nigrum were not found to affect the misclassification rate. This leads to the preliminary conclusion that spectral mixtures are probably a significant factor driving misclassifications when attempting to classify between weed species. Besides, misclassified pixels can be characterized by combining spatial and spectral analysis. The preliminary findings raised different questions regarding the characteristics of spectral mixtures that lead to misclassification: Are specific mixtures more prone to misclassification than others? Does the identity of those mixtures affect by the growth stage? Does the spatial location of a specific mixture influence its probability to be misclassified? Therefore, we aim to continue and characterize those pixels, to establish the effect of spectral mixtures on the classification.