AUTOMATIC REFINEMENT OF TRAINING DATA FOR CLASSIFICATION OF SATELLITE IMAGERY

In this paper, we present a method for automatic refinement of training data. Many classifiers from machine learning used in applications in the remote sensing domain, rely on previously labelled training data. This labelling is often done by human operators and is bound to time constraints. Hence, selection of training data must be kept practical which implies a certain inaccuracy. This results in erroneously tagged regions enclosed within competing classes. For that purpose, we propose a method that removes outliers from training data by using an iterative training-classification scheme. Outliers are detected by their newly determined class membership as well as through analysis of uncertainty of classified samples. The sample selection method which incorporates quality of neighbouring samples is presented and compared to alternative strategies. Additionally, iterative approaches tend to propagate errors which might lead to degenerating classes. Therefore, a robust stopping criterion based on training data characteristics is described. Our experiments using a support vector machine (SVM) show, that outliers are reliably removed, allowing a more convenient sample selection. The classification result for unknown scenes of the accordant validation set improves from 70.36% to 79.12% on average. Additionally, the average complexity of the SVM model is decreased by 82.75% resulting in similar reduction of processing time.


INTRODUCTION
Today, the increasing amount of image data originating from sensors like satellites provides a broad basis for several applications in Geographic Information Systems (GIS).Evaluation, however, often demands more manpower than available.Hence, (semi-)automatic systems based on computer vision and machine learning algorithms are of great interest with respect to these applications (Förstner, 2009).With regard to this, methods involving pixel-wise and object-wise classification as well as segmentation are proposed for land cover classification (Helmholz et al., 2010).A comprehensive review in (Mountrakis et al., 2011) shows that a lot of research has been done in the area of support vector machine classification, recently.Supervised methods like SVMs still demand for user interaction in the training process.The selection of samples (i.e. training data) is crucial and directly influences the classification.With real world data, optimal sample selection is neither possible nor practical for the human operator.This particularly applies for small enclosures of dissimilar regions within competing classes (e. g. bushes and tree groups within settlement).These outliers are incorrect samples that consequently reduce classification quality.To address this problem, we propose a method to automatically optimise training data for SVM classification with respect to correctness while concurrently reducing the complexity of the derived model.

Related Work
Only little research has been done in this area, recently.(Tolba, 2010) describes a method to locate outliers on a low-dimensional manifold that was mapped from a higher dimensional space of training samples.(Xu et al., 2006) directly modify the standard soft margin principle to suppress outliers during training.Surveys on outlier detection methodologies are given by (Chandola et al., 2009) and (Escalante, 2005).A survey of (Hodge and Austin, 2004) points out decision tree methods for supervised machine learning approaches.(John, 1995) uses iterative training and pruning of misclassified labels to keep inliers.(Brodley and Friedl, 1996) extend this idea for land cover classification.A consensus voting scheme is employed to filter results from an ensemble of classifiers to eliminate mislabelled samples.For SVM classification, further methodologies can be roughly categorised into online learning and batch learning based techniques.
In online learning, samples are added one at a time.This can be utilised for active learning where new samples are consecutively queried for annotation by the algorithm, while updating the trained model.While primarily being focused on large data sets, the algorithm queries for critical or important samples and thus avoids outliers.In (Laskov et al., 2006), incremental support vector learning is utilised for active learning.(Li and Sethi, 2006) propose confidence-based active learning to optimise the training, by only processing uncertain samples (which hold most information).
In batch learning, all samples are available at once.The main focus of several proposed approaches lies on training set reduction while keeping cluster boundaries.(Bakır et al., 2005) first train independent SVM on subsets of the training data.These are used to classify the training data and then to identify uncertain samples and discard all others.In a second step, a final SVM is trained on the remaining samples.(Wagstaff et al., 2010) propose to use probability estimates of a low complexity SVM to determine uncertain regions for high-accuracy classification.(Wang et al., 2005) describe two approaches based on confidence and Hausdorff distance to remove unneeded samples.

Contribution
As stated in the previous paragraph, most batch learning based methods focus on training set reduction.Samples are removed, such that only those at boundaries remain, since these will most likely lead to support vectors.However, uncertain samples at boundaries might also originate from outliers that unnecessarily increase the complexity of the model.In our approach, we pick up the idea of using uncertainty information, but use it to remove outliers.
In an iterative training-classification scheme, we identify and remove misclassified samples.In contrast to outlier detection meth-ods like those in (John, 1995) or (Brodley and Friedl, 1996), our approach takes uncertainty of the executing classifier into account.Uncertainty of classified samples is evaluated by employing probability estimates (Wu et al., 2003).Highly uncertain samples, which primarily represent transitions between distinct textures in the image, are removed to further improve the refinement.An underrated problem of sample removal is degeneration of classes.In order to avoid underrepresented classes to vanish from training data, we describe a robust stopping criterion for the iterative refinement.Our experiments show that manual sample selection can be kept practical, since small enclosures of competing classes are sorted out automatically.We show that both, quality and processing time of classification improve even with low quality training data.It is important to note that, in the first instance, outliers are removed which would otherwise lead to a wrong model and thereby unnecessarily increased model complexity.It is not the primary aim to search for the most trivial samples in order to simplify the model.
In the following section, we will first show the methodology of the proposed approach in detail.Subsequently, the experimental setup is outlined, Finally, results are shown which illustrate the benefit of our approach.

Base System
The iterative refinement proposed in this paper is built around a common base system which consists of feature extraction and training/classification as explained in the following subsections.
2.1.1Feature Extraction Radiometric and statistical features are extracted within a local N × N neighbourhood for each pixel.This is done for all available spectral image bands (e. g.R, G, B, NIR) and in different levels of detail and finally composes the feature vector.Here, the feature vector dimension d f , considering two scales, two features and four bands, adds up to d f = 16.

SVM Training and Classification
For training and classification, all feature vectors are handed to an SVM (Vapnik, 1998).(Burges, 1998) gives a comprehensive introduction to support vector machines.Our implementation is based on LIBSVM (Chang and Lin, 2001).For our tests, the common Radial Basis Function (RBF) kernel was used.No explicit parameter tuning was done, general parameters with respect to robust classification in all our scenes were chosen.This implies a good generalisation biased against overfitting.

Iterative Refinement
The general method of our approach is shown in Figure 2  In this paper's contribution, additional steps are introduced to optimise the model.After training, a reclassification of the training data is done (Figure 3(a)).As can be seen, there are misclassifications compared to the original labels.Due to robust, generalising parameters, the training is not prone to overfitting.Hence, misclassified samples can safely be treated as outliers and exempted from further consideration.Additionally, uncertain samples are excluded as well.Therefore, probabilities for class memberships are estimated by pairwise coupling (Wu et al., 2003) for each    Class labels are depicted in grey, invalid area is black.S1 and S2 are not valid due to invalid center position and neighbourhood, respectively.Only S3 is a valid sample.

Convergence Characteristics
Gradually removing outliers entails the risk of degenerating classes.A class might be underrepresented due to a low number of samples or many outliers.Additionally, low separability in feature space can lead to a heavy bias towards other classes.Hence, it is essential to keep track of characteristic details of the refinement process.In our case, one such detail appears at the beginning of the iteration.
In Figure 5, the number of samples for four different classes is plotted over 75 iterations.The local minimum at iteration start is significant as it already indicates a converging class.Usually, the initial selection of samples contains a large number of outliers.A lot of them are removed during the first iteration.Therefore, the number of samples drops.Afterwards, a new SVM model is trained based on the improved sample set.This leads to a better representation of the data and consequently to less uncertainty and outliers.This implies, that the number of removed samples is lower, i. e. the number of samples increases.
Accordingly, two factors can cause a missing minimum at iteration start.First, the initial set doesn't contain any or only few outliers.In this case, the number of samples will hardly drop at all.Second, the number of samples monotonously drops.This indicates, that the SVM model did not improve for this class, causing degeneration.

EXPERIMENTS
In the following, the setup for experiments is outlined, which consists of image data, training subsets, and validation subsets with reference.At last, the evaluation procedure is described.

Image Data
Input data for our tests originate from the IKONOS satellite.The images are ortho-rectified and consist of the four spectral bands red (R), green (G), blue (B), and near infra red (NIR) with 8 bit colour depth per band.The spatial resolution is 1 m.The scenes cover areas from Hildesheim/Germany and Weiterstadt/Germany.

Validation Set and Reference
Representative scenes from city and environs of Hildesheim and Weiterstadt were manually referenced with pixel accuracy, both originating from the same IKONOS image data.Classification result and reference are compared pixel-wise to obtain the rate of correct detection.The reference features more classes than our classification system.Thus, classes are mapped if necessary, e. g. inner city and suburban areas are combined to settlement.In many applications, line objects like streets are treated separately from area objects since training with the same classifier is not efficient.Hence, line objects are ignored in our evaluation to not falsify the results.

Training Sets
Training data for each class was manually selected with a graphical tool.The data originates from the same geographical region, yet not overlapping with the according validation set.We The contour lengths of original and optimised labels are compared as a measure for convenience of training data selection.
The longer the contours are, the more effort has to be put into sample selection by a human operator.Hence, a higher contour length after refinement equals more convenience for initial sample selection.All contours are one pixel wide.Contour pixels were simply counted using an 8 neighbourhood.Since contours of manually selected labels are oriented horizontally and vertically, this method is perfectly reasonable as a lower boundary which does not abet our approach.

RESULTS
Figure 6 shows a characteristic result of the refinement process (set T02) with respect to evaluation items defined in section 3.4.The four graphs illustrate the convergence of the process.In Figure 6(a), the number of samples of all but one class directly drop to the characteristic minimum as discussed in section 2.2.2.The other class of type Cropland/Grassland (c1) does not contain a significant number of outliers, resulting in immediate convergence.This is also due the homogeneity of this class (see c1 in Figure 3(a) for instance).As can be seen, the number of support vectors rapidly decreases which argues for a lower complexity of the model (Figure 6(b)).The overall contour length (Figure 6(c)) increases as expected due to outlier removal.Furthermore, a local maximum stands out significantly.It relates to the characteristic local minima of the number of samples but does not turn out to be as consistent.The last plot depicts the overall classification of the validation set for each iteration.Here, the initial training set leads to 71.33% correct classifications.It increases nearly monotonously until convergence, with a correct rate of 79.49%.This trend is observable in all test sets.The classification results are listed in Table 2.
From scene Weiterstadt, test sets T04 and T05 show the biggest improvement.These were trained with the GIS data which is very prone to outliers.The highest improvement was achieved for a test set of scene Hildesheim (T10).It is a very small test set which does not represent the scene quite well.These observations validate our approach of refining poor training data resulting in an average improvement from 70.36 % to 79.12 % The impact on a human operator is estimated by evaluation of contour length as described in section 3.4.On average, an initial sample selection with 35.6 % shorter contours is compensated if training data is refined.
Additionally, processing time for classification is extensively reduced on account of a less complex SVM model.The number of support vectors which directly correlates to processing time is reduced by 82.75 % on average.

Comparison of Sample Selection Strategies
A comparison of sample selection strategies is given in Figure 7. Results clearly support our approach which outperforms alternative strategies.Allowing only clean samples (S3, see Figure 4) and removing uncertain samples leads to a considerable improvement of the classification result.Here, test set T01 was optimised with different strategies for sample selection, tested with and without removal of uncertain samples.

Comparison to Slack Variables
To handle outliers in training data, slack variables were introduced to SVMs (Cortes and Vapnik, 1995), which allow for a certain amount of feature vectors to be located on the wrong side of the hyperplane.Costs C are used to these penalise outliers.In     Figure 8, the classification result and the total amount of support vectors is plotted for varying values of C for set T10.This set clearly shows the important characteristics: Allowing more outliers (C = 1.0) helps in improving the initial classification result.Nevertheless, the model can still be refined by our approach.Additionally, the model complexity is reduced.This applies even more for higher values of C.

Class Degeneration
Within our setup, three test sets show degeneration of classes (T05, T06, T10).T06 articulately exhibits important characteristics.(Figure 9).Samples for three of the five classes do not change significantly.The number of samples for the other two rapidly drops.Their monotonous decline indicates that newly derived SVM models do not improve.This is where the stopping criterion steps in.As soon as the relative number of samples compared to the initial value reaches a threshold Ts without the   typical local minimum in the number of samples, the iteration is stopped (iteration 9 in the example of Figure 9).Here, this holds true for both classes, even though only one degenerates.Threshold Ts is set to 30 %.However, results are not too sensitive to the choice of Ts as iteration reliably stops before degeneration and rapid loss in correct classification rate occurs.

CONCLUSIONS
In this paper, we presented a general method for automatic refinement of training data for SVM classification.Refinement was done with respect to sample selection convenience.We have shown, that by incorporating uncertainty into an iterative outlier detection, correct rate of classification as well as complexity of the derived model can be significantly reduced.The correct classification rate was improved from 70.36 % to 79.12% on average.The relevance for manual sample selection for a human operator is given by compensating 35.6 % shorter contours.Additionally, since poor training data is compensated, automatic training with samples from sources like GIS becomes practical.The processing time for classification is narrowed down considerably as the complexity of the SVM model is reduced by 82.75 % on average.Since internals of the classifier are not modified, the refinement is conceivable for other margin based classifiers if probability estimates are available.
2 ITERATIVE TRAINING DATA REFINEMENT Without loss of generality, examples and explanations will refer to a four class problem.Training samples were chosen from the following classes • Cropland/Grassland (c1) • Forest (c2) • Industry (c3) • Settlement (c4) which nearly cover the entire area of the classified scenes.Figure 1 shows a typical, manually selected training set (T01) representing the aforementioned classes with patches of c4, c3, c1, c2, c4 from left to right.Additionally, class membership is represented by a label image.

Figure 1 :
Figure 1: Training set T01 (a) with the accordant labels (b).Black are invalid regions not used as samples, grey values represent labels.Classes from left to right: c4, c3, c2, c1, c4 . The exemplary training data set T01 consists of five manually selected sample patches of the four classes c1 -c4 and their labels (Figure 1).Larger images of the pictograms for intermediate results are depicted in Figure 3.

Figure 2 :
Figure 2: Training data refinement scheme.Larger images of the pictograms showing intermediate results are depicted in Figure 3 (a) Reclassification result.Black are invalid regions not used as samples, grey values represent labels.(b) Uncertainty.Values range from uncertain, (0, black) to certain (1, white) (c) Optimised labels.Black are invalid regions not used as samples, grey values represent labels.

Figure 3 :
Figure 3: Intermediate results of the refinement process shown in Figure 2. Reclassification result (a) and uncertainty (b) are used to refine the label image resulting in (c).
sample.The first best to second best ratio defines the uncertainty U of each sample (Figure 3(b)).Given a threshold Tu, uncertain pixels with U > Tu are excluded.This modified label image (Figure 3(c)) serves as basis for the next iteration.2.2.1 Sample Selection Figure 4 shows how samples are selected from label image.Grey represents a label of a certain class while black marks invalid samples.These might originate from the initial label image or were removed during the refinement process.The boxes indicate the N × N neighbourhood used for feature extraction, a dot marks the center position.As shown, samples are only used in case S3 where the entire local neighbourhood does not cover any invalid regions.

Figure 4 :
Figure 4: Training sample selection.Boxes S1 − S3 indicate N × N neighbourhood from feature extraction with center (dot).Class labels are depicted in grey, invalid area is black.S1 and S2 are not valid due to invalid center position and neighbourhood, respectively.Only S3 is a valid sample.

Figure 5 :
Figure 5: Convergence characteristics showing the number of samples per class (class is colour coded) for 25 steps of iteration.Each of the four graphs belongs to one of the classes c1 -c4.A significant detail is the minimum at the beginning of the iteration.
Number of samples per class (class is colour coded) of training set T02 for each iteration.
Number of support vectors per class (class is colour coded) of training set T02 for each iteration with magnification of important section.Contour length of trained label image of training set T02 for each iteration.
Overall classification result of validation set for each iteration.

Figure 6 :
Figure 6: Results of the refinement process for each of 75 iterations.Four characteristics as discussed in section 3.4 are plotted.

Figure 7 :
Figure 7: Comparison of sample selection strategies and impact of uncertainty information.S2 and S3 refer to sample selection from Figure 4. +UR indicates removal of uncertain samples from training set.

Figure 8 :
Figure 8: Comparison for different values for cost variable C when classifying with an SVM using slack variables.
Number of samples per class (class is colour coded) for each of the first 23 iterations.The graph of the sixth class of this set is omitted for better visualisation, it converges immediately.
Overall classification result for each iteration.

Figure 9 :
Figure 9: Degeneration of classes.(a) shows the important section for the number of samples per class.(b) depicts the classification result.
German GIS, the ATKIS 1 was rasterised with the same spatial resolution as the image data and mapped to accordant labels.Hence, it was used to automatically generate training data.Since ATKIS contains errors and is generalised at a higher level of detail compared to our manually referenced scenes, it is a demanding stress test for our system.Table 1 lists all training data sets.For scene Hildesheim, two sets with six classes were created.

Table 2 :
Classification results of validation set for training data sets T01 -T11, see Table 1 for reference.