BOOSTED UNSUPERVISED MULTI-SOURCE SELECTION FOR DOMAIN ADAPTATION
- 1Institut für Informationsverarbeitung, Leibniz Universität Hannover, Germany
- 2Institute of Photogrammetry and GeoInformation, Leibniz Universität Hannover, Germany
Keywords: Transfer Learning, Domain Adaptation, Negative Transfer, Source Selection, Machine Learning, Remote Sensing
Abstract. Supervised machine learning needs high quality, densely sampled and labelled training data. Transfer learning (TL) techniques have been devised to reduce this dependency by adapting classifiers trained on different, but related, (source) training data to new (target) data sets. A problem in TL is how to quantify the relatedness of a source quickly and robustly, because transferring knowledge from unrelated data can degrade the performance of a classifier. In this paper, we propose a method that can select a nearly optimal source from a large number of candidate sources. This operation depends only on the marginal probability distributions of the data, thus allowing the use of the often abundant unlabelled data. We extend this method to multi-source selection by optimizing a weighted combination of sources. The source weights are computed using a very fast boosting-like optimization scheme. The run-time complexity of our method scales linearly in regard to the number of candidate sources and the size of the training set and is thus applicable to very large data sets. We also propose a modification of an existing TL algorithm to handle multiple weighted training sets. Our method is evaluated on five survey regions. The experiments show that our source selection method is effective in discriminating between related and unrelated sources, almost always generating results within 3% in overall accuracy of a classifier based on fully labelled training data. We also show that using the selected source as training data for a TL method will additionally result in a performance improvement.