4 research outputs found

    Semi-supervised learning using multiple clusterings with limited labeled data

    Get PDF
    Supervised classification consists in learning a predictive model using a set of labeled samples. It is accepted that predictive models accuracy usually increases as more labeled samples are available. Labeled samples are generally difficult to obtain as the labeling step if often performed manually. On the contrary, unlabeled samples are easily available. As the labeling task is tedious and time consuming, users generally provide a very limited number of labeled objects. However, designing approaches able to work efficiently with a very limited number of labeled samples is highly challenging. In this context, semi-supervised approaches have been proposed to leverage from both labeled and unlabeled data. In this paper, we focus on cases where the number of labeled samples is very limited. We review and formalize eight semi-supervised learning algorithms and introduce a new method that combine supervised and unsupervised learning in order to use both labeled and unlabeled data. The main idea of this method is to produce new features derived from a first step of data clustering. These features are then used to enrich the description of the input data leading to a better use of the data distribution. The efficiency of all the methods is compared on various artificial, UCI datasets, and on the classification of a very high resolution remote sensing image. The experiments reveal that our method shows good results, especially when the number of labeled sample is very limited. It also confirms that combining labeled and unlabeled data is very useful in pattern recognition

    Heterogeneous information fusion: combination of multiple supervised and unsupervised classification methods based on belief functions

    Get PDF
    International audienceIn real-life machine learning applications, a common problem is that raw data (e.g. remote sensing data) is sometimes inaccessible due to confidentiality and privacy constrains of corporations, making classification methods arduous to work in the supervised context. Moreover, even though raw data is accessible, limited labeled samples can also seriously affect supervised methods. Recently, supervised and unsupervised classification (clustering) results related to specific applications are published by more and more organizations. Therefore, combination of supervised classification and clustering results has gained increasing attention to improve the accuracy of supervised predictions. Incorporating clustering results with supervised classifications at the output level can help to lessen the recline on information at the raw data level, so that is pertinent to improve the accuracy for the applications when raw data is inaccessible or training samples are limited. We focus on the combination of multiple supervised classification and clustering results at the output level based on belief functions for three purposes: (1) to improve the accuracy of classification when raw data is inaccessible or training samples are highly limited; (2) to reduce uncertain and imprecise information in the supervised results; and (3) to study how supervised classification and clustering results affect the combination at the output level. Our contributions consist of a transformation method to transfer heterogeneous information into the same frame, and an iterative fusion strategy to retain most of the trustful information in multiple supervised classification and clustering results
    corecore