11,215 research outputs found
Information theoretic combination of pattern classifiers
Combining several classifiers has proved to be an effective machine learning technique. Two concepts clearly influence the performances of an ensemble of classifiers: the diversity between classifiers and the individual accuracies of the classifiers. In this paper we propose an information theoretic framework to establish a link between these quantities. As they appear to be contradictory, we propose an information theoretic score (ITS) that expresses a trade-off between individual accuracy and diversity. This technique can be directly used, for example, for selecting an optimal ensemble in a pool of classifiers. We perform experiments in the context of overproduction and selection of classifiers, showing that the selection based on the ITS outperforms state-of-the-art diversity-based selection techniques
Information theoretic combination of classifiers with application to face detection
Combining several classifiers has become a very active subdiscipline in the field of pattern recognition. For years, pattern recognition community has focused on seeking optimal learning algorithms able to produce very accurate classifiers. However, empirical experience proved that is is often much easier finding several relatively good classifiers than only finding one single very accurate predictor. The advantages of combining classifiers instead of single classifier schemes are twofold: it helps reducing the computational requirements by using simpler models, and it can improve the classification skills. It is commonly admitted that classifiers need to be complementary in order to improve their performances by aggregation. This complementarity is usually termed as diversity in classifier combination community. Although diversity is a very intuitive concept, explicitly using diversity measures for creating classifier ensembles is not as successful as expected. In this thesis, we propose an information theoretic framework for combining classifiers. In particular, we prove by means of information theoretic tools that diversity between classifiers is not sufficient to guarantee optimal classifier combination. In fact, we show that diversity and accuracies of the individual classifiers are generally contradictory: two very accurate classifiers cannot be diverse, and inversely, two very diverse classifiers will necessarily have poor classification skills. In order to tackle this contradiction, we propose a information theoretic score (ITS) that fixes a trade-off between these two quantities. A first possible application is to consider this new score as a selection criterion for extracting a good ensemble in a predefined pool of classifiers. We also propose an ensemble creation technique based on AdaBoost, by taking into account the information theoretic score for iteratively selecting the classifiers. As an illustration of efficient classifier combination technique, we propose several algorithms for building ensembles of Support Vector Machines (SVM). Support Vector Machines are one of the most popular discriminative approaches of pattern recognition and are often considered as state-of-the-art in binary classification. However these classifiers present one severe drawback when facing a very large number of training examples: they become computationally expensive to train. This problem can be addressed by decomposing the learning into several classification tasks with lower computational requirements. We propose to train several parallel SVM on subsets of the complete training set. We develop several algorithms for designing efficient ensembles of SVM by taking into account our information theoretic score. The second part of this thesis concentrates on human face detection, which appears to be a very challenging binary pattern recognition task. In this work, we focus on two main aspects: feature extraction and how to apply classifier combination techniques to face detection systems. We introduce new geometrical filters called anisotropic Gaussian filters, that are very efficient to model face appearance. Finally we propose a parallel mixture of boosted classifier for reducing the false positive rate and decreasing the training time, while keeping the testing time unchanged. The complete face detection system is evaluated on several datasets, showing that it compares favorably to state-of-the-art techniques
Land cover classification using fuzzy rules and aggregation of contextual information through evidence theory
Land cover classification using multispectral satellite image is a very
challenging task with numerous practical applications. We propose a multi-stage
classifier that involves fuzzy rule extraction from the training data and then
generation of a possibilistic label vector for each pixel using the fuzzy rule
base. To exploit the spatial correlation of land cover types we propose four
different information aggregation methods which use the possibilistic class
label of a pixel and those of its eight spatial neighbors for making the final
classification decision. Three of the aggregation methods use Dempster-Shafer
theory of evidence while the remaining one is modeled after the fuzzy k-NN
rule. The proposed methods are tested with two benchmark seven channel
satellite images and the results are found to be quite satisfactory. They are
also compared with a Markov random field (MRF) model-based contextual
classification method and found to perform consistently better.Comment: 14 pages, 2 figure
One-class classifiers based on entropic spanning graphs
One-class classifiers offer valuable tools to assess the presence of outliers
in data. In this paper, we propose a design methodology for one-class
classifiers based on entropic spanning graphs. Our approach takes into account
the possibility to process also non-numeric data by means of an embedding
procedure. The spanning graph is learned on the embedded input data and the
outcoming partition of vertices defines the classifier. The final partition is
derived by exploiting a criterion based on mutual information minimization.
Here, we compute the mutual information by using a convenient formulation
provided in terms of the -Jensen difference. Once training is
completed, in order to associate a confidence level with the classifier
decision, a graph-based fuzzy model is constructed. The fuzzification process
is based only on topological information of the vertices of the entropic
spanning graph. As such, the proposed one-class classifier is suitable also for
data characterized by complex geometric structures. We provide experiments on
well-known benchmarks containing both feature vectors and labeled graphs. In
addition, we apply the method to the protein solubility recognition problem by
considering several representations for the input samples. Experimental results
demonstrate the effectiveness and versatility of the proposed method with
respect to other state-of-the-art approaches.Comment: Extended and revised version of the paper "One-Class Classification
Through Mutual Information Minimization" presented at the 2016 IEEE IJCNN,
Vancouver, Canad
Decentralized learning with budgeted network load using Gaussian copulas and classifier ensembles
We examine a network of learners which address the same classification task
but must learn from different data sets. The learners cannot share data but
instead share their models. Models are shared only one time so as to preserve
the network load. We introduce DELCO (standing for Decentralized Ensemble
Learning with COpulas), a new approach allowing to aggregate the predictions of
the classifiers trained by each learner. The proposed method aggregates the
base classifiers using a probabilistic model relying on Gaussian copulas.
Experiments on logistic regressor ensembles demonstrate competing accuracy and
increased robustness in case of dependent classifiers. A companion python
implementation can be downloaded at https://github.com/john-klein/DELC
- …