24,313 research outputs found
Heuristic Ternary Error-Correcting Output Codes Via Weight Optimization and Layered Clustering-Based Approach
One important classifier ensemble for multiclass classification problems is
Error-Correcting Output Codes (ECOCs). It bridges multiclass problems and
binary-class classifiers by decomposing multiclass problems to a serial
binary-class problems. In this paper, we present a heuristic ternary code,
named Weight Optimization and Layered Clustering-based ECOC (WOLC-ECOC). It
starts with an arbitrary valid ECOC and iterates the following two steps until
the training risk converges. The first step, named Layered Clustering based
ECOC (LC-ECOC), constructs multiple strong classifiers on the most confusing
binary-class problem. The second step adds the new classifiers to ECOC by a
novel Optimized Weighted (OW) decoding algorithm, where the optimization
problem of the decoding is solved by the cutting plane algorithm. Technically,
LC-ECOC makes the heuristic training process not blocked by some difficult
binary-class problem. OW decoding guarantees the non-increase of the training
risk for ensuring a small code length. Results on 14 UCI datasets and a music
genre classification problem demonstrate the effectiveness of WOLC-ECOC
A new Bayesian ensemble of trees classifier for identifying multi-class labels in satellite images
Classification of satellite images is a key component of many remote sensing
applications. One of the most important products of a raw satellite image is
the classified map which labels the image pixels into meaningful classes.
Though several parametric and non-parametric classifiers have been developed
thus far, accurate labeling of the pixels still remains a challenge. In this
paper, we propose a new reliable multiclass-classifier for identifying class
labels of a satellite image in remote sensing applications. The proposed
multiclass-classifier is a generalization of a binary classifier based on the
flexible ensemble of regression trees model called Bayesian Additive Regression
Trees (BART). We used three small areas from the LANDSAT 5 TM image, acquired
on August 15, 2009 (path/row: 08/29, L1T product, UTM map projection) over
Kings County, Nova Scotia, Canada to classify the land-use. Several prediction
accuracy and uncertainty measures have been used to compare the reliability of
the proposed classifier with the state-of-the-art classifiers in remote
sensing.Comment: 31 pages, 6 figures, 4 table
META-DES.Oracle: Meta-learning and feature selection for ensemble selection
The key issue in Dynamic Ensemble Selection (DES) is defining a suitable
criterion for calculating the classifiers' competence. There are several
criteria available to measure the level of competence of base classifiers, such
as local accuracy estimates and ranking. However, using only one criterion may
lead to a poor estimation of the classifier's competence. In order to deal with
this issue, we have proposed a novel dynamic ensemble selection framework using
meta-learning, called META-DES. An important aspect of the META-DES framework
is that multiple criteria can be embedded in the system encoded as different
sets of meta-features. However, some DES criteria are not suitable for every
classification problem. For instance, local accuracy estimates may produce poor
results when there is a high degree of overlap between the classes. Moreover, a
higher classification accuracy can be obtained if the performance of the
meta-classifier is optimized for the corresponding data. In this paper, we
propose a novel version of the META-DES framework based on the formal
definition of the Oracle, called META-DES.Oracle. The Oracle is an abstract
method that represents an ideal classifier selection scheme. A meta-feature
selection scheme using an overfitting cautious Binary Particle Swarm
Optimization (BPSO) is proposed for improving the performance of the
meta-classifier. The difference between the outputs obtained by the
meta-classifier and those presented by the Oracle is minimized. Thus, the
meta-classifier is expected to obtain results that are similar to the Oracle.
Experiments carried out using 30 classification problems demonstrate that the
optimization procedure based on the Oracle definition leads to a significant
improvement in classification accuracy when compared to previous versions of
the META-DES framework and other state-of-the-art DES techniques.Comment: Paper published on Information Fusio
Object Recognition Based on Amounts of Unlabeled Data
This paper proposes a novel semi-supervised method on object recognition.
First, based on Boost Picking, a universal algorithm, Boost Picking Teaching
(BPT), is proposed to train an effective binary-classifier just using a few
labeled data and amounts of unlabeled data. Then, an ensemble strategy is
detailed to synthesize multiple BPT-trained binary-classifiers to be a
high-performance multi-classifier. The rationality of the strategy is also
analyzed in theory. Finally, the proposed method is tested on two databases,
CIFAR-10 and CIFAR-100. Using 2% labeled data and 98% unlabeled data, the
accuracies of the proposed method on the two data sets are 78.39% and 50.77%
respectively.Comment: 16 pages, 6 figures, 2 table
A New Approach in Persian Handwritten Letters Recognition Using Error Correcting Output Coding
Classification Ensemble, which uses the weighed polling of outputs, is the
art of combining a set of basic classifiers for generating high-performance,
robust and more stable results. This study aims to improve the results of
identifying the Persian handwritten letters using Error Correcting Output
Coding (ECOC) ensemble method. Furthermore, the feature selection is used to
reduce the costs of errors in our proposed method. ECOC is a method for
decomposing a multi-way classification problem into many binary classification
tasks; and then combining the results of the subtasks into a hypothesized
solution to the original problem. Firstly, the image features are extracted by
Principal Components Analysis (PCA). After that, ECOC is used for
identification the Persian handwritten letters which it uses Support Vector
Machine (SVM) as the base classifier. The empirical results of applying this
ensemble method using 10 real-world data sets of Persian handwritten letters
indicate that this method has better results in identifying the Persian
handwritten letters than other ensemble methods and also single
classifications. Moreover, by testing a number of different features, this
paper found that we can reduce the additional cost in feature selection stage
by using this method.Comment: Journal of Advances in Computer Researc
Estimating the Accuracies of Multiple Classifiers Without Labeled Data
In various situations one is given only the predictions of multiple
classifiers over a large unlabeled test data. This scenario raises the
following questions: Without any labeled data and without any a-priori
knowledge about the reliability of these different classifiers, is it possible
to consistently and computationally efficiently estimate their accuracies?
Furthermore, also in a completely unsupervised manner, can one construct a more
accurate unsupervised ensemble classifier? In this paper, focusing on the
binary case, we present simple, computationally efficient algorithms to solve
these questions. Furthermore, under standard classifier independence
assumptions, we prove our methods are consistent and study their asymptotic
error. Our approach is spectral, based on the fact that the off-diagonal
entries of the classifiers' covariance matrix and 3-d tensor are rank-one. We
illustrate the competitive performance of our algorithms via extensive
experiments on both artificial and real datasets
Enhancing Multi-Class Classification of Random Forest using Random Vector Functional Neural Network and Oblique Decision Surfaces
Both neural networks and decision trees are popular machine learning methods
and are widely used to solve problems from diverse domains. These two
classifiers are commonly used base classifiers in an ensemble framework. In
this paper, we first present a new variant of oblique decision tree based on a
linear classifier, then construct an ensemble classifier based on the fusion of
a fast neural network, random vector functional link network and oblique
decision trees. Random Vector Functional Link Network has an elegant closed
form solution with extremely short training time. The neural network partitions
each training bag (obtained using bagging) at the root level into C subsets
where C is the number of classes in the dataset and subsequently, C oblique
decision trees are trained on such partitions. The proposed method provides a
rich insight into the data by grouping the confusing or hard to classify
samples for each class and thus, provides an opportunity to employ fine-grained
classification rule over the data. The performance of the ensemble classifier
is evaluated on several multi-class datasets where it demonstrates a superior
performance compared to other state-of- the-art classifiers.Comment: 8 pages, 5 figure
Less Is More: A Comprehensive Framework for the Number of Components of Ensemble Classifiers
The number of component classifiers chosen for an ensemble greatly impacts
the prediction ability. In this paper, we use a geometric framework for a
priori determining the ensemble size, which is applicable to most of existing
batch and online ensemble classifiers. There are only a limited number of
studies on the ensemble size examining Majority Voting (MV) and Weighted
Majority Voting (WMV). Almost all of them are designed for batch-mode, hardly
addressing online environments. Big data dimensions and resource limitations,
in terms of time and memory, make determination of ensemble size crucial,
especially for online environments. For the MV aggregation rule, our framework
proves that the more strong components we add to the ensemble, the more
accurate predictions we can achieve. For the WMV aggregation rule, our
framework proves the existence of an ideal number of components, which is equal
to the number of class labels, with the premise that components are completely
independent of each other and strong enough. While giving the exact definition
for a strong and independent classifier in the context of an ensemble is a
challenging task, our proposed geometric framework provides a theoretical
explanation of diversity and its impact on the accuracy of predictions. We
conduct a series of experimental evaluations to show the practical value of our
theorems and existing challenges.Comment: This is an extended version of the work presented as a short paper at
the Conference on Information and Knowledge Management (CIKM), 201
Novelty Detection in MultiClass Scenarios with Incomplete Set of Class Labels
We address the problem of novelty detection in multiclass scenarios where
some class labels are missing from the training set. Our method is based on the
initial assignment of confidence values, which measure the affinity between a
new test point and each known class. We first compare the values of the two top
elements in this vector of confidence values. In the heart of our method lies
the training of an ensemble of classifiers, each trained to discriminate known
from novel classes based on some partition of the training data into
presumed-known and presumednovel classes. Our final novelty score is derived
from the output of this ensemble of classifiers. We evaluated our method on two
datasets of images containing a relatively large number of classes - the
Caltech-256 and Cifar-100 datasets. We compared our method to 3 alternative
methods which represent commonly used approaches, including the one-class SVM,
novelty based on k-NN, novelty based on maximal confidence, and the recent
KNFST method. The results show a very clear and marked advantage for our method
over all alternative methods, in an experimental setup where class labels are
missing during training.Comment: 10 page
Quantum ensembles of quantum classifiers
Quantum machine learning witnesses an increasing amount of quantum algorithms
for data-driven decision making, a problem with potential applications ranging
from automated image recognition to medical diagnosis. Many of those algorithms
are implementations of quantum classifiers, or models for the classification of
data inputs with a quantum computer. Following the success of collective
decision making with ensembles in classical machine learning, this paper
introduces the concept of quantum ensembles of quantum classifiers. Creating
the ensemble corresponds to a state preparation routine, after which the
quantum classifiers are evaluated in parallel and their combined decision is
accessed by a single-qubit measurement. This framework naturally allows for
exponentially large ensembles in which -- similar to Bayesian learning -- the
individual classifiers do not have to be trained. As an example, we analyse an
exponentially large quantum ensemble in which each classifier is weighed
according to its performance in classifying the training data, leading to new
results for quantum as well as classical machine learning.Comment: 19 pages, 9 figure
- …