784 research outputs found
On Reject and Refine Options in Multicategory Classification
In many real applications of statistical learning, a decision made from
misclassification can be too costly to afford; in this case, a reject option,
which defers the decision until further investigation is conducted, is often
preferred. In recent years, there has been much development for binary
classification with a reject option. Yet, little progress has been made for the
multicategory case. In this article, we propose margin-based multicategory
classification methods with a reject option. In addition, and more importantly,
we introduce a new and unique refine option for the multicategory problem,
where the class of an observation is predicted to be from a set of class
labels, whose cardinality is not necessarily one. The main advantage of both
options lies in their capacity of identifying error-prone observations.
Moreover, the refine option can provide more constructive information for
classification by effectively ruling out implausible classes. Efficient
implementations have been developed for the proposed methods. On the
theoretical side, we offer a novel statistical learning theory and show a fast
convergence rate of the excess -risk of our methods with emphasis on
diverging dimensionality and number of classes. The results can be further
improved under a low noise assumption. A set of comprehensive simulation and
real data studies has shown the usefulness of the new learning tools compared
to regular multicategory classifiers. Detailed proofs of theorems and extended
numerical results are included in the supplemental materials available online.Comment: A revised version of this paper was accepted for publication in the
Journal of the American Statistical Association Theory and Methods Section.
52 pages, 6 figure
Extending twin support vector machine classifier for multi-category classification problems
© 2013 – IOS Press and the authors. All rights reservedTwin support vector machine classifier (TWSVM) was proposed by Jayadeva et al., which was used for binary classification
problems. TWSVM not only overcomes the difficulties in handling the problem of exemplar unbalance in binary classification problems, but also it is four times faster in training a classifier than classical support vector machines. This paper proposes one-versus-all twin support vector machine classifiers (OVA-TWSVM) for multi-category classification problems by utilizing the strengths of TWSVM. OVA-TWSVM extends TWSVM to solve k-category classification problems by developing k TWSVM where in the ith TWSVM, we only solve the Quadratic Programming Problems (QPPs) for the ith class, and get the ith nonparallel hyperplane corresponding to the ith class data. OVA-TWSVM uses the well known one-versus-all (OVA) approach to construct a corresponding twin support vector machine classifier. We analyze the efficiency of the OVA-TWSVM theoretically, and perform experiments to test its efficiency on both synthetic data sets and several benchmark data sets from the UCI machine learning repository. Both the theoretical analysis and experimental results demonstrate that OVA-TWSVM can outperform the traditional OVA-SVMs classifier. Further experimental comparisons with other multiclass classifiers demonstrated that comparable performance could be achieved.This work is supported in part by the grant
of the Fundamental Research Funds for the Central Universities of GK201102007 in PR China, and is also supported by Natural Science Basis Research Plan in Shaanxi Province of China (Program No.2010JM3004), and is at the same time supported by Chinese Academy of Sciences under the Innovative
Group Overseas Partnership Grant as well as Natural Science Foundation of China Major International Joint Research Project (NO.71110107026)
Elephant Search with Deep Learning for Microarray Data Analysis
Even though there is a plethora of research in Microarray gene expression
data analysis, still, it poses challenges for researchers to effectively and
efficiently analyze the large yet complex expression of genes. The feature
(gene) selection method is of paramount importance for understanding the
differences in biological and non-biological variation between samples. In
order to address this problem, a novel elephant search (ES) based optimization
is proposed to select best gene expressions from the large volume of microarray
data. Further, a promising machine learning method is envisioned to leverage
such high dimensional and complex microarray dataset for extracting hidden
patterns inside to make a meaningful prediction and most accurate
classification. In particular, stochastic gradient descent based Deep learning
(DL) with softmax activation function is then used on the reduced features
(genes) for better classification of different samples according to their gene
expression levels. The experiments are carried out on nine most popular Cancer
microarray gene selection datasets, obtained from UCI machine learning
repository. The empirical results obtained by the proposed elephant search
based deep learning (ESDL) approach are compared with most recent published
article for its suitability in future Bioinformatics research.Comment: 12 pages, 5 Tabl
Multiclass Learning with Simplex Coding
In this paper we discuss a novel framework for multiclass learning, defined
by a suitable coding/decoding strategy, namely the simplex coding, that allows
to generalize to multiple classes a relaxation approach commonly used in binary
classification. In this framework, a relaxation error analysis can be developed
avoiding constraints on the considered hypotheses class. Moreover, we show that
in this setting it is possible to derive the first provably consistent
regularized method with training/tuning complexity which is independent to the
number of classes. Tools from convex analysis are introduced that can be used
beyond the scope of this paper
- …