246 research outputs found
Optimizing 0/1 Loss for Perceptrons by Random Coordinate Descent
The 0/1 loss is an important cost function for perceptrons. Nevertheless it cannot be easily minimized by most existing perceptron learning algorithms. In this paper, we propose a family of random coordinate descent algorithms to directly minimize the 0/1 loss for perceptrons, and prove their convergence. Our algorithms are computationally efficient, and usually achieve the lowest 0/1 loss compared with other algorithms. Such advantages make them favorable for nonseparable real-world problems. Experiments show that our algorithms are especially useful for ensemble learning, and could achieve the lowest test error for many complex data sets when coupled with AdaBoost
Active Sampling of Pairs and Points for Large-scale Linear Bipartite Ranking
Bipartite ranking is a fundamental ranking problem that learns to order
relevant instances ahead of irrelevant ones. The pair-wise approach for
bi-partite ranking construct a quadratic number of pairs to solve the problem,
which is infeasible for large-scale data sets. The point-wise approach, albeit
more efficient, often results in inferior performance. That is, it is difficult
to conduct bipartite ranking accurately and efficiently at the same time. In
this paper, we develop a novel active sampling scheme within the pair-wise
approach to conduct bipartite ranking efficiently. The scheme is inspired from
active learning and can reach a competitive ranking performance while focusing
only on a small subset of the many pairs during training. Moreover, we propose
a general Combined Ranking and Classification (CRC) framework to accurately
conduct bipartite ranking. The framework unifies point-wise and pair-wise
approaches and is simply based on the idea of treating each instance point as a
pseudo-pair. Experiments on 14 real-word large-scale data sets demonstrate that
the proposed algorithm of Active Sampling within CRC, when coupled with a
linear Support Vector Machine, usually outperforms state-of-the-art point-wise
and pair-wise ranking approaches in terms of both accuracy and efficiency.Comment: a shorter version was presented in ACML 201
Infinite Ensemble Learning with Support Vector Machines
Ensemble learning algorithms such as boosting can achieve better performance by averaging over the predictions of base learners. However, existing algorithms are limited to combining only a finite number of base learners, and the generated ensemble is usually sparse. It is not clear whether we should construct an ensemble classifier with a larger or even an infinite number of base learners.
In addition, constructing an infinite ensemble itself is a challenging task. In this paper, we formulate an infinite ensemble learning framework based on SVM. The framework could output an infinite and nonsparse ensemble, and can be applied to construct new kernels for SVM as well as to interpret existing ones. We demonstrate the framework with a concrete application, the stump kernel, which embodies infinitely many decision stumps. The stump kernel is simple, yet powerful.
Experimental results show that SVM with the stump kernel usually achieves better performance than boosting, even with noisy data.</p
Reduction from Complementary-Label Learning to Probability Estimates
Complementary-Label Learning (CLL) is a weakly-supervised learning problem
that aims to learn a multi-class classifier from only complementary labels,
which indicate a class to which an instance does not belong. Existing
approaches mainly adopt the paradigm of reduction to ordinary classification,
which applies specific transformations and surrogate losses to connect CLL back
to ordinary classification. Those approaches, however, face several
limitations, such as the tendency to overfit or be hooked on deep models. In
this paper, we sidestep those limitations with a novel perspective--reduction
to probability estimates of complementary classes. We prove that accurate
probability estimates of complementary labels lead to good classifiers through
a simple decoding step. The proof establishes a reduction framework from CLL to
probability estimates. The framework offers explanations of several key CLL
approaches as its special cases and allows us to design an improved algorithm
that is more robust in noisy environments. The framework also suggests a
validation procedure based on the quality of probability estimates, leading to
an alternative way to validate models with only complementary labels. The
flexible framework opens a wide range of unexplored opportunities in using deep
and non-deep models for probability estimates to solve the CLL problem.
Empirical experiments further verified the framework's efficacy and robustness
in various settings
Ordinal Regression by Extended Binary Classification
We present a reduction framework from ordinal regression to binary classification based on extended examples. The framework consists of three steps: extracting
extended examples from the original examples, learning a binary classifier on the extended examples with any binary classification algorithm, and constructing a
ranking rule from the binary classifier. A weighted 0/1 loss of the binary classifier would then bound the mislabeling cost of the ranking rule. Our framework
allows not only to design good ordinal regression algorithms based on well-tuned binary classification approaches, but also to derive new generalization bounds for
ordinal regression from known bounds for binary classification. In addition, our framework unifies many existing ordinal regression algorithms, such as perceptron
ranking and support vector ordinal regression. When compared empirically on benchmark data sets, some of our newly designed algorithms enjoy advantages
in terms of both training speed and generalization performance over existing algorithms, which demonstrates the usefulness of our framework
- …