18,044 research outputs found
Supervised Classification: Quite a Brief Overview
The original problem of supervised classification considers the task of
automatically assigning objects to their respective classes on the basis of
numerical measurements derived from these objects. Classifiers are the tools
that implement the actual functional mapping from these measurements---also
called features or inputs---to the so-called class label---or output. The
fields of pattern recognition and machine learning study ways of constructing
such classifiers. The main idea behind supervised methods is that of learning
from examples: given a number of example input-output relations, to what extent
can the general mapping be learned that takes any new and unseen feature vector
to its correct class? This chapter provides a basic introduction to the
underlying ideas of how to come to a supervised classification problem. In
addition, it provides an overview of some specific classification techniques,
delves into the issues of object representation and classifier evaluation, and
(very) briefly covers some variations on the basic supervised classification
task that may also be of interest to the practitioner
Error Bounds for Piecewise Smooth and Switching Regression
The paper deals with regression problems, in which the nonsmooth target is
assumed to switch between different operating modes. Specifically, piecewise
smooth (PWS) regression considers target functions switching deterministically
via a partition of the input space, while switching regression considers
arbitrary switching laws. The paper derives generalization error bounds in
these two settings by following the approach based on Rademacher complexities.
For PWS regression, our derivation involves a chaining argument and a
decomposition of the covering numbers of PWS classes in terms of the ones of
their component functions and the capacity of the classifier partitioning the
input space. This yields error bounds with a radical dependency on the number
of modes. For switching regression, the decomposition can be performed directly
at the level of the Rademacher complexities, which yields bounds with a linear
dependency on the number of modes. By using once more chaining and a
decomposition at the level of covering numbers, we show how to recover a
radical dependency. Examples of applications are given in particular for PWS
and swichting regression with linear and kernel-based component functions.Comment: This work has been submitted to the IEEE for possible publication.
Copyright may be transferred without notice,after which this version may no
longer be accessibl
Non-uniform Feature Sampling for Decision Tree Ensembles
We study the effectiveness of non-uniform randomized feature selection in
decision tree classification. We experimentally evaluate two feature selection
methodologies, based on information extracted from the provided dataset:
\emph{leverage scores-based} and \emph{norm-based} feature selection.
Experimental evaluation of the proposed feature selection techniques indicate
that such approaches might be more effective compared to naive uniform feature
selection and moreover having comparable performance to the random forest
algorithm [3]Comment: 7 pages, 7 figures, 1 tabl
Learning Kernel-Based Halfspaces with the Zero-One Loss
We describe and analyze a new algorithm for agnostically learning
kernel-based halfspaces with respect to the \emph{zero-one} loss function.
Unlike most previous formulations which rely on surrogate convex loss functions
(e.g. hinge-loss in SVM and log-loss in logistic regression), we provide finite
time/sample guarantees with respect to the more natural zero-one loss function.
The proposed algorithm can learn kernel-based halfspaces in worst-case time
\poly(\exp(L\log(L/\epsilon))), for \emph{any} distribution, where is a
Lipschitz constant (which can be thought of as the reciprocal of the margin),
and the learned classifier is worse than the optimal halfspace by at most
. We also prove a hardness result, showing that under a certain
cryptographic assumption, no algorithm can learn kernel-based halfspaces in
time polynomial in .Comment: This is a full version of the paper appearing in the 23rd
International Conference on Learning Theory (COLT 2010). Compared to the
previous arXiv version, this version contains some small corrections in the
proof of Lemma 3 and in appendix
- …