27 research outputs found
Classification with Minimax Fast Rates for Classes of Bayes Rules with Sparse Representation
We construct a classifier which attains the rate of convergence
under sparsity and margin assumptions. An approach close to the one met in
approximation theory for the estimation of function is used to obtain this
result. The idea is to develop the Bayes rule in a fundamental system of
made of indicator of dyadic sets and to assume that
coefficients, equal to , belong to a kind of ball. This
assumption can be seen as a sparsity assumption, in the sense that the
proportion of coefficients non equal to zero decreases as "frequency" grows.
Finally, rates of convergence are obtained by using an usual trade-off between
a bias term and a variance term
S2: An Efficient Graph Based Active Learning Algorithm with Application to Nonparametric Classification
This paper investigates the problem of active learning for binary label
prediction on a graph. We introduce a simple and label-efficient algorithm
called S2 for this task. At each step, S2 selects the vertex to be labeled
based on the structure of the graph and all previously gathered labels.
Specifically, S2 queries for the label of the vertex that bisects the *shortest
shortest* path between any pair of oppositely labeled vertices. We present a
theoretical estimate of the number of queries S2 needs in terms of a novel
parametrization of the complexity of binary functions on graphs. We also
present experimental results demonstrating the performance of S2 on both real
and synthetic data. While other graph-based active learning algorithms have
shown promise in practice, our algorithm is the first with both good
performance and theoretical guarantees. Finally, we demonstrate the
implications of the S2 algorithm to the theory of nonparametric active
learning. In particular, we show that S2 achieves near minimax optimal excess
risk for an important class of nonparametric classification problems.Comment: A version of this paper appears in the Conference on Learning Theory
(COLT) 201
Risk Bounds for Embedded Variable Selection in Classification Trees
International audienceThe problems of model and variable selections for classification trees are jointly considered. A penalized criterion is proposed which explicitly takes into account the number of variables, and a risk bound inequality is provided for the tree classifier minimizing this criterion. This penalized criterion is compared to the one used during the pruning step of the CART algorithm. It is shown that the two criteria are similar under some specific margin assumptions. In practice, the tuning parameter of the CART penalty has to be calibrated by hold-out. Simulation studies are performed which confirm that the hold-out procedure mimics the form of the proposed penalized criterion
Classification algorithms using adaptive partitioning
Algorithms for binary classification based on adaptive tree partitioning are
formulated and analyzed for both their risk performance and their friendliness
to numerical implementation. The algorithms can be viewed as generating a set
approximation to the Bayes set and thus fall into the general category of set
estimators. In contrast with the most studied tree-based algorithms, which
utilize piecewise constant approximation on the generated partition [IEEE
Trans. Inform. Theory 52 (2006) 1335-1353; Mach. Learn. 66 (2007) 209-242], we
consider decorated trees, which allow us to derive higher order methods.
Convergence rates for these methods are derived in terms the parameter
of margin conditions and a rate of best approximation of the Bayes set by
decorated adaptive partitions. They can also be expressed in terms of the Besov
smoothness of the regression function that governs its approximability
by piecewise polynomials on adaptive partition. The execution of the algorithms
does not require knowledge of the smoothness or margin conditions. Besov
smoothness conditions are weaker than the commonly used H\"{o}lder conditions,
which govern approximation by nonadaptive partitions, and therefore for a given
regression function can result in a higher rate of convergence. This in turn
mitigates the compatibility conflict between smoothness and margin parameters.Comment: Published in at http://dx.doi.org/10.1214/14-AOS1234 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org