25 research outputs found
Maximum Margin Multiclass Nearest Neighbors
We develop a general framework for margin-based multicategory classification
in metric spaces. The basic work-horse is a margin-regularized version of the
nearest-neighbor classifier. We prove generalization bounds that match the
state of the art in sample size and significantly improve the dependence on
the number of classes . Our point of departure is a nearly Bayes-optimal
finite-sample risk bound independent of . Although -free, this bound is
unregularized and non-adaptive, which motivates our main result: Rademacher and
scale-sensitive margin bounds with a logarithmic dependence on . As the best
previous risk estimates in this setting were of order , our bound is
exponentially sharper. From the algorithmic standpoint, in doubling metric
spaces our classifier may be trained on examples in time and
evaluated on new points in time
Soft Methodology for Cost-and-error Sensitive Classification
Many real-world data mining applications need varying cost for different
types of classification errors and thus call for cost-sensitive classification
algorithms. Existing algorithms for cost-sensitive classification are
successful in terms of minimizing the cost, but can result in a high error rate
as the trade-off. The high error rate holds back the practical use of those
algorithms. In this paper, we propose a novel cost-sensitive classification
methodology that takes both the cost and the error rate into account. The
methodology, called soft cost-sensitive classification, is established from a
multicriteria optimization problem of the cost and the error rate, and can be
viewed as regularizing cost-sensitive classification with the error rate. The
simple methodology allows immediate improvements of existing cost-sensitive
classification algorithms. Experiments on the benchmark and the real-world data
sets show that our proposed methodology indeed achieves lower test error rates
and similar (sometimes lower) test costs than existing cost-sensitive
classification algorithms. We also demonstrate that the methodology can be
extended for considering the weighted error rate instead of the original error
rate. This extension is useful for tackling unbalanced classification problems.Comment: A shorter version appeared in KDD '1
Reduction Scheme for Empirical Risk Minimization and Its Applications to Multiple-Instance Learning
In this paper, we propose a simple reduction scheme for empirical risk
minimization (ERM) that preserves empirical Rademacher complexity. The
reduction allows us to transfer known generalization bounds and algorithms for
ERM to the target learning problems in a straightforward way. In particular, we
apply our reduction scheme to the multiple-instance learning (MIL) problem, for
which generalization bounds and ERM algorithms have been extensively studied.
We show that various learning problems can be reduced to MIL. Examples include
top-1 ranking learning, multi-class learning, and labeled and complementarily
labeled learning. It turns out that, some of the generalization bounds derived
are, despite the simplicity of derivation, incomparable or competitive with the
existing bounds. Moreover, in some setting of labeled and complementarily
labeled learning, the algorithm derived is the first polynomial-time algorithm