25 research outputs found

    Maximum Margin Multiclass Nearest Neighbors

    Full text link
    We develop a general framework for margin-based multicategory classification in metric spaces. The basic work-horse is a margin-regularized version of the nearest-neighbor classifier. We prove generalization bounds that match the state of the art in sample size nn and significantly improve the dependence on the number of classes kk. Our point of departure is a nearly Bayes-optimal finite-sample risk bound independent of kk. Although kk-free, this bound is unregularized and non-adaptive, which motivates our main result: Rademacher and scale-sensitive margin bounds with a logarithmic dependence on kk. As the best previous risk estimates in this setting were of order k\sqrt k, our bound is exponentially sharper. From the algorithmic standpoint, in doubling metric spaces our classifier may be trained on nn examples in O(n2logn)O(n^2\log n) time and evaluated on new points in O(logn)O(\log n) time

    Soft Methodology for Cost-and-error Sensitive Classification

    Full text link
    Many real-world data mining applications need varying cost for different types of classification errors and thus call for cost-sensitive classification algorithms. Existing algorithms for cost-sensitive classification are successful in terms of minimizing the cost, but can result in a high error rate as the trade-off. The high error rate holds back the practical use of those algorithms. In this paper, we propose a novel cost-sensitive classification methodology that takes both the cost and the error rate into account. The methodology, called soft cost-sensitive classification, is established from a multicriteria optimization problem of the cost and the error rate, and can be viewed as regularizing cost-sensitive classification with the error rate. The simple methodology allows immediate improvements of existing cost-sensitive classification algorithms. Experiments on the benchmark and the real-world data sets show that our proposed methodology indeed achieves lower test error rates and similar (sometimes lower) test costs than existing cost-sensitive classification algorithms. We also demonstrate that the methodology can be extended for considering the weighted error rate instead of the original error rate. This extension is useful for tackling unbalanced classification problems.Comment: A shorter version appeared in KDD '1

    Reduction Scheme for Empirical Risk Minimization and Its Applications to Multiple-Instance Learning

    Full text link
    In this paper, we propose a simple reduction scheme for empirical risk minimization (ERM) that preserves empirical Rademacher complexity. The reduction allows us to transfer known generalization bounds and algorithms for ERM to the target learning problems in a straightforward way. In particular, we apply our reduction scheme to the multiple-instance learning (MIL) problem, for which generalization bounds and ERM algorithms have been extensively studied. We show that various learning problems can be reduced to MIL. Examples include top-1 ranking learning, multi-class learning, and labeled and complementarily labeled learning. It turns out that, some of the generalization bounds derived are, despite the simplicity of derivation, incomparable or competitive with the existing bounds. Moreover, in some setting of labeled and complementarily labeled learning, the algorithm derived is the first polynomial-time algorithm
    corecore