35 research outputs found

    Top-k Multiclass SVM

    Full text link
    Class ambiguity is typical in image classification problems with a large number of classes. When classes are difficult to discriminate, it makes sense to allow k guesses and evaluate classifiers based on the top-k error instead of the standard zero-one loss. We propose top-k multiclass SVM as a direct method to optimize for top-k performance. Our generalization of the well-known multiclass SVM is based on a tight convex upper bound of the top-k error. We propose a fast optimization scheme based on an efficient projection onto the top-k simplex, which is of its own interest. Experiments on five datasets show consistent improvements in top-k accuracy compared to various baselines.Comment: NIPS 201

    Fast training of multi-class support vector machines.

    Get PDF
    Abstract We present new decomposition algorithms for training multi-class support vector machines (SVMs), in particular the variants proposed by Lee, Lin, & Wahba (LLW) and Weston & Watkins (WW). Although these two types of machines have desirable theoretical properties, they have been rarely used in practice because efficient training algorithms have been missing. Training is accelerated by considering hypotheses without bias, by second order working set selection, and by using working sets of size two instead of applying sequential minimal optimization (SMO). We derive a new bound for the generalization performance of multi-class SVMs. The bound depends on the sum of target margin violations, which corresponds to the loss function employed in the WW machine. The improved training scheme allows us to perform a thorough empirical comparison of the Crammer & Singer (CS), the WW, and the LLW machine. In our experiments, all machines gave better generalization results than the baseline one-vs-all approach. The two-variable decomposition algorithm outperformed SMO. The LLW SVM performed best in terms of accuracy, at the cost of slower training. The WW SVM led to better generalizing hypotheses compared to the CS machine and did not require longer training times. Thus, we see no reason to prefer the CS variant over the WW SVM

    Loss Functions for Top-k Error: Analysis and Insights

    Full text link
    In order to push the performance on realistic computer vision tasks, the number of classes in modern benchmark datasets has significantly increased in recent years. This increase in the number of classes comes along with increased ambiguity between the class labels, raising the question if top-1 error is the right performance measure. In this paper, we provide an extensive comparison and evaluation of established multiclass methods comparing their top-k performance both from a practical as well as from a theoretical perspective. Moreover, we introduce novel top-k loss functions as modifications of the softmax and the multiclass SVM losses and provide efficient optimization schemes for them. In the experiments, we compare on various datasets all of the proposed and established methods for top-k error optimization. An interesting insight of this paper is that the softmax loss yields competitive top-k performance for all k simultaneously. For a specific top-k error, our new top-k losses lead typically to further improvements while being faster to train than the softmax.Comment: In Computer Vision and Pattern Recognition (CVPR), 201

    Structured learning for non-smooth ranking losses

    Get PDF
    Learning to rank from relevance judgment is an active research area. Itemwise score regression, pairwise preference satisfaction, and listwise structured learning are the major techniques in use. Listwise structured learning has been applied recently to optimize important non-decomposable ranking criteria like AUC (area under ROC curve) and MAP (mean average precision). We propose new, almost-linear-time algorithms to optimize for two other criteria widely used to evaluate search systems: MRR (mean reciprocal rank) and NDCG (normalized discounted cumulative gain) in the max-margin structured learning framework. We also demonstrate that, for different ranking criteria, one may need to use different feature maps. Search applications should not be optimized in favor of a single criterion, because they need to cater to a variety of queries. E.g. MRR is best for navigational queries, while NDCG is best for informational queries. A key contribution of this paper is to fold multiple ranking loss functions into a multi-criteria max-margin optimization. The result is a single, robust ranking model that is close to the best accuracy of learners trained on individual criteria. In fact, experiments over the popular LETOR and TREC data sets show that, contrary to conventional wisdom, a test criterion is often not best served by training with the same individual criterion

    Hybrid MPI/OpenMP Parallel Linear Support Vector Machine Training

    Get PDF
    corecore