35 research outputs found
Top-k Multiclass SVM
Class ambiguity is typical in image classification problems with a large
number of classes. When classes are difficult to discriminate, it makes sense
to allow k guesses and evaluate classifiers based on the top-k error instead of
the standard zero-one loss. We propose top-k multiclass SVM as a direct method
to optimize for top-k performance. Our generalization of the well-known
multiclass SVM is based on a tight convex upper bound of the top-k error. We
propose a fast optimization scheme based on an efficient projection onto the
top-k simplex, which is of its own interest. Experiments on five datasets show
consistent improvements in top-k accuracy compared to various baselines.Comment: NIPS 201
Fast training of multi-class support vector machines.
Abstract We present new decomposition algorithms for training multi-class support vector machines (SVMs), in particular the variants proposed by Lee, Lin, & Wahba (LLW) and Weston & Watkins (WW). Although these two types of machines have desirable theoretical properties, they have been rarely used in practice because efficient training algorithms have been missing. Training is accelerated by considering hypotheses without bias, by second order working set selection, and by using working sets of size two instead of applying sequential minimal optimization (SMO). We derive a new bound for the generalization performance of multi-class SVMs. The bound depends on the sum of target margin violations, which corresponds to the loss function employed in the WW machine. The improved training scheme allows us to perform a thorough empirical comparison of the Crammer & Singer (CS), the WW, and the LLW machine. In our experiments, all machines gave better generalization results than the baseline one-vs-all approach. The two-variable decomposition algorithm outperformed SMO. The LLW SVM performed best in terms of accuracy, at the cost of slower training. The WW SVM led to better generalizing hypotheses compared to the CS machine and did not require longer training times. Thus, we see no reason to prefer the CS variant over the WW SVM
Loss Functions for Top-k Error: Analysis and Insights
In order to push the performance on realistic computer vision tasks, the
number of classes in modern benchmark datasets has significantly increased in
recent years. This increase in the number of classes comes along with increased
ambiguity between the class labels, raising the question if top-1 error is the
right performance measure. In this paper, we provide an extensive comparison
and evaluation of established multiclass methods comparing their top-k
performance both from a practical as well as from a theoretical perspective.
Moreover, we introduce novel top-k loss functions as modifications of the
softmax and the multiclass SVM losses and provide efficient optimization
schemes for them. In the experiments, we compare on various datasets all of the
proposed and established methods for top-k error optimization. An interesting
insight of this paper is that the softmax loss yields competitive top-k
performance for all k simultaneously. For a specific top-k error, our new top-k
losses lead typically to further improvements while being faster to train than
the softmax.Comment: In Computer Vision and Pattern Recognition (CVPR), 201
Structured learning for non-smooth ranking losses
Learning to rank from relevance judgment is an active research area. Itemwise score regression, pairwise preference satisfaction, and listwise structured learning are the major techniques in use. Listwise structured learning has been applied recently to optimize important non-decomposable ranking criteria like AUC (area under ROC curve) and MAP (mean average precision). We propose new, almost-linear-time algorithms to optimize for two other criteria widely used to evaluate search systems: MRR (mean reciprocal rank) and NDCG (normalized discounted cumulative gain) in the max-margin structured learning framework. We also demonstrate that, for different ranking criteria, one may need to use different feature maps. Search applications should not be optimized in favor of a single criterion, because they need to cater to a variety of queries. E.g. MRR is best for navigational queries, while NDCG is best for informational queries. A key contribution of this paper is to fold multiple ranking loss functions into a multi-criteria max-margin optimization. The result is a single, robust ranking model that is close to the best accuracy of learners trained on individual criteria. In fact, experiments over the popular LETOR and TREC data sets show that, contrary to conventional wisdom, a test criterion is often not best served by training with the same individual criterion