Search CORE

98,395 research outputs found

Locally Non-linear Embeddings for Extreme Multi-label Learning

Author: Bhatia Kush
Jain Himanshu
Jain Prateek
Kar Purushottam
Varma Manik
Publication venue
Publication date: 09/07/2015
Field of study

The objective in extreme multi-label learning is to train a classifier that can automatically tag a novel data point with the most relevant subset of labels from an extremely large label set. Embedding based approaches make training and prediction tractable by assuming that the training label matrix is low-rank and hence the effective number of labels can be reduced by projecting the high dimensional label vectors onto a low dimensional linear subspace. Still, leading embedding approaches have been unable to deliver high prediction accuracies or scale to large problems as the low rank assumption is violated in most real world applications. This paper develops the X-One classifier to address both limitations. The main technical contribution in X-One is a formulation for learning a small ensemble of local distance preserving embeddings which can accurately predict infrequently occurring (tail) labels. This allows X-One to break free of the traditional low-rank assumption and boost classification accuracy by learning embeddings which preserve pairwise distances between only the nearest label vectors. We conducted extensive experiments on several real-world as well as benchmark data sets and compared our method against state-of-the-art methods for extreme multi-label classification. Experiments reveal that X-One can make significantly more accurate predictions then the state-of-the-art methods including both embeddings (by as much as 35%) as well as trees (by as much as 6%). X-One can also scale efficiently to data sets with a million labels which are beyond the pale of leading embedding methods

arXiv.org e-Print Archive

CiteSeerX

On label dependence in multilabel classification

Author: Cheng Weiwei
Dembszynski Krzysztof
Hüllermeier Eyke
Waegeman Willem
Publication venue: Ghent University, KERMIT, Department of Applied Mathematics, Biometrics and Process Control
Publication date: 01/01/2010
Field of study

Ghent University Academic Bibliography

Nearest Labelset Using Double Distances for Multi-label Classification

Author: Gweon Hyukjun
Schonlau Matthias
Steiner Stefan
Publication venue
Publication date: 15/02/2017
Field of study

Multi-label classification is a type of supervised learning where an instance may belong to multiple labels simultaneously. Predicting each label independently has been criticized for not exploiting any correlation between labels. In this paper we propose a novel approach, Nearest Labelset using Double Distances (NLDD), that predicts the labelset observed in the training data that minimizes a weighted sum of the distances in both the feature space and the label space to the new instance. The weights specify the relative tradeoff between the two distances. The weights are estimated from a binomial regression of the number of misclassified labels as a function of the two distances. Model parameters are estimated by maximum likelihood. NLDD only considers labelsets observed in the training data, thus implicitly taking into account label dependencies. Experiments on benchmark multi-label data sets show that the proposed method on average outperforms other well-known approaches in terms of Hamming loss, 0/1 loss, and multi-label accuracy and ranks second after ECC on the F-measure

arXiv.org e-Print Archive