135,652 research outputs found
Locally Non-linear Embeddings for Extreme Multi-label Learning
The objective in extreme multi-label learning is to train a classifier that
can automatically tag a novel data point with the most relevant subset of
labels from an extremely large label set. Embedding based approaches make
training and prediction tractable by assuming that the training label matrix is
low-rank and hence the effective number of labels can be reduced by projecting
the high dimensional label vectors onto a low dimensional linear subspace.
Still, leading embedding approaches have been unable to deliver high prediction
accuracies or scale to large problems as the low rank assumption is violated in
most real world applications.
This paper develops the X-One classifier to address both limitations. The
main technical contribution in X-One is a formulation for learning a small
ensemble of local distance preserving embeddings which can accurately predict
infrequently occurring (tail) labels. This allows X-One to break free of the
traditional low-rank assumption and boost classification accuracy by learning
embeddings which preserve pairwise distances between only the nearest label
vectors.
We conducted extensive experiments on several real-world as well as benchmark
data sets and compared our method against state-of-the-art methods for extreme
multi-label classification. Experiments reveal that X-One can make
significantly more accurate predictions then the state-of-the-art methods
including both embeddings (by as much as 35%) as well as trees (by as much as
6%). X-One can also scale efficiently to data sets with a million labels which
are beyond the pale of leading embedding methods
The singular values and vectors of low rank perturbations of large rectangular random matrices
In this paper, we consider the singular values and singular vectors of
finite, low rank perturbations of large rectangular random matrices.
Specifically, we prove almost sure convergence of the extreme singular values
and appropriate projections of the corresponding singular vectors of the
perturbed matrix. As in the prequel, where we considered the eigenvalue aspect
of the problem, the non-random limiting value is shown to depend explicitly on
the limiting singular value distribution of the unperturbed matrix via an
integral transforms that linearizes rectangular additive convolution in free
probability theory. The large matrix limit of the extreme singular values of
the perturbed matrix differs from that of the original matrix if and only if
the singular values of the perturbing matrix are above a certain critical
threshold which depends on this same aforementioned integral transform. We
examine the consequence of this singular value phase transition on the
associated left and right singular eigenvectors and discuss the finite
fluctuations above these non-random limits.Comment: 22 pages, presentation of the main results and of the hypotheses
slightly modifie
Optimization via Low-rank Approximation for Community Detection in Networks
Community detection is one of the fundamental problems of network analysis,
for which a number of methods have been proposed. Most model-based or
criteria-based methods have to solve an optimization problem over a discrete
set of labels to find communities, which is computationally infeasible. Some
fast spectral algorithms have been proposed for specific methods or models, but
only on a case-by-case basis. Here we propose a general approach for maximizing
a function of a network adjacency matrix over discrete labels by projecting the
set of labels onto a subspace approximating the leading eigenvectors of the
expected adjacency matrix. This projection onto a low-dimensional space makes
the feasible set of labels much smaller and the optimization problem much
easier. We prove a general result about this method and show how to apply it to
several previously proposed community detection criteria, establishing its
consistency for label estimation in each case and demonstrating the fundamental
connection between spectral properties of the network and various model-based
approaches to community detection. Simulations and applications to real-world
data are included to demonstrate our method performs well for multiple problems
over a wide range of parameters.Comment: 45 pages, 7 figures; added discussions about computational complexity
and extension to more than two communitie
- …