11,463 research outputs found
Tight Continuous Relaxation of the Balanced -Cut Problem
Spectral Clustering as a relaxation of the normalized/ratio cut has become
one of the standard graph-based clustering methods. Existing methods for the
computation of multiple clusters, corresponding to a balanced -cut of the
graph, are either based on greedy techniques or heuristics which have weak
connection to the original motivation of minimizing the normalized cut. In this
paper we propose a new tight continuous relaxation for any balanced -cut
problem and show that a related recently proposed relaxation is in most cases
loose leading to poor performance in practice. For the optimization of our
tight continuous relaxation we propose a new algorithm for the difficult
sum-of-ratios minimization problem which achieves monotonic descent. Extensive
comparisons show that our method outperforms all existing approaches for ratio
cut and other balanced -cut criteria.Comment: Long version of paper accepted at NIPS 201
Active Sampling of Pairs and Points for Large-scale Linear Bipartite Ranking
Bipartite ranking is a fundamental ranking problem that learns to order
relevant instances ahead of irrelevant ones. The pair-wise approach for
bi-partite ranking construct a quadratic number of pairs to solve the problem,
which is infeasible for large-scale data sets. The point-wise approach, albeit
more efficient, often results in inferior performance. That is, it is difficult
to conduct bipartite ranking accurately and efficiently at the same time. In
this paper, we develop a novel active sampling scheme within the pair-wise
approach to conduct bipartite ranking efficiently. The scheme is inspired from
active learning and can reach a competitive ranking performance while focusing
only on a small subset of the many pairs during training. Moreover, we propose
a general Combined Ranking and Classification (CRC) framework to accurately
conduct bipartite ranking. The framework unifies point-wise and pair-wise
approaches and is simply based on the idea of treating each instance point as a
pseudo-pair. Experiments on 14 real-word large-scale data sets demonstrate that
the proposed algorithm of Active Sampling within CRC, when coupled with a
linear Support Vector Machine, usually outperforms state-of-the-art point-wise
and pair-wise ranking approaches in terms of both accuracy and efficiency.Comment: a shorter version was presented in ACML 201
- …