211,376 research outputs found
Approximate kernel clustering
In the kernel clustering problem we are given a large positive
semi-definite matrix with and a small
positive semi-definite matrix . The goal is to find a
partition of which maximizes the quantity We study the
computational complexity of this generic clustering problem which originates in
the theory of machine learning. We design a constant factor polynomial time
approximation algorithm for this problem, answering a question posed by Song,
Smola, Gretton and Borgwardt. In some cases we manage to compute the sharp
approximation threshold for this problem assuming the Unique Games Conjecture
(UGC). In particular, when is the identity matrix the UGC
hardness threshold of this problem is exactly . We present
and study a geometric conjecture of independent interest which we show would
imply that the UGC threshold when is the identity matrix is
for every
Approximate Clustering via Metric Partitioning
In this paper we consider two metric covering/clustering problems -
\textit{Minimum Cost Covering Problem} (MCC) and -clustering. In the MCC
problem, we are given two point sets (clients) and (servers), and a
metric on . We would like to cover the clients by balls centered at
the servers. The objective function to minimize is the sum of the -th
power of the radii of the balls. Here is a parameter of the
problem (but not of a problem instance). MCC is closely related to the
-clustering problem. The main difference between -clustering and MCC is
that in -clustering one needs to select balls to cover the clients.
For any \eps > 0, we describe quasi-polynomial time (1 + \eps)
approximation algorithms for both of the problems. However, in case of
-clustering the algorithm uses (1 + \eps)k balls. Prior to our work, a
and a approximation were achieved by
polynomial-time algorithms for MCC and -clustering, respectively, where is an absolute constant. These two problems are thus interesting examples of
metric covering/clustering problems that admit (1 + \eps)-approximation
(using (1+\eps)k balls in case of -clustering), if one is willing to
settle for quasi-polynomial time. In contrast, for the variant of MCC where
is part of the input, we show under standard assumptions that no
polynomial time algorithm can achieve an approximation factor better than
for .Comment: 19 page
Large Scale Spectral Clustering Using Approximate Commute Time Embedding
Spectral clustering is a novel clustering method which can detect complex
shapes of data clusters. However, it requires the eigen decomposition of the
graph Laplacian matrix, which is proportion to and thus is not
suitable for large scale systems. Recently, many methods have been proposed to
accelerate the computational time of spectral clustering. These approximate
methods usually involve sampling techniques by which a lot information of the
original data may be lost. In this work, we propose a fast and accurate
spectral clustering approach using an approximate commute time embedding, which
is similar to the spectral embedding. The method does not require using any
sampling technique and computing any eigenvector at all. Instead it uses random
projection and a linear time solver to find the approximate embedding. The
experiments in several synthetic and real datasets show that the proposed
approach has better clustering quality and is faster than the state-of-the-art
approximate spectral clustering methods
Fast Approximate Spectral Clustering for Dynamic Networks
Spectral clustering is a widely studied problem, yet its complexity is
prohibitive for dynamic graphs of even modest size. We claim that it is
possible to reuse information of past cluster assignments to expedite
computation. Our approach builds on a recent idea of sidestepping the main
bottleneck of spectral clustering, i.e., computing the graph eigenvectors, by
using fast Chebyshev graph filtering of random signals. We show that the
proposed algorithm achieves clustering assignments with quality approximating
that of spectral clustering and that it can yield significant complexity
benefits when the graph dynamics are appropriately bounded
ACCAMS: Additive Co-Clustering to Approximate Matrices Succinctly
Matrix completion and approximation are popular tools to capture a user's
preferences for recommendation and to approximate missing data. Instead of
using low-rank factorization we take a drastically different approach, based on
the simple insight that an additive model of co-clusterings allows one to
approximate matrices efficiently. This allows us to build a concise model that,
per bit of model learned, significantly beats all factorization approaches to
matrix approximation. Even more surprisingly, we find that summing over small
co-clusterings is more effective in modeling matrices than classic
co-clustering, which uses just one large partitioning of the matrix.
Following Occam's razor principle suggests that the simple structure induced
by our model better captures the latent preferences and decision making
processes present in the real world than classic co-clustering or matrix
factorization. We provide an iterative minimization algorithm, a collapsed
Gibbs sampler, theoretical guarantees for matrix approximation, and excellent
empirical evidence for the efficacy of our approach. We achieve
state-of-the-art results on the Netflix problem with a fraction of the model
complexity.Comment: 22 pages, under review for conference publicatio
- …