18,589 research outputs found
Approximation Algorithms for Bregman Co-clustering and Tensor Clustering
In the past few years powerful generalizations to the Euclidean k-means
problem have been made, such as Bregman clustering [7], co-clustering (i.e.,
simultaneous clustering of rows and columns of an input matrix) [9,18], and
tensor clustering [8,34]. Like k-means, these more general problems also suffer
from the NP-hardness of the associated optimization. Researchers have developed
approximation algorithms of varying degrees of sophistication for k-means,
k-medians, and more recently also for Bregman clustering [2]. However, there
seem to be no approximation algorithms for Bregman co- and tensor clustering.
In this paper we derive the first (to our knowledge) guaranteed methods for
these increasingly important clustering settings. Going beyond Bregman
divergences, we also prove an approximation factor for tensor clustering with
arbitrary separable metrics. Through extensive experiments we evaluate the
characteristics of our method, and show that it also has practical impact.Comment: 18 pages; improved metric cas
Clustering Boolean Tensors
Tensor factorizations are computationally hard problems, and in particular,
are often significantly harder than their matrix counterparts. In case of
Boolean tensor factorizations -- where the input tensor and all the factors are
required to be binary and we use Boolean algebra -- much of that hardness comes
from the possibility of overlapping components. Yet, in many applications we
are perfectly happy to partition at least one of the modes. In this paper we
investigate what consequences does this partitioning have on the computational
complexity of the Boolean tensor factorizations and present a new algorithm for
the resulting clustering problem. This algorithm can alternatively be seen as a
particularly regularized clustering algorithm that can handle extremely
high-dimensional observations. We analyse our algorithms with the goal of
maximizing the similarity and argue that this is more meaningful than
minimizing the dissimilarity. As a by-product we obtain a PTAS and an efficient
0.828-approximation algorithm for rank-1 binary factorizations. Our algorithm
for Boolean tensor clustering achieves high scalability, high similarity, and
good generalization to unseen data with both synthetic and real-world data
sets
Clustering {Boolean} Tensors
Tensor factorizations are computationally hard problems, and in particular, are often significantly harder than their matrix counterparts. In case of Boolean tensor factorizations -- where the input tensor and all the factors are required to be binary and we use Boolean algebra -- much of that hardness comes from the possibility of overlapping components. Yet, in many applications we are perfectly happy to partition at least one of the modes. In this paper we investigate what consequences does this partitioning have on the computational complexity of the Boolean tensor factorizations and present a new algorithm for the resulting clustering problem. This algorithm can alternatively be seen as a particularly regularized clustering algorithm that can handle extremely high-dimensional observations. We analyse our algorithms with the goal of maximizing the similarity and argue that this is more meaningful than minimizing the dissimilarity. As a by-product we obtain a PTAS and an efficient 0.828-approximation algorithm for rank-1 binary factorizations. Our algorithm for Boolean tensor clustering achieves high scalability, high similarity, and good generalization to unseen data with both synthetic and real-world data sets
- …