2,314 research outputs found
Clustering and Community Detection with Imbalanced Clusters
Spectral clustering methods which are frequently used in clustering and
community detection applications are sensitive to the specific graph
constructions particularly when imbalanced clusters are present. We show that
ratio cut (RCut) or normalized cut (NCut) objectives are not tailored to
imbalanced cluster sizes since they tend to emphasize cut sizes over cut
values. We propose a graph partitioning problem that seeks minimum cut
partitions under minimum size constraints on partitions to deal with imbalanced
cluster sizes. Our approach parameterizes a family of graphs by adaptively
modulating node degrees on a fixed node set, yielding a set of parameter
dependent cuts reflecting varying levels of imbalance. The solution to our
problem is then obtained by optimizing over these parameters. We present
rigorous limit cut analysis results to justify our approach and demonstrate the
superiority of our method through experiments on synthetic and real datasets
for data clustering, semi-supervised learning and community detection.Comment: Extended version of arXiv:1309.2303 with new applications. Accepted
to IEEE TSIP
Deep Divergence-Based Approach to Clustering
A promising direction in deep learning research consists in learning
representations and simultaneously discovering cluster structure in unlabeled
data by optimizing a discriminative loss function. As opposed to supervised
deep learning, this line of research is in its infancy, and how to design and
optimize suitable loss functions to train deep neural networks for clustering
is still an open question. Our contribution to this emerging field is a new
deep clustering network that leverages the discriminative power of
information-theoretic divergence measures, which have been shown to be
effective in traditional clustering. We propose a novel loss function that
incorporates geometric regularization constraints, thus avoiding degenerate
structures of the resulting clustering partition. Experiments on synthetic
benchmarks and real datasets show that the proposed network achieves
competitive performance with respect to other state-of-the-art methods, scales
well to large datasets, and does not require pre-training steps
On interference among moving sensors and related problems
We show that for any set of points moving along "simple" trajectories
(i.e., each coordinate is described with a polynomial of bounded degree) in
and any parameter , one can select a fixed non-empty
subset of the points of size , such that the Voronoi diagram of
this subset is "balanced" at any given time (i.e., it contains points
per cell). We also show that the bound is near optimal even for
the one dimensional case in which points move linearly in time. As
applications, we show that one can assign communication radii to the sensors of
a network of moving sensors so that at any given time their interference is
. We also show some results in kinetic approximate range
counting and kinetic discrepancy. In order to obtain these results, we extend
well-known results from -net theory to kinetic environments
Evaluating Stability in Massive Social Networks: Efficient Streaming Algorithms for Structural Balance
Structural balance theory studies stability in networks. Given a -vertex
complete graph whose edges are labeled positive or negative, the
graph is considered \emph{balanced} if every triangle either consists of three
positive edges (three mutual ``friends''), or one positive edge and two
negative edges (two ``friends'' with a common ``enemy''). From a computational
perspective, structural balance turns out to be a special case of correlation
clustering with the number of clusters at most two. The two main algorithmic
problems of interest are: detecting whether a given graph is balanced, or
finding a partition that approximates the \emph{frustration index},
i.e., the minimum number of edge flips that turn the graph balanced.
We study these problems in the streaming model where edges are given one by
one and focus on \emph{memory efficiency}. We provide randomized single-pass
algorithms for: determining whether an input graph is balanced with
memory, and finding a partition that induces a -approximation to the frustration index with memory. We further provide several new lower bounds,
complementing different aspects of our algorithms such as the need for
randomization or approximation.
To obtain our main results, we develop a method using pseudorandom generators
(PRGs) to sample edges between independently-chosen \emph{vertices} in graph
streaming. Furthermore, our algorithm that approximates the frustration index
improves the running time of the state-of-the-art correlation clustering with
two clusters (Giotis-Guruswami algorithm [SODA 2006]) from
to time for
-approximation. These results may be of independent interest
- …