8,667 research outputs found
Phase Transitions in Semidefinite Relaxations
Statistical inference problems arising within signal processing, data mining,
and machine learning naturally give rise to hard combinatorial optimization
problems. These problems become intractable when the dimensionality of the data
is large, as is often the case for modern datasets. A popular idea is to
construct convex relaxations of these combinatorial problems, which can be
solved efficiently for large scale datasets.
Semidefinite programming (SDP) relaxations are among the most powerful
methods in this family, and are surprisingly well-suited for a broad range of
problems where data take the form of matrices or graphs. It has been observed
several times that, when the `statistical noise' is small enough, SDP
relaxations correctly detect the underlying combinatorial structures.
In this paper we develop asymptotic predictions for several `detection
thresholds,' as well as for the estimation error above these thresholds. We
study some classical SDP relaxations for statistical problems motivated by
graph synchronization and community detection in networks. We map these
optimization problems to statistical mechanics models with vector spins, and
use non-rigorous techniques from statistical mechanics to characterize the
corresponding phase transitions. Our results clarify the effectiveness of SDP
relaxations in solving high-dimensional statistical problems.Comment: 71 pages, 24 pdf figure
Detectability thresholds and optimal algorithms for community structure in dynamic networks
We study the fundamental limits on learning latent community structure in
dynamic networks. Specifically, we study dynamic stochastic block models where
nodes change their community membership over time, but where edges are
generated independently at each time step. In this setting (which is a special
case of several existing models), we are able to derive the detectability
threshold exactly, as a function of the rate of change and the strength of the
communities. Below this threshold, we claim that no algorithm can identify the
communities better than chance. We then give two algorithms that are optimal in
the sense that they succeed all the way down to this limit. The first uses
belief propagation (BP), which gives asymptotically optimal accuracy, and the
second is a fast spectral clustering algorithm, based on linearizing the BP
equations. We verify our analytic and algorithmic results via numerical
simulation, and close with a brief discussion of extensions and open questions.Comment: 9 pages, 3 figure
Uncovering nodes that spread information between communities in social networks
From many datasets gathered in online social networks, well defined community
structures have been observed. A large number of users participate in these
networks and the size of the resulting graphs poses computational challenges.
There is a particular demand in identifying the nodes responsible for
information flow between communities; for example, in temporal Twitter networks
edges between communities play a key role in propagating spikes of activity
when the connectivity between communities is sparse and few edges exist between
different clusters of nodes. The new algorithm proposed here is aimed at
revealing these key connections by measuring a node's vicinity to nodes of
another community. We look at the nodes which have edges in more than one
community and the locality of nodes around them which influence the information
received and broadcasted to them. The method relies on independent random walks
of a chosen fixed number of steps, originating from nodes with edges in more
than one community. For the large networks that we have in mind, existing
measures such as betweenness centrality are difficult to compute, even with
recent methods that approximate the large number of operations required. We
therefore design an algorithm that scales up to the demand of current big data
requirements and has the ability to harness parallel processing capabilities.
The new algorithm is illustrated on synthetic data, where results can be judged
carefully, and also on a real, large scale Twitter activity data, where new
insights can be gained
A Spectral Algorithm with Additive Clustering for the Recovery of Overlapping Communities in Networks
This paper presents a novel spectral algorithm with additive clustering
designed to identify overlapping communities in networks. The algorithm is
based on geometric properties of the spectrum of the expected adjacency matrix
in a random graph model that we call stochastic blockmodel with overlap (SBMO).
An adaptive version of the algorithm, that does not require the knowledge of
the number of hidden communities, is proved to be consistent under the SBMO
when the degrees in the graph are (slightly more than) logarithmic. The
algorithm is shown to perform well on simulated data and on real-world graphs
with known overlapping communities.Comment: Journal of Theoretical Computer Science (TCS), Elsevier, A Para\^itr
- …