13,779 research outputs found
Detecting communities is Hard (And Counting Them is Even Harder)
We consider the algorithmic problem of community detection in networks. Given an undirected friendship graph G, a subset
S of vertices is an (a,b)-community if: * Every member of the community is friends with an (a)-fraction of the community; and
* every non-member is friends with at most a (b)-fraction of the
community.
[Arora, Ge, Sachdeva, Schoenebeck 2012] gave a quasi-polynomial
time algorithm for enumerating all the (a,b)-communities
for any constants a>b.
Here, we prove that, assuming the Exponential Time Hypothesis (ETH),
quasi-polynomial time is in fact necessary - and even for a much weaker
approximation desideratum. Namely, distinguishing between:
* G contains an (1,o(1))-community; and
* G does not contain a (b,b+o(1))-community
for any b.
We also prove that counting the number of (1,o(1))-communities
requires quasi-polynomial time assuming the weaker #ETH
Is It Easier to Count Communities Than Find Them?
Random graph models with community structure have been studied extensively in the literature. For both the problems of detecting and recovering community structure, an interesting landscape of statistical and computational phase transitions has emerged. A natural unanswered question is: might it be possible to infer properties of the community structure (for instance, the number and sizes of communities) even in situations where actually finding those communities is believed to be computationally hard? We show the answer is no. In particular, we consider certain hypothesis testing problems between models with different community structures, and we show (in the low-degree polynomial framework) that testing between two options is as hard as finding the communities.
In addition, our methods give the first computational lower bounds for testing between two different "planted" distributions, whereas previous results have considered testing between a planted distribution and an i.i.d. "null" distribution
Detecting Communities under Differential Privacy
Complex networks usually expose community structure with groups of nodes
sharing many links with the other nodes in the same group and relatively few
with the nodes of the rest. This feature captures valuable information about
the organization and even the evolution of the network. Over the last decade, a
great number of algorithms for community detection have been proposed to deal
with the increasingly complex networks. However, the problem of doing this in a
private manner is rarely considered. In this paper, we solve this problem under
differential privacy, a prominent privacy concept for releasing private data.
We analyze the major challenges behind the problem and propose several schemes
to tackle them from two perspectives: input perturbation and algorithm
perturbation. We choose Louvain method as the back-end community detection for
input perturbation schemes and propose the method LouvainDP which runs Louvain
algorithm on a noisy super-graph. For algorithm perturbation, we design
ModDivisive using exponential mechanism with the modularity as the score. We
have thoroughly evaluated our techniques on real graphs of different sizes and
verified their outperformance over the state-of-the-art
On Efficiently Detecting Overlapping Communities over Distributed Dynamic Graphs
Modern networks are of huge sizes as well as high dynamics, which challenges
the efficiency of community detection algorithms. In this paper, we study the
problem of overlapping community detection on distributed and dynamic graphs.
Given a distributed, undirected and unweighted graph, the goal is to detect
overlapping communities incrementally as the graph is dynamically changing. We
propose an efficient algorithm, called \textit{randomized Speaker-Listener
Label Propagation Algorithm} (rSLPA), based on the \textit{Speaker-Listener
Label Propagation Algorithm} (SLPA) by relaxing the probability distribution of
label propagation. Besides detecting high-quality communities, rSLPA can
incrementally update the detected communities after a batch of edge insertion
and deletion operations. To the best of our knowledge, rSLPA is the first
algorithm that can incrementally capture the same communities as those obtained
by applying the detection algorithm from the scratch on the updated graph.
Extensive experiments are conducted on both synthetic and real-world datasets,
and the results show that our algorithm can achieve high accuracy and
efficiency at the same time.Comment: A short version of this paper will be published as ICDE'2018 poste
Discovering Communities of Community Discovery
Discovering communities in complex networks means grouping nodes similar to
each other, to uncover latent information about them. There are hundreds of
different algorithms to solve the community detection task, each with its own
understanding and definition of what a "community" is. Dozens of review works
attempt to order such a diverse landscape -- classifying community discovery
algorithms by the process they employ to detect communities, by their
explicitly stated definition of community, or by their performance on a
standardized task. In this paper, we classify community discovery algorithms
according to a fourth criterion: the similarity of their results. We create an
Algorithm Similarity Network (ASN), whose nodes are the community detection
approaches, connected if they return similar groupings. We then perform
community detection on this network, grouping algorithms that consistently
return the same partitions or overlapping coverage over a span of more than one
thousand synthetic and real world networks. This paper is an attempt to create
a similarity-based classification of community detection algorithms based on
empirical data. It improves over the state of the art by comparing more than
seventy approaches, discovering that the ASN contains well-separated groups,
making it a sensible tool for practitioners, aiding their choice of algorithms
fitting their analytic needs
Phase Transitions of the Typical Algorithmic Complexity of the Random Satisfiability Problem Studied with Linear Programming
Here we study the NP-complete -SAT problem. Although the worst-case
complexity of NP-complete problems is conjectured to be exponential, there
exist parametrized random ensembles of problems where solutions can typically
be found in polynomial time for suitable ranges of the parameter. In fact,
random -SAT, with as control parameter, can be solved quickly
for small enough values of . It shows a phase transition between a
satisfiable phase and an unsatisfiable phase. For branch and bound algorithms,
which operate in the space of feasible Boolean configurations, the empirically
hardest problems are located only close to this phase transition. Here we study
-SAT () and the related optimization problem MAX-SAT by a linear
programming approach, which is widely used for practical problems and allows
for polynomial run time. In contrast to branch and bound it operates outside
the space of feasible configurations. On the other hand, finding a solution
within polynomial time is not guaranteed. We investigated several variants like
including artificial objective functions, so called cutting-plane approaches,
and a mapping to the NP-complete vertex-cover problem. We observed several
easy-hard transitions, from where the problems are typically solvable (in
polynomial time) using the given algorithms, respectively, to where they are
not solvable in polynomial time. For the related vertex-cover problem on random
graphs these easy-hard transitions can be identified with structural properties
of the graphs, like percolation transitions. For the present random -SAT
problem we have investigated numerous structural properties also exhibiting
clear transitions, but they appear not be correlated to the here observed
easy-hard transitions. This renders the behaviour of random -SAT more
complex than, e.g., the vertex-cover problem.Comment: 11 pages, 5 figure
Automatic Detection of Online Jihadist Hate Speech
We have developed a system that automatically detects online jihadist hate
speech with over 80% accuracy, by using techniques from Natural Language
Processing and Machine Learning. The system is trained on a corpus of 45,000
subversive Twitter messages collected from October 2014 to December 2016. We
present a qualitative and quantitative analysis of the jihadist rhetoric in the
corpus, examine the network of Twitter users, outline the technical procedure
used to train the system, and discuss examples of use.Comment: 31 page
Bridges of the BeltLine
As currently realized, the Atlanta BeltLine weaves under, over, and through a multitude of overpasses, footbridges, and tunnels. As in any city, this significant feature is simultaneously an asset and a potential hazard. These types of structures are "vulnerable critical facilities" that should be included in emergency risk assessments and mitigation planning (FEMA, 2013). As such, the Bridges of the BeltLine project was proposed as a mixed-methods study to understand how people's movement along the BeltLine can inform emergency management mitigation, planning, and response. Understanding pedestrian flow in cities has been underfunded and understudied but is nonetheless critical to city infrastructure monitoring and improvement projects. This study focused on developing inexpensive, low-power consumption sensors capable of detecting human presence while preserving privacy, as well as a survey designed to collect data that the sensors cannot. The survey data were intended to describe BeltLine users, querying on demographics, reasons, frequency, duration of use, and mode of travel to and on the BeltLine. After conferring with the Atlanta BeltLine, Inc. (ABI) leadership, it became apparent that ABI's primary interest is in understanding which communities are being served by the BeltLine and whether it has changed commuting and travel behaviors or created new demand. As a result, the project's original focus on emergency management was expanded to explore which communities are being served and for what kind of use. As such, the project's revised objective was two-fold: to facilitate understanding of (a) whether the BeltLine is serving the adjacent communities and purpose of use and (b) to inform emergency mitigation, planning, and response.This research was made possible by a grant from Georgia Tech's Executive Vice President of Research, Small Bets Seed Grants program, with supplemental funding from the Center for the Development and Application of Internet of Things Technologies (CDAIT)
- …