13,779 research outputs found

    Detecting communities is Hard (And Counting Them is Even Harder)

    Get PDF
    We consider the algorithmic problem of community detection in networks. Given an undirected friendship graph G, a subset S of vertices is an (a,b)-community if: * Every member of the community is friends with an (a)-fraction of the community; and * every non-member is friends with at most a (b)-fraction of the community. [Arora, Ge, Sachdeva, Schoenebeck 2012] gave a quasi-polynomial time algorithm for enumerating all the (a,b)-communities for any constants a>b. Here, we prove that, assuming the Exponential Time Hypothesis (ETH), quasi-polynomial time is in fact necessary - and even for a much weaker approximation desideratum. Namely, distinguishing between: * G contains an (1,o(1))-community; and * G does not contain a (b,b+o(1))-community for any b. We also prove that counting the number of (1,o(1))-communities requires quasi-polynomial time assuming the weaker #ETH

    Is It Easier to Count Communities Than Find Them?

    Get PDF
    Random graph models with community structure have been studied extensively in the literature. For both the problems of detecting and recovering community structure, an interesting landscape of statistical and computational phase transitions has emerged. A natural unanswered question is: might it be possible to infer properties of the community structure (for instance, the number and sizes of communities) even in situations where actually finding those communities is believed to be computationally hard? We show the answer is no. In particular, we consider certain hypothesis testing problems between models with different community structures, and we show (in the low-degree polynomial framework) that testing between two options is as hard as finding the communities. In addition, our methods give the first computational lower bounds for testing between two different "planted" distributions, whereas previous results have considered testing between a planted distribution and an i.i.d. "null" distribution

    Detecting Communities under Differential Privacy

    Get PDF
    Complex networks usually expose community structure with groups of nodes sharing many links with the other nodes in the same group and relatively few with the nodes of the rest. This feature captures valuable information about the organization and even the evolution of the network. Over the last decade, a great number of algorithms for community detection have been proposed to deal with the increasingly complex networks. However, the problem of doing this in a private manner is rarely considered. In this paper, we solve this problem under differential privacy, a prominent privacy concept for releasing private data. We analyze the major challenges behind the problem and propose several schemes to tackle them from two perspectives: input perturbation and algorithm perturbation. We choose Louvain method as the back-end community detection for input perturbation schemes and propose the method LouvainDP which runs Louvain algorithm on a noisy super-graph. For algorithm perturbation, we design ModDivisive using exponential mechanism with the modularity as the score. We have thoroughly evaluated our techniques on real graphs of different sizes and verified their outperformance over the state-of-the-art

    On Efficiently Detecting Overlapping Communities over Distributed Dynamic Graphs

    Full text link
    Modern networks are of huge sizes as well as high dynamics, which challenges the efficiency of community detection algorithms. In this paper, we study the problem of overlapping community detection on distributed and dynamic graphs. Given a distributed, undirected and unweighted graph, the goal is to detect overlapping communities incrementally as the graph is dynamically changing. We propose an efficient algorithm, called \textit{randomized Speaker-Listener Label Propagation Algorithm} (rSLPA), based on the \textit{Speaker-Listener Label Propagation Algorithm} (SLPA) by relaxing the probability distribution of label propagation. Besides detecting high-quality communities, rSLPA can incrementally update the detected communities after a batch of edge insertion and deletion operations. To the best of our knowledge, rSLPA is the first algorithm that can incrementally capture the same communities as those obtained by applying the detection algorithm from the scratch on the updated graph. Extensive experiments are conducted on both synthetic and real-world datasets, and the results show that our algorithm can achieve high accuracy and efficiency at the same time.Comment: A short version of this paper will be published as ICDE'2018 poste

    Discovering Communities of Community Discovery

    Get PDF
    Discovering communities in complex networks means grouping nodes similar to each other, to uncover latent information about them. There are hundreds of different algorithms to solve the community detection task, each with its own understanding and definition of what a "community" is. Dozens of review works attempt to order such a diverse landscape -- classifying community discovery algorithms by the process they employ to detect communities, by their explicitly stated definition of community, or by their performance on a standardized task. In this paper, we classify community discovery algorithms according to a fourth criterion: the similarity of their results. We create an Algorithm Similarity Network (ASN), whose nodes are the community detection approaches, connected if they return similar groupings. We then perform community detection on this network, grouping algorithms that consistently return the same partitions or overlapping coverage over a span of more than one thousand synthetic and real world networks. This paper is an attempt to create a similarity-based classification of community detection algorithms based on empirical data. It improves over the state of the art by comparing more than seventy approaches, discovering that the ASN contains well-separated groups, making it a sensible tool for practitioners, aiding their choice of algorithms fitting their analytic needs

    Phase Transitions of the Typical Algorithmic Complexity of the Random Satisfiability Problem Studied with Linear Programming

    Full text link
    Here we study the NP-complete KK-SAT problem. Although the worst-case complexity of NP-complete problems is conjectured to be exponential, there exist parametrized random ensembles of problems where solutions can typically be found in polynomial time for suitable ranges of the parameter. In fact, random KK-SAT, with α=M/N\alpha=M/N as control parameter, can be solved quickly for small enough values of α\alpha. It shows a phase transition between a satisfiable phase and an unsatisfiable phase. For branch and bound algorithms, which operate in the space of feasible Boolean configurations, the empirically hardest problems are located only close to this phase transition. Here we study KK-SAT (K=3,4K=3,4) and the related optimization problem MAX-SAT by a linear programming approach, which is widely used for practical problems and allows for polynomial run time. In contrast to branch and bound it operates outside the space of feasible configurations. On the other hand, finding a solution within polynomial time is not guaranteed. We investigated several variants like including artificial objective functions, so called cutting-plane approaches, and a mapping to the NP-complete vertex-cover problem. We observed several easy-hard transitions, from where the problems are typically solvable (in polynomial time) using the given algorithms, respectively, to where they are not solvable in polynomial time. For the related vertex-cover problem on random graphs these easy-hard transitions can be identified with structural properties of the graphs, like percolation transitions. For the present random KK-SAT problem we have investigated numerous structural properties also exhibiting clear transitions, but they appear not be correlated to the here observed easy-hard transitions. This renders the behaviour of random KK-SAT more complex than, e.g., the vertex-cover problem.Comment: 11 pages, 5 figure

    Automatic Detection of Online Jihadist Hate Speech

    Full text link
    We have developed a system that automatically detects online jihadist hate speech with over 80% accuracy, by using techniques from Natural Language Processing and Machine Learning. The system is trained on a corpus of 45,000 subversive Twitter messages collected from October 2014 to December 2016. We present a qualitative and quantitative analysis of the jihadist rhetoric in the corpus, examine the network of Twitter users, outline the technical procedure used to train the system, and discuss examples of use.Comment: 31 page

    Bridges of the BeltLine

    Get PDF
    As currently realized, the Atlanta BeltLine weaves under, over, and through a multitude of overpasses, footbridges, and tunnels. As in any city, this significant feature is simultaneously an asset and a potential hazard. These types of structures are "vulnerable critical facilities" that should be included in emergency risk assessments and mitigation planning (FEMA, 2013). As such, the Bridges of the BeltLine project was proposed as a mixed-methods study to understand how people's movement along the BeltLine can inform emergency management mitigation, planning, and response. Understanding pedestrian flow in cities has been underfunded and understudied but is nonetheless critical to city infrastructure monitoring and improvement projects. This study focused on developing inexpensive, low-power consumption sensors capable of detecting human presence while preserving privacy, as well as a survey designed to collect data that the sensors cannot. The survey data were intended to describe BeltLine users, querying on demographics, reasons, frequency, duration of use, and mode of travel to and on the BeltLine. After conferring with the Atlanta BeltLine, Inc. (ABI) leadership, it became apparent that ABI's primary interest is in understanding which communities are being served by the BeltLine and whether it has changed commuting and travel behaviors or created new demand. As a result, the project's original focus on emergency management was expanded to explore which communities are being served and for what kind of use. As such, the project's revised objective was two-fold: to facilitate understanding of (a) whether the BeltLine is serving the adjacent communities and purpose of use and (b) to inform emergency mitigation, planning, and response.This research was made possible by a grant from Georgia Tech's Executive Vice President of Research, Small Bets Seed Grants program, with supplemental funding from the Center for the Development and Application of Internet of Things Technologies (CDAIT)
    • …
    corecore