3,843 research outputs found

    Statistical Mechanics of Community Detection

    Full text link
    Starting from a general \textit{ansatz}, we show how community detection can be interpreted as finding the ground state of an infinite range spin glass. Our approach applies to weighted and directed networks alike. It contains the \textit{at hoc} introduced quality function from \cite{ReichardtPRL} and the modularity QQ as defined by Newman and Girvan \cite{Girvan03} as special cases. The community structure of the network is interpreted as the spin configuration that minimizes the energy of the spin glass with the spin states being the community indices. We elucidate the properties of the ground state configuration to give a concise definition of communities as cohesive subgroups in networks that is adaptive to the specific class of network under study. Further we show, how hierarchies and overlap in the community structure can be detected. Computationally effective local update rules for optimization procedures to find the ground state are given. We show how the \textit{ansatz} may be used to discover the community around a given node without detecting all communities in the full network and we give benchmarks for the performance of this extension. Finally, we give expectation values for the modularity of random graphs, which can be used in the assessment of statistical significance of community structure

    Explain3D: Explaining Disagreements in Disjoint Datasets

    Get PDF
    Data plays an important role in applications, analytic processes, and many aspects of human activity. As data grows in size and complexity, we are met with an imperative need for tools that promote understanding and explanations over data-related operations. Data management research on explanations has focused on the assumption that data resides in a single dataset, under one common schema. But the reality of today's data is that it is frequently un-integrated, coming from different sources with different schemas. When different datasets provide different answers to semantically similar questions, understanding the reasons for the discrepancies is challenging and cannot be handled by the existing single-dataset solutions. In this paper, we propose Explain3D, a framework for explaining the disagreements across disjoint datasets (3D). Explain3D focuses on identifying the reasons for the differences in the results of two semantically similar queries operating on two datasets with potentially different schemas. Our framework leverages the queries to perform a semantic mapping across the relevant parts of their provenance; discrepancies in this mapping point to causes of the queries' differences. Exploiting the queries gives Explain3D an edge over traditional schema matching and record linkage techniques, which are query-agnostic. Our work makes the following contributions: (1) We formalize the problem of deriving optimal explanations for the differences of the results of semantically similar queries over disjoint datasets. (2) We design a 3-stage framework for solving the optimal explanation problem. (3) We develop a smart-partitioning optimizer that improves the efficiency of the framework by orders of magnitude. (4)~We experiment with real-world and synthetic data to demonstrate that Explain3D can derive precise explanations efficiently

    Communities in Networks

    Full text link
    We survey some of the concepts, methods, and applications of community detection, which has become an increasingly important area of network science. To help ease newcomers into the field, we provide a guide to available methodology and open problems, and discuss why scientists from diverse backgrounds are interested in these problems. As a running theme, we emphasize the connections of community detection to problems in statistical physics and computational optimization.Comment: survey/review article on community structure in networks; published version is available at http://people.maths.ox.ac.uk/~porterm/papers/comnotices.pd

    Loom: Query-aware Partitioning of Online Graphs

    Full text link
    As with general graph processing systems, partitioning data over a cluster of machines improves the scalability of graph database management systems. However, these systems will incur additional network cost during the execution of a query workload, due to inter-partition traversals. Workload-agnostic partitioning algorithms typically minimise the likelihood of any edge crossing partition boundaries. However, these partitioners are sub-optimal with respect to many workloads, especially queries, which may require more frequent traversal of specific subsets of inter-partition edges. Furthermore, they largely unsuited to operating incrementally on dynamic, growing graphs. We present a new graph partitioning algorithm, Loom, that operates on a stream of graph updates and continuously allocates the new vertices and edges to partitions, taking into account a query workload of graph pattern expressions along with their relative frequencies. First we capture the most common patterns of edge traversals which occur when executing queries. We then compare sub-graphs, which present themselves incrementally in the graph update stream, against these common patterns. Finally we attempt to allocate each match to single partitions, reducing the number of inter-partition edges within frequently traversed sub-graphs and improving average query performance. Loom is extensively evaluated over several large test graphs with realistic query workloads and various orderings of the graph updates. We demonstrate that, given a workload, our prototype produces partitionings of significantly better quality than existing streaming graph partitioning algorithms Fennel and LDG

    DEMON: a Local-First Discovery Method for Overlapping Communities

    Full text link
    Community discovery in complex networks is an interesting problem with a number of applications, especially in the knowledge extraction task in social and information networks. However, many large networks often lack a particular community organization at a global level. In these cases, traditional graph partitioning algorithms fail to let the latent knowledge embedded in modular structure emerge, because they impose a top-down global view of a network. We propose here a simple local-first approach to community discovery, able to unveil the modular organization of real complex networks. This is achieved by democratically letting each node vote for the communities it sees surrounding it in its limited view of the global system, i.e. its ego neighborhood, using a label propagation algorithm; finally, the local communities are merged into a global collection. We tested this intuition against the state-of-the-art overlapping and non-overlapping community discovery methods, and found that our new method clearly outperforms the others in the quality of the obtained communities, evaluated by using the extracted communities to predict the metadata about the nodes of several real world networks. We also show how our method is deterministic, fully incremental, and has a limited time complexity, so that it can be used on web-scale real networks.Comment: 9 pages; Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Beijing, China, August 12-16, 201
    corecore