68,493 research outputs found

    Finding missing edges in networks based on their community structure

    Full text link
    Many edge prediction methods have been proposed, based on various local or global properties of the structure of an incomplete network. Community structure is another significant feature of networks: Vertices in a community are more densely connected than average. It is often true that vertices in the same community have "similar" properties, which suggests that missing edges are more likely to be found within communities than elsewhere. We use this insight to propose a strategy for edge prediction that combines existing edge prediction methods with community detection. We show that this method gives better prediction accuracy than existing edge prediction methods alone.Comment: 7 pages, 6 figure

    Link-Prediction Enhanced Consensus Clustering for Complex Networks

    Full text link
    Many real networks that are inferred or collected from data are incomplete due to missing edges. Missing edges can be inherent to the dataset (Facebook friend links will never be complete) or the result of sampling (one may only have access to a portion of the data). The consequence is that downstream analyses that consume the network will often yield less accurate results than if the edges were complete. Community detection algorithms, in particular, often suffer when critical intra-community edges are missing. We propose a novel consensus clustering algorithm to enhance community detection on incomplete networks. Our framework utilizes existing community detection algorithms that process networks imputed by our link prediction based algorithm. The framework then merges their multiple outputs into a final consensus output. On average our method boosts performance of existing algorithms by 7% on artificial data and 17% on ego networks collected from Facebook

    Evaluating Overfit and Underfit in Models of Network Community Structure

    Full text link
    A common data mining task on networks is community detection, which seeks an unsupervised decomposition of a network into structural groups based on statistical regularities in the network's connectivity. Although many methods exist, the No Free Lunch theorem for community detection implies that each makes some kind of tradeoff, and no algorithm can be optimal on all inputs. Thus, different algorithms will over or underfit on different inputs, finding more, fewer, or just different communities than is optimal, and evaluation methods that use a metadata partition as a ground truth will produce misleading conclusions about general accuracy. Here, we present a broad evaluation of over and underfitting in community detection, comparing the behavior of 16 state-of-the-art community detection algorithms on a novel and structurally diverse corpus of 406 real-world networks. We find that (i) algorithms vary widely both in the number of communities they find and in their corresponding composition, given the same input, (ii) algorithms can be clustered into distinct high-level groups based on similarities of their outputs on real-world networks, and (iii) these differences induce wide variation in accuracy on link prediction and link description tasks. We introduce a new diagnostic for evaluating overfitting and underfitting in practice, and use it to roughly divide community detection methods into general and specialized learning algorithms. Across methods and inputs, Bayesian techniques based on the stochastic block model and a minimum description length approach to regularization represent the best general learning approach, but can be outperformed under specific circumstances. These results introduce both a theoretically principled approach to evaluate over and underfitting in models of network community structure and a realistic benchmark by which new methods may be evaluated and compared.Comment: 22 pages, 13 figures, 3 table

    The impact of partially missing communities~on the reliability of centrality measures

    Full text link
    Network data is usually not error-free, and the absence of some nodes is a very common type of measurement error. Studies have shown that the reliability of centrality measures is severely affected by missing nodes. This paper investigates the reliability of centrality measures when missing nodes are likely to belong to the same community. We study the behavior of five commonly used centrality measures in uniform and scale-free networks in various error scenarios. We find that centrality measures are generally more reliable when missing nodes are likely to belong to the same community than in cases in which nodes are missing uniformly at random. In scale-free networks, the betweenness centrality becomes, however, less reliable when missing nodes are more likely to belong to the same community. Moreover, centrality measures in scale-free networks are more reliable in networks with stronger community structure. In contrast, we do not observe this effect for uniform networks. Our observations suggest that the impact of missing nodes on the reliability of centrality measures might not be as severe as the literature suggests

    Community Detection in Networks with Node Attributes

    Full text link
    Community detection algorithms are fundamental tools that allow us to uncover organizational principles in networks. When detecting communities, there are two possible sources of information one can use: the network structure, and the features and attributes of nodes. Even though communities form around nodes that have common edges and common attributes, typically, algorithms have only focused on one of these two data modalities: community detection algorithms traditionally focus only on the network structure, while clustering algorithms mostly consider only node attributes. In this paper, we develop Communities from Edge Structure and Node Attributes (CESNA), an accurate and scalable algorithm for detecting overlapping communities in networks with node attributes. CESNA statistically models the interaction between the network structure and the node attributes, which leads to more accurate community detection as well as improved robustness in the presence of noise in the network structure. CESNA has a linear runtime in the network size and is able to process networks an order of magnitude larger than comparable approaches. Last, CESNA also helps with the interpretation of detected communities by finding relevant node attributes for each community.Comment: Published in the proceedings of IEEE ICDM '1

    Communities in Networks

    Full text link
    We survey some of the concepts, methods, and applications of community detection, which has become an increasingly important area of network science. To help ease newcomers into the field, we provide a guide to available methodology and open problems, and discuss why scientists from diverse backgrounds are interested in these problems. As a running theme, we emphasize the connections of community detection to problems in statistical physics and computational optimization.Comment: survey/review article on community structure in networks; published version is available at http://people.maths.ox.ac.uk/~porterm/papers/comnotices.pd

    A General Framework for Complex Network Applications

    Full text link
    Complex network theory has been applied to solving practical problems from different domains. In this paper, we present a general framework for complex network applications. The keys of a successful application are a thorough understanding of the real system and a correct mapping of complex network theory to practical problems in the system. Despite of certain limitations discussed in this paper, complex network theory provides a foundation on which to develop powerful tools in analyzing and optimizing large interconnected systems.Comment: 8 page

    Finding and evaluating community structure in networks

    Full text link
    We propose and study a set of algorithms for discovering community structure in networks -- natural divisions of network nodes into densely connected subgroups. Our algorithms all share two definitive features: first, they involve iterative removal of edges from the network to split it into communities, the edges removed being identified using one of a number of possible "betweenness" measures, and second, these measures are, crucially, recalculated after each removal. We also propose a measure for the strength of the community structure found by our algorithms, which gives us an objective metric for choosing the number of communities into which a network should be divided. We demonstrate that our algorithms are highly effective at discovering community structure in both computer-generated and real-world network data, and show how they can be used to shed light on the sometimes dauntingly complex structure of networked systems.Comment: 16 pages, 13 figure
    corecore