5,547 research outputs found

    Mixture models and exploratory analysis in networks

    Get PDF
    Networks are widely used in the biological, physical, and social sciences as a concise mathematical representation of the topology of systems of interacting components. Understanding the structure of these networks is one of the outstanding challenges in the study of complex systems. Here we describe a general technique for detecting structural features in large-scale network data which works by dividing the nodes of a network into classes such that the members of each class have similar patterns of connection to other nodes. Using the machinery of probabilistic mixture models and the expectation-maximization algorithm, we show that it is possible to detect, without prior knowledge of what we are looking for, a very broad range of types of structure in networks. We give a number of examples demonstrating how the method can be used to shed light on the properties of real-world networks, including social and information networks.Comment: 8 pages, 4 figures, two new examples in this version plus minor correction

    Model selection and hypothesis testing for large-scale network models with overlapping groups

    Full text link
    The effort to understand network systems in increasing detail has resulted in a diversity of methods designed to extract their large-scale structure from data. Unfortunately, many of these methods yield diverging descriptions of the same network, making both the comparison and understanding of their results a difficult challenge. A possible solution to this outstanding issue is to shift the focus away from ad hoc methods and move towards more principled approaches based on statistical inference of generative models. As a result, we face instead the more well-defined task of selecting between competing generative processes, which can be done under a unified probabilistic framework. Here, we consider the comparison between a variety of generative models including features such as degree correction, where nodes with arbitrary degrees can belong to the same group, and community overlap, where nodes are allowed to belong to more than one group. Because such model variants possess an increasing number of parameters, they become prone to overfitting. In this work, we present a method of model selection based on the minimum description length criterion and posterior odds ratios that is capable of fully accounting for the increased degrees of freedom of the larger models, and selects the best one according to the statistical evidence available in the data. In applying this method to many empirical unweighted networks from different fields, we observe that community overlap is very often not supported by statistical evidence and is selected as a better model only for a minority of them. On the other hand, we find that degree correction tends to be almost universally favored by the available data, implying that intrinsic node proprieties (as opposed to group properties) are often an essential ingredient of network formation.Comment: 20 pages,7 figures, 1 tabl

    Data-driven Computational Social Science: A Survey

    Get PDF
    Social science concerns issues on individuals, relationships, and the whole society. The complexity of research topics in social science makes it the amalgamation of multiple disciplines, such as economics, political science, and sociology, etc. For centuries, scientists have conducted many studies to understand the mechanisms of the society. However, due to the limitations of traditional research methods, there exist many critical social issues to be explored. To solve those issues, computational social science emerges due to the rapid advancements of computation technologies and the profound studies on social science. With the aids of the advanced research techniques, various kinds of data from diverse areas can be acquired nowadays, and they can help us look into social problems with a new eye. As a result, utilizing various data to reveal issues derived from computational social science area has attracted more and more attentions. In this paper, to the best of our knowledge, we present a survey on data-driven computational social science for the first time which primarily focuses on reviewing application domains involving human dynamics. The state-of-the-art research on human dynamics is reviewed from three aspects: individuals, relationships, and collectives. Specifically, the research methodologies used to address research challenges in aforementioned application domains are summarized. In addition, some important open challenges with respect to both emerging research topics and research methods are discussed.Comment: 28 pages, 8 figure
    • …
    corecore