5,547 research outputs found
Mixture models and exploratory analysis in networks
Networks are widely used in the biological, physical, and social sciences as
a concise mathematical representation of the topology of systems of interacting
components. Understanding the structure of these networks is one of the
outstanding challenges in the study of complex systems. Here we describe a
general technique for detecting structural features in large-scale network data
which works by dividing the nodes of a network into classes such that the
members of each class have similar patterns of connection to other nodes. Using
the machinery of probabilistic mixture models and the expectation-maximization
algorithm, we show that it is possible to detect, without prior knowledge of
what we are looking for, a very broad range of types of structure in networks.
We give a number of examples demonstrating how the method can be used to shed
light on the properties of real-world networks, including social and
information networks.Comment: 8 pages, 4 figures, two new examples in this version plus minor
correction
Model selection and hypothesis testing for large-scale network models with overlapping groups
The effort to understand network systems in increasing detail has resulted in
a diversity of methods designed to extract their large-scale structure from
data. Unfortunately, many of these methods yield diverging descriptions of the
same network, making both the comparison and understanding of their results a
difficult challenge. A possible solution to this outstanding issue is to shift
the focus away from ad hoc methods and move towards more principled approaches
based on statistical inference of generative models. As a result, we face
instead the more well-defined task of selecting between competing generative
processes, which can be done under a unified probabilistic framework. Here, we
consider the comparison between a variety of generative models including
features such as degree correction, where nodes with arbitrary degrees can
belong to the same group, and community overlap, where nodes are allowed to
belong to more than one group. Because such model variants possess an
increasing number of parameters, they become prone to overfitting. In this
work, we present a method of model selection based on the minimum description
length criterion and posterior odds ratios that is capable of fully accounting
for the increased degrees of freedom of the larger models, and selects the best
one according to the statistical evidence available in the data. In applying
this method to many empirical unweighted networks from different fields, we
observe that community overlap is very often not supported by statistical
evidence and is selected as a better model only for a minority of them. On the
other hand, we find that degree correction tends to be almost universally
favored by the available data, implying that intrinsic node proprieties (as
opposed to group properties) are often an essential ingredient of network
formation.Comment: 20 pages,7 figures, 1 tabl
Data-driven Computational Social Science: A Survey
Social science concerns issues on individuals, relationships, and the whole
society. The complexity of research topics in social science makes it the
amalgamation of multiple disciplines, such as economics, political science, and
sociology, etc. For centuries, scientists have conducted many studies to
understand the mechanisms of the society. However, due to the limitations of
traditional research methods, there exist many critical social issues to be
explored. To solve those issues, computational social science emerges due to
the rapid advancements of computation technologies and the profound studies on
social science. With the aids of the advanced research techniques, various
kinds of data from diverse areas can be acquired nowadays, and they can help us
look into social problems with a new eye. As a result, utilizing various data
to reveal issues derived from computational social science area has attracted
more and more attentions. In this paper, to the best of our knowledge, we
present a survey on data-driven computational social science for the first time
which primarily focuses on reviewing application domains involving human
dynamics. The state-of-the-art research on human dynamics is reviewed from
three aspects: individuals, relationships, and collectives. Specifically, the
research methodologies used to address research challenges in aforementioned
application domains are summarized. In addition, some important open challenges
with respect to both emerging research topics and research methods are
discussed.Comment: 28 pages, 8 figure
- …