826 research outputs found
Topics in social network analysis and network science
This chapter introduces statistical methods used in the analysis of social
networks and in the rapidly evolving parallel-field of network science.
Although several instances of social network analysis in health services
research have appeared recently, the majority involve only the most basic
methods and thus scratch the surface of what might be accomplished.
Cutting-edge methods using relevant examples and illustrations in health
services research are provided
Scalable Inference of Customer Similarities from Interactions Data using Dirichlet Processes
Under the sociological theory of homophily, people who are similar to one
another are more likely to interact with one another. Marketers often have
access to data on interactions among customers from which, with homophily as a
guiding principle, inferences could be made about the underlying similarities.
However, larger networks face a quadratic explosion in the number of potential
interactions that need to be modeled. This scalability problem renders
probability models of social interactions computationally infeasible for all
but the smallest networks. In this paper we develop a probabilistic framework
for modeling customer interactions that is both grounded in the theory of
homophily, and is flexible enough to account for random variation in who
interacts with whom. In particular, we present a novel Bayesian nonparametric
approach, using Dirichlet processes, to moderate the scalability problems that
marketing researchers encounter when working with networked data. We find that
this framework is a powerful way to draw insights into latent similarities of
customers, and we discuss how marketers can apply these insights to
segmentation and targeting activities
Nonparametric Bayes Modeling of Populations of Networks
Replicated network data are increasingly available in many research fields.
In connectomic applications, inter-connections among brain regions are
collected for each patient under study, motivating statistical models which can
flexibly characterize the probabilistic generative mechanism underlying these
network-valued data. Available models for a single network are not designed
specifically for inference on the entire probability mass function of a
network-valued random variable and therefore lack flexibility in characterizing
the distribution of relevant topological structures. We propose a flexible
Bayesian nonparametric approach for modeling the population distribution of
network-valued data. The joint distribution of the edges is defined via a
mixture model which reduces dimensionality and efficiently incorporates network
information within each mixture component by leveraging latent space
representations. The formulation leads to an efficient Gibbs sampler and
provides simple and coherent strategies for inference and goodness-of-fit
assessments. We provide theoretical results on the flexibility of our model and
illustrate improved performance --- compared to state-of-the-art models --- in
simulations and application to human brain networks
Sequences of purchases in credit card data reveal life styles in urban populations
Zipf-like distributions characterize a wide set of phenomena in physics,
biology, economics and social sciences. In human activities, Zipf-laws describe
for example the frequency of words appearance in a text or the purchases types
in shopping patterns. In the latter, the uneven distribution of transaction
types is bound with the temporal sequences of purchases of individual choices.
In this work, we define a framework using a text compression technique on the
sequences of credit card purchases to detect ubiquitous patterns of collective
behavior. Clustering the consumers by their similarity in purchases sequences,
we detect five consumer groups. Remarkably, post checking, individuals in each
group are also similar in their age, total expenditure, gender, and the
diversity of their social and mobility networks extracted by their mobile phone
records. By properly deconstructing transaction data with Zipf-like
distributions, this method uncovers sets of significant sequences that reveal
insights on collective human behavior.Comment: 30 pages, 26 figure
A Latent Parameter Node-Centric Model for Spatial Networks
Spatial networks, in which nodes and edges are embedded in space, play a
vital role in the study of complex systems. For example, many social networks
attach geo-location information to each user, allowing the study of not only
topological interactions between users, but spatial interactions as well. The
defining property of spatial networks is that edge distances are associated
with a cost, which may subtly influence the topology of the network. However,
the cost function over distance is rarely known, thus developing a model of
connections in spatial networks is a difficult task.
In this paper, we introduce a novel model for capturing the interaction
between spatial effects and network structure. Our approach represents a unique
combination of ideas from latent variable statistical models and spatial
network modeling. In contrast to previous work, we view the ability to form
long/short-distance connections to be dependent on the individual nodes
involved. For example, a node's specific surroundings (e.g. network structure
and node density) may make it more likely to form a long distance link than
other nodes with the same degree. To capture this information, we attach a
latent variable to each node which represents a node's spatial reach. These
variables are inferred from the network structure using a Markov Chain Monte
Carlo algorithm.
We experimentally evaluate our proposed model on 4 different types of
real-world spatial networks (e.g. transportation, biological, infrastructure,
and social). We apply our model to the task of link prediction and achieve up
to a 35% improvement over previous approaches in terms of the area under the
ROC curve. Additionally, we show that our model is particularly helpful for
predicting links between nodes with low degrees. In these cases, we see much
larger improvements over previous models
Recommended from our members
The analysis of social network data: an exciting frontier for statisticians
The catalyst for this paper is the recent interest in the relationship between social networks and an individual's health, which has arisen following a series of papers by Nicholas Christakis and James Fowler on person- to-person spread of health behaviors. In this issue, they provide a detailed explanation of their methods that offers insights, justifications, and responses to criticisms [1]. In this paper, we introduce some of the key statistical methods used in social network analysis and indicate where those used by Christakis and Fowler (CF) fit into the general framework. The intent is to provide the background necessary for readers to be able to make their own evaluation of the work by CF and understand the challenges of research involving social networks. We entertain possible solutions to some of the difficulties encountered in accounting for confounding effects in analyses of peer effects and provide comments on the contributions of CF
- …