6,205 research outputs found
Scalable Inference of Customer Similarities from Interactions Data using Dirichlet Processes
Under the sociological theory of homophily, people who are similar to one
another are more likely to interact with one another. Marketers often have
access to data on interactions among customers from which, with homophily as a
guiding principle, inferences could be made about the underlying similarities.
However, larger networks face a quadratic explosion in the number of potential
interactions that need to be modeled. This scalability problem renders
probability models of social interactions computationally infeasible for all
but the smallest networks. In this paper we develop a probabilistic framework
for modeling customer interactions that is both grounded in the theory of
homophily, and is flexible enough to account for random variation in who
interacts with whom. In particular, we present a novel Bayesian nonparametric
approach, using Dirichlet processes, to moderate the scalability problems that
marketing researchers encounter when working with networked data. We find that
this framework is a powerful way to draw insights into latent similarities of
customers, and we discuss how marketers can apply these insights to
segmentation and targeting activities
Structure and Dynamics of Information Pathways in Online Media
Diffusion of information, spread of rumors and infectious diseases are all
instances of stochastic processes that occur over the edges of an underlying
network. Many times networks over which contagions spread are unobserved, and
such networks are often dynamic and change over time. In this paper, we
investigate the problem of inferring dynamic networks based on information
diffusion data. We assume there is an unobserved dynamic network that changes
over time, while we observe the results of a dynamic process spreading over the
edges of the network. The task then is to infer the edges and the dynamics of
the underlying network.
We develop an on-line algorithm that relies on stochastic convex optimization
to efficiently solve the dynamic network inference problem. We apply our
algorithm to information diffusion among 3.3 million mainstream media and blog
sites and experiment with more than 179 million different pieces of information
spreading over the network in a one year period. We study the evolution of
information pathways in the online media space and find interesting insights.
Information pathways for general recurrent topics are more stable across time
than for on-going news events. Clusters of news media sites and blogs often
emerge and vanish in matter of days for on-going news events. Major social
movements and events involving civil population, such as the Libyan's civil war
or Syria's uprise, lead to an increased amount of information pathways among
blogs as well as in the overall increase in the network centrality of blogs and
social media sites.Comment: To Appear at the 6th International Conference on Web Search and Data
Mining (WSDM '13
Numeric Input Relations for Relational Learning with Applications to Community Structure Analysis
Most work in the area of statistical relational learning (SRL) is focussed on
discrete data, even though a few approaches for hybrid SRL models have been
proposed that combine numerical and discrete variables. In this paper we
distinguish numerical random variables for which a probability distribution is
defined by the model from numerical input variables that are only used for
conditioning the distribution of discrete response variables. We show how
numerical input relations can very easily be used in the Relational Bayesian
Network framework, and that existing inference and learning methods need only
minor adjustments to be applied in this generalized setting. The resulting
framework provides natural relational extensions of classical probabilistic
models for categorical data. We demonstrate the usefulness of RBN models with
numeric input relations by several examples.
In particular, we use the augmented RBN framework to define probabilistic
models for multi-relational (social) networks in which the probability of a
link between two nodes depends on numeric latent feature vectors associated
with the nodes. A generic learning procedure can be used to obtain a
maximum-likelihood fit of model parameters and latent feature values for a
variety of models that can be expressed in the high-level RBN representation.
Specifically, we propose a model that allows us to interpret learned latent
feature values as community centrality degrees by which we can identify nodes
that are central for one community, that are hubs between communities, or that
are isolated nodes. In a multi-relational setting, the model also provides a
characterization of how different relations are associated with each community
Modeling Information Propagation with Survival Theory
Networks provide a skeleton for the spread of contagions, like, information,
ideas, behaviors and diseases. Many times networks over which contagions
diffuse are unobserved and need to be inferred. Here we apply survival theory
to develop general additive and multiplicative risk models under which the
network inference problems can be solved efficiently by exploiting their
convexity. Our additive risk model generalizes several existing network
inference models. We show all these models are particular cases of our more
general model. Our multiplicative model allows for modeling scenarios in which
a node can either increase or decrease the risk of activation of another node,
in contrast with previous approaches, which consider only positive risk
increments. We evaluate the performance of our network inference algorithms on
large synthetic and real cascade datasets, and show that our models are able to
predict the length and duration of cascades in real data.Comment: To appear at ICML '1
Estimating spillovers using imprecisely measured networks
In many experimental contexts, whether and how network interactions impact
the outcome of interest for both treated and untreated individuals are key
concerns. Networks data is often assumed to perfectly represent these possible
interactions. This paper considers the problem of estimating treatment effects
when measured connections are, instead, a noisy representation of the true
spillover pathways. We show that existing methods, using the potential outcomes
framework, yield biased estimators in the presence of this mismeasurement. We
develop a new method, using a class of mixture models, that can account for
missing connections and discuss its estimation via the Expectation-Maximization
algorithm. We check our method's performance by simulating experiments on real
network data from 43 villages in India. Finally, we use data from a previously
published study to show that estimates using our method are more robust to the
choice of network measure
Topics in social network analysis and network science
This chapter introduces statistical methods used in the analysis of social
networks and in the rapidly evolving parallel-field of network science.
Although several instances of social network analysis in health services
research have appeared recently, the majority involve only the most basic
methods and thus scratch the surface of what might be accomplished.
Cutting-edge methods using relevant examples and illustrations in health
services research are provided
- …