9,589 research outputs found
Structure and Dynamics of Information Pathways in Online Media
Diffusion of information, spread of rumors and infectious diseases are all
instances of stochastic processes that occur over the edges of an underlying
network. Many times networks over which contagions spread are unobserved, and
such networks are often dynamic and change over time. In this paper, we
investigate the problem of inferring dynamic networks based on information
diffusion data. We assume there is an unobserved dynamic network that changes
over time, while we observe the results of a dynamic process spreading over the
edges of the network. The task then is to infer the edges and the dynamics of
the underlying network.
We develop an on-line algorithm that relies on stochastic convex optimization
to efficiently solve the dynamic network inference problem. We apply our
algorithm to information diffusion among 3.3 million mainstream media and blog
sites and experiment with more than 179 million different pieces of information
spreading over the network in a one year period. We study the evolution of
information pathways in the online media space and find interesting insights.
Information pathways for general recurrent topics are more stable across time
than for on-going news events. Clusters of news media sites and blogs often
emerge and vanish in matter of days for on-going news events. Major social
movements and events involving civil population, such as the Libyan's civil war
or Syria's uprise, lead to an increased amount of information pathways among
blogs as well as in the overall increase in the network centrality of blogs and
social media sites.Comment: To Appear at the 6th International Conference on Web Search and Data
Mining (WSDM '13
Modeling Information Propagation with Survival Theory
Networks provide a skeleton for the spread of contagions, like, information,
ideas, behaviors and diseases. Many times networks over which contagions
diffuse are unobserved and need to be inferred. Here we apply survival theory
to develop general additive and multiplicative risk models under which the
network inference problems can be solved efficiently by exploiting their
convexity. Our additive risk model generalizes several existing network
inference models. We show all these models are particular cases of our more
general model. Our multiplicative model allows for modeling scenarios in which
a node can either increase or decrease the risk of activation of another node,
in contrast with previous approaches, which consider only positive risk
increments. We evaluate the performance of our network inference algorithms on
large synthetic and real cascade datasets, and show that our models are able to
predict the length and duration of cascades in real data.Comment: To appear at ICML '1
Collaborative Inference of Coexisting Information Diffusions
Recently, \textit{diffusion history inference} has become an emerging
research topic due to its great benefits for various applications, whose
purpose is to reconstruct the missing histories of information diffusion traces
according to incomplete observations. The existing methods, however, often
focus only on single information diffusion trace, while in a real-world social
network, there often coexist multiple information diffusions over the same
network. In this paper, we propose a novel approach called Collaborative
Inference Model (CIM) for the problem of the inference of coexisting
information diffusions. By exploiting the synergism between the coexisting
information diffusions, CIM holistically models multiple information diffusions
as a sparse 4th-order tensor called Coexisting Diffusions Tensor (CDT) without
any prior assumption of diffusion models, and collaboratively infers the
histories of the coexisting information diffusions via a low-rank approximation
of CDT with a fusion of heterogeneous constraints generated from additional
data sources. To improve the efficiency, we further propose an optimal
algorithm called Time Window based Parallel Decomposition Algorithm (TWPDA),
which can speed up the inference without compromise on the accuracy by
utilizing the temporal locality of information diffusions. The extensive
experiments conducted on real world datasets and synthetic datasets verify the
effectiveness and efficiency of CIM and TWPDA
Influence Maximization with Bandits
We consider the problem of \emph{influence maximization}, the problem of
maximizing the number of people that become aware of a product by finding the
`best' set of `seed' users to expose the product to. Most prior work on this
topic assumes that we know the probability of each user influencing each other
user, or we have data that lets us estimate these influences. However, this
information is typically not initially available or is difficult to obtain. To
avoid this assumption, we adopt a combinatorial multi-armed bandit paradigm
that estimates the influence probabilities as we sequentially try different
seed sets. We establish bounds on the performance of this procedure under the
existing edge-level feedback as well as a novel and more realistic node-level
feedback. Beyond our theoretical results, we describe a practical
implementation and experimentally demonstrate its efficiency and effectiveness
on four real datasets.Comment: 12 page
Early Warning Analysis for Social Diffusion Events
There is considerable interest in developing predictive capabilities for
social diffusion processes, for instance to permit early identification of
emerging contentious situations, rapid detection of disease outbreaks, or
accurate forecasting of the ultimate reach of potentially viral ideas or
behaviors. This paper proposes a new approach to this predictive analytics
problem, in which analysis of meso-scale network dynamics is leveraged to
generate useful predictions for complex social phenomena. We begin by deriving
a stochastic hybrid dynamical systems (S-HDS) model for diffusion processes
taking place over social networks with realistic topologies; this modeling
approach is inspired by recent work in biology demonstrating that S-HDS offer a
useful mathematical formalism with which to represent complex, multi-scale
biological network dynamics. We then perform formal stochastic reachability
analysis with this S-HDS model and conclude that the outcomes of social
diffusion processes may depend crucially upon the way the early dynamics of the
process interacts with the underlying network's community structure and
core-periphery structure. This theoretical finding provides the foundations for
developing a machine learning algorithm that enables accurate early warning
analysis for social diffusion events. The utility of the warning algorithm, and
the power of network-based predictive metrics, are demonstrated through an
empirical investigation of the propagation of political memes over social media
networks. Additionally, we illustrate the potential of the approach for
security informatics applications through case studies involving early warning
analysis of large-scale protests events and politically-motivated cyber
attacks
Modeling the structure and evolution of discussion cascades
We analyze the structure and evolution of discussion cascades in four popular
websites: Slashdot, Barrapunto, Meneame and Wikipedia. Despite the big
heterogeneities between these sites, a preferential attachment (PA) model with
bias to the root can capture the temporal evolution of the observed trees and
many of their statistical properties, namely, probability distributions of the
branching factors (degrees), subtree sizes and certain correlations. The
parameters of the model are learned efficiently using a novel maximum
likelihood estimation scheme for PA and provide a figurative interpretation
about the communication habits and the resulting discussion cascades on the
four different websites.Comment: 10 pages, 11 figure
Estimating Diffusion Network Structures: Recovery Conditions, Sample Complexity & Soft-thresholding Algorithm
Information spreads across social and technological networks, but often the
network structures are hidden from us and we only observe the traces left by
the diffusion processes, called cascades. Can we recover the hidden network
structures from these observed cascades? What kind of cascades and how many
cascades do we need? Are there some network structures which are more difficult
than others to recover? Can we design efficient inference algorithms with
provable guarantees?
Despite the increasing availability of cascade data and methods for inferring
networks from these data, a thorough theoretical understanding of the above
questions remains largely unexplored in the literature. In this paper, we
investigate the network structure inference problem for a general family of
continuous-time diffusion models using an -regularized likelihood
maximization framework. We show that, as long as the cascade sampling process
satisfies a natural incoherence condition, our framework can recover the
correct network structure with high probability if we observe
cascades, where is the maximum number of parents of a node and is the
total number of nodes. Moreover, we develop a simple and efficient
soft-thresholding inference algorithm, which we use to illustrate the
consequences of our theoretical results, and show that our framework
outperforms other alternatives in practice.Comment: To appear in the 31st International Conference on Machine Learning
(ICML), 201
- …