5,915 research outputs found
On the Convexity of Latent Social Network Inference
In many real-world scenarios, it is nearly impossible to collect explicit
social network data. In such cases, whole networks must be inferred from
underlying observations. Here, we formulate the problem of inferring latent
social networks based on network diffusion or disease propagation data. We
consider contagions propagating over the edges of an unobserved social network,
where we only observe the times when nodes became infected, but not who
infected them. Given such node infection times, we then identify the optimal
network that best explains the observed data. We present a maximum likelihood
approach based on convex programming with a l1-like penalty term that
encourages sparsity. Experiments on real and synthetic data reveal that our
method near-perfectly recovers the underlying network structure as well as the
parameters of the contagion propagation model. Moreover, our approach scales
well as it can infer optimal networks of thousands of nodes in a matter of
minutes.Comment: NIPS, 201
Uncovering the Temporal Dynamics of Diffusion Networks
Time plays an essential role in the diffusion of information, influence and
disease over networks. In many cases we only observe when a node copies
information, makes a decision or becomes infected -- but the connectivity,
transmission rates between nodes and transmission sources are unknown.
Inferring the underlying dynamics is of outstanding interest since it enables
forecasting, influencing and retarding infections, broadly construed. To this
end, we model diffusion processes as discrete networks of continuous temporal
processes occurring at different rates. Given cascade data -- observed
infection times of nodes -- we infer the edges of the global diffusion network
and estimate the transmission rates of each edge that best explain the observed
data. The optimization problem is convex. The model naturally (without
heuristics) imposes sparse solutions and requires no parameter tuning. The
problem decouples into a collection of independent smaller problems, thus
scaling easily to networks on the order of hundreds of thousands of nodes.
Experiments on real and synthetic data show that our algorithm both recovers
the edges of diffusion networks and accurately estimates their transmission
rates from cascade data.Comment: To appear in the 28th International Conference on Machine Learning
(ICML), 2011. Website: http://www.stanford.edu/~manuelgr/netrate
Learning user-specific latent influence and susceptibility from information cascades
Predicting cascade dynamics has important implications for understanding
information propagation and launching viral marketing. Previous works mainly
adopt a pair-wise manner, modeling the propagation probability between pairs of
users using n^2 independent parameters for n users. Consequently, these models
suffer from severe overfitting problem, specially for pairs of users without
direct interactions, limiting their prediction accuracy. Here we propose to
model the cascade dynamics by learning two low-dimensional user-specific
vectors from observed cascades, capturing their influence and susceptibility
respectively. This model requires much less parameters and thus could combat
overfitting problem. Moreover, this model could naturally model
context-dependent factors like cumulative effect in information propagation.
Extensive experiments on synthetic dataset and a large-scale microblogging
dataset demonstrate that this model outperforms the existing pair-wise models
at predicting cascade dynamics, cascade size, and "who will be retweeted".Comment: from The 29th AAAI Conference on Artificial Intelligence (AAAI-2015
Submodular Inference of Diffusion Networks from Multiple Trees
Diffusion and propagation of information, influence and diseases take place
over increasingly larger networks. We observe when a node copies information,
makes a decision or becomes infected but networks are often hidden or
unobserved. Since networks are highly dynamic, changing and growing rapidly, we
only observe a relatively small set of cascades before a network changes
significantly. Scalable network inference based on a small cascade set is then
necessary for understanding the rapidly evolving dynamics that govern
diffusion. In this article, we develop a scalable approximation algorithm with
provable near-optimal performance based on submodular maximization which
achieves a high accuracy in such scenario, solving an open problem first
introduced by Gomez-Rodriguez et al (2010). Experiments on synthetic and real
diffusion data show that our algorithm in practice achieves an optimal
trade-off between accuracy and running time.Comment: To appear in the 29th International Conference on Machine Learning
(ICML), 2012. Website:
http://www.stanford.edu/~manuelgr/network-inference-multitree
Phantom cascades: The effect of hidden nodes on information diffusion
Research on information diffusion generally assumes complete knowledge of the
underlying network. However, in the presence of factors such as increasing
privacy awareness, restrictions on application programming interfaces (APIs)
and sampling strategies, this assumption rarely holds in the real world which
in turn leads to an underestimation of the size of information cascades. In
this work we study the effect of hidden network structure on information
diffusion processes. We characterise information cascades through activation
paths traversing visible and hidden parts of the network. We quantify diffusion
estimation error while varying the amount of hidden structure in five empirical
and synthetic network datasets and demonstrate the effect of topological
properties on this error. Finally, we suggest practical recommendations for
practitioners and propose a model to predict the cascade size with minimal
information regarding the underlying network.Comment: Preprint submitted to Elsevier Computer Communication
Collaborative Inference of Coexisting Information Diffusions
Recently, \textit{diffusion history inference} has become an emerging
research topic due to its great benefits for various applications, whose
purpose is to reconstruct the missing histories of information diffusion traces
according to incomplete observations. The existing methods, however, often
focus only on single information diffusion trace, while in a real-world social
network, there often coexist multiple information diffusions over the same
network. In this paper, we propose a novel approach called Collaborative
Inference Model (CIM) for the problem of the inference of coexisting
information diffusions. By exploiting the synergism between the coexisting
information diffusions, CIM holistically models multiple information diffusions
as a sparse 4th-order tensor called Coexisting Diffusions Tensor (CDT) without
any prior assumption of diffusion models, and collaboratively infers the
histories of the coexisting information diffusions via a low-rank approximation
of CDT with a fusion of heterogeneous constraints generated from additional
data sources. To improve the efficiency, we further propose an optimal
algorithm called Time Window based Parallel Decomposition Algorithm (TWPDA),
which can speed up the inference without compromise on the accuracy by
utilizing the temporal locality of information diffusions. The extensive
experiments conducted on real world datasets and synthetic datasets verify the
effectiveness and efficiency of CIM and TWPDA
Latent Self-Exciting Point Process Model for Spatial-Temporal Networks
We propose a latent self-exciting point process model that describes
geographically distributed interactions between pairs of entities. In contrast
to most existing approaches that assume fully observable interactions, here we
consider a scenario where certain interaction events lack information about
participants. Instead, this information needs to be inferred from the available
observations. We develop an efficient approximate algorithm based on
variational expectation-maximization to infer unknown participants in an event
given the location and the time of the event. We validate the model on
synthetic as well as real-world data, and obtain very promising results on the
identity-inference task. We also use our model to predict the timing and
participants of future events, and demonstrate that it compares favorably with
baseline approaches.Comment: 20 pages, 6 figures (v3); 11 pages, 6 figures (v2); previous version
appeared in the 9th Bayesian Modeling Applications Workshop, UAI'1
- …