103 research outputs found
Estimating Infection Sources in Networks Using Partial Timestamps
We study the problem of identifying infection sources in a network based on
the network topology, and a subset of infection timestamps. In the case of a
single infection source in a tree network, we derive the maximum likelihood
estimator of the source and the unknown diffusion parameters. We then introduce
a new heuristic involving an optimization over a parametrized family of Gromov
matrices to develop a single source estimation algorithm for general graphs.
Compared with the breadth-first search tree heuristic commonly adopted in the
literature, simulations demonstrate that our approach achieves better
estimation accuracy than several other benchmark algorithms, even though these
require more information like the diffusion parameters. We next develop a
multiple sources estimation algorithm for general graphs, which first
partitions the graph into source candidate clusters, and then applies our
single source estimation algorithm to each cluster. We show that if the graph
is a tree, then each source candidate cluster contains at least one source.
Simulations using synthetic and real networks, and experiments using real-world
data suggest that our proposed algorithms are able to estimate the true
infection source(s) to within a small number of hops with a small portion of
the infection timestamps being observed.Comment: 15 pages, 15 figures, accepted by IEEE Transactions on Information
Forensics and Securit
It’s Always April Fools’ Day! On the Difficulty of Social Network Misinformation Classification via Propagation Features
Given the huge impact that Online Social Networks (OSN)
had in the way people get informed and form their opinion,
they became an attractive playground for malicious entities
that want to spread misinformation, and leverage their effect.
In fact, misinformation easily spreads on OSN and is a huge
threat for modern society, possibly influencing also the outcome
of elections, or even putting people’s life at risk (e.g.,
spreading “anti-vaccines” misinformation). Therefore, it is
of paramount importance for our society to have some sort
of “validation” on information spreading through OSN. The
need for a wide-scale validation would greatly benefit from
automatic tools.
In this paper, we show that it is difficult to carry out an automatic
classification of misinformation considering only structural
properties of content propagation cascades. We focus on
structural properties, because they would be inherently dif-
ficult to be manipulated, with the the aim of circumventing
classification systems. To support our claim, we carry out an
extensive evaluation on Facebook posts belonging to conspiracy
theories (as representative of misinformation), and scientific
news (representative of fact-checked content). Our
findings show that conspiracy content actually reverberates
in a way which is hard to distinguish from the one scientific
content does: for the classification mechanisms we investigated,
classification F1-score never exceeds 0.65 during content
propagation stages, and is still less than 0.7 even after
propagation is complete
Reconstructing Graph Diffusion History from a Single Snapshot
Diffusion on graphs is ubiquitous with numerous high-impact applications. In
these applications, complete diffusion histories play an essential role in
terms of identifying dynamical patterns, reflecting on precaution actions, and
forecasting intervention effects. Despite their importance, complete diffusion
histories are rarely available and are highly challenging to reconstruct due to
ill-posedness, explosive search space, and scarcity of training data. To date,
few methods exist for diffusion history reconstruction. They are exclusively
based on the maximum likelihood estimation (MLE) formulation and require to
know true diffusion parameters. In this paper, we study an even harder problem,
namely reconstructing Diffusion history from A single SnapsHot} (DASH), where
we seek to reconstruct the history from only the final snapshot without knowing
true diffusion parameters. We start with theoretical analyses that reveal a
fundamental limitation of the MLE formulation. We prove: (a) estimation error
of diffusion parameters is unavoidable due to NP-hardness of diffusion
parameter estimation, and (b) the MLE formulation is sensitive to estimation
error of diffusion parameters. To overcome the inherent limitation of the MLE
formulation, we propose a novel barycenter formulation: finding the barycenter
of the posterior distribution of histories, which is provably stable against
the estimation error of diffusion parameters. We further develop an effective
solver named DIffusion hiTting Times with Optimal proposal (DITTO) by reducing
the problem to estimating posterior expected hitting times via the
Metropolis--Hastings Markov chain Monte Carlo method (M--H MCMC) and employing
an unsupervised graph neural network to learn an optimal proposal to accelerate
the convergence of M--H MCMC. We conduct extensive experiments to demonstrate
the efficacy of the proposed method.Comment: Full version of the KDD 2023 paper. Our code is available at
https://github.com/q-rz/KDD23-DITT
Rumour source detection in social networks using partial observations
The spread of information on graphs has been extensively studied in engineering, biology, and economics. Re- cently, however, several authors have started to address the more challenging inverse problem, of localizing the origin of an epidemic, given observed traces of infection. In this paper, we introduce a novel technique to estimate the location of a source of multiple epidemics on a general graph, assuming knowledge of the start times of rumours, and using observations from a small number of monitors
Back to the Source: an Online Approach for Sensor Placement and Source Localization
Source localization, the act of finding the originator of a disease or rumor in a network, has become an important problem in sociology and epidemiology. The localization is done using the infection state and time of infection of a few designated sensor nodes; however, maintaining sensors can be very costly in practice. We propose the first online approach to source localization: We deploy a priori only a small number of sensors (which reveal if they are reached by an infection) and then iteratively choose the best location to place new sensors in order to localize the source. This approach allows for source localization with a very small number of sensors; moreover, the source can be found while the epidemic is still ongoing. Our method applies to a general network topology and performs well even with random transmission delays
- …