37,268 research outputs found
Toward automatic censorship detection in microblogs
Social media is an area where users often experience censorship through a
variety of means such as the restriction of search terms or active and
retroactive deletion of messages. In this paper we examine the feasibility of
automatically detecting censorship of microblogs. We use a network growing
model to simulate discussion over a microblog follow network and compare two
censorship strategies to simulate varying levels of message deletion. Using
topological features extracted from the resulting graphs, a classifier is
trained to detect whether or not a given communication graph has been censored.
The results show that censorship detection is feasible under empirically
measured levels of message deletion. The proposed framework can enable
automated censorship measurement and tracking, which, when combined with
aggregated citizen reports of censorship, can allow users to make informed
decisions about online communication habits.Comment: 13 pages. Updated with example cascades figure and typo fixes. To
appear at the International Workshop on Data Mining in Social Networks
(PAKDD-SocNet) 201
Disease spread over randomly switched large-scale networks
In this paper we study disease spread over a randomly switched network, which
is modeled by a stochastic switched differential equation based on the so
called -intertwined model for disease spread over static networks. Assuming
that all the edges of the network are independently switched, we present
sufficient conditions for the convergence of infection probability to zero.
Though the stability theory for switched linear systems can naively derive a
necessary and sufficient condition for the convergence, the condition cannot be
used for large-scale networks because, for a network with agents, it
requires computing the maximum real eigenvalue of a matrix of size exponential
in . On the other hand, our conditions that are based also on the spectral
theory of random matrices can be checked by computing the maximum real
eigenvalue of a matrix of size exactly
Statistical clustering of temporal networks through a dynamic stochastic block model
Statistical node clustering in discrete time dynamic networks is an emerging
field that raises many challenges. Here, we explore statistical properties and
frequentist inference in a model that combines a stochastic block model (SBM)
for its static part with independent Markov chains for the evolution of the
nodes groups through time. We model binary data as well as weighted dynamic
random graphs (with discrete or continuous edges values). Our approach,
motivated by the importance of controlling for label switching issues across
the different time steps, focuses on detecting groups characterized by a stable
within group connectivity behavior. We study identifiability of the model
parameters, propose an inference procedure based on a variational expectation
maximization algorithm as well as a model selection criterion to select for the
number of groups. We carefully discuss our initialization strategy which plays
an important role in the method and compare our procedure with existing ones on
synthetic datasets. We also illustrate our approach on dynamic contact
networks, one of encounters among high school students and two others on animal
interactions. An implementation of the method is available as a R package
called dynsbm
Sharing Social Network Data: Differentially Private Estimation of Exponential-Family Random Graph Models
Motivated by a real-life problem of sharing social network data that contain
sensitive personal information, we propose a novel approach to release and
analyze synthetic graphs in order to protect privacy of individual
relationships captured by the social network while maintaining the validity of
statistical results. A case study using a version of the Enron e-mail corpus
dataset demonstrates the application and usefulness of the proposed techniques
in solving the challenging problem of maintaining privacy \emph{and} supporting
open access to network data to ensure reproducibility of existing studies and
discovering new scientific insights that can be obtained by analyzing such
data. We use a simple yet effective randomized response mechanism to generate
synthetic networks under -edge differential privacy, and then use
likelihood based inference for missing data and Markov chain Monte Carlo
techniques to fit exponential-family random graph models to the generated
synthetic networks.Comment: Updated, 39 page
Properties of Healthcare Teaming Networks as a Function of Network Construction Algorithms
Network models of healthcare systems can be used to examine how providers
collaborate, communicate, refer patients to each other. Most healthcare service
network models have been constructed from patient claims data, using billing
claims to link patients with providers. The data sets can be quite large,
making standard methods for network construction computationally challenging
and thus requiring the use of alternate construction algorithms. While these
alternate methods have seen increasing use in generating healthcare networks,
there is little to no literature comparing the differences in the structural
properties of the generated networks. To address this issue, we compared the
properties of healthcare networks constructed using different algorithms and
the 2013 Medicare Part B outpatient claims data. Three different algorithms
were compared: binning, sliding frame, and trace-route. Unipartite networks
linking either providers or healthcare organizations by shared patients were
built using each method. We found that each algorithm produced networks with
substantially different topological properties. Provider networks adhered to a
power law, and organization networks to a power law with exponential cutoff.
Censoring networks to exclude edges with less than 11 shared patients, a common
de-identification practice for healthcare network data, markedly reduced edge
numbers and greatly altered measures of vertex prominence such as the
betweenness centrality. We identified patterns in the distance patients travel
between network providers, and most strikingly between providers in the
Northeast United States and Florida. We conclude that the choice of network
construction algorithm is critical for healthcare network analysis, and discuss
the implications for selecting the algorithm best suited to the type of
analysis to be performed.Comment: With links to comprehensive, high resolution figures and networks via
figshare.co
Theories for influencer identification in complex networks
In social and biological systems, the structural heterogeneity of interaction
networks gives rise to the emergence of a small set of influential nodes, or
influencers, in a series of dynamical processes. Although much smaller than the
entire network, these influencers were observed to be able to shape the
collective dynamics of large populations in different contexts. As such, the
successful identification of influencers should have profound implications in
various real-world spreading dynamics such as viral marketing, epidemic
outbreaks and cascading failure. In this chapter, we first summarize the
centrality-based approach in finding single influencers in complex networks,
and then discuss the more complicated problem of locating multiple influencers
from a collective point of view. Progress rooted in collective influence
theory, belief-propagation and computer science will be presented. Finally, we
present some applications of influencer identification in diverse real-world
systems, including online social platforms, scientific publication, brain
networks and socioeconomic systems.Comment: 24 pages, 6 figure
- …