24 research outputs found
What Stops Social Epidemics?
Theoretical progress in understanding the dynamics of spreading processes on
graphs suggests the existence of an epidemic threshold below which no epidemics
form and above which epidemics spread to a significant fraction of the graph.
We have observed information cascades on the social media site Digg that spread
fast enough for one initial spreader to infect hundreds of people, yet end up
affecting only 0.1% of the entire network. We find that two effects, previously
studied in isolation, combine cooperatively to drastically limit the final size
of cascades on Digg. First, because of the highly clustered structure of the
Digg network, most people who are aware of a story have been exposed to it via
multiple friends. This structure lowers the epidemic threshold while moderately
slowing the overall growth of cascades. In addition, we find that the mechanism
for social contagion on Digg points to a fundamental difference between
information spread and other contagion processes: despite multiple
opportunities for infection within a social group, people are less likely to
become spreaders of information with repeated exposure. The consequences of
this mechanism become more pronounced for more clustered graphs. Ultimately,
this effect severely curtails the size of social epidemics on Digg.Comment: 8 pages, 10 figures, accepted in ICWSM1
Phantom cascades: The effect of hidden nodes on information diffusion
Research on information diffusion generally assumes complete knowledge of the
underlying network. However, in the presence of factors such as increasing
privacy awareness, restrictions on application programming interfaces (APIs)
and sampling strategies, this assumption rarely holds in the real world which
in turn leads to an underestimation of the size of information cascades. In
this work we study the effect of hidden network structure on information
diffusion processes. We characterise information cascades through activation
paths traversing visible and hidden parts of the network. We quantify diffusion
estimation error while varying the amount of hidden structure in five empirical
and synthetic network datasets and demonstrate the effect of topological
properties on this error. Finally, we suggest practical recommendations for
practitioners and propose a model to predict the cascade size with minimal
information regarding the underlying network.Comment: Preprint submitted to Elsevier Computer Communication
Information is not a Virus, and Other Consequences of Human Cognitive Limits
The many decisions people make about what to pay attention to online shape
the spread of information in online social networks. Due to the constraints of
available time and cognitive resources, the ease of discovery strongly impacts
how people allocate their attention to social media content. As a consequence,
the position of information in an individual's social feed, as well as explicit
social signals about its popularity, determine whether it will be seen, and the
likelihood that it will be shared with followers. Accounting for these
cognitive limits simplifies mechanics of information diffusion in online social
networks and explains puzzling empirical observations: (i) information
generally fails to spread in social media and (ii) highly connected people are
less likely to re-share information. Studies of information diffusion on
different social media platforms reviewed here suggest that the interplay
between human cognitive limits and network structure differentiates the spread
of information from other social contagions, such as the spread of a virus
through a population.Comment: accepted for publication in Future Interne
Why Do Cascade Sizes Follow a Power-Law?
We introduce random directed acyclic graph and use it to model the
information diffusion network. Subsequently, we analyze the cascade generation
model (CGM) introduced by Leskovec et al. [19]. Until now only empirical
studies of this model were done. In this paper, we present the first
theoretical proof that the sizes of cascades generated by the CGM follow the
power-law distribution, which is consistent with multiple empirical analysis of
the large social networks. We compared the assumptions of our model with the
Twitter social network and tested the goodness of approximation.Comment: 8 pages, 7 figures, accepted to WWW 201
Non-Conservative Diffusion and its Application to Social Network Analysis
The random walk is fundamental to modeling dynamic processes on networks.
Metrics based on the random walk have been used in many applications from image
processing to Web page ranking. However, how appropriate are random walks to
modeling and analyzing social networks? We argue that unlike a random walk,
which conserves the quantity diffusing on a network, many interesting social
phenomena, such as the spread of information or disease on a social network,
are fundamentally non-conservative. When an individual infects her neighbor
with a virus, the total amount of infection increases. We classify diffusion
processes as conservative and non-conservative and show how these differences
impact the choice of metrics used for network analysis, as well as our
understanding of network structure and behavior. We show that Alpha-Centrality,
which mathematically describes non-conservative diffusion, leads to new
insights into the behavior of spreading processes on networks. We give a
scalable approximate algorithm for computing the Alpha-Centrality in a massive
graph. We validate our approach on real-world online social networks of Digg.
We show that a non-conservative metric, such as Alpha-Centrality, produces
better agreement with empirical measure of influence than conservative metrics,
such as PageRank. We hope that our investigation will inspire further
exploration into the realms of conservative and non-conservative metrics in
social network analysis
Quantifying Information Overload in Social Media and its Impact on Social Contagions
Information overload has become an ubiquitous problem in modern society.
Social media users and microbloggers receive an endless flow of information,
often at a rate far higher than their cognitive abilities to process the
information. In this paper, we conduct a large scale quantitative study of
information overload and evaluate its impact on information dissemination in
the Twitter social media site. We model social media users as information
processing systems that queue incoming information according to some policies,
process information from the queue at some unknown rates and decide to forward
some of the incoming information to other users. We show how timestamped data
about tweets received and forwarded by users can be used to uncover key
properties of their queueing policies and estimate their information processing
rates and limits. Such an understanding of users' information processing
behaviors allows us to infer whether and to what extent users suffer from
information overload.
Our analysis provides empirical evidence of information processing limits for
social media users and the prevalence of information overloading. The most
active and popular social media users are often the ones that are overloaded.
Moreover, we find that the rate at which users receive information impacts
their processing behavior, including how they prioritize information from
different sources, how much information they process, and how quickly they
process information. Finally, the susceptibility of a social media user to
social contagions depends crucially on the rate at which she receives
information. An exposure to a piece of information, be it an idea, a convention
or a product, is much less effective for users that receive information at
higher rates, meaning they need more exposures to adopt a particular contagion.Comment: To appear at ICSWM '1
Computational Social Scientist Beware: Simpson's Paradox in Behavioral Data
Observational data about human behavior is often heterogeneous, i.e.,
generated by subgroups within the population under study that vary in size and
behavior. Heterogeneity predisposes analysis to Simpson's paradox, whereby the
trends observed in data that has been aggregated over the entire population may
be substantially different from those of the underlying subgroups. I illustrate
Simpson's paradox with several examples coming from studies of online behavior
and show that aggregate response leads to wrong conclusions about the
underlying individual behavior. I then present a simple method to test whether
Simpson's paradox is affecting results of analysis. The presence of Simpson's
paradox in social data suggests that important behavioral differences exist
within the population, and failure to take these differences into account can
distort the studies' findings.Comment: to appear in Journal of Computational Social Science