29,200 research outputs found
Pushing Your Point of View: Behavioral Measures of Manipulation in Wikipedia
As a major source for information on virtually any topic, Wikipedia serves an
important role in public dissemination and consumption of knowledge. As a
result, it presents tremendous potential for people to promulgate their own
points of view; such efforts may be more subtle than typical vandalism. In this
paper, we introduce new behavioral metrics to quantify the level of controversy
associated with a particular user: a Controversy Score (C-Score) based on the
amount of attention the user focuses on controversial pages, and a Clustered
Controversy Score (CC-Score) that also takes into account topical clustering.
We show that both these measures are useful for identifying people who try to
"push" their points of view, by showing that they are good predictors of which
editors get blocked. The metrics can be used to triage potential POV pushers.
We apply this idea to a dataset of users who requested promotion to
administrator status and easily identify some editors who significantly changed
their behavior upon becoming administrators. At the same time, such behavior is
not rampant. Those who are promoted to administrator status tend to have more
stable behavior than comparable groups of prolific editors. This suggests that
the Adminship process works well, and that the Wikipedia community is not
overwhelmed by users who become administrators to promote their own points of
view
Temporal similarity metrics for latent network reconstruction: The role of time-lag decay
When investigating the spreading of a piece of information or the diffusion
of an innovation, we often lack information on the underlying propagation
network. Reconstructing the hidden propagation paths based on the observed
diffusion process is a challenging problem which has recently attracted
attention from diverse research fields. To address this reconstruction problem,
based on static similarity metrics commonly used in the link prediction
literature, we introduce new node-node temporal similarity metrics. The new
metrics take as input the time-series of multiple independent spreading
processes, based on the hypothesis that two nodes are more likely to be
connected if they were often infected at similar points in time. This
hypothesis is implemented by introducing a time-lag function which penalizes
distant infection times. We find that the choice of this time-lag strongly
affects the metrics' reconstruction accuracy, depending on the network's
clustering coefficient and we provide an extensive comparative analysis of
static and temporal similarity metrics for network reconstruction. Our findings
shed new light on the notion of similarity between pairs of nodes in complex
networks
Scalable Privacy-Compliant Virality Prediction on Twitter
The digital town hall of Twitter becomes a preferred medium of communication
for individuals and organizations across the globe. Some of them reach
audiences of millions, while others struggle to get noticed. Given the impact
of social media, the question remains more relevant than ever: how to model the
dynamics of attention in Twitter. Researchers around the world turn to machine
learning to predict the most influential tweets and authors, navigating the
volume, velocity, and variety of social big data, with many compromises. In
this paper, we revisit content popularity prediction on Twitter. We argue that
strict alignment of data acquisition, storage and analysis algorithms is
necessary to avoid the common trade-offs between scalability, accuracy and
privacy compliance. We propose a new framework for the rapid acquisition of
large-scale datasets, high accuracy supervisory signal and multilanguage
sentiment prediction while respecting every privacy request applicable. We then
apply a novel gradient boosting framework to achieve state-of-the-art results
in virality ranking, already before including tweet's visual or propagation
features. Our Gradient Boosted Regression Tree is the first to offer
explainable, strong ranking performance on benchmark datasets. Since the
analysis focused on features available early, the model is immediately
applicable to incoming tweets in 18 languages.Comment: AffCon@AAAI-19 Best Paper Award; Presented at AAAI-19 W1: Affective
Content Analysi
- …