11 research outputs found
Distilling Information Reliability and Source Trustworthiness from Digital Traces
Online knowledge repositories typically rely on their users or dedicated
editors to evaluate the reliability of their content. These evaluations can be
viewed as noisy measurements of both information reliability and information
source trustworthiness. Can we leverage these noisy evaluations, often biased,
to distill a robust, unbiased and interpretable measure of both notions?
In this paper, we argue that the temporal traces left by these noisy
evaluations give cues on the reliability of the information and the
trustworthiness of the sources. Then, we propose a temporal point process
modeling framework that links these temporal traces to robust, unbiased and
interpretable notions of information reliability and source trustworthiness.
Furthermore, we develop an efficient convex optimization procedure to learn the
parameters of the model from historical traces. Experiments on real-world data
gathered from Wikipedia and Stack Overflow show that our modeling framework
accurately predicts evaluation events, provides an interpretable measure of
information reliability and source trustworthiness, and yields interesting
insights about real-world events.Comment: Accepted at 26th World Wide Web conference (WWW-17
Fake News Detection in Social Networks via Crowd Signals
Our work considers leveraging crowd signals for detecting fake news and is
motivated by tools recently introduced by Facebook that enable users to flag
fake news. By aggregating users' flags, our goal is to select a small subset of
news every day, send them to an expert (e.g., via a third-party fact-checking
organization), and stop the spread of news identified as fake by an expert. The
main objective of our work is to minimize the spread of misinformation by
stopping the propagation of fake news in the network. It is especially
challenging to achieve this objective as it requires detecting fake news with
high-confidence as quickly as possible. We show that in order to leverage
users' flags efficiently, it is crucial to learn about users' flagging
accuracy. We develop a novel algorithm, DETECTIVE, that performs Bayesian
inference for detecting fake news and jointly learns about users' flagging
accuracy over time. Our algorithm employs posterior sampling to actively trade
off exploitation (selecting news that maximize the objective value at a given
epoch) and exploration (selecting news that maximize the value of information
towards learning about users' flagging accuracy). We demonstrate the
effectiveness of our approach via extensive experiments and show the power of
leveraging community signals for fake news detection
Can Who-Edits-What Predict Edit Survival?
As the number of contributors to online peer-production systems grows, it
becomes increasingly important to predict whether the edits that users make
will eventually be beneficial to the project. Existing solutions either rely on
a user reputation system or consist of a highly specialized predictor that is
tailored to a specific peer-production system. In this work, we explore a
different point in the solution space that goes beyond user reputation but does
not involve any content-based feature of the edits. We view each edit as a game
between the editor and the component of the project. We posit that the
probability that an edit is accepted is a function of the editor's skill, of
the difficulty of editing the component and of a user-component interaction
term. Our model is broadly applicable, as it only requires observing data about
who makes an edit, what the edit affects and whether the edit survives or not.
We apply our model on Wikipedia and the Linux kernel, two examples of
large-scale peer-production systems, and we seek to understand whether it can
effectively predict edit survival: in both cases, we provide a positive answer.
Our approach significantly outperforms those based solely on user reputation
and bridges the gap with specialized predictors that use content-based
features. It is simple to implement, computationally inexpensive, and in
addition it enables us to discover interesting structure in the data.Comment: Accepted at KDD 201
UNIPoint: Universally Approximating Point Processes Intensities
Point processes are a useful mathematical tool for describing events over
time, and so there are many recent approaches for representing and learning
them. One notable open question is how to precisely describe the flexibility of
point process models and whether there exists a general model that can
represent all point processes. Our work bridges this gap. Focusing on the
widely used event intensity function representation of point processes, we
provide a proof that a class of learnable functions can universally approximate
any valid intensity function. The proof connects the well known
Stone-Weierstrass Theorem for function approximation, the uniform density of
non-negative continuous functions using a transfer functions, the formulation
of the parameters of a piece-wise continuous functions as a dynamic system, and
a recurrent neural network implementation for capturing the dynamics. Using
these insights, we design and implement UNIPoint, a novel neural point process
model, using recurrent neural networks to parameterise sums of basis function
upon each event. Evaluations on synthetic and real world datasets show that
this simpler representation performs better than Hawkes process variants and
more complex neural network-based approaches. We expect this result will
provide a practical basis for selecting and tuning models, as well as
furthering theoretical work on representational complexity and learnability
Distilling Information Reliability and Source Trustworthiness from Digital Traces
Online knowledge repositories typically rely on their users or dedicated editors to evaluate the reliability of their content. These evaluations can be viewed as noisy measurements of both information reliability and information source trustworthiness. Can we leverage these noisy evaluations, often biased, to distill a robust, unbiased and interpretable measure of both notions? In this paper, we argue that the temporal traces left by these noisy evaluations give cues on the reliability of the information and the trustworthiness of the sources. Then, we propose a temporal point process modeling framework that links these temporal traces to robust, unbiased and interpretable notions of information reliability and source trustworthiness. Furthermore, we develop an efficient convex optimization procedure to learn the parameters of the model from historical traces. Experiments on real-world data gathered from Wikipedia and Stack Overflow show that our modeling framework accurately predicts evaluation events, provides an interpretable measure of information reliability and source trustworthiness, and yields interesting insights about real-world events