40,835 research outputs found
Study of the Temporal-Statistics-Based Reputation Models for Q&A Systems
Q&A systems are becoming a vital source of knowledge in many different domains. In some cases, they are also associated with services which provide employers with important information regarding the expertise of its potential employees. Therefore, the reputation earned in such communities can be associated with better job opportunities, and its significance is increasing. However, in a community where there is no direct financial motivation for participation, a reputation score is not solely an expertise metric. It is also a powerful motivator for remaining an active community member. Regardless of this complexity, algorithms for calculating reputation scores need to be as easy to understand (and implement) as possible. Therefore, the designers of the Q&A reputation system often implement a set of fixed rules, to some extent trading quality for quantity. Our goal is to study whether (and how) temporal statistics of a Q&A website can be incorporated into its reputation system. We want the proposed mechanism to dynamically adjust the impact of a single-answer evaluation on the reputation of its producer. We would like the proposed model to accurately reflect the expertise of content producers
Postmortem Analysis of Decayed Online Social Communities: Cascade Pattern Analysis and Prediction
Recently, many online social networks, such as MySpace, Orkut, and
Friendster, have faced inactivity decay of their members, which contributed to
the collapse of these networks. The reasons, mechanics, and prevention
mechanisms of such inactivity decay are not fully understood. In this work, we
analyze decayed and alive sub-websites from the StackExchange platform. The
analysis mainly focuses on the inactivity cascades that occur among the members
of these communities. We provide measures to understand the decay process and
statistical analysis to extract the patterns that accompany the inactivity
decay. Additionally, we predict cascade size and cascade virality using machine
learning. The results of this work include a statistically significant
difference of the decay patterns between the decayed and the alive
sub-websites. These patterns are mainly: cascade size, cascade virality,
cascade duration, and cascade similarity. Additionally, the contributed
prediction framework showed satisfactory prediction results compared to a
baseline predictor. Supported by empirical evidence, the main findings of this
work are: (1) the decay process is not governed by only one network measure; it
is better described using multiple measures; (2) the expert members of the
StackExchange sub-websites were mainly responsible for the activity or
inactivity of the StackExchange sub-websites; (3) the Statistics sub-website is
going through decay dynamics that may lead to it becoming fully-decayed; and
(4) decayed sub-websites were originally less resilient to inactivity decay,
unlike the alive sub-websites
Active learning in annotating micro-blogs dealing with e-reputation
Elections unleash strong political views on Twitter, but what do people
really think about politics? Opinion and trend mining on micro blogs dealing
with politics has recently attracted researchers in several fields including
Information Retrieval and Machine Learning (ML). Since the performance of ML
and Natural Language Processing (NLP) approaches are limited by the amount and
quality of data available, one promising alternative for some tasks is the
automatic propagation of expert annotations. This paper intends to develop a
so-called active learning process for automatically annotating French language
tweets that deal with the image (i.e., representation, web reputation) of
politicians. Our main focus is on the methodology followed to build an original
annotated dataset expressing opinion from two French politicians over time. We
therefore review state of the art NLP-based ML algorithms to automatically
annotate tweets using a manual initiation step as bootstrap. This paper focuses
on key issues about active learning while building a large annotated data set
from noise. This will be introduced by human annotators, abundance of data and
the label distribution across data and entities. In turn, we show that Twitter
characteristics such as the author's name or hashtags can be considered as the
bearing point to not only improve automatic systems for Opinion Mining (OM) and
Topic Classification but also to reduce noise in human annotations. However, a
later thorough analysis shows that reducing noise might induce the loss of
crucial information.Comment: Journal of Interdisciplinary Methodologies and Issues in Science -
Vol 3 - Contextualisation digitale - 201
Can Who-Edits-What Predict Edit Survival?
As the number of contributors to online peer-production systems grows, it
becomes increasingly important to predict whether the edits that users make
will eventually be beneficial to the project. Existing solutions either rely on
a user reputation system or consist of a highly specialized predictor that is
tailored to a specific peer-production system. In this work, we explore a
different point in the solution space that goes beyond user reputation but does
not involve any content-based feature of the edits. We view each edit as a game
between the editor and the component of the project. We posit that the
probability that an edit is accepted is a function of the editor's skill, of
the difficulty of editing the component and of a user-component interaction
term. Our model is broadly applicable, as it only requires observing data about
who makes an edit, what the edit affects and whether the edit survives or not.
We apply our model on Wikipedia and the Linux kernel, two examples of
large-scale peer-production systems, and we seek to understand whether it can
effectively predict edit survival: in both cases, we provide a positive answer.
Our approach significantly outperforms those based solely on user reputation
and bridges the gap with specialized predictors that use content-based
features. It is simple to implement, computationally inexpensive, and in
addition it enables us to discover interesting structure in the data.Comment: Accepted at KDD 201
- …