27,611 research outputs found
Impact Of Content Features For Automatic Online Abuse Detection
Online communities have gained considerable importance in recent years due to
the increasing number of people connected to the Internet. Moderating user
content in online communities is mainly performed manually, and reducing the
workload through automatic methods is of great financial interest for community
maintainers. Often, the industry uses basic approaches such as bad words
filtering and regular expression matching to assist the moderators. In this
article, we consider the task of automatically determining if a message is
abusive. This task is complex since messages are written in a non-standardized
way, including spelling errors, abbreviations, community-specific codes...
First, we evaluate the system that we propose using standard features of online
messages. Then, we evaluate the impact of the addition of pre-processing
strategies, as well as original specific features developed for the community
of an online in-browser strategy game. We finally propose to analyze the
usefulness of this wide range of features using feature selection. This work
can lead to two possible applications: 1) automatically flag potentially
abusive messages to draw the moderator's attention on a narrow subset of
messages ; and 2) fully automate the moderation process by deciding whether a
message is abusive without any human intervention
Graph-based Features for Automatic Online Abuse Detection
While online communities have become increasingly important over the years,
the moderation of user-generated content is still performed mostly manually.
Automating this task is an important step in reducing the financial cost
associated with moderation, but the majority of automated approaches strictly
based on message content are highly vulnerable to intentional obfuscation. In
this paper, we discuss methods for extracting conversational networks based on
raw multi-participant chat logs, and we study the contribution of graph
features to a classification system that aims to determine if a given message
is abusive. The conversational graph-based system yields unexpectedly high
performance , with results comparable to those previously obtained with a
content-based approach
Abusive Language Detection in Online Conversations by Combining Content-and Graph-based Features
In recent years, online social networks have allowed worldwide users to meet
and discuss. As guarantors of these communities, the administrators of these
platforms must prevent users from adopting inappropriate behaviors. This
verification task, mainly done by humans, is more and more difficult due to the
ever growing amount of messages to check. Methods have been proposed to
automatize this moderation process, mainly by providing approaches based on the
textual content of the exchanged messages. Recent work has also shown that
characteristics derived from the structure of conversations, in the form of
conversational graphs, can help detecting these abusive messages. In this
paper, we propose to take advantage of both sources of information by proposing
fusion methods integrating content-and graph-based features. Our experiments on
raw chat logs show that the content of the messages, but also of their dynamics
within a conversation contain partially complementary information, allowing
performance improvements on an abusive message classification task with a final
F-measure of 93.26%
Understanding Psycholinguistic Behavior of predominant drunk texters in Social Media
In the last decade, social media has evolved as one of the leading platform
to create, share, or exchange information; it is commonly used as a way for
individuals to maintain social connections. In this online digital world,
people use to post texts or pictures to express their views socially and create
user-user engagement through discussions and conversations. Thus, social media
has established itself to bear signals relating to human behavior. One can
easily design user characteristic network by scraping through someone's social
media profiles. In this paper, we investigate the potential of social media in
characterizing and understanding predominant drunk texters from the perspective
of their social, psychological and linguistic behavior as evident from the
content generated by them. Our research aims to analyze the behavior of drunk
texters on social media and to contrast this with non-drunk texters. We use
Twitter social media to obtain the set of drunk texters and non-drunk texters
and show that we can classify users into these two respective sets using
various psycholinguistic features with an overall average accuracy of 96.78%
with very high precision and recall. Note that such an automatic classification
can have far-reaching impact - (i) on health research related to addiction
prevention and control, and (ii) in eliminating abusive and vulgar contents
from Twitter, borne by the tweets of drunk texters.Comment: 6 pages, 8 Figures, ISCC 2018 Workshops - ICTS4eHealth 201
Hoaxy: A Platform for Tracking Online Misinformation
Massive amounts of misinformation have been observed to spread in
uncontrolled fashion across social media. Examples include rumors, hoaxes, fake
news, and conspiracy theories. At the same time, several journalistic
organizations devote significant efforts to high-quality fact checking of
online claims. The resulting information cascades contain instances of both
accurate and inaccurate information, unfold over multiple time scales, and
often reach audiences of considerable size. All these factors pose challenges
for the study of the social dynamics of online news sharing. Here we introduce
Hoaxy, a platform for the collection, detection, and analysis of online
misinformation and its related fact-checking efforts. We discuss the design of
the platform and present a preliminary analysis of a sample of public tweets
containing both fake news and fact checking. We find that, in the aggregate,
the sharing of fact-checking content typically lags that of misinformation by
10--20 hours. Moreover, fake news are dominated by very active users, while
fact checking is a more grass-roots activity. With the increasing risks
connected to massive online misinformation, social news observatories have the
potential to help researchers, journalists, and the general public understand
the dynamics of real and fake news sharing.Comment: 6 pages, 6 figures, submitted to Third Workshop on Social News On the
We
Characterizing Pedophile Conversations on the Internet using Online Grooming
Cyber-crime targeting children such as online pedophile activity are a major
and a growing concern to society. A deep understanding of predatory chat
conversations on the Internet has implications in designing effective solutions
to automatically identify malicious conversations from regular conversations.
We believe that a deeper understanding of the pedophile conversation can result
in more sophisticated and robust surveillance systems than majority of the
current systems relying only on shallow processing such as simple word-counting
or key-word spotting.
In this paper, we study pedophile conversations from the perspective of
online grooming theory and perform a series of linguistic-based empirical
analysis on several pedophile chat conversations to gain useful insights and
patterns. We manually annotated 75 pedophile chat conversations with six stages
of online grooming and test several hypothesis on it. The results of our
experiments reveal that relationship forming is the most dominant online
grooming stage in contrast to the sexual stage. We use a widely used
word-counting program (LIWC) to create psycho-linguistic profiles for each of
the six online grooming stages to discover interesting textual patterns useful
to improve our understanding of the online pedophile phenomenon. Furthermore,
we present empirical results that throw light on various aspects of a pedophile
conversation such as probability of state transitions from one stage to
another, distribution of a pedophile chat conversation across various online
grooming stages and correlations between pre-defined word categories and online
grooming stages
- …