28,194 research outputs found
Impact Of Content Features For Automatic Online Abuse Detection
Online communities have gained considerable importance in recent years due to
the increasing number of people connected to the Internet. Moderating user
content in online communities is mainly performed manually, and reducing the
workload through automatic methods is of great financial interest for community
maintainers. Often, the industry uses basic approaches such as bad words
filtering and regular expression matching to assist the moderators. In this
article, we consider the task of automatically determining if a message is
abusive. This task is complex since messages are written in a non-standardized
way, including spelling errors, abbreviations, community-specific codes...
First, we evaluate the system that we propose using standard features of online
messages. Then, we evaluate the impact of the addition of pre-processing
strategies, as well as original specific features developed for the community
of an online in-browser strategy game. We finally propose to analyze the
usefulness of this wide range of features using feature selection. This work
can lead to two possible applications: 1) automatically flag potentially
abusive messages to draw the moderator's attention on a narrow subset of
messages ; and 2) fully automate the moderation process by deciding whether a
message is abusive without any human intervention
Towards Better Understanding Researcher Strategies in Cross-Lingual Event Analytics
With an increasing amount of information on globally important events, there
is a growing demand for efficient analytics of multilingual event-centric
information. Such analytics is particularly challenging due to the large amount
of content, the event dynamics and the language barrier. Although memory
institutions increasingly collect event-centric Web content in different
languages, very little is known about the strategies of researchers who conduct
analytics of such content. In this paper we present researchers' strategies for
the content, method and feature selection in the context of cross-lingual
event-centric analytics observed in two case studies on multilingual Wikipedia.
We discuss the influence factors for these strategies, the findings enabled by
the adopted methods along with the current limitations and provide
recommendations for services supporting researchers in cross-lingual
event-centric analytics.Comment: In Proceedings of the International Conference on Theory and Practice
of Digital Libraries 201
Analyzing Controversial Topics within Facebook
Social media plays a significant role in the dissemination of information. Now more than ever, consumers turn to social media sites (SMS) to catch up on current events and share their perspectives. While this form of communication is enjoyed by the public, it also has its drawbacks. Because many perspectives can be captured via SMS, this often leads to public discourse and in some cases, controversy. Misinformation and disinformation continue to spread throughout the internet allowing many consumers to become misinformed. This further elevates such discourse and allows for real issues to be forgotten as online debate spirals out of reality and false information gains traction. Given the issues at hand, this paper seeks to demonstrate a rudimentary measurement of curve fitting as a proof of concept for capturing controversy on Facebook using the reactions of its user base toward controversial topics
- …