16 research outputs found
HUMAN: Hierarchical Universal Modular ANnotator
A lot of real-world phenomena are complex and cannot be captured by single
task annotations. This causes a need for subsequent annotations, with
interdependent questions and answers describing the nature of the subject at
hand. Even in the case a phenomenon is easily captured by a single task, the
high specialisation of most annotation tools can result in having to switch to
another tool if the task only slightly changes.
We introduce HUMAN, a novel web-based annotation tool that addresses the
above problems by a) covering a variety of annotation tasks on both textual and
image data, and b) the usage of an internal deterministic state machine,
allowing the researcher to chain different annotation tasks in an
interdependent manner. Further, the modular nature of the tool makes it easy to
define new annotation tasks and integrate machine learning algorithms e.g., for
active learning. HUMAN comes with an easy-to-use graphical user interface that
simplifies the annotation task and management.Comment: 7 pages, 4 figures, EMNLP - Demonstrations 202
Toxicity
In research on online comments on social media platforms, different terms are widely used to describe comments that are hateful or disrespectful and thereby poison a discussion. This chapter takes a theoretical perspective on the term toxicity and related research in the field of computer science. More specifically, it explains the usage of the term and why its exact interpretation depends on the platform in question. Further, the article discusses the advantages of toxicity over other terms and provides an overview of the available toxic comment datasets. Finally, it introduces the concept of engaging comments as the counterpart of toxic comments, leading to a task that is complementary to the prevention and removal of toxic comments: the fostering and highlighting of engaging comments
CHILab @ HaSpeeDe 2: Enhancing Hate Speech Detection with Part-of-Speech Tagging
The present paper describes two neural network systems used for Hate Speech Detection tasks that make use not only of the pre-processed text but also of its Part-of-Speech (PoS) tag. The first system uses a Transformer Encoder block, a relatively novel neural network architecture that arises as a substitute for recurrent neural networks. The second system uses a Depth-wise Separable Convolutional Neural Network, a new type of CNN that has become known in the field of image processing thanks to its computational efficiency. These systems have been used for the participation to the HaSpeeDe 2 task of the EVALITA 2020 workshop with CHILab as the team name, where our best system, the one that uses Transformer, ranked first in two out of four tasks and ranked third in the other two tasks. The systems have also been tested on English, Spanish and German languages
Empowering NGOs in Countering Online Hate Messages
Studies on online hate speech have mostly focused on the automated detection
of harmful messages. Little attention has been devoted so far to the
development of effective strategies to fight hate speech, in particular through
the creation of counter-messages. While existing manual scrutiny and
intervention strategies are time-consuming and not scalable, advances in
natural language processing have the potential to provide a systematic approach
to hatred management. In this paper, we introduce a novel ICT platform that NGO
operators can use to monitor and analyze social media data, along with a
counter-narrative suggestion tool. Our platform aims at increasing the
efficiency and effectiveness of operators' activities against islamophobia. We
test the platform with more than one hundred NGO operators in three countries
through qualitative and quantitative evaluation. Results show that NGOs favor
the platform solution with the suggestion tool, and that the time required to
produce counter-narratives significantly decreases.Comment: Preprint of the paper published in Online Social Networks and Media
Journal (OSNEM
Overview of GermEval Task 2, 2019 shared task on the identification of offensive language
We present the second edition of the GermEval Shared Task on the Identification of Offensive Language. This shared task deals with the classification of German tweets from Twitter. Two subtasks were continued from the first edition, namely a coarse-grained binary classification task and a fine-grained multi-class classification task. As a novel subtask, we introduce the classification of offensive tweets as explicit or implicit.
The shared task had 13 participating groups submitting 28 runs for the coarse-grained
task, another 28 runs for the fine-grained task, and 17 runs for the implicit-explicit
task.
We evaluate the results of the systems submitted to the shared task. The shared task homepage can be found at https://projects.fzai.h-da.de/iggsa