Search CORE

3 research outputs found

Results of the PolEval 2019 Shared Task 6 : first dataset and Open Shared Task for automatic cyberbullying detection in Polish Twitter

Author: Dybała Paweł
Pieciukiewicz Agata
Ptaszynski Michal
Publication venue: Institute of Computer Sciences. Polish Academy of Sciences
Publication date: 01/01/2019
Field of study

In this paper we describe the first dataset for the Polish language containing annotations of harmful and toxic language. The dataset was created to study harmful Internet phenomena such as cyberbullying and hate speech, which recently dramatically gain on numbers in Polish Internet as well as worldwide. The dataset was automatically collected from Polish Twitter accounts and annotated by both layperson volunteers under the supervision of a cyberbullying and hate-speech expert. Together with the dataset we propose the first open shared task for Polish to utilize the dataset in classification of such harmful phenomena. In particular, we propose two subtasks: 1) binary classification of harmful and non-harmful tweets, and 2) multiclass classification between two types of harmful information (cyberbullying and hate-speech), and other. The first installment of the shared task became a success by reaching fourteen overall submissions, hence proving a high demand for research applying such data

Jagiellonian Univeristy Repository