Automating News Comment Moderation with Limited Resources: Benchmarking in Croatian and Estonian

Pelicon, A; Pollak, S; Pranjić, M; Purver, M; Shekhar, R

Automating News Comment Moderation with Limited Resources: Benchmarking in Croatian and Estonian

Authors: A Pelicon
S Pollak
M Pranjić
M Purver
R Shekhar
Publication date: 8 September 2020
Publisher: German Society for Computational Linguistics & Language Technology
Doi

Abstract

This article describes initial work into the automatic classification of user-generated content in news media to support human moderators. We work with real-world data — comments posted by readers under online news articles — in two less-resourced European languages, Croatian and Estonian. We describe our dataset, and experiments into automatic classification using a range of models. Performance obtained is reasonable but not as good as might be expected given similar work in offensive language classification in other languages; we then investigate possible reasons in terms of the variability and reliability of the data and its annotation.https://jlcl.org/content/2-allissues/1-heft1-2020/jlcl_2020-1_3.pd

Similar works

Full text

Available Versions

Supporting member

Queen Mary Research Online

oai:qmro.qmul.ac.uk:123456789/...

Last time updated on 25/09/2020

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

oai:zenodo.org:4032371

Last time updated on 02/12/2022