Automating News Comment Moderation with Limited Resources: Benchmarking in Croatian and Estonian

Abstract

This article describes initial work into the automatic classification of user-generated content in news media to support human moderators. We work with real-world data — comments posted by readers under online news articles — in two less-resourced European languages, Croatian and Estonian. We describe our dataset, and experiments into automatic classification using a range of models. Performance obtained is reasonable but not as good as might be expected given similar work in offensive language classification in other languages; we then investigate possible reasons in terms of the variability and reliability of the data and its annotation.https://jlcl.org/content/2-allissues/1-heft1-2020/jlcl_2020-1_3.pd

    Similar works