Detecting Hate Speech Online: A Case of Croatian

Abstract

This project proposes a NooJ algorithm with the task to find and categorize various slurs, insults and ultimately, hate speech in Croatian. The results also provide a more detailed insight into inappropriate language in Croatian. We strongly emphasize the ethical considerations of (mis)identifying hate speech and as a result, an unethical and undeserved censorship of inappropriate, but free speech. Thus, we tried to make a clear distinction between insults and hate speech. The test corpus consists of written online comments and remarks posted on five Croatian Facebook news pages during one week period. Given the differences between the standard Croatian grammar and syntax, and what is actually being used in informal on-line communication, the false negatives present the biggest difficulty since some variations (substandard usages of cases, spelling errors, colloquialisms) are impossible to predict, and therefore, extremely hard to implement into the algorithm

    Similar works