Leveraging Sentiment Data for the Detection of Homophobic/Transphobic Content in a Multi-Task, Multi-Lingual Setting Using Transformers

Abstract

Hateful content is published and spread on social media at an increasing rate, harming the user experience.In addition, hateful content targeting particular, marginalized/vulnerable groups (e.g. homophobic/trans-phobic content) can cause even more harm to members of said groups. Hence, detecting hateful contentis crucial, regardless of its origin, or the language used. The large variety of (often underresourced)languages used, however, makes this task daunting, especially as many users use code-mixing in theirmessages. To help overcome these difficulties, the approach we present here uses a multi-languageframework. And to further mitigate the scarcity of labelled data, it also leverages data from the relatedtask of sentiment-analysis to improve the detection of homophobic/transphobic content. We evaluatedour system by participating in a sentiment analysis and hate speech detection challenge. Results showthat our multi-task model outperforms its single-task counterpart (on average, by 24%) on the detection ofhomophobic/transphobic content. Moreover, the results achieved in detecting homophobic/transphobiccontent put our system in 1st or 2nd place for three out of four languages examined.Licens fulltext: CC BY License</p

    Similar works

    Full text

    thumbnail-image