Some users of social media are spreading racist, sexist, and otherwise
hateful content. For the purpose of training a hate speech detection system,
the reliability of the annotations is crucial, but there is no universally
agreed-upon definition. We collected potentially hateful messages and asked two
groups of internet users to determine whether they were hate speech or not,
whether they should be banned or not and to rate their degree of offensiveness.
One of the groups was shown a definition prior to completing the survey. We
aimed to assess whether hate speech can be annotated reliably, and the extent
to which existing definitions are in accordance with subjective ratings. Our
results indicate that showing users a definition caused them to partially align
their own opinion with the definition but did not improve reliability, which
was very low overall. We conclude that the presence of hate speech should
perhaps not be considered a binary yes-or-no decision, and raters need more
detailed instructions for the annotation