'Institute of Electrical and Electronics Engineers (IEEE)'
Doi
Abstract
The popularity of social networks has only increased
in recent years. In theory, the use of social media was proposed
so we could share our views online, keep in contact with loved
ones or share good moments of life. However, the reality is
not so perfect, so you have people sharing hate speech-related
messages, or using it to bully specific individuals, for instance,
or even creating robots where their only goal is to target specific
situations or people. Identifying who wrote such text is not easy
and there are several possible ways of doing it, such as using
natural language processing or machine learning algorithms
that can investigate and perform predictions using the metadata associated with it. In this work, we present an initial
investigation of which are the best machine learning techniques
to detect offensive language in tweets. After an analysis of the
current trend in the literature about the recent text classification
techniques, we have selected Linear SVM and Naive Bayes
algorithms for our initial tests. For the preprocessing of data,
we have used different techniques for attribute selection that
will be justified in the literature section. After our experiments,
we have obtained 92% of accuracy and 95% of recall to detect
offensive language with Naive Bayes and 90% of accuracy and
92% of recall with Linear SVM. From our understanding, these
results overcome our related literature and are a good indicative
of the importance of the data description approach we have used