1,389 research outputs found
Classifying racist texts using a support vector machine
In this poster we present an overview of the techniques we used to develop and evaluate a text categorisation system to automatically classify racist texts. Detecting racism is difficult because the presence of indicator words is insufficient to indicate racist texts, unlike some other text classification tasks. Support Vector Machines (SVM) are used to automatically categorise web pages based on whether or not they are racist. Different interpretations of what constitutes a term are taken, and in this poster we look at three representations of a web page within an SVM -- bag-of-words, bigrams and part-of-speech tags
Automatic text categorisation of racist webpages
Automatic Text Categorisation (TC) involves the assignment of one or more predefined categories to text documents in order that they can be effectively managed. In this thesis we examine the possibility of applying automatic text categorisation to the problem of categorising texts (web pages) based on whether or not they are racist.
TC has proven successful for topic-based problems such as news story categorisation. However, the problem of detecting racism is dissimilar to topic-based problems in that lexical items present in racist documents can also appear in anti-racist documents or indeed potentially any document. The mere presence of a potentially racist term does not necessarily mean the document is racist. The difficulty is finding what discerns racist documents from non-racist.
We use a machine learning method called Support Vector Machines (SVM) to automatically learn features of racism in order to be capable of making a decision about the target class of unseen documents. We examine various representations within an SVM so as to identify the most effective method for handling this problem. Our work shows that it is possible to develop automatic categorisation of web pages, based on these approache
Detecting and Monitoring Hate Speech in Twitter
Social Media are sensors in the real world that can be used to measure the pulse of societies.
However, the massive and unfiltered feed of messages posted in social media is a phenomenon that
nowadays raises social alarms, especially when these messages contain hate speech targeted to a
specific individual or group. In this context, governments and non-governmental organizations
(NGOs) are concerned about the possible negative impact that these messages can have on individuals
or on the society. In this paper, we present HaterNet, an intelligent system currently being used by
the Spanish National Office Against Hate Crimes of the Spanish State Secretariat for Security that
identifies and monitors the evolution of hate speech in Twitter. The contributions of this research
are many-fold: (1) It introduces the first intelligent system that monitors and visualizes, using social
network analysis techniques, hate speech in Social Media. (2) It introduces a novel public dataset on
hate speech in Spanish consisting of 6000 expert-labeled tweets. (3) It compares several classification
approaches based on different document representation strategies and text classification models. (4)
The best approach consists of a combination of a LTSM+MLP neural network that takes as input the
tweetâs word, emoji, and expression tokensâ embeddings enriched by the tf-idf, and obtains an area
under the curve (AUC) of 0.828 on our dataset, outperforming previous methods presented in the
literatureThe work by Quijano-Sanchez was supported by the Spanish Ministry of Science and Innovation
grant FJCI-2016-28855. The research of Liberatore was supported by the Government of Spain, grant MTM2015-65803-R, and by the European Unionâs Horizon 2020 Research and Innovation Programme, under the Marie Sklodowska-Curie grant agreement No. 691161 (GEOSAFE). All the financial support is gratefully acknowledge
Automatic offensive language detection from Twitter data using machine learning and feature selection of metadata
The popularity of social networks has only increased
in recent years. In theory, the use of social media was proposed
so we could share our views online, keep in contact with loved
ones or share good moments of life. However, the reality is
not so perfect, so you have people sharing hate speech-related
messages, or using it to bully specific individuals, for instance,
or even creating robots where their only goal is to target specific
situations or people. Identifying who wrote such text is not easy
and there are several possible ways of doing it, such as using
natural language processing or machine learning algorithms
that can investigate and perform predictions using the metadata associated with it. In this work, we present an initial
investigation of which are the best machine learning techniques
to detect offensive language in tweets. After an analysis of the
current trend in the literature about the recent text classification
techniques, we have selected Linear SVM and Naive Bayes
algorithms for our initial tests. For the preprocessing of data,
we have used different techniques for attribute selection that
will be justified in the literature section. After our experiments,
we have obtained 92% of accuracy and 95% of recall to detect
offensive language with Naive Bayes and 90% of accuracy and
92% of recall with Linear SVM. From our understanding, these
results overcome our related literature and are a good indicative
of the importance of the data description approach we have used
Violence Detection in Social Media-Review
Social media has become a vital part of humansâ day to day life. Different users engage with social media differently. With the increased usage of social media, many researchers have investigated different aspects of social media. Many examples in the recent past show, content in the social media can generate violence in the user community. Violence in social media can be categorised into aggregation in comments, cyber-bullying and incidents like protests, murders. Identifying violent content in social media is a challenging task: social media posts contain both the visual and text as well as these posts may contain hidden meaning according to the usersâ context and other background information. This paper summarizes the different social media violent categories and existing methods to detect the violent content.Keywords: Machine learning, natural language processing, violence, social media, convolution neural networ
- âŠ