58,129 research outputs found
Automatic offensive language detection from Twitter data using machine learning and feature selection of metadata
The popularity of social networks has only increased
in recent years. In theory, the use of social media was proposed
so we could share our views online, keep in contact with loved
ones or share good moments of life. However, the reality is
not so perfect, so you have people sharing hate speech-related
messages, or using it to bully specific individuals, for instance,
or even creating robots where their only goal is to target specific
situations or people. Identifying who wrote such text is not easy
and there are several possible ways of doing it, such as using
natural language processing or machine learning algorithms
that can investigate and perform predictions using the metadata associated with it. In this work, we present an initial
investigation of which are the best machine learning techniques
to detect offensive language in tweets. After an analysis of the
current trend in the literature about the recent text classification
techniques, we have selected Linear SVM and Naive Bayes
algorithms for our initial tests. For the preprocessing of data,
we have used different techniques for attribute selection that
will be justified in the literature section. After our experiments,
we have obtained 92% of accuracy and 95% of recall to detect
offensive language with Naive Bayes and 90% of accuracy and
92% of recall with Linear SVM. From our understanding, these
results overcome our related literature and are a good indicative
of the importance of the data description approach we have used
Combination of Domain Knowledge and Deep Learning for Sentiment Analysis of Short and Informal Messages on Social Media
Sentiment analysis has been emerging recently as one of the major natural
language processing (NLP) tasks in many applications. Especially, as social
media channels (e.g. social networks or forums) have become significant sources
for brands to observe user opinions about their products, this task is thus
increasingly crucial. However, when applied with real data obtained from social
media, we notice that there is a high volume of short and informal messages
posted by users on those channels. This kind of data makes the existing works
suffer from many difficulties to handle, especially ones using deep learning
approaches. In this paper, we propose an approach to handle this problem. This
work is extended from our previous work, in which we proposed to combine the
typical deep learning technique of Convolutional Neural Networks with domain
knowledge. The combination is used for acquiring additional training data
augmentation and a more reasonable loss function. In this work, we further
improve our architecture by various substantial enhancements, including
negation-based data augmentation, transfer learning for word embeddings, the
combination of word-level embeddings and character-level embeddings, and using
multitask learning technique for attaching domain knowledge rules in the
learning process. Those enhancements, specifically aiming to handle short and
informal messages, help us to enjoy significant improvement in performance once
experimenting on real datasets.Comment: A Preprint of an article accepted for publication by Inderscience in
IJCVR on September 201
Traffic event detection framework using social media
This is an accepted manuscript of an article published by IEEE in 2017 IEEE International Conference on Smart Grid and Smart Cities (ICSGSC) on 18/09/2017, available online: https://ieeexplore.ieee.org/document/8038595
The accepted version of the publication may differ from the final published version.© 2017 IEEE. Traffic incidents are one of the leading causes of non-recurrent traffic congestions. By detecting these incidents on time, traffic management agencies can activate strategies to ease congestion and travelers can plan their trip by taking into consideration these factors. In recent years, there has been an increasing interest in Twitter because of the real-time nature of its data. Twitter has been used as a way of predicting revenues, accidents, natural disasters, and traffic. This paper proposes a framework for the real-time detection of traffic events using Twitter data. The methodology consists of a text classification algorithm to identify traffic related tweets. These traffic messages are then geolocated and further classified into positive, negative, or neutral class using sentiment analysis. In addition, stress and relaxation strength detection is performed, with the purpose of further analyzing user emotions within the tweet. Future work will be carried out to implement the proposed framework in the West Midlands area, United Kingdom.Published versio
- …