58,129 research outputs found

    Automatic offensive language detection from Twitter data using machine learning and feature selection of metadata

    Get PDF
    The popularity of social networks has only increased in recent years. In theory, the use of social media was proposed so we could share our views online, keep in contact with loved ones or share good moments of life. However, the reality is not so perfect, so you have people sharing hate speech-related messages, or using it to bully specific individuals, for instance, or even creating robots where their only goal is to target specific situations or people. Identifying who wrote such text is not easy and there are several possible ways of doing it, such as using natural language processing or machine learning algorithms that can investigate and perform predictions using the metadata associated with it. In this work, we present an initial investigation of which are the best machine learning techniques to detect offensive language in tweets. After an analysis of the current trend in the literature about the recent text classification techniques, we have selected Linear SVM and Naive Bayes algorithms for our initial tests. For the preprocessing of data, we have used different techniques for attribute selection that will be justified in the literature section. After our experiments, we have obtained 92% of accuracy and 95% of recall to detect offensive language with Naive Bayes and 90% of accuracy and 92% of recall with Linear SVM. From our understanding, these results overcome our related literature and are a good indicative of the importance of the data description approach we have used

    Combination of Domain Knowledge and Deep Learning for Sentiment Analysis of Short and Informal Messages on Social Media

    Full text link
    Sentiment analysis has been emerging recently as one of the major natural language processing (NLP) tasks in many applications. Especially, as social media channels (e.g. social networks or forums) have become significant sources for brands to observe user opinions about their products, this task is thus increasingly crucial. However, when applied with real data obtained from social media, we notice that there is a high volume of short and informal messages posted by users on those channels. This kind of data makes the existing works suffer from many difficulties to handle, especially ones using deep learning approaches. In this paper, we propose an approach to handle this problem. This work is extended from our previous work, in which we proposed to combine the typical deep learning technique of Convolutional Neural Networks with domain knowledge. The combination is used for acquiring additional training data augmentation and a more reasonable loss function. In this work, we further improve our architecture by various substantial enhancements, including negation-based data augmentation, transfer learning for word embeddings, the combination of word-level embeddings and character-level embeddings, and using multitask learning technique for attaching domain knowledge rules in the learning process. Those enhancements, specifically aiming to handle short and informal messages, help us to enjoy significant improvement in performance once experimenting on real datasets.Comment: A Preprint of an article accepted for publication by Inderscience in IJCVR on September 201

    Traffic event detection framework using social media

    Get PDF
    This is an accepted manuscript of an article published by IEEE in 2017 IEEE International Conference on Smart Grid and Smart Cities (ICSGSC) on 18/09/2017, available online: https://ieeexplore.ieee.org/document/8038595 The accepted version of the publication may differ from the final published version.© 2017 IEEE. Traffic incidents are one of the leading causes of non-recurrent traffic congestions. By detecting these incidents on time, traffic management agencies can activate strategies to ease congestion and travelers can plan their trip by taking into consideration these factors. In recent years, there has been an increasing interest in Twitter because of the real-time nature of its data. Twitter has been used as a way of predicting revenues, accidents, natural disasters, and traffic. This paper proposes a framework for the real-time detection of traffic events using Twitter data. The methodology consists of a text classification algorithm to identify traffic related tweets. These traffic messages are then geolocated and further classified into positive, negative, or neutral class using sentiment analysis. In addition, stress and relaxation strength detection is performed, with the purpose of further analyzing user emotions within the tweet. Future work will be carried out to implement the proposed framework in the West Midlands area, United Kingdom.Published versio
    • …
    corecore