187 research outputs found

    Deep Neural Networks for Sentiment Analysis in Tweets with Emoticons

    Get PDF
    Businesses glean meaningful feedback in regard to products and services from social media posts in order to improve the quality of products and services, as well as to meet customer expectations. Sentiment analysis is increasingly being used to help businesses by assigning positive or negative polarity to such posts. Although methods currently exist to determine the polarity of sentiments, such methods are unreliable when posts contain terms that are not typically part of a standard dictionary used for sentiment analysis, such as slang and informal language. This dissertation has aimed to empirically investigate alternative methods to improve the classification accuracy of sentiments in such contexts. Specifically, it considers posts written in English that include emoticons. The benchmark Sentiment140 English language datasets were used for evaluation and labeled tweets that included emoticons. Two types of deep neural networks–Convolution Neural Networks (CNN) and Long Short-Term Memory (LSTM) Networks–were used for classification since they have been demonstrated to produce the best results. All terms in the tweets were represented using the pre-trained embedding vectors word2vec, GloVe, and fastText. Baseline models were trained and tested using tweets with their emoticons removed. For each baseline model, a corresponding model was trained that included emoticons as inputs; in others, emoticons were replaced with English language. Accuracy, precision, recall, and F_(1 )scores of models using emoticons were compared to their corresponding baseline models that did not use emoticons. Experiments are conducted on data with emoticons and emoticons removed for all the models. Our experiments showed that LSTM that uses an attention model with fastText embedding outperformed the linear models for identifying sentiment for the all datasets used. We also learned that when we replaced emoticons with English language, the sentiment classification accuracy improved. We therefore concluded that inclusion of emoticons as features achieves the highest accuracy in our research on sentiment classification

    Using millions of emoji occurrences to learn any-domain representations for detecting sentiment, emotion and sarcasm

    Full text link
    NLP tasks are often limited by scarcity of manually annotated data. In social media sentiment analysis and related tasks, researchers have therefore used binarized emoticons and specific hashtags as forms of distant supervision. Our paper shows that by extending the distant supervision to a more diverse set of noisy labels, the models can learn richer representations. Through emoji prediction on a dataset of 1246 million tweets containing one of 64 common emojis we obtain state-of-the-art performance on 8 benchmark datasets within sentiment, emotion and sarcasm detection using a single pretrained model. Our analyses confirm that the diversity of our emotional labels yield a performance improvement over previous distant supervision approaches.Comment: Accepted at EMNLP 2017. Please include EMNLP in any citations. Minor changes from the EMNLP camera-ready version. 9 pages + references and supplementary materia

    ArAutoSenti: Automatic annotation and new tendencies for sentiment classification of Arabic messages

    Get PDF
    The file attached to this record is the author's final peer reviewed version.A corpus-based sentiment analysis approach for messages written in Arabic and its dialects is presented and implemented. The originality of this approach resides in the automation construction of the annotated sentiment corpus, which relies mainly on a sentiment lexicon that is also constructed automatically. For the classification step, shallow and deep classifiers are used with features being extracted applying word embedding models. For the validation of the constructed corpus, we proceed with a manual reviewing and it was found that 85.17% were correctly annotated. This approach is applied on the under-resourced Algerian dialect and the approach is tested on two external test corpora presented in the literature. The obtained results are very encouraging with an F1-score that is up to 88% (on the first test corpus) and up to 81% (on the second test corpus). These results respectively represent a 20% and a 6% improvement, respectively, when compared with existing work in the research literature

    Emotion Classification of Indonesian Tweets using Bidirectional LSTM

    Get PDF
    Emotion classification can be a powerful tool to derive narratives from social media data. Traditional machine learning models that perform emotion classification on Indonesian Twitter data exist but rely on closed-source features. Recurrent neural networks can meet or exceed the performance of state-of-the-art traditional machine learning techniques using exclusively open-source data and models. Specifically, these results show that recurrent neural network variants can produce more than an 8% gain in accuracy in comparison with logistic regression and SVM techniques and a 15% gain over random forest when using FastText embeddings. This research found a statistical significance in the performance of a single-layer bidirectional long short-term memory model over a two-layer stacked bidirectional long short-term memory model. This research also found that a single-layer bidirectional long short-term memory recurrent neural network met the performance of a state-of-the-art logistic regression model with supplemental closed-source features from a study by Saputri et al. [8] when classifying the emotion of Indonesian tweets
    • …
    corecore