14,237 research outputs found

    Hybrid Sentiment Classification of Reviews Using Synonym Lexicon and Word embedding

    Get PDF
    Sentiment analysis is used in extract some useful information from the given set of documents by using Natural Language Processing (NLP) techniques. These techniques have wide scope in various fields which are dealing with huge amount of data link e-commerce, business and market analysis, social media and review impact of products and movies. Sentiment analysis can be applied over these data for finding the polarity of the data like positive, neutral or negative automatically or many complex sentiments like happiness, sad, anger, joy, etc. for a particular product and services based on user reviews. Sentiment analysis not only able to find the polarity of the reviews. Sentiment analysis utilizes machine learning algorithms with vectorization techniques based on textual documents to train the classifier models. These models are later used to perform sentiment analysis on the given dataset of particular domain on which the classifier model is trained. Vectorization is done for text document by using word embedding based and hybrid vectorization. The proposed methodology focus on fast and accurate sentiment prediction with higher confidence value over the dataset in both Tamil and English

    Emotion-aware polarity lexicons for Twitter sentiment analysis.

    Get PDF
    Theoretical frameworks in psychology map the relationships between emotions and sentiments. In this paper we study the role of such mapping for computational emotion detection from text (e.g. social media) with a aim to understand the usefulness of an emotion-rich corpus of documents (e.g. tweets) to learn polarity lexicons for sentiment analysis. We propose two different methods that leverage a corpus of emotion-labelled tweets to learn word-polarity lexicons. The proposed methods model the emotion corpus using a generative unigram mixture model (UMM), combined with the emotion-sentiment mapping proposed in Psychology for automated generation of word-polarity lexicons that capture emotion-rich vocabulary. We comparatively evaluate the quality of the proposed mixture model in learning emotion-aware sentiment lexicons with those generated using supervised latent dirichlet allocation (sLDA) and word-document frequency (WDF) statistics. Sentiment analysis experiments on benchmark Twitter data sets confirm the quality of our proposed lexicons. Further a comparative analysis with sLDA, WDF based emotion-aware lexicons and standard sentiment lexicons that are agnostic to emotion knowledge suggest that the proposed lexicons lead to a significantly better performance in both sentiment classification and sentiment intensity prediction tasks

    Impact of Online Education and Sentiment Analysis from Twitter Data using Topic Modeling Algorithms

    Get PDF
    During a pandemic, all industries suffer greatly, and every sector of the world suffers in some way, including the education sector. Internet expressions reflect users' feelings about a product or service. The polarity of information in source data toward a subject under investigation is determined by sentiment analysis processes. The goal of this study is to examine social media expressions about online teaching and learning, as online education will become a part of everyday life in the future. We collected data from Twitter using keywords related to online education and Google form from engineering undergraduate students for prototype implementation. This analysis will assist teachers, parents, and the student community in understanding the benefits and drawbacks of the education industry, allowing for further improvement in educational outcomes. We used aspect-based sentiment analysis and topic modeling to determine sentiment polarity and important topics for education sector stakeholders. To begin, we used TextBlob Python package to determine sentiment polarity, and Bag of Words, LDA and LSA model for discovering topics. After modeling topics from the collected data, topic Coherence is used to assess the degree of semantic similarity between high-scoring words in the topic. The word cloud and LDAvis are used to visualize data. The experimental results are promising and it will assist education stakeholders in addressing the concerns that have been identified as social media expressions to work on

    Sentiment Analysis of Tweets using Unsupervised Learning Techniques and the K-Means Algorithm

    Get PDF
    Abstract: Today, web content such as images, text, speeches, and videos are user-generated, and social networks have become increasingly popular as a means for people to share their ideas and opinions. One of the most popular social media for expressing their feelings towards events that occur is Twitter. The main objective of this study is to classify and analyze the content of the affiliates of the Pension and Funds Administration (AFP) published on Twitter. This study incorporates machine learning techniques for data mining, cleaning, tokenization, exploratory analysis, classification, and sentiment analysis. To apply the study and examine the data, Twitter was used with the hashtag #afp, followed by descriptive and exploratory analysis, including metrics of the tweets. Finally, a content analysis was carried out, including word frequency calculation, lemmatization, and classification of words by sentiment, emotions, and word cloud. The study uses tweets published in the month of May 2022. Sentiment distribution was also performed in three polarity classes: positive, neutral, and negative, representing 22%, 4%, and 74% respectively. Supported by the unsupervised learning method and the K-Means algorithm, we were able to determine the number of clusters using the elbow method. Finally, the sentiment analysis and the clusters formed indicate that there is a very pronounced dispersion, the distances are not very similar, even though the data standardization work was carried out

    Sentiment analysis on Twitter data using machine learning

    Get PDF
    In the world of social media people are more responsive towards product or certain events that are currently occurring. This response given by the user is in form of raw textual data (Semi Structured Data) in different languages and terms, which contains noise in data as well as critical information that encourage the analyst to discover knowledge and pattern from the dataset available. This is useful for decision making and taking strategic decision for the future market. To discover this unknown information from the linguistic data Natural Language Processing (NLP) and Data Mining techniques are most focused research terms used for sentiment analysis. In the derived approach the analysis on Twitter data to detect sentiment of the people throughout the world using machine learning techniques. Here the data set available for research is from Twitter for world cup Soccer 2014, held in Brazil. During this period, many people had given their opinion, emotion and attitude about the game, promotion, players. By filtering and analyzing the data using natural language processing techniques, and sentiment polarity has been calculated based on the emotion word detected in the user tweets. The data set is normalized to be used by machine learning algorithm and prepared using natural language processing techniques like Word Tokenization, Stemming and lemmatization, POS (Part of speech) Tagger, NER (Name Entity recognition) and parser to extract emotions for the textual data from each tweet. This approach is implemented using Python programming language and Natural Language Toolkit (NLTK), which is openly available for academic as well as for research purpose. Derived algorithm extracts emotional words using WordNet with its POS (Part-of-Speech) for the word in a sentence that has a meaning in current context, and is assigned sentiment polarity using ‘SentWordNet’ Dictionary or using lexicon based method. The resultant polarity assigned is further analyzed using Naïve Bayes and SVM (support vector Machine) machine learning algorithm and visualized data on WEKA platform. Finally, the goal is to compare both the results of implementation and prove the best approach for sentiment analysis on social media for semi structured data.Master of Science (MSc) in Computational Science

    Hybrid Words Representation for the classification of low quality text

    Full text link
    University of Technology Sydney. Faculty of Engineering and Information Technology.Language enables humans to communicate with others. For instance, we talk, give our opinions and suggestions all using natural language; to be more precise, we use words while communicating with others. However, in today's world, we wish to communicate with computers, just like humans. It is not an easy task because human communicate in an unstructured and informal way, whereas computers need structured and clean data. So it is essential for computers to understand and classify text accurately for proper human-computer interactions. For classifying a text, the first question we must address is how to improve the low-quality text. The next immediate challenge is to have the best representation so that text can be classified accurately. The way text is organized reflects polysemy, semantic and syntactical coupling relationships which are embedded in its contents. The effective capturing of such content relationships is thereby crucial for a better understanding of text representations. This is especially challenging in the environments where the text messages are short, informal and noisy, and involves natural language ambiguities. The existing sentiment classification methods are mainly for document and clean textual data which can not capture relationship, different attributes and characteristics within tweet messages. Social media analysis, especially the analysis of tweet messages on Twitter has become increasingly relevant since the significant portion of data is ubiquitous in nature. The social media-based short text is valuable for many good reasons, explored increasingly in text analysis, social media analysis and recommendation. In the same time, there is a number of challenges that need to be addressed in this space. One of the main issues is that the traditional word embeddings are unable to capture polysemy (assigns the same representation of a word irrespective of its context and meaning) and out of vocabulary words (assigns a random representation). Furthermore, traditional word embeddings fail to capture sentiment information of words which results in similar word vector representations having the opposite polarities. Thus, ignoring polysemy within the context and sentiment polarity of words in a tweet reduces the performance for tweets classification. In order to address the above-mentioned research challenges and limitations associated with word-level representations, this thesis focuses on improving the representation of low-quality text by improving the unstructured and informal nature of tweets to utilize the information thoroughly and manages the natural language ambiguities to build a more robust sentiment classification model. As compared to previous studies, the proposed models can deal with the ubiquitous nature of the short text, polysemy, semantic and syntactical relationships within a content, thereby addressing the natural language ambiguity problems. Chapter 4 presents the effects of pre-processing techniques using two different word representation models with the machine and deep learning classifiers. Then, we present our recommended combination (approach) of different pre-processing techniques which improves the low quality, by performing sentiment-aware tokenization, correction of spelling mistakes, word segmentation and other techniques to utilize most of the information hidden in unstructured text. The experimental result shows that the proposed combination performs well as compared to other combinations. Chapter 5 presents the hybrid words representation. In this chapter, we proposed our Deep Intelligent Contextual Embedding for Twitter sentiment analysis. Proposed model addresses the natural language ambiguities and is devised to capture polysemy in context, semantics, syntax and sentiment knowledge of words. Bi-directional Long-Short Term Memory wth attention is employed to determine the sentiment. We evaluate the proposed model by performing quantitative and qualitative analysis. The experimental results show that the proposed model outperforms various word embedding models in the sentiment analysis of tweets. Above mentioned methods can be applied to any social media classification task. The performance of proposed models is compared with different models which support the effectiveness of the proposed models and bound the information loss in their generated high-quality representations

    Semantic Sentiment Analysis of Twitter Data

    Full text link
    Internet and the proliferation of smart mobile devices have changed the way information is created, shared, and spreads, e.g., microblogs such as Twitter, weblogs such as LiveJournal, social networks such as Facebook, and instant messengers such as Skype and WhatsApp are now commonly used to share thoughts and opinions about anything in the surrounding world. This has resulted in the proliferation of social media content, thus creating new opportunities to study public opinion at a scale that was never possible before. Naturally, this abundance of data has quickly attracted business and research interest from various fields including marketing, political science, and social studies, among many others, which are interested in questions like these: Do people like the new Apple Watch? Do Americans support ObamaCare? How do Scottish feel about the Brexit? Answering these questions requires studying the sentiment of opinions people express in social media, which has given rise to the fast growth of the field of sentiment analysis in social media, with Twitter being especially popular for research due to its scale, representativeness, variety of topics discussed, as well as ease of public access to its messages. Here we present an overview of work on sentiment analysis on Twitter.Comment: Microblog sentiment analysis; Twitter opinion mining; In the Encyclopedia on Social Network Analysis and Mining (ESNAM), Second edition. 201
    • …
    corecore