14,237 research outputs found
Hybrid Sentiment Classification of Reviews Using Synonym Lexicon and Word embedding
Sentiment analysis is used in extract some useful
information from the given set of documents by
using Natural Language Processing (NLP)
techniques. These techniques have wide scope in
various fields which are dealing with huge
amount of data link e-commerce, business and
market analysis, social media and review impact
of products and movies. Sentiment analysis can
be applied over these data for finding the polarity
of the data like positive, neutral or negative
automatically or many complex sentiments like
happiness, sad, anger, joy, etc. for a particular
product and services based on user reviews.
Sentiment analysis not only able to find the
polarity of the reviews. Sentiment analysis
utilizes machine learning algorithms with
vectorization techniques based on textual
documents to train the classifier models. These
models are later used to perform sentiment
analysis on the given dataset of particular domain
on which the classifier model is trained.
Vectorization is done for text document by using
word embedding based and hybrid vectorization.
The proposed methodology focus on fast and
accurate sentiment prediction with higher
confidence value over the dataset in both Tamil
and English
Emotion-aware polarity lexicons for Twitter sentiment analysis.
Theoretical frameworks in psychology map the relationships between emotions and sentiments. In this paper we study the role of such mapping for computational emotion detection from text (e.g. social media) with a aim to understand the usefulness of an emotion-rich corpus of documents (e.g. tweets) to learn polarity lexicons for sentiment analysis. We propose two different methods that leverage a corpus of emotion-labelled tweets to learn word-polarity lexicons. The proposed methods model the emotion corpus using a generative unigram mixture model (UMM), combined with the emotion-sentiment mapping proposed in Psychology for automated generation of word-polarity lexicons that capture emotion-rich vocabulary. We comparatively evaluate the quality of the proposed mixture model in learning emotion-aware sentiment lexicons with those generated using supervised latent dirichlet allocation (sLDA) and word-document frequency (WDF) statistics. Sentiment analysis experiments on benchmark Twitter data sets confirm the quality of our proposed lexicons. Further a comparative analysis with sLDA, WDF based emotion-aware lexicons and standard sentiment lexicons that are agnostic to emotion knowledge suggest that the proposed lexicons lead to a significantly better performance in both sentiment classification and sentiment intensity prediction tasks
Impact of Online Education and Sentiment Analysis from Twitter Data using Topic Modeling Algorithms
During a pandemic, all industries suffer greatly, and every sector of the world suffers in some way, including the education sector. Internet expressions reflect users' feelings about a product or service. The polarity of information in source data toward a subject under investigation is determined by sentiment analysis processes. The goal of this study is to examine social media expressions about online teaching and learning, as online education will become a part of everyday life in the future. We collected data from Twitter using keywords related to online education and Google form from engineering undergraduate students for prototype implementation. This analysis will assist teachers, parents, and the student community in understanding the benefits and drawbacks of the education industry, allowing for further improvement in educational outcomes. We used aspect-based sentiment analysis and topic modeling to determine sentiment polarity and important topics for education sector stakeholders. To begin, we used TextBlob Python package to determine sentiment polarity, and Bag of Words, LDA and LSA model for discovering topics. After modeling topics from the collected data, topic Coherence is used to assess the degree of semantic similarity between high-scoring words in the topic. The word cloud and LDAvis are used to visualize data. The experimental results are promising and it will assist education stakeholders in addressing the concerns that have been identified as social media expressions to work on
Recommended from our members
Sentiment Analysis for the Low-Resourced Latinised Arabic "Arabizi"
The expansion of digital communication mediums from private mobile messaging into the public through social media presented an opportunity for the data science research and industry to mine the generated big data for artificial information extraction. A popular information extraction task is sentiment analysis, which aims at extracting polarity opinions, positive, negative, or neutral, from the written natural language. This science helped organisations better understand the public’s opinion towards events, news, public figures, and products.
However, sentiment analysis has advanced for the English language ahead of Arabic. While sentiment analysis for Arabic is developing in the literature of Natural Language Processing (NLP), a popular variety of Arabic, Arabizi, has been overlooked for sentiment analysis advancements.
Arabizi is an informal transcription of the spoken dialectal Arabic in Latin script used for social texting. It is known to be common among the Arab youth, yet it is overlooked in efforts on Arabic sentiment analysis for its linguistic complexities.
As to Arabic, Arabizi is rich in inflectional morphology, but also codeswitched with English or French, and distinctively transcribed without adhering to a standard orthography. The rich morphology, inconsistent orthography, and codeswitching challenges are compounded together to have a multiplied effect on the lexical sparsity of the language, where each Arabizi word becomes eligible to be spelled in many ways, that, in addition to the mixing of other languages within the same textual context. The resulting high degree of lexical sparsity defies the very basics of sentiment analysis, classification of positive and negative words. Arabizi is even faced with a severe shortage of data resources that are required to set out any sentiment analysis approach.
In this thesis, we tackle this gap by conducting research on sentiment analysis for Arabizi. We addressed the sparsity challenge by harvesting Arabizi data from multi-lingual social media text using deep learning to build Arabizi resources for sentiment analysis. We developed six new morphologically and orthographically rich Arabizi sentiment lexicons and set the baseline for Arabizi sentiment analysis on social media
Sentiment Analysis of Tweets using Unsupervised Learning Techniques and the K-Means Algorithm
Abstract: Today, web content such as images, text, speeches, and videos are user-generated, and social networks have become increasingly popular as a means for people to share their ideas and opinions. One of the most popular social media for expressing their feelings towards events that occur is Twitter. The main objective of this study is to classify and analyze the content of the affiliates of the Pension and Funds Administration (AFP) published on Twitter. This study incorporates machine learning techniques for data mining, cleaning, tokenization, exploratory analysis, classification, and sentiment analysis. To apply the study and examine the data, Twitter was used with the hashtag #afp, followed by descriptive and exploratory analysis, including metrics of the tweets. Finally, a content analysis was carried out, including word frequency calculation, lemmatization, and classification of words by sentiment, emotions, and word cloud. The study uses tweets published in the month of May 2022. Sentiment distribution was also performed in three polarity classes: positive, neutral, and negative, representing 22%, 4%, and 74% respectively. Supported by the unsupervised learning method and the K-Means algorithm, we were able to determine the number of clusters using the elbow method. Finally, the sentiment analysis and the clusters formed indicate that there is a very pronounced dispersion, the distances are not very similar, even though the data standardization work was carried out
Sentiment analysis on Twitter data using machine learning
In the world of social media people are more responsive towards product or certain events
that are currently occurring. This response given by the user is in form of raw textual data
(Semi Structured Data) in different languages and terms, which contains noise in data as
well as critical information that encourage the analyst to discover knowledge and pattern
from the dataset available. This is useful for decision making and taking strategic decision
for the future market. To discover this unknown information from the linguistic data Natural Language
Processing (NLP) and Data Mining techniques are most focused research terms used for
sentiment analysis. In the derived approach the analysis on Twitter data to detect sentiment
of the people throughout the world using machine learning techniques. Here the data set
available for research is from Twitter for world cup Soccer 2014, held in Brazil. During
this period, many people had given their opinion, emotion and attitude about the game,
promotion, players. By filtering and analyzing the data using natural language processing
techniques, and sentiment polarity has been calculated based on the emotion word detected
in the user tweets. The data set is normalized to be used by machine learning algorithm and
prepared using natural language processing techniques like Word Tokenization, Stemming
and lemmatization, POS (Part of speech) Tagger, NER (Name Entity recognition) and
parser to extract emotions for the textual data from each tweet. This approach is implemented using Python programming language and Natural Language Toolkit (NLTK),
which is openly available for academic as well as for research purpose. Derived algorithm
extracts emotional words using WordNet with its POS (Part-of-Speech) for the word in a
sentence that has a meaning in current context, and is assigned sentiment polarity using
‘SentWordNet’ Dictionary or using lexicon based method. The resultant polarity assigned
is further analyzed using Naïve Bayes and SVM (support vector Machine) machine
learning algorithm and visualized data on WEKA platform. Finally, the goal is to compare
both the results of implementation and prove the best approach for sentiment analysis on
social media for semi structured data.Master of Science (MSc) in Computational Science
Hybrid Words Representation for the classification of low quality text
University of Technology Sydney. Faculty of Engineering and Information Technology.Language enables humans to communicate with others. For instance, we talk, give our opinions and suggestions all using natural language; to be more precise, we use words while communicating with others. However, in today's world, we wish to communicate with computers, just like humans. It is not an easy task because human communicate in an unstructured and informal way, whereas computers need structured and clean data. So it is essential for computers to understand and classify text accurately for proper human-computer interactions. For classifying a text, the first question we must address is how to improve the low-quality text. The next immediate challenge is to have the best representation so that text can be classified accurately. The way text is organized reflects polysemy, semantic and syntactical coupling relationships which are embedded in its contents. The effective capturing of such content relationships is thereby crucial for a better understanding of text representations. This is especially challenging in the environments where the text messages are short, informal and noisy, and involves natural language ambiguities. The existing sentiment classification methods are mainly for document and clean textual data which can not capture relationship, different attributes and characteristics within tweet messages.
Social media analysis, especially the analysis of tweet messages on Twitter has become increasingly relevant since the significant portion of data is ubiquitous in nature. The social media-based short text is valuable for many good reasons, explored increasingly in text analysis, social media analysis and recommendation. In the same time, there is a number of challenges that need to be addressed in this space. One of the main issues is that the traditional word embeddings are unable to capture polysemy (assigns the same representation of a word irrespective of its context and meaning) and out of vocabulary words (assigns a random representation). Furthermore, traditional word embeddings fail to capture sentiment information of words which results in similar word vector representations having the opposite polarities. Thus, ignoring polysemy within the context and sentiment polarity of words in a tweet reduces the performance for tweets classification.
In order to address the above-mentioned research challenges and limitations associated with word-level representations, this thesis focuses on improving the representation of low-quality text by improving the unstructured and informal nature of tweets to utilize the information thoroughly and manages the natural language ambiguities to build a more robust sentiment classification model. As compared to previous studies, the proposed models can deal with the ubiquitous nature of the short text, polysemy, semantic and syntactical relationships within a content, thereby addressing the natural language ambiguity problems.
Chapter 4 presents the effects of pre-processing techniques using two different word representation models with the machine and deep learning classifiers. Then, we present our recommended combination (approach) of different pre-processing techniques which improves the low quality, by performing sentiment-aware tokenization, correction of spelling mistakes, word segmentation and other techniques to utilize most of the information hidden in unstructured text. The experimental result shows that the proposed combination performs well as compared to other combinations.
Chapter 5 presents the hybrid words representation. In this chapter, we proposed our Deep Intelligent Contextual Embedding for Twitter sentiment analysis. Proposed model addresses the natural language ambiguities and is devised to capture polysemy in context, semantics, syntax and sentiment knowledge of words. Bi-directional Long-Short Term Memory wth attention is employed to determine the sentiment. We evaluate the proposed model by performing quantitative and qualitative analysis. The experimental results show that the proposed model outperforms various word embedding models in the sentiment analysis of tweets.
Above mentioned methods can be applied to any social media classification task. The performance of proposed models is compared with different models which support the effectiveness of the proposed models and bound the information loss in their generated high-quality representations
Semantic Sentiment Analysis of Twitter Data
Internet and the proliferation of smart mobile devices have changed the way
information is created, shared, and spreads, e.g., microblogs such as Twitter,
weblogs such as LiveJournal, social networks such as Facebook, and instant
messengers such as Skype and WhatsApp are now commonly used to share thoughts
and opinions about anything in the surrounding world. This has resulted in the
proliferation of social media content, thus creating new opportunities to study
public opinion at a scale that was never possible before. Naturally, this
abundance of data has quickly attracted business and research interest from
various fields including marketing, political science, and social studies,
among many others, which are interested in questions like these: Do people like
the new Apple Watch? Do Americans support ObamaCare? How do Scottish feel about
the Brexit? Answering these questions requires studying the sentiment of
opinions people express in social media, which has given rise to the fast
growth of the field of sentiment analysis in social media, with Twitter being
especially popular for research due to its scale, representativeness, variety
of topics discussed, as well as ease of public access to its messages. Here we
present an overview of work on sentiment analysis on Twitter.Comment: Microblog sentiment analysis; Twitter opinion mining; In the
Encyclopedia on Social Network Analysis and Mining (ESNAM), Second edition.
201
- …