499 research outputs found
A Fine Grain Sentiment Analysis with Semantics in Tweets
Social networking is nowadays a major source of new information in the world. Microblogging sites like Twitter have millions of active users (320 million active users on Twitter on the 30th September 2015) who share their opinions in real time, generating huge amounts of data. These data are, in most cases, available to any network user. The opinions of Twitter users have become something that companies and other organisations study to see whether or not their users like the products or services they offer. One way to assess opinions on Twitter is classifying the sentiment of the tweets as positive or negative. However, this process is usually done at a coarse grain level and the tweets are classified as positive or negative. However, tweets can be partially positive and negative at the same time, referring to different entities. As a result, general approaches usually classify these tweets as “neutral”. In this paper, we propose a semantic analysis of tweets, using Natural Language Processing to classify the sentiment with regards to the entities mentioned in each tweet. We offer a combination of Big Data tools (under the Apache Hadoop framework) and sentiment analysis using RDF graphs supporting the study of the tweet’s lexicon. This work has been empirically validated using a sporting event, the 2014 Phillips 66 Big 12 Men’s Basketball Championship. The experimental results show a clear correlation between the predicted sentiments with specific events during the championship
Trustworthiness in Social Big Data Incorporating Semantic Analysis, Machine Learning and Distributed Data Processing
This thesis presents several state-of-the-art approaches constructed for the purpose of (i) studying the trustworthiness of users in Online Social Network platforms, (ii) deriving concealed knowledge from their textual content, and (iii) classifying and predicting the domain knowledge of users and their content. The developed approaches are refined through proof-of-concept experiments, several benchmark comparisons, and appropriate and rigorous evaluation metrics to verify and validate their effectiveness and efficiency, and hence, those of the applied frameworks
A Complete Text-Processing Pipeline for Business Performance Tracking
Natural text processing is amongst the most researched domains because of its varied applications. However, most existing works focus on improving the performance of machine learning models instead of applying those models in practical business cases. We present a text processing pipeline that enables business users to identify business performance factors through sentiment analysis and opinion summarization of customer feedback. The pipeline performs fine-grained sentiment classification of customer comments, and the results are used for the sentiment trend tracking process. The pipeline also performs topic modelling in which key aspects of customer comments are clustered using their co-relation scores. The results are used to produce abstractive opinion summarization. The proposed text processing pipeline is evaluated using two business cases in the food and retail domains. The performance of the sentiment analysis component is measured using mean absolute error (MAE) rate, root mean squared error (RMSE) rate, and coefficient of determination
SmokEng: Towards Fine-grained Classification of Tobacco-related Social Media Text
Contemporary datasets on tobacco consumption focus on one of two topics,
either public health mentions and disease surveillance, or sentiment analysis
on topical tobacco products and services. However, two primary considerations
are not accounted for, the language of the demographic affected and a
combination of the topics mentioned above in a fine-grained classification
mechanism. In this paper, we create a dataset of 3144 tweets, which are
selected based on the presence of colloquial slang related to smoking and
analyze it based on the semantics of the tweet. Each class is created and
annotated based on the content of the tweets such that further hierarchical
methods can be easily applied.
Further, we prove the efficacy of standard text classification methods on
this dataset, by designing experiments which do both binary as well as
multi-class classification. Our experiments tackle the identification of either
a specific topic (such as tobacco product promotion), a general mention
(cigarettes and related products) or a more fine-grained classification. This
methodology paves the way for further analysis, such as understanding sentiment
or style, which makes this dataset a vital contribution to both disease
surveillance and tobacco use research.Comment: Accepted at the Workshop on Noisy User-generated Text (W-NUT) at
EMNLP-IJCNLP 201
Editor’s Note
Digital information has redefined the way in which both public
and private organizations are faced with the use of data to improve
decision making. The importance of Big Data lies in the huge amount
of data generated every day, especially following the emergence of
online social networks (Facebook, Twitter, Google Plus, etc.) and the
exponential growth of devices such as smartphones, smartwatches
and other wearables, sensor networks, etc. as well as the possibility of
taking into account increasingly updated and more varied information
for decision making. [1]
With proper Big Data analysis we can spot trends, get models from
historical data for predicting future events or extract patterns from user
behaviour, and thus be able to tailor services to the needs of users in a
better way
Measuring relative opinion from location-based social media: A case study of the 2016 U.S. presidential election
Social media has become an emerging alternative to opinion polls for public
opinion collection, while it is still posing many challenges as a passive data
source, such as structurelessness, quantifiability, and representativeness.
Social media data with geotags provide new opportunities to unveil the
geographic locations of users expressing their opinions. This paper aims to
answer two questions: 1) whether quantifiable measurement of public opinion can
be obtained from social media and 2) whether it can produce better or
complementary measures compared to opinion polls. This research proposes a
novel approach to measure the relative opinion of Twitter users towards public
issues in order to accommodate more complex opinion structures and take
advantage of the geography pertaining to the public issues. To ensure that this
new measure is technically feasible, a modeling framework is developed
including building a training dataset by adopting a state-of-the-art approach
and devising a new deep learning method called Opinion-Oriented Word Embedding.
With a case study of the tweets selected for the 2016 U.S. presidential
election, we demonstrate the predictive superiority of our relative opinion
approach and we show how it can aid visual analytics and support opinion
predictions. Although the relative opinion measure is proved to be more robust
compared to polling, our study also suggests that the former can advantageously
complement the later in opinion prediction
Basic tasks of sentiment analysis
Subjectivity detection is the task of identifying objective and subjective
sentences. Objective sentences are those which do not exhibit any sentiment.
So, it is desired for a sentiment analysis engine to find and separate the
objective sentences for further analysis, e.g., polarity detection. In
subjective sentences, opinions can often be expressed on one or multiple
topics. Aspect extraction is a subtask of sentiment analysis that consists in
identifying opinion targets in opinionated text, i.e., in detecting the
specific aspects of a product or service the opinion holder is either praising
or complaining about
Target-oriented Sentiment Classification with Sequential Cross-modal Semantic Graph
Multi-modal aspect-based sentiment classification (MABSC) is task of
classifying the sentiment of a target entity mentioned in a sentence and an
image. However, previous methods failed to account for the fine-grained
semantic association between the image and the text, which resulted in limited
identification of fine-grained image aspects and opinions. To address these
limitations, in this paper we propose a new approach called SeqCSG, which
enhances the encoder-decoder sentiment classification framework using
sequential cross-modal semantic graphs. SeqCSG utilizes image captions and
scene graphs to extract both global and local fine-grained image information
and considers them as elements of the cross-modal semantic graph along with
tokens from tweets. The sequential cross-modal semantic graph is represented as
a sequence with a multi-modal adjacency matrix indicating relationships between
elements. Experimental results show that the approach outperforms existing
methods and achieves state-of-the-art performance on two standard datasets.
Further analysis has demonstrated that the model can implicitly learn the
correlation between fine-grained information of the image and the text with the
given target. Our code is available at https://github.com/zjukg/SeqCSG.Comment: ICANN 2023, https://github.com/zjukg/SeqCS
- …