6,065 research outputs found
Tripartite Graph Clustering for Dynamic Sentiment Analysis on Social Media
The growing popularity of social media (e.g, Twitter) allows users to easily
share information with each other and influence others by expressing their own
sentiments on various subjects. In this work, we propose an unsupervised
\emph{tri-clustering} framework, which analyzes both user-level and tweet-level
sentiments through co-clustering of a tripartite graph. A compelling feature of
the proposed framework is that the quality of sentiment clustering of tweets,
users, and features can be mutually improved by joint clustering. We further
investigate the evolution of user-level sentiments and latent feature vectors
in an online framework and devise an efficient online algorithm to sequentially
update the clustering of tweets, users and features with newly arrived data.
The online framework not only provides better quality of both dynamic
user-level and tweet-level sentiment analysis, but also improves the
computational and storage efficiency. We verified the effectiveness and
efficiency of the proposed approaches on the November 2012 California ballot
Twitter data.Comment: A short version is in Proceeding of the 2014 ACM SIGMOD International
Conference on Management of dat
Comparing and Combining Sentiment Analysis Methods
Several messages express opinions about events, products, and services,
political views or even their author's emotional state and mood. Sentiment
analysis has been used in several applications including analysis of the
repercussions of events in social networks, analysis of opinions about products
and services, and simply to better understand aspects of social communication
in Online Social Networks (OSNs). There are multiple methods for measuring
sentiments, including lexical-based approaches and supervised machine learning
methods. Despite the wide use and popularity of some methods, it is unclear
which method is better for identifying the polarity (i.e., positive or
negative) of a message as the current literature does not provide a method of
comparison among existing methods. Such a comparison is crucial for
understanding the potential limitations, advantages, and disadvantages of
popular methods in analyzing the content of OSNs messages. Our study aims at
filling this gap by presenting comparisons of eight popular sentiment analysis
methods in terms of coverage (i.e., the fraction of messages whose sentiment is
identified) and agreement (i.e., the fraction of identified sentiments that are
in tune with ground truth). We develop a new method that combines existing
approaches, providing the best coverage results and competitive agreement. We
also present a free Web service called iFeel, which provides an open API for
accessing and comparing results across different sentiment methods for a given
text.Comment: Proceedings of the first ACM conference on Online social networks
(2013) 27-3
Fidelity-Weighted Learning
Training deep neural networks requires many training samples, but in practice
training labels are expensive to obtain and may be of varying quality, as some
may be from trusted expert labelers while others might be from heuristics or
other sources of weak supervision such as crowd-sourcing. This creates a
fundamental quality versus-quantity trade-off in the learning process. Do we
learn from the small amount of high-quality data or the potentially large
amount of weakly-labeled data? We argue that if the learner could somehow know
and take the label-quality into account when learning the data representation,
we could get the best of both worlds. To this end, we propose
"fidelity-weighted learning" (FWL), a semi-supervised student-teacher approach
for training deep neural networks using weakly-labeled data. FWL modulates the
parameter updates to a student network (trained on the task we care about) on a
per-sample basis according to the posterior confidence of its label-quality
estimated by a teacher (who has access to the high-quality labels). Both
student and teacher are learned from the data. We evaluate FWL on two tasks in
information retrieval and natural language processing where we outperform
state-of-the-art alternative semi-supervised methods, indicating that our
approach makes better use of strong and weak labels, and leads to better
task-dependent data representations.Comment: Published as a conference paper at ICLR 201
- …