3,883 research outputs found
Semantic Sentiment Analysis of Twitter Data
Internet and the proliferation of smart mobile devices have changed the way
information is created, shared, and spreads, e.g., microblogs such as Twitter,
weblogs such as LiveJournal, social networks such as Facebook, and instant
messengers such as Skype and WhatsApp are now commonly used to share thoughts
and opinions about anything in the surrounding world. This has resulted in the
proliferation of social media content, thus creating new opportunities to study
public opinion at a scale that was never possible before. Naturally, this
abundance of data has quickly attracted business and research interest from
various fields including marketing, political science, and social studies,
among many others, which are interested in questions like these: Do people like
the new Apple Watch? Do Americans support ObamaCare? How do Scottish feel about
the Brexit? Answering these questions requires studying the sentiment of
opinions people express in social media, which has given rise to the fast
growth of the field of sentiment analysis in social media, with Twitter being
especially popular for research due to its scale, representativeness, variety
of topics discussed, as well as ease of public access to its messages. Here we
present an overview of work on sentiment analysis on Twitter.Comment: Microblog sentiment analysis; Twitter opinion mining; In the
Encyclopedia on Social Network Analysis and Mining (ESNAM), Second edition.
201
A context based model for sentiment analysis in twitter for the italian language
Studi recenti per la Sentiment
Analysis in Twitter hanno tentato di creare
modelli per caratterizzare la polaritÂŽa di
un tweet osservando ciascun messaggio
in isolamento. In realt`a, i tweet fanno
parte di conversazioni, la cui natura pu`o
essere sfruttata per migliorare la qualit`a
dellâanalisi da parte di sistemi automatici.
In (Vanzo et al., 2014) `e stato proposto un
modello basato sulla classificazione di sequenze
per la caratterizzazione della polarit`
a dei tweet, che sfrutta il contesto in
cui il messaggio `e immerso. In questo lavoro,
si vuole verificare lâapplicabilit`a di
tale metodologia anche per la lingua Italiana.Recent works on Sentiment
Analysis over Twitter leverage the idea
that the sentiment depends on a single
incoming tweet. However, tweets are
plunged into streams of posts, thus making
available a wider context. The contribution
of this information has been recently
investigated for the English language by
modeling the polarity detection as a sequential
classification task over streams of
tweets (Vanzo et al., 2014). Here, we want
to verify the applicability of this method
even for a morphological richer language,
i.e. Italian
Recommended from our members
New topic detection in microblogs and topic model evaluation using topical alignment
textThis thesis deals with topic model evaluation and new topic detection in microblogs. Microblogs are short and thus may not carry any contextual clues. Hence it becomes challenging to apply traditional natural language processing algorithms on such data. Graphical models have been traditionally used for topic discovery and text clustering on sets of text-based documents. Their unsupervised nature allows topic models to be trained easily on datasets meant for specific domains. However the advantage of not requiring annotated data comes with a drawback with respect to evaluation difficulties. The problem aggravates when the data comprises microblogs which are unstructured and noisy.
We demonstrate the application of three types of such models to microblogs - the Latent Dirichlet Allocation, the Author-Topic and the Author-Recipient-Topic model. We extensively evaluate these models under different settings, and our results show that the Author-Recipient-Topic model extracts the most coherent topics. We also addressed the problem of topic modeling on short text by using clustering techniques. This technique helps in boosting the performance of our models.
Topical alignment is used for large scale assessment of topical relevance by comparing topics to manually generated domain specific concepts. In this thesis we use this idea to evaluate topic models by measuring misalignments between topics. Our study on comparing topic models reveals interesting traits about Twitter messages, users and their interactions and establishes that joint modeling on author-recipient pairs and on the content of tweet leads to qualitatively better topic discovery.
This thesis gives a new direction to the well known problem of topic discovery in microblogs. Trend prediction or topic discovery for microblogs is an extensive research area. We propose the idea of using topical alignment to detect new topics by comparing topics from the current week to those of the previous week. We measure correspondence between a set of topics from the current week and a set of topics from the previous week to quantify five types of misalignments: \textit{junk, fused, missing} and \textit{repeated}. Our analysis compares three types of topic models under different settings and demonstrates how our framework can detect new topics from topical misalignments. In particular so-called \textit{junk} topics are more likely to be new topics and the \textit{missing} topics are likely to have died or die out.
To get more insights into the nature of microblogs we apply topical alignment to hashtags. Comparing topics to hashtags enables us to make interesting inferences about Twitter messages and their content. Our study revealed that although a very small proportion of Twitter messages explicitly contain hashtags, the proportion of tweets that discuss topics related to hashtags is much higher.Computer Science
TK: The Twitter Top-K Keywords Benchmark
Information retrieval from textual data focuses on the construction of
vocabularies that contain weighted term tuples. Such vocabularies can then be
exploited by various text analysis algorithms to extract new knowledge, e.g.,
top-k keywords, top-k documents, etc. Top-k keywords are casually used for
various purposes, are often computed on-the-fly, and thus must be efficiently
computed. To compare competing weighting schemes and database implementations,
benchmarking is customary. To the best of our knowledge, no benchmark currently
addresses these problems. Hence, in this paper, we present a top-k keywords
benchmark, TK, which features a real tweet dataset and queries with
various complexities and selectivities. TK helps evaluate weighting
schemes and database implementations in terms of computing performance. To
illustrate TK's relevance and genericity, we successfully performed
tests on the TF-IDF and Okapi BM25 weighting schemes, on one hand, and on
different relational (Oracle, PostgreSQL) and document-oriented (MongoDB)
database implementations, on the other hand
Mining Frequency of Drug Side Effects Over a Large Twitter Dataset Using Apache Spark
Despite clinical trials by pharmaceutical companies as well as current FDA reporting systems, there are still drug side effects that have not been caught. To find a larger sample of reports, a possible way is to mine online social media. With its current widespread use, social media such as Twitter has given rise to massive amounts of data, which can be used as reports for drug side effects. To process these large datasets, Apache Spark has become popular for fast, distributed batch processing. In this work, we have improved on previous pipelines in sentimental analysis-based mining, processing, and extracting tweets with drug-caused side effects. We have also added a new ensemble classifier using a combination of sentiment analysis features to increase the accuracy of identifying drug-caused side effects. In addition, the frequency count for the side effects is also provided. Furthermore, we have also implemented the same pipeline in Apache Spark to improve the speed of processing of tweets by 2.5 times, as well as to support the process of large tweet datasets. As the frequency count of drug side effects opens a wide door for further analysis, we present a preliminary study on this issue, including the side effects of simultaneously using two drugs, and the potential danger of using less-common combination of drugs. We believe the pipeline design and the results present in this work would have great implication on studying drug side effects and on big data analysis in general
Task-specific Word Identification from Short Texts Using a Convolutional Neural Network
Task-specific word identification aims to choose the task-related words that
best describe a short text. Existing approaches require well-defined seed words
or lexical dictionaries (e.g., WordNet), which are often unavailable for many
applications such as social discrimination detection and fake review detection.
However, we often have a set of labeled short texts where each short text has a
task-related class label, e.g., discriminatory or non-discriminatory, specified
by users or learned by classification algorithms. In this paper, we focus on
identifying task-specific words and phrases from short texts by exploiting
their class labels rather than using seed words or lexical dictionaries. We
consider the task-specific word and phrase identification as feature learning.
We train a convolutional neural network over a set of labeled texts and use
score vectors to localize the task-specific words and phrases. Experimental
results on sentiment word identification show that our approach significantly
outperforms existing methods. We further conduct two case studies to show the
effectiveness of our approach. One case study on a crawled tweets dataset
demonstrates that our approach can successfully capture the
discrimination-related words/phrases. The other case study on fake review
detection shows that our approach can identify the fake-review words/phrases.Comment: accepted by Intelligent Data Analysis, an International Journa
- âŠ