12,024 research outputs found
Automatically extracting polarity-bearing topics for cross-domain sentiment classification
Joint sentiment-topic (JST) model was previously proposed to detect sentiment and topic simultaneously from text. The only supervision required by JST model learning is domain-independent polarity word priors. In this paper, we modify the JST model by incorporating word polarity priors through modifying the topic-word Dirichlet priors. We study the polarity-bearing topics extracted by JST and show that by augmenting the original feature space with polarity-bearing topics, the in-domain supervised classifiers learned from augmented feature representation achieve the state-of-the-art performance of 95% on the movie review data and an average of 90% on the multi-domain sentiment dataset. Furthermore, using feature augmentation and selection according to the information gain criteria for cross-domain sentiment classification, our proposed approach performs either better or comparably compared to previous approaches. Nevertheless, our approach is much simpler and does not require difficult parameter tuning
Multilingual Twitter Sentiment Classification: The Role of Human Annotators
What are the limits of automated Twitter sentiment classification? We analyze
a large set of manually labeled tweets in different languages, use them as
training data, and construct automated classification models. It turns out that
the quality of classification models depends much more on the quality and size
of training data than on the type of the model trained. Experimental results
indicate that there is no statistically significant difference between the
performance of the top classification models. We quantify the quality of
training data by applying various annotator agreement measures, and identify
the weakest points of different datasets. We show that the model performance
approaches the inter-annotator agreement when the size of the training set is
sufficiently large. However, it is crucial to regularly monitor the self- and
inter-annotator agreements since this improves the training datasets and
consequently the model performance. Finally, we show that there is strong
evidence that humans perceive the sentiment classes (negative, neutral, and
positive) as ordered
Cross-Lingual Classification of Crisis Data
Many citizens nowadays flock to social media during crises to share or acquire the latest information about the event. Due to the sheer volume of data typically circulated during such events, it is necessary to be able to efficiently filter out irrelevant posts, thus focusing attention on the posts that are truly relevant to the crisis. Current methods for classifying the relevance of posts to a crisis or set of crises typically struggle to deal with posts in different languages, and it is not viable during rapidly evolving crisis situations to train new models for each language. In this paper we test statistical and semantic classification approaches on cross-lingual datasets from 30 crisis events, consisting of posts written mainly in English, Spanish, and Italian. We experiment with scenarios where the model is trained on one language and tested on another, and where the data is translated to a single language. We show that the addition of semantic features extracted from external knowledge bases improve accuracy over a purely statistical model
Tweet categorization by combining content and structural knowledge
Twitter is a worldwide social media platform where millions of people frequently express ideas and opinions
about any topic. This widespread success makes the analysis of tweets an interesting and possibly
lucrative task, being those tweets rarely objective and becoming the targeting for large-scale analysis. In
this paper, we explore the idea of integrating two fundamental aspects of a tweet, the proper textual
content and its underlying structural information, when addressing the tweet categorization task. Thus,
not only we analyze textual content of tweets but also analyze the structural information provided by the
relationship between tweets and users, and we propose different methods for effectively combining both
kinds of feature models extracted from the different knowledge sources. In order to test our approach, we
address the specific task of determining the political opinion of Twitter users within their political context,
observing that our most refined knowledge integration approach performs remarkably better (about
5 points above) than the textual-based classic modelMinisterio de EconomĂa y Competitividad TIN2012-38536-C03-02Junta de AndalucĂa P11-TIC-7684 M
- …