1,353 research outputs found

    Crowdsourced real-world sensing: sentiment analysis and the real-time web

    Get PDF
    The advent of the real-time web is proving both challeng- ing and at the same time disruptive for a number of areas of research, notably information retrieval and web data mining. As an area of research reaching maturity, sentiment analysis oers a promising direction for modelling the text content available in real-time streams. This paper reviews the real-time web as a new area of focus for sentiment analysis and discusses the motivations and challenges behind such a direction

    Evaluation datasets for Twitter sentiment analysis: a survey and a new dataset, the STS-Gold

    Get PDF
    Sentiment analysis over Twitter offers organisations and individuals a fast and effective way to monitor the publics' feelings towards them and their competitors. To assess the performance of sentiment analysis methods over Twitter a small set of evaluation datasets have been released in the last few years. In this paper we present an overview of eight publicly available and manually annotated evaluation datasets for Twitter sentiment analysis. Based on this review, we show that a common limitation of most of these datasets, when assessing sentiment analysis at target (entity) level, is the lack of distinctive sentiment annotations among the tweets and the entities contained in them. For example, the tweet "I love iPhone, but I hate iPad" can be annotated with a mixed sentiment label, but the entity iPhone within this tweet should be annotated with a positive sentiment label. Aiming to overcome this limitation, and to complement current evaluation datasets, we present STS-Gold, a new evaluation dataset where tweets and targets (entities) are annotated individually and therefore may present different sentiment labels. This paper also provides a comparative study of the various datasets along several dimensions including: total number of tweets, vocabulary size and sparsity. We also investigate the pair-wise correlation among these dimensions as well as their correlations to the sentiment classification performance on different datasets

    Classifying sentiment in microblogs: is brevity an advantage?

    Get PDF
    Microblogs as a new textual domain offer a unique proposition for sentiment analysis. Their short document length suggests any sentiment they contain is compact and explicit. However, this short length coupled with their noisy nature can pose difficulties for standard machine learning document representations. In this work we examine the hypothesis that it is easier to classify the sentiment in these short form documents than in longer form documents. Surprisingly, we find classifying sentiment in microblogs easier than in blogs and make a number of observations pertaining to the challenge of supervised learning for sentiment analysis in microblogs

    Sentiment Analysis and Political Party Classification in 2016 U.S. President Debates in Twitter

    Get PDF
    We introduce a framework of combining tweet sentiment analysis with available default user profiles to classify political party of users who posted tweets in 2016 U.S. president debates. The main works focus on extracting event-related information in short event period instead of collecting tweets in a long-time period as most previous works do. Our framework is not limited in debate event, it can be used by researchers to build rationale of other events study. In sentiment analysis, we show that all three NaĆÆve Bayes classifiers with different distributions obtain accuracy above 75% and the results reveal positive tweets most likely follow Gaussian or Multinomial distributions while negative tweets most likely follow Bernoulli distribution in our training data. We also show that under unbalanced sparse term document setting, instead of using ā€œAdd-1ā€ parameter, tuning Laplace smoothing parameter to adjust the weights of new terms in a tweet can help improve the classifierā€™s performance in targeted direction. Finally, we show sentiment might help classifying political part

    Modeling the formation of attentive publics in social media: the case of Donald Trump

    Full text link
    Previous research has shown the importance of Donald Trumpā€™s Twitter activity, and that of his Twitter following, in spreading his message during the primary and general election campaigns of 2015ā€“2016. However, we know little about how the publics who followed Trump and amplified his messages took shape. We take this case as an opportunity to theorize and test questions about the assembly of what we call ā€œattentive publicsā€ in social media. We situate our study in the context of current discussions of audience formation, attention flow, and hybridity in the United Statesā€™ political media system. From this we derive propositions concerning how attentive publics aggregate around a particular object, in this case Trump himself, which we test using time series modeling. We also present an exploration of the possible role of automated accounts in these processes. Our results reiterate the media hybridity described by others, while emphasizing the importance of news media coverage in building social media attentive publics.Accepted manuscrip

    Tweets as data: Demonstration of TweeQL and TwitInfo

    Get PDF
    Microblogs such as Twitter are a tremendous repository of user-generated content. Increasingly, we see tweets used as data sources for novel applications such as disaster mapping, brand sentiment analysis, and real-time visualizations. In each scenario, the workflow for processing tweets is ad-hoc, and a lot of unnecessary work goes into repeating common data processing patterns. We introduce TweeQL, a stream query processing language that presents a SQL-like query interface for unstructured tweets to generate structured data for downstream applications. We have built several tools on top of TweeQL, most notably TwitInfo, an event timeline generation and exploration interface that summarizes events as they are discussed on Twitter. Our demonstration will allow the audience to interact with both TweeQL and TwitInfo to convey the value of data embedded in tweets
    • ā€¦
    corecore