Search CORE

131 research outputs found

Evaluation datasets for Twitter sentiment analysis: a survey and a new dataset, the STS-Gold

Author: Alani Harith
Fernández Miriam
He Yulan
Saif Hassan
Publication venue
Publication date: 01/01/2013
Field of study

Sentiment analysis over Twitter offers organisations and individuals a fast and effective way to monitor the publics' feelings towards them and their competitors. To assess the performance of sentiment analysis methods over Twitter a small set of evaluation datasets have been released in the last few years. In this paper we present an overview of eight publicly available and manually annotated evaluation datasets for Twitter sentiment analysis. Based on this review, we show that a common limitation of most of these datasets, when assessing sentiment analysis at target (entity) level, is the lack of distinctive sentiment annotations among the tweets and the entities contained in them. For example, the tweet "I love iPhone, but I hate iPad" can be annotated with a mixed sentiment label, but the entity iPhone within this tweet should be annotated with a positive sentiment label. Aiming to overcome this limitation, and to complement current evaluation datasets, we present STS-Gold, a new evaluation dataset where tweets and targets (entities) are annotated individually and therefore may present different sentiment labels. This paper also provides a comparative study of the various datasets along several dimensions including: total number of tweets, vocabulary size and sparsity. We also investigate the pair-wise correlation among these dimensions as well as their correlations to the sentiment classification performance on different datasets

CiteSeerX

Open Research Online (The Open University)

Umigon: Sentiment analysis for tweets based on lexicons and heuristics

Author: Levallois C. (Clément)
Publication venue
Publication date: 01/01/2013
Field of study

Umigon is developed since December 2012 as a web application providing a service of sentiment detection in tweets. It has been designed to be fast and scalable. Umigon also provides indications for additional semantic features present in the tweets, such as time indications or markers of subjectivity. Umigon is in continuous development, it can be tried freely at www.umigon.com. Its code is open sourced at: https://github.com/seinecle/Umigon

EUR Research Repository

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Erasmus University Digital Repository

JOINT_FORCES : unite competing sentiment classifiers with random forest

Author: Cieliebak Mark
Dürr Oliver
Uzdilli Fatih
Publication venue: Association for Computational Linguistics
Publication date: 01/01/2014
Field of study

In this paper, we describe how we created a meta-classifier to detect the message-level sentiment of tweets. We participated in SemEval-2014 Task 9B by combining the results of several existing classifiers using a random forest. The results of 5 other teams from the competition as well as from 7 general purpose commercial classifiers were used to train the algorithm. This way, we were able to get a boost of up to 3.24 F1 score points

ZHAW digitalcollection

NILC_USP: an improved hybrid system for sentiment analysis in Twitter messages.

Author: Amzil Zouher
Cole Richard B.
Herrenknecht Christine
Hess Philipp
Mccarron Pearse
Sibat Manoella
Zendong Suzie Zita
Publication venue: Dublin
Publication date: 01/08/2014
Field of study

This paper describes the NILC USP system that participated in SemEval-2014 Task 9: Sentiment Analysis in Twitter, a re-run of the SemEval 2013 task under the same name. Our system is an improved version of the system that participated in the 2013 task. This system adopts a hybrid classification process that uses three classification approaches: rule-based, lexiconbased and machine learning. We suggest a pipeline architecture that extracts the best characteristics from each classifier. In this work, we want to verify how\ud this hybrid approach would improve with better classifiers. The improved system achieved an F-score of 65.39% in the Twitter message-level subtask for 2013 dataset (+ 9.08% of improvement) and 63.94% for 2014 dataset.FAPESPSAMSUN

NRC Publications Archive

Crossref

ArchiMer - Institutional Archive of Ifremer

NILC_USP: an improved hybrid system for sentiment analysis in Twitter messages.

Author: Avanço Lucas Vinicius
Balage Filho Pedro Paulo
Nunes Maria das Graças Volpe
Pardo Thiago Alexandre Salgueiro
Publication venue: Dublin
Publication date: 01/01/2014
Field of study

Crossref

RCAAP - Repositório Científico de Acesso Aberto de Portugal

Universidade de São Paulo

Do Convolutional Networks need to be Deep for Text Classification ?

Author: Cerisara Christophe
Denis Alexandre
Le Hoa T.
Publication venue
Publication date: 13/07/2017
Field of study

We study in this work the importance of depth in convolutional models for text classification, either when character or word inputs are considered. We show on 5 standard text classification and sentiment analysis tasks that deep models indeed give better performances than shallow networks when the text input is represented as a sequence of characters. However, a simple shallow-and-wide network outperforms deep models such as DenseNet with word inputs. Our shallow word model further establishes new state-of-the-art performances on two datasets: Yelp Binary (95.9\%) and Yelp Full (64.9\%)

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

Data Sets: Word Embeddings Learned from Tweets and General Data

Author: Li Quanzhi
Liu Xiaomo
Nourbakhsh Armineh
Shah Sameena
Publication venue
Publication date: 03/05/2017
Field of study

A word embedding is a low-dimensional, dense and real- valued vector representation of a word. Word embeddings have been used in many NLP tasks. They are usually gener- ated from a large text corpus. The embedding of a word cap- tures both its syntactic and semantic aspects. Tweets are short, noisy and have unique lexical and semantic features that are different from other types of text. Therefore, it is necessary to have word embeddings learned specifically from tweets. In this paper, we present ten word embedding data sets. In addition to the data sets learned from just tweet data, we also built embedding sets from the general data and the combination of tweets with the general data. The general data consist of news articles, Wikipedia data and other web data. These ten embedding models were learned from about 400 million tweets and 7 billion words from the general text. In this paper, we also present two experiments demonstrating how to use the data sets in some NLP tasks, such as tweet sentiment analysis and tweet topic classification tasks

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

LT3: sentiment classification in user-generated content using a rich feature set

Author: De Clercq Orphée
Hoste Veronique
Lefever Els
Van de Kauter Marjan
Van Hee Cynthia
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2014
Field of study

Crossref

Ghent University Academic Bibliography