327 research outputs found
TASS 2015 – La evolución de los sistemas de análisis de opiniones para español
El análisis de opiniones en microblogging sigue siendo una tarea de actualidad, que permite conocer la orientación de las opiniones que minuto tras minuto se publican en medios sociales en Internet. TASS es un taller de participación que tiene como finalidad promover la investigación y desarrollo de nuevos algoritmos, recursos y técnicas aplicado al análisis de opiniones en español. En este artículo se describe la cuarta edición de TASS, resumiendo las principales aportaciones de los sistemas presentados, analizando los resultados y mostrando la evolución de los mismos. Además de analizar brevemente los sistemas que se presentaron, se presenta un nuevo corpus de tweets etiquetados en el dominio político, que se desarrolló para la tarea de Análisis de Opiniones a nivel de Aspecto.Sentiment Analysis in microblogging continues to be a trendy task, which allows to understand the polarity of the opinions published in social media. TASS is a workshop whose goal is to boost the research on Sentiment Analysis in Spanish. In this paper we describe the fourth edition of TASS, showing a summary of the systems, analyzing the results to check their evolution. In addition to a brief description of the participant systems, a new corpus of tweets is presented, compiled for the Sentiment Analysis at Aspect Level task.This work has been partially supported by a grant from the Fondo Europeo de Desarrollo Regional (FEDER), REDES project (TIN2015-65136-C2-1-R) and Ciudad2020 (INNPRONTA IPT-20111006) from the Spanish Government
Multilingual sentiment analysis in social media.
252 p.This thesis addresses the task of analysing sentiment in messages coming from social media. The ultimate goal was to develop a Sentiment Analysis system for Basque. However, because of the socio-linguistic reality of the Basque language a tool providing only analysis for Basque would not be enough for a real world application. Thus, we set out to develop a multilingual system, including Basque, English, French and Spanish.The thesis addresses the following challenges to build such a system:- Analysing methods for creating Sentiment lexicons, suitable for less resourced languages.- Analysis of social media (specifically Twitter): Tweets pose several challenges in order to understand and extract opinions from such messages. Language identification and microtext normalization are addressed.- Research the state of the art in polarity classification, and develop a supervised classifier that is tested against well known social media benchmarks.- Develop a social media monitor capable of analysing sentiment with respect to specific events, products or organizations
Multilingual sentiment analysis in social media.
252 p.This thesis addresses the task of analysing sentiment in messages coming from social media. The ultimate goal was to develop a Sentiment Analysis system for Basque. However, because of the socio-linguistic reality of the Basque language a tool providing only analysis for Basque would not be enough for a real world application. Thus, we set out to develop a multilingual system, including Basque, English, French and Spanish.The thesis addresses the following challenges to build such a system:- Analysing methods for creating Sentiment lexicons, suitable for less resourced languages.- Analysis of social media (specifically Twitter): Tweets pose several challenges in order to understand and extract opinions from such messages. Language identification and microtext normalization are addressed.- Research the state of the art in polarity classification, and develop a supervised classifier that is tested against well known social media benchmarks.- Develop a social media monitor capable of analysing sentiment with respect to specific events, products or organizations
DravidianCodeMix: Sentiment Analysis and Offensive Language Identification Dataset for Dravidian Languages in Code-Mixed Text
This paper describes the development of a multilingual, manually annotated
dataset for three under-resourced Dravidian languages generated from social
media comments. The dataset was annotated for sentiment analysis and offensive
language identification for a total of more than 60,000 YouTube comments. The
dataset consists of around 44,000 comments in Tamil-English, around 7,000
comments in Kannada-English, and around 20,000 comments in Malayalam-English.
The data was manually annotated by volunteer annotators and has a high
inter-annotator agreement in Krippendorff's alpha. The dataset contains all
types of code-mixing phenomena since it comprises user-generated content from a
multilingual country. We also present baseline experiments to establish
benchmarks on the dataset using machine learning methods. The dataset is
available on Github
(https://github.com/bharathichezhiyan/DravidianCodeMix-Dataset) and Zenodo
(https://zenodo.org/record/4750858\#.YJtw0SYo\_0M).Comment: 36 page
Discovering a tourism destination with social media data: BERT-based sentiment analysis
Purpose – The main purpose of this paper is to analyze a tourist destination using sentiment analysis
techniques with data from Twitter and Instagram to find the most representative entities (or places) and
perceptions (or aspects) of the users.
Design/methodology/approach – The authors used 90,725 Instagram posts and 235,755 Twitter tweets
to analyze tourism in Granada (Spain) to identify the important places and perceptions mentioned by travelers
on both social media sites. The authors used several approaches for sentiment classification for English and
Spanish texts, including deep learning models.
Findings – The best results in a test set were obtained using a bidirectional encoder representations
from transformers (BERT) model for Spanish texts and Tweeteval for English texts, and these were
subsequently used to analyze the data sets. It was then possible to identify the most important
entities and aspects, and this, in turn, provided interesting insights for researchers, practitioners,
travelers and tourism managers so that services could be improved and better marketing strategies
formulated.
Research limitations/implications – The authors propose a Spanish-Tourism-BERT model for
performing sentiment classification together with a process to find places through hashtags and to reveal the
important negative aspects of each place.
Practical implications – The study enables managers and practitioners to implement the Spanish-BERT
model with our Spanish Tourism data set that the authors released for adoption in applications to find both
positive and negative perceptions.
Originality/value – This study presents a novel approach on how to apply sentiment analysis in
the tourism domain. First, the way to evaluate the different existing models and tools is presented;
second, a model is trained using BERT (deep learning model); third, an approach of how to identify
the acceptance of the places of a destination through hashtags is presented and, finally, the
evaluation of why the users express positivity (negativity) through the identification of entities and
aspects.Spanish Ministerio de Ciencia e Innovacion, Agencia Estatal de Investigacion PID2019-106758GB-C31European Commissio
Fine-grained Subjectivity and Sentiment Analysis: Recognizing the intensity, polarity, and attitudes of private states
Private states (mental and emotional states) are part of the information that is conveyed in many forms of discourse. News articles often report emotional responses to news stories; editorials, reviews, and weblogs convey opinions and beliefs. This dissertation investigates the manual and automatic identification of linguistic expressions of private states in a corpus of news documents from the world press. A term for the linguistic expression of private states is subjectivity.The conceptual representation of private states used in this dissertation is that of Wiebe et al. (2005). As part of this research, annotators are trained to identify expressions of private states and their properties, such as the source and the intensity of the private state. This dissertation then extends the conceptual representation of private states to better model the attitudes and targets of private states. The inter-annotator agreement studies conducted for this dissertation show that the various concepts in the original and extended representation of private states can be reliably annotated.Exploring the automatic recognition of various types of private states is also a large part of this dissertation. Experiments are conducted that focus on three types of fine-grained subjectivity analysis: recognizing the intensity of clauses and sentences, recognizing the contextual polarity of words and phrases, and recognizing the attribution levels where sentiment and arguing attitudes are expressed. Various supervised machine learning algorithms are used to train automatic systems to perform each of these tasks. These experiments result in automatic systems for performing fine-grained subjectivity analysis that significantly outperform baseline systems
- …