392 research outputs found
Research, development and evaluation of a practical model for sentiment analysis
Sentiment Analysis is the task of extracting subjective information from input sources
coming from a speaker or writer. Usually it refers to identifying whether a text holds a
positive or negative polarity. The main approaches to carry out Sentiment Analysis are
lexicon or dictionary-based methods and machine learning schemes. Lexicon-based models
make use of a prede ned set of words, where each of the words composing the set has an
associated polarity. Document polarity will depend on the feature selection method, and how
their scores are combined. Machine-learning approaches usually rely on supervised classifiers.
Although classifiers offer adaptability for specific contexts, they need to be trained with huge
amounts of labelled data which may not be available, specially for upcoming topics.
This project, contrary to most scientific researches over this field, aims to go further in
emotion detection and puts its efforts on identifying the actual sentiment of documents,
instead of focusing on whether it may have a positive or negative connotation. The set of
sentiments used for this approach have been extracted from Plutchik's wheel of emotions,
which defines eight basic bipolar sentiments and another eight advanced emotions composed
of two basic ones. Moreover, in this project we have created a new scheme for SA combining
a lexicon-based model for getting term emotions and a statistical approach to identify the
most relevant topics in the document which are the targets of the sentiments. By taking this
approach we have tried to overcome the disadvantages of simple Bag-of-words models that
do not make any distinctions between parts of speech (POS) and weight all words commonly
using the tf-idf scheme which leads to overweight most frequently used words. Furthermore,
in order to improve knowledge, this projects presents a heuristic learning method that
allows improving initial knowledge by converging to human-like sensitivity.
In order to test proposed scheme's performance, an Android application for mobile devices
has been developed. This app allows users taking photos and introducing descriptions which
are processed and classi ed with emotions. Classi cation that may be corrected by the user
so that system performance statistics can be extracted.El Análisis de Sentimientos consiste en extraer información subjetiva de lenguaje escrito
u oral. Habitualmente se basa en identificar si un texto es positivo o negativo, es decir,
extraer su polaridad. Las principales formas de llevar a cabo el Análisis de Sentimientos son
los métodos basados en dictionarios y en aprendizaje automático. Los modelos basados en
léxicos hacen uso de un conjunto predefinido de palabras que tienen asociada una polaridad.
La polaridad del texto dependerá los elementos analizados y la forma en la que se combinan
sus valores. Las aproximaciones basadas en aprendizaje automático, por el contrario, normalmente
se apoyan en clasificadores supervisados. A pesar de que los claificadores ofrecen
adaptabilidad para contextos muy específicos, necesitan gran cantidad de datos para ser
entrenados no siempre disponibles, como por ejemplo en temas muy novedosos.
Este proyecto, al contrario que la mayoría de investigaciones en este campo, intenta ir
m as allá en la detección de emociones y pretende identificar los sentimientos del texto en
vez de centrarse en su polaridad. El conjunto de sentimientos usados para este proyecto
esrá basado en la Rueda de las Emociones de Plutchik, que define ocho sentimientos
básicos y ocho complejos formados por dos básicos. Además, en este proyecto se ha creado
un nuevo modelo de AS combinando léxicos para extraer las emociones de las palabras con
otro estadístico que trata de identificar los temas más importantes del texto. De esta forma,
se ha intentado superar las desventajas de los modelos Bag-of-words que no diferencian
entre clases de palabras y ponderan todas las palabras usando el esquema tf-idf, que
conlleva sobreponderar las palabras más usadas. Asimismo, para mejorar el conocimiento
del proyecto, se ha implementado un método de aprendizaje heurístico que permite mejorar
el conocimiento inicial para converger con la sensibilidad real de los humanos.
Para evaluar el rendimiento del modelo propuesto, una aplicación Android para móviles
ha sido desarrollada. Esta app permite a los usuarios tomar fotos e introducir descripciones
que son procesadas y clasificadas por emociones. Clasificación que puede ser corregida por
el usuario permitiendo así extraer estadísticas del rendimiento del sistema.Ingeniería Informátic
FINE-GRAINED EMOTION DETECTION IN MICROBLOG TEXT
Automatic emotion detection in text is concerned with using natural language processing techniques to recognize emotions expressed in written discourse. Endowing computers with the ability to recognize emotions in a particular kind of text, microblogs, has important applications in sentiment analysis and affective computing. In order to build computational models that can recognize the emotions represented in tweets we need to identify a set of suitable emotion categories. Prior work has mainly focused on building computational models for only a small set of six basic emotions (happiness, sadness, fear, anger, disgust, and surprise). This thesis describes a taxonomy of 28 emotion categories, an expansion of these six basic emotions, developed inductively from data. This set of 28 emotion categories represents a set of fine-grained emotion categories that are representative of the range of emotions expressed in tweets, microblog posts on Twitter.
The ability of humans to recognize these fine-grained emotion categories is characterized using inter-annotator reliability measures based on annotations provided by expert and novice annotators. A set of 15,553 human-annotated tweets form a gold standard corpus, EmoTweet-28. For each emotion category, we have extracted a set of linguistic cues (i.e., punctuation marks, emoticons, emojis, abbreviated forms, interjections, lemmas, hashtags and collocations) that can serve as salient indicators for that emotion category.
We evaluated the performance of automatic classification techniques on the set of 28 emotion categories through a series of experiments using several classifier and feature combinations. Our results shows that it is feasible to extend machine learning classification to fine-grained emotion detection in tweets (i.e., as many as 28 emotion categories) with results that are comparable to state-of-the-art classifiers that detect six to eight basic emotions in text. Classifiers using features extracted from the linguistic cues associated with each category equal or better the performance of conventional corpus-based and lexicon-based features for fine-grained emotion classification.
This thesis makes an important theoretical contribution in the development of a taxonomy of emotion in text. In addition, this research also makes several practical contributions, particularly in the creation of language resources (i.e., corpus and lexicon) and machine learning models for fine-grained emotion detection in text
Fine-grained Subjectivity and Sentiment Analysis: Recognizing the intensity, polarity, and attitudes of private states
Private states (mental and emotional states) are part of the information that is conveyed in many forms of discourse. News articles often report emotional responses to news stories; editorials, reviews, and weblogs convey opinions and beliefs. This dissertation investigates the manual and automatic identification of linguistic expressions of private states in a corpus of news documents from the world press. A term for the linguistic expression of private states is subjectivity.The conceptual representation of private states used in this dissertation is that of Wiebe et al. (2005). As part of this research, annotators are trained to identify expressions of private states and their properties, such as the source and the intensity of the private state. This dissertation then extends the conceptual representation of private states to better model the attitudes and targets of private states. The inter-annotator agreement studies conducted for this dissertation show that the various concepts in the original and extended representation of private states can be reliably annotated.Exploring the automatic recognition of various types of private states is also a large part of this dissertation. Experiments are conducted that focus on three types of fine-grained subjectivity analysis: recognizing the intensity of clauses and sentences, recognizing the contextual polarity of words and phrases, and recognizing the attribution levels where sentiment and arguing attitudes are expressed. Various supervised machine learning algorithms are used to train automatic systems to perform each of these tasks. These experiments result in automatic systems for performing fine-grained subjectivity analysis that significantly outperform baseline systems
Automated Semantic Understanding of Human Emotions in Writing and Speech
Affective Human Computer Interaction (A-HCI) will be critical for the success of new technologies that will prevalent in the 21st century. If cell phones and the internet are any indication, there will be continued rapid development of automated assistive systems that help humans to live better, more productive lives. These will not be just passive systems such as cell phones, but active assistive systems like robot aides in use in hospitals, homes, entertainment room, office, and other work environments. Such systems will need to be able to properly deduce human emotional state before they determine how to best interact with people. This dissertation explores and extends the body of knowledge related to Affective HCI. New semantic methodologies are developed and studied for reliable and accurate detection of human emotional states and magnitudes in written and spoken speech; and for mapping emotional states and magnitudes to 3-D facial expression outputs. The automatic detection of affect in language is based on natural language processing and machine learning approaches. Two affect corpora were developed to perform this analysis. Emotion classification is performed at the sentence level using a step-wise approach which incorporates sentiment flow and sentiment composition features. For emotion magnitude estimation, a regression model was developed to predict evolving emotional magnitude of actors. Emotional magnitudes at any point during a story or conversation are determined by 1) previous emotional state magnitude; 2) new text and speech inputs that might act upon that state; and 3) information about the context the actors are in. Acoustic features are also used to capture additional information from the speech signal. Evaluation of the automatic understanding of affect is performed by testing the model on a testing subset of the newly extended corpus. To visualize actor emotions as perceived by the system, a methodology was also developed to map predicted emotion class magnitudes to 3-D facial parameters using vertex-level mesh morphing. The developed sentence level emotion state detection approach achieved classification accuracies as high as 71% for the neutral vs. emotion classification task in a test corpus of children’s stories. After class re-sampling, the results of the step-wise classification methodology on a test sub-set of a medical drama corpus achieved accuracies in the 56% to 84% range for each emotion class and polarity. For emotion magnitude prediction, the developed recurrent (prior-state feedback) regression model using both text-based and acoustic based features achieved correlation coefficients in the range of 0.69 to 0.80. This prediction function was modeled using a non-linear approach based on Support Vector Regression (SVR) and performed better than other approaches based on Linear Regression or Artificial Neural Networks
Investigating and extending the methods in automated opinion analysis through improvements in phrase based analysis
Opinion analysis is an area of research which deals with the computational treatment of opinion statement and subjectivity in textual data. Opinion analysis has emerged over the past couple of decades as an active area of research, as it provides solutions to the issues raised by information overload. The problem of information overload has emerged with the advancements in communication technologies which gave rise to an exponential growth in user generated subjective data available online. Opinion analysis has a rich set of applications which are used to enable opportunities for organisations such as tracking user opinions about products, social issues in communities through to engagement in political participation etc.The opinion analysis area shows hyperactivity in recent years and research at different levels of granularity has, and is being undertaken. However it is observed that there are limitations in the state-of-the-art, especially as dealing with the level of granularities on their own does not solve current research issues. Therefore a novel sentence level opinion analysis approach utilising clause and phrase level analysis is proposed. This approach uses linguistic and syntactic analysis of sentences to understand the interdependence of words within sentences, and further uses rule based analysis for phrase level analysis to calculate the opinion at each hierarchical structure of a sentence. The proposed opinion analysis approach requires lexical and contextual resources for implementation. In the context of this Thesis the approach is further presented as part of an extended unifying framework for opinion analysis resulting in the design and construction of a novel corpus. The above contributions to the field (approach, framework and corpus) are evaluated within the Thesis and are found to make improvements on existing limitations in the field, particularly with regards to opinion analysis automation. Further work is required in integrating a mechanism for greater word sense disambiguation and in lexical resource development
Sentiment Analysis of Textual Content in Social Networks. From Hand-Crafted to Deep Learning-Based Models
Aquesta tesi proposa diversos mètodes avançats per analitzar automàticament el contingut textual compartit a les xarxes socials i identificar les opinions, emocions i sentiments a diferents nivells d’anàlisi i en diferents idiomes.
Comencem proposant un sistema d’anàlisi de sentiments, anomenat SentiRich, basat en un conjunt ric d’atributs, inclosa la informació extreta de lèxics de sentiments i models de word embedding pre-entrenats. A continuació, proposem un sistema basat en Xarxes Neurals Convolucionals i regressors XGboost per resoldre una sèrie de tasques d’anàlisi de sentiments i emocions a Twitter. Aquestes tasques van des de les tasques típiques d’anàlisi de sentiments fins a determinar automàticament la intensitat d’una emoció (com ara alegria, por, ira, etc.) i la intensitat del sentiment dels autors a partir dels seus tweets. També proposem un nou sistema basat en Deep Learning per solucionar el problema de classificació de les emocions múltiples a Twitter. A més, es va considerar el problema de l’anàlisi del sentiment depenent de l’objectiu. Per a aquest propòsit, proposem un sistema basat en Deep Learning que identifica i extreu l'objectiu dels tweets. Tot i que alguns idiomes, com l’anglès, disposen d’una àmplia gamma de recursos per permetre l’anàlisi del sentiment, a la majoria de llenguatges els hi manca. Per tant, utilitzem la tècnica d'anàlisi de sentiments entre idiomes per desenvolupar un sistema nou, multilingüe i basat en Deep Learning per a llenguatges amb pocs recursos lingüístics. Proposem combinar l’ajuda a la presa de decisions multi-criteri i anàlisis de sentiments per desenvolupar un sistema que permeti als usuaris la possibilitat d’explotar tant les opinions com les seves preferències en el procés de classificació d’alternatives. Finalment, vam aplicar els sistemes desenvolupats al camp de la comunicació de les marques de destinació a través de les xarxes socials. Amb aquesta finalitat, hem recollit tweets de persones locals, visitants i els gabinets oficials de Turisme de diferents destinacions turístiques i es van analitzar les opinions i les emocions compartides en ells. En general, els mètodes proposats en aquesta tesi milloren el rendiment dels enfocaments d’última generació i mostren troballes apassionants.Esta tesis propone varios métodos avanzados para analizar automáticamente el contenido textual compartido en las redes sociales e identificar opiniones, emociones y sentimientos, en diferentes niveles de análisis y en diferentes idiomas. Comenzamos proponiendo un sistema de análisis de sentimientos, llamado SentiRich, que está basado en un conjunto rico de características, que incluyen la información extraída de léxicos de sentimientos y modelos de word embedding previamente entrenados. Luego, proponemos un sistema basado en redes neuronales convolucionales y regresores XGboost para resolver una variedad de tareas de análisis de sentimientos y emociones en Twitter. Estas tareas van desde las típicas tareas de análisis de sentimientos hasta la determinación automática de la intensidad de una emoción (como alegría, miedo, ira, etc.) y la intensidad del sentimiento de los autores de los tweets. También proponemos un novedoso sistema basado en Deep Learning para abordar el problema de clasificación de emociones múltiples en Twitter. Además, consideramos el problema del análisis de sentimientos dependiente del objetivo. Para este propósito, proponemos un sistema basado en Deep Learning que identifica y extrae el objetivo de los tweets.
Si bien algunos idiomas, como el inglés, tienen una amplia gama de recursos para permitir el análisis de sentimientos, la mayoría de los idiomas carecen de ellos. Por lo tanto, utilizamos la técnica de Análisis de Sentimiento Inter-lingual para desarrollar un sistema novedoso, multilingüe y basado en Deep Learning para los lenguajes con pocos recursos lingüísticos. Proponemos combinar la Ayuda a la Toma de Decisiones Multi-criterio y el análisis de sentimientos para desarrollar un sistema que brinde a los usuarios la capacidad de explotar las opiniones junto con sus preferencias en el proceso de clasificación de alternativas. Finalmente, aplicamos los sistemas desarrollados al campo de la comunicación de las marcas de destino a través de las redes sociales. Con este fin, recopilamos tweets de personas locales, visitantes, y gabinetes oficiales de Turismo de diferentes destinos turísticos y analizamos las opiniones y las emociones compartidas en ellos. En general, los métodos propuestos en esta tesis mejoran el rendimiento de los enfoques de vanguardia y muestran hallazgos interesa.This thesis proposes several advanced methods to automatically analyse textual content shared on social networks and identify people’ opinions, emotions and feelings at a different level of analysis and in different languages.
We start by proposing a sentiment analysis system, called SentiRich, based on a set of rich features, including the information extracted from sentiment lexicons and pre-trained word embedding models. Then, we propose an ensemble system based on Convolutional Neural Networks and XGboost regressors to solve an array of sentiment and emotion analysis tasks on Twitter. These tasks range from the typical sentiment analysis tasks, to automatically determining the intensity of an emotion (such as joy, fear, anger, etc.) and the intensity of sentiment (aka valence) of the authors from their tweets. We also propose a novel Deep Learning-based system to address the multiple emotion classification problem on Twitter. Moreover, we considered the problem of target-dependent sentiment analysis. For this purpose, we propose a Deep Learning-based system that identifies and extracts the target of the tweets.
While some languages, such as English, have a vast array of resources to enable sentiment analysis, most low-resource languages lack them. So, we utilise the Cross-lingual Sentiment Analysis technique to develop a novel, multi-lingual and Deep Learning-based system for low resource languages. We propose to combine Multi-Criteria Decision Aid and sentiment analysis to develop a system that gives users the ability to exploit reviews alongside their preferences in the process of alternatives ranking. Finally, we applied the developed systems to the field of communication of destination brands through social networks. To this end, we collected tweets of local people, visitors, and official brand destination offices from different tourist destinations and analysed the opinions and the emotions shared in these tweets
Acquiring and Exploiting Lexical Knowledge for Twitter Sentiment Analysis
The most popular sentiment analysis task in Twitter is the automatic classification of tweets into sentiment categories such as positive, negative, and neutral. State-of-the-art solutions to this problem are based on supervised machine learning models trained from manually annotated examples. These models are affected by label sparsity, because the manual annotation of tweets is labour-intensive and time-consuming.
This thesis addresses the label sparsity problem for Twitter polarity classification by automatically building two type of resources that can be exploited when labelled data is scarce: opinion lexicons, which are lists of words labelled by sentiment, and synthetically labelled tweets.
In the first part of the thesis, we induce Twitter-specific opinion lexicons by training words level classifiers using representations that exploit different sources of information: (a) the morphological information conveyed by part-of-speech (POS) tags, (b) associations between words and the sentiment expressed in the tweets that contain them, and (c) distributional representations calculated from unlabelled tweets. Experimental results show that the induced lexicons produce significant improvements over existing manually annotated lexicons for tweet-level polarity classification.
In the second part of the thesis, we develop distant supervision methods for generating synthetic training data for Twitter polarity classification by exploiting unlabelled tweets and prior lexical knowledge. Positive and negative training instances are generated by averaging unlabelled tweets annotated according to a given polarity lexicon. We study different mechanisms for selecting the candidate tweets to be averaged. Our experimental results show that the training data generated by the proposed models produce classifiers that perform significantly better than classifiers trained from tweets annotated with emoticons, a popular distant supervision approach for Twitter sentiment analysis
- …