93 research outputs found
Text-based Sentiment Analysis and Music Emotion Recognition
Nowadays, with the expansion of social media, large amounts of user-generated
texts like tweets, blog posts or product reviews are shared online. Sentiment polarity
analysis of such texts has become highly attractive and is utilized in recommender
systems, market predictions, business intelligence and more. We also witness deep
learning techniques becoming top performers on those types of tasks. There are
however several problems that need to be solved for efficient use of deep neural
networks on text mining and text polarity analysis.
First of all, deep neural networks are data hungry. They need to be fed with
datasets that are big in size, cleaned and preprocessed as well as properly labeled.
Second, the modern natural language processing concept of word embeddings as a
dense and distributed text feature representation solves sparsity and dimensionality
problems of the traditional bag-of-words model. Still, there are various uncertainties
regarding the use of word vectors: should they be generated from the same dataset
that is used to train the model or it is better to source them from big and popular
collections that work as generic text feature representations? Third, it is not easy for
practitioners to find a simple and highly effective deep learning setup for various
document lengths and types. Recurrent neural networks are weak with longer texts
and optimal convolution-pooling combinations are not easily conceived. It is thus
convenient to have generic neural network architectures that are effective and can
adapt to various texts, encapsulating much of design complexity.
This thesis addresses the above problems to provide methodological and practical
insights for utilizing neural networks on sentiment analysis of texts and achieving
state of the art results. Regarding the first problem, the effectiveness of various
crowdsourcing alternatives is explored and two medium-sized and emotion-labeled
song datasets are created utilizing social tags. One of the research interests of Telecom
Italia was the exploration of relations between music emotional stimulation and
driving style. Consequently, a context-aware music recommender system that aims
to enhance driving comfort and safety was also designed. To address the second
problem, a series of experiments with large text collections of various contents and
domains were conducted. Word embeddings of different parameters were exercised
and results revealed that their quality is influenced (mostly but not only) by the
size of texts they were created from. When working with small text datasets, it is
thus important to source word features from popular and generic word embedding
collections. Regarding the third problem, a series of experiments involving convolutional
and max-pooling neural layers were conducted. Various patterns relating
text properties and network parameters with optimal classification accuracy were
observed. Combining convolutions of words, bigrams, and trigrams with regional
max-pooling layers in a couple of stacks produced the best results. The derived
architecture achieves competitive performance on sentiment polarity analysis of
movie, business and product reviews.
Given that labeled data are becoming the bottleneck of the current deep learning
systems, a future research direction could be the exploration of various data programming
possibilities for constructing even bigger labeled datasets. Investigation
of feature-level or decision-level ensemble techniques in the context of deep neural
networks could also be fruitful. Different feature types do usually represent complementary
characteristics of data. Combining word embedding and traditional text
features or utilizing recurrent networks on document splits and then aggregating the
predictions could further increase prediction accuracy of such models
Pushing the envelope of sentiment analysis beyond words and polarities
Idioms are multi-word expressions which hold a literal and figurative meaning which is conventionally understood by native speakers. Their overall meaning, often, cannot be deduced from the literal meaning of their constituent words. Sentiment analysis, also referred to as opinion mining, aims to automatically extract and classify sentiments, opinions, and emotions expressed in text. The research in this thesis is motivated by the fact that idioms, which often express an affective stance towards an entity or an event, are not featured systematically in sentiment analysis. To estimate the degree to which the inclusion of idioms as features may improve the results of traditional sentiment analysis, we compared our results to two state-of-the-art sentiment analysis approaches. Firstly, we collected a set of idioms that are relevant to sentiment analysis, i.e. those that can be mapped to an emotion. These mappings were obtained using a crowdsourcing approach. Secondly, to evaluate the results of sentiment analysis, we assembled a corpus of sentences in which idioms are used in context. Each sentence was annotated with an emotion, which formed the basis for the gold standard used for the comparison against the baseline methods. The classification performance was improved by almost 20 percentage points.
Given the positive findings from our initial experiments, the main limitation was the significant knowledge-engineering overhead involved in hand-crafting lexico-semantic resources used to support idiom-based features. To minimise the bottleneck associated with the acquisition of such resources, we scaled up our original approach by automating their engineering. Subsequently, these resources were used to replace the manually engineered counterparts of such features in the originally proposed method. The fully automated approach outperformed the two baseline methods by 7 and 9 percentage points. These improvements, however, were poorer in comparison to those achieved in the initial study. Nevertheless, we have demonstrated, not only can idiom-based features be automatically engineered, but they too, improve sentiment classification results, when such features are present.
Taking a long-term view of the research in this thesis, we want to address the limitations of state-of-the-art sentiment analysis approaches by focusing on a full range of emotions, rather than sentiment polarity. However, there is no consensus among researchers on a standardised framework for classifying emotions. Proposing such a framework would be a major contribution to the field of sentiment analysis, as it would stimulate its evolution into fully-fledged emotion classification and allow for systematic comparison of independent studies. With this goal in mind, we investigated the utility of different classification frameworks for sentiment analysis. A comprehensive statistical analysis of our experimental results provided explicit evidence that, in relative terms, six basic emotions are best suited for sentiment analysis. However, we identified the major shortcoming of oversimplifying positive emotions
Recommended from our members
Moment-to-moment mood change modelling in mobile mental health network
Human interests and behaviour change over time and often affected by multiple factors. In particular, human emotions, mood and its constituent processes change and interact over time. Therefore, modelling human behaviour should take into account the changes over time for customization and adaptation of systems to the users’ specific needs. Understanding and assessing the temporal dynamics of mood are critical for modelling human behaviour for both individuals and group of people who share similar habits, life style and personal circumstances. Thus, in order to construct a personalized recommendation for a given user, it is first necessary to have some knowledge about previous user interests and behaviours. However, the challenge of obtaining large-scale data on human emotions has left the most fundamental questions on emotions less explored: How do emotions vary across individuals, evolve over time, and are connected to social ties? We address these questions using a large-scale dataset of users that contains both their users’ interactions with momentary emotions and topical labels. Using this dataset, we identify patterns of human emotions on different levels, starting from the network level, group-level (cluster) and moving towards the user level. At the user-level, we identify how human emotions are distributed and vary over time. In particular, we model changes in mood using multi-level multimodal features including users’ sentimental status, engagement and linguistic queries. We also utilise language models to model and understand patterns of mood change. We model the changes of users’ mental states based on replies and responses to posts over time and predict future states. We find that the future mental states can be predicted with reasonable accuracy given users’ historical posts, current participation features. Our findings form a step forward towards better understand the interplay between user behaviour and mood change exhibited while interacting on mental health network and providing some interpretable summaries that can be used in the future by health experts and individuals and work on possible medical interventions together with clinical experts
Recent Developments in Smart Healthcare
Medicine is undergoing a sector-wide transformation thanks to the advances in computing and networking technologies. Healthcare is changing from reactive and hospital-centered to preventive and personalized, from disease focused to well-being centered. In essence, the healthcare systems, as well as fundamental medicine research, are becoming smarter. We anticipate significant improvements in areas ranging from molecular genomics and proteomics to decision support for healthcare professionals through big data analytics, to support behavior changes through technology-enabled self-management, and social and motivational support. Furthermore, with smart technologies, healthcare delivery could also be made more efficient, higher quality, and lower cost. In this special issue, we received a total 45 submissions and accepted 19 outstanding papers that roughly span across several interesting topics on smart healthcare, including public health, health information technology (Health IT), and smart medicine
Application of Common Sense Computing for the Development of a Novel Knowledge-Based Opinion Mining Engine
The ways people express their opinions and sentiments have radically changed in the past few years thanks to the advent of social networks, web communities, blogs, wikis and other online collaborative media. The distillation of knowledge from this huge amount of unstructured information can be a key factor for marketers who want to create an image or identity in the minds of their customers for their product, brand, or organisation. These online social data, however, remain hardly accessible to computers, as they are specifically meant for human consumption. The automatic analysis of online opinions, in fact, involves a deep understanding of natural language text by machines, from which we are still very far.
Hitherto, online information retrieval has been mainly based on algorithms relying on the textual representation of web-pages. Such algorithms are very good at retrieving texts, splitting them into parts, checking the spelling and counting their words. But when it comes to interpreting sentences and extracting meaningful information, their capabilities are known to be very limited. Existing approaches to opinion mining and sentiment analysis, in particular, can be grouped into three main categories: keyword spotting, in which text is classified into categories based on the presence of fairly unambiguous affect words; lexical affinity, which assigns arbitrary words a probabilistic affinity for a particular emotion; statistical methods, which calculate the valence of affective keywords and word co-occurrence frequencies on the base of a large training corpus. Early works aimed to classify entire documents as containing overall positive or negative polarity, or rating scores of reviews.
Such systems were mainly based on supervised approaches relying on manually labelled samples, such as movie or product reviews where the opinionist’s overall positive or negative attitude was explicitly indicated. However, opinions and sentiments do not occur only at document level, nor they are limited to a single valence or target. Contrary or complementary attitudes toward the same topic or multiple topics can be present across the span of a document. In more recent works, text analysis granularity has been taken down to segment and sentence level, e.g., by using presence of opinion-bearing lexical items (single words or n-grams) to detect subjective sentences, or by exploiting association rule mining for a feature-based analysis of product reviews. These approaches, however, are still far from being able to infer the cognitive and affective information associated with natural language as they mainly rely on knowledge bases that are still too limited to efficiently process text at sentence level.
In this thesis, common sense computing techniques are further developed and applied to bridge the semantic gap between word-level natural language data and the concept-level opinions conveyed by these. In particular, the ensemble application of graph mining and multi-dimensionality reduction techniques on two common sense knowledge bases was exploited to develop a novel intelligent engine for open-domain opinion mining and sentiment analysis. The proposed approach, termed sentic computing, performs a clause-level semantic analysis of text, which allows the inference of both the conceptual and emotional information associated with natural language opinions and, hence, a more efficient passage from (unstructured) textual information to (structured) machine-processable data.
The engine was tested on three different resources, namely a Twitter hashtag repository, a LiveJournal database and a PatientOpinion dataset, and its performance compared both with results obtained using standard sentiment analysis techniques and using different state-of-the-art knowledge bases such as Princeton’s WordNet, MIT’s ConceptNet and Microsoft’s Probase. Differently from most currently available opinion mining services, the developed engine does not base its analysis on a limited set of affect words and their co-occurrence frequencies, but rather on common sense concepts and the cognitive and affective valence conveyed by these. This allows the engine to be domain-independent and, hence, to be embedded in any opinion mining system for the development of intelligent applications in multiple fields such as Social Web, HCI and e-health. Looking ahead, the combined novel use of different knowledge bases and of common sense reasoning techniques for opinion mining proposed in this work, will, eventually, pave the way for development of more bio-inspired approaches to the design of natural language processing systems capable of handling knowledge, retrieving it when necessary, making analogies and learning from experience
- …