4,815 research outputs found
Affective Music Information Retrieval
Much of the appeal of music lies in its power to convey emotions/moods and to
evoke them in listeners. In consequence, the past decade witnessed a growing
interest in modeling emotions from musical signals in the music information
retrieval (MIR) community. In this article, we present a novel generative
approach to music emotion modeling, with a specific focus on the
valence-arousal (VA) dimension model of emotion. The presented generative
model, called \emph{acoustic emotion Gaussians} (AEG), better accounts for the
subjectivity of emotion perception by the use of probability distributions.
Specifically, it learns from the emotion annotations of multiple subjects a
Gaussian mixture model in the VA space with prior constraints on the
corresponding acoustic features of the training music pieces. Such a
computational framework is technically sound, capable of learning in an online
fashion, and thus applicable to a variety of applications, including
user-independent (general) and user-dependent (personalized) emotion
recognition and emotion-based music retrieval. We report evaluations of the
aforementioned applications of AEG on a larger-scale emotion-annotated corpora,
AMG1608, to demonstrate the effectiveness of AEG and to showcase how
evaluations are conducted for research on emotion-based MIR. Directions of
future work are also discussed.Comment: 40 pages, 18 figures, 5 tables, author versio
CHORUS Deliverable 2.2: Second report - identification of multi-disciplinary key issues for gap analysis toward EU multimedia search engines roadmap
After addressing the state-of-the-art during the first year of Chorus and establishing the existing landscape in
multimedia search engines, we have identified and analyzed gaps within European research effort during our second year.
In this period we focused on three directions, notably technological issues, user-centred issues and use-cases and socio-
economic and legal aspects. These were assessed by two central studies: firstly, a concerted vision of functional breakdown
of generic multimedia search engine, and secondly, a representative use-cases descriptions with the related discussion on
requirement for technological challenges. Both studies have been carried out in cooperation and consultation with the
community at large through EC concertation meetings (multimedia search engines cluster), several meetings with our
Think-Tank, presentations in international conferences, and surveys addressed to EU projects coordinators as well as
National initiatives coordinators. Based on the obtained feedback we identified two types of gaps, namely core
technological gaps that involve research challenges, and “enablers”, which are not necessarily technical research
challenges, but have impact on innovation progress. New socio-economic trends are presented as well as emerging legal
challenges
User-Centered Categorization of Mood in Fiction
Readers articulate mood in deeply subjective ways, yet the underlying structure of users’ understanding of the media they consume has important implications for retrieval and access. User articulations might at first seem too idiosyncratic, but organizing them meaningfully has considerable potential to provide a better searching experience for all involved. The current study develops mood categories inductively for fiction organization and retrieval in information systems.We developed and distributed an open-ended survey to 76 fiction readers to understand their preferences with regard to the affective elements in fiction. From the fiction reader responses, the research team identified 161 mood terms and used them for further categorization.Our inductive approach resulted in 30 categories, including angry, cozy, dark, and nostalgic. Results include three overlapping mood families: Emotion, Tone/Narrative, and Atmosphere/Setting, which in turn relate to structures that connect reader-generated data with conceptual frameworks in previous studies.The inherent complexity of “mood” should not dissuade us from carefully investigating users’ preferences in this regard. Adding to the existing efforts of classifying moods conducted by experts, the current study presents mood terms provided by actual end-users when describing different moods in fiction. This study offers a useful roadmap for creating taxonomies for retrieval and description, as well as structures derived from user-provided terms that ultimately have the potential to improve user experience
Text-based Sentiment Analysis and Music Emotion Recognition
Nowadays, with the expansion of social media, large amounts of user-generated
texts like tweets, blog posts or product reviews are shared online. Sentiment polarity
analysis of such texts has become highly attractive and is utilized in recommender
systems, market predictions, business intelligence and more. We also witness deep
learning techniques becoming top performers on those types of tasks. There are
however several problems that need to be solved for efficient use of deep neural
networks on text mining and text polarity analysis.
First of all, deep neural networks are data hungry. They need to be fed with
datasets that are big in size, cleaned and preprocessed as well as properly labeled.
Second, the modern natural language processing concept of word embeddings as a
dense and distributed text feature representation solves sparsity and dimensionality
problems of the traditional bag-of-words model. Still, there are various uncertainties
regarding the use of word vectors: should they be generated from the same dataset
that is used to train the model or it is better to source them from big and popular
collections that work as generic text feature representations? Third, it is not easy for
practitioners to find a simple and highly effective deep learning setup for various
document lengths and types. Recurrent neural networks are weak with longer texts
and optimal convolution-pooling combinations are not easily conceived. It is thus
convenient to have generic neural network architectures that are effective and can
adapt to various texts, encapsulating much of design complexity.
This thesis addresses the above problems to provide methodological and practical
insights for utilizing neural networks on sentiment analysis of texts and achieving
state of the art results. Regarding the first problem, the effectiveness of various
crowdsourcing alternatives is explored and two medium-sized and emotion-labeled
song datasets are created utilizing social tags. One of the research interests of Telecom
Italia was the exploration of relations between music emotional stimulation and
driving style. Consequently, a context-aware music recommender system that aims
to enhance driving comfort and safety was also designed. To address the second
problem, a series of experiments with large text collections of various contents and
domains were conducted. Word embeddings of different parameters were exercised
and results revealed that their quality is influenced (mostly but not only) by the
size of texts they were created from. When working with small text datasets, it is
thus important to source word features from popular and generic word embedding
collections. Regarding the third problem, a series of experiments involving convolutional
and max-pooling neural layers were conducted. Various patterns relating
text properties and network parameters with optimal classification accuracy were
observed. Combining convolutions of words, bigrams, and trigrams with regional
max-pooling layers in a couple of stacks produced the best results. The derived
architecture achieves competitive performance on sentiment polarity analysis of
movie, business and product reviews.
Given that labeled data are becoming the bottleneck of the current deep learning
systems, a future research direction could be the exploration of various data programming
possibilities for constructing even bigger labeled datasets. Investigation
of feature-level or decision-level ensemble techniques in the context of deep neural
networks could also be fruitful. Different feature types do usually represent complementary
characteristics of data. Combining word embedding and traditional text
features or utilizing recurrent networks on document splits and then aggregating the
predictions could further increase prediction accuracy of such models
PERSONALIZED INDEXING OF MUSIC BY EMOTIONS
How a person interprets music and what prompts a person to feel certain emotions are two very subjective things. This dissertation presents a method where a system can learn and track a user’s listening habits with the purpose of recommending songs that fit the user’s specific way of interpreting music and emotions. First a literature review is presented which shows an overview of the current state of recommender systems, as well as describing classifiers; then the process of collecting user data is discussed; then the process of training and testing personalized classifiers is described; finally a system combining the personalized classifiers with clustered data into a hierarchy of recommender systems is presented
Content Recommendation Through Linked Data
Nowadays, people can easily obtain a huge amount of information from the Web, but often they have no criteria to discern it. This issue is known as information overload. Recommender systems are software tools to suggest interesting items to users and can help them to deal with a vast amount of information. Linked Data is a set of best practices to publish data on the Web, and it is the basis of the Web of Data, an interconnected global dataspace.
This thesis discusses how to discover information useful for the user from the vast amount of structured data, and notably Linked Data available on the Web. The work addresses this issue by considering three research questions: how to exploit existing relationships between resources published on the Web to provide recommendations to users; how to represent the user and his context to generate better recommendations for the current situation; and how to effectively visualize the recommended resources and their relationships.
To address the first question, the thesis proposes a new algorithm based on Linked Data which exploits existing relationships between resources to recommend related resources. The algorithm was integrated into a framework to deploy and evaluate Linked Data based recommendation algorithms. In fact, a related problem is how to compare them and how to evaluate their performance when applied to a given dataset. The user evaluation showed that our algorithm improves the rate of new recommendations, while maintaining a satisfying prediction accuracy. To represent the user and their context, this thesis presents the Recommender System Context ontology, which is exploited in a new context-aware approach that can be used with existing recommendation algorithms. The evaluation showed that this method can significantly improve the prediction accuracy. As regards the problem of effectively visualizing the recommended resources and their relationships, this thesis proposes a visualization framework for DBpedia (the Linked Data version of Wikipedia) and mobile devices, which is designed to be extended to other datasets.
In summary, this thesis shows how it is possible to exploit structured data available on the Web to recommend useful resources to users. Linked Data were successfully exploited in recommender systems. Various proposed approaches were implemented and applied to use cases of Telecom Italia
Moody Music Generator: Characterising Control Parameters Using Crowdsourcing.
Abstract. We characterise the expressive effects of a music generator capable of varying its moods through two control parameters. The two control parameters were constructed on the basis of existing work on va-lence and arousal in music, and intended to provide control over those two mood factors. In this paper we conduct a listener study to determine how people actually perceive the various moods the generator can produce. Rather than directly attempting to validate that our two control param-eters represent arousal and valence, instead we conduct an open-ended study to crowd-source labels characterising different parts of this two-dimensional control space. Our aim is to characterise perception of the generator’s expressive space, without constraining listeners ’ responses to labels specifically aimed at validating the original arousal/valence moti-vation. Subjects were asked to listen to clips of generated music over the Internet, and to describe the moods with free-text labels. We find that the arousal parameter does roughly map to perceived arousal, but that the nominal “valence ” parameter has strong interaction with the arousal parameter, and produces different effects in different parts of the con-trol space. We believe that the characterisation methodology described here is general and could be used to map the expressive range of other parameterisable generators.
- …