4,815 research outputs found

    Affective Music Information Retrieval

    Full text link
    Much of the appeal of music lies in its power to convey emotions/moods and to evoke them in listeners. In consequence, the past decade witnessed a growing interest in modeling emotions from musical signals in the music information retrieval (MIR) community. In this article, we present a novel generative approach to music emotion modeling, with a specific focus on the valence-arousal (VA) dimension model of emotion. The presented generative model, called \emph{acoustic emotion Gaussians} (AEG), better accounts for the subjectivity of emotion perception by the use of probability distributions. Specifically, it learns from the emotion annotations of multiple subjects a Gaussian mixture model in the VA space with prior constraints on the corresponding acoustic features of the training music pieces. Such a computational framework is technically sound, capable of learning in an online fashion, and thus applicable to a variety of applications, including user-independent (general) and user-dependent (personalized) emotion recognition and emotion-based music retrieval. We report evaluations of the aforementioned applications of AEG on a larger-scale emotion-annotated corpora, AMG1608, to demonstrate the effectiveness of AEG and to showcase how evaluations are conducted for research on emotion-based MIR. Directions of future work are also discussed.Comment: 40 pages, 18 figures, 5 tables, author versio

    CHORUS Deliverable 2.2: Second report - identification of multi-disciplinary key issues for gap analysis toward EU multimedia search engines roadmap

    Get PDF
    After addressing the state-of-the-art during the first year of Chorus and establishing the existing landscape in multimedia search engines, we have identified and analyzed gaps within European research effort during our second year. In this period we focused on three directions, notably technological issues, user-centred issues and use-cases and socio- economic and legal aspects. These were assessed by two central studies: firstly, a concerted vision of functional breakdown of generic multimedia search engine, and secondly, a representative use-cases descriptions with the related discussion on requirement for technological challenges. Both studies have been carried out in cooperation and consultation with the community at large through EC concertation meetings (multimedia search engines cluster), several meetings with our Think-Tank, presentations in international conferences, and surveys addressed to EU projects coordinators as well as National initiatives coordinators. Based on the obtained feedback we identified two types of gaps, namely core technological gaps that involve research challenges, and “enablers”, which are not necessarily technical research challenges, but have impact on innovation progress. New socio-economic trends are presented as well as emerging legal challenges

    User-Centered Categorization of Mood in Fiction

    Get PDF
    Readers articulate mood in deeply subjective ways, yet the underlying structure of users’ understanding of the media they consume has important implications for retrieval and access. User articulations might at first seem too idiosyncratic, but organizing them meaningfully has considerable potential to provide a better searching experience for all involved. The current study develops mood categories inductively for fiction organization and retrieval in information systems.We developed and distributed an open-ended survey to 76 fiction readers to understand their preferences with regard to the affective elements in fiction. From the fiction reader responses, the research team identified 161 mood terms and used them for further categorization.Our inductive approach resulted in 30 categories, including angry, cozy, dark, and nostalgic. Results include three overlapping mood families: Emotion, Tone/Narrative, and Atmosphere/Setting, which in turn relate to structures that connect reader-generated data with conceptual frameworks in previous studies.The inherent complexity of “mood” should not dissuade us from carefully investigating users’ preferences in this regard. Adding to the existing efforts of classifying moods conducted by experts, the current study presents mood terms provided by actual end-users when describing different moods in fiction. This study offers a useful roadmap for creating taxonomies for retrieval and description, as well as structures derived from user-provided terms that ultimately have the potential to improve user experience

    Text-based Sentiment Analysis and Music Emotion Recognition

    Get PDF
    Nowadays, with the expansion of social media, large amounts of user-generated texts like tweets, blog posts or product reviews are shared online. Sentiment polarity analysis of such texts has become highly attractive and is utilized in recommender systems, market predictions, business intelligence and more. We also witness deep learning techniques becoming top performers on those types of tasks. There are however several problems that need to be solved for efficient use of deep neural networks on text mining and text polarity analysis. First of all, deep neural networks are data hungry. They need to be fed with datasets that are big in size, cleaned and preprocessed as well as properly labeled. Second, the modern natural language processing concept of word embeddings as a dense and distributed text feature representation solves sparsity and dimensionality problems of the traditional bag-of-words model. Still, there are various uncertainties regarding the use of word vectors: should they be generated from the same dataset that is used to train the model or it is better to source them from big and popular collections that work as generic text feature representations? Third, it is not easy for practitioners to find a simple and highly effective deep learning setup for various document lengths and types. Recurrent neural networks are weak with longer texts and optimal convolution-pooling combinations are not easily conceived. It is thus convenient to have generic neural network architectures that are effective and can adapt to various texts, encapsulating much of design complexity. This thesis addresses the above problems to provide methodological and practical insights for utilizing neural networks on sentiment analysis of texts and achieving state of the art results. Regarding the first problem, the effectiveness of various crowdsourcing alternatives is explored and two medium-sized and emotion-labeled song datasets are created utilizing social tags. One of the research interests of Telecom Italia was the exploration of relations between music emotional stimulation and driving style. Consequently, a context-aware music recommender system that aims to enhance driving comfort and safety was also designed. To address the second problem, a series of experiments with large text collections of various contents and domains were conducted. Word embeddings of different parameters were exercised and results revealed that their quality is influenced (mostly but not only) by the size of texts they were created from. When working with small text datasets, it is thus important to source word features from popular and generic word embedding collections. Regarding the third problem, a series of experiments involving convolutional and max-pooling neural layers were conducted. Various patterns relating text properties and network parameters with optimal classification accuracy were observed. Combining convolutions of words, bigrams, and trigrams with regional max-pooling layers in a couple of stacks produced the best results. The derived architecture achieves competitive performance on sentiment polarity analysis of movie, business and product reviews. Given that labeled data are becoming the bottleneck of the current deep learning systems, a future research direction could be the exploration of various data programming possibilities for constructing even bigger labeled datasets. Investigation of feature-level or decision-level ensemble techniques in the context of deep neural networks could also be fruitful. Different feature types do usually represent complementary characteristics of data. Combining word embedding and traditional text features or utilizing recurrent networks on document splits and then aggregating the predictions could further increase prediction accuracy of such models

    PERSONALIZED INDEXING OF MUSIC BY EMOTIONS

    Get PDF
    How a person interprets music and what prompts a person to feel certain emotions are two very subjective things. This dissertation presents a method where a system can learn and track a user’s listening habits with the purpose of recommending songs that fit the user’s specific way of interpreting music and emotions. First a literature review is presented which shows an overview of the current state of recommender systems, as well as describing classifiers; then the process of collecting user data is discussed; then the process of training and testing personalized classifiers is described; finally a system combining the personalized classifiers with clustered data into a hierarchy of recommender systems is presented

    Content Recommendation Through Linked Data

    Get PDF
    Nowadays, people can easily obtain a huge amount of information from the Web, but often they have no criteria to discern it. This issue is known as information overload. Recommender systems are software tools to suggest interesting items to users and can help them to deal with a vast amount of information. Linked Data is a set of best practices to publish data on the Web, and it is the basis of the Web of Data, an interconnected global dataspace. This thesis discusses how to discover information useful for the user from the vast amount of structured data, and notably Linked Data available on the Web. The work addresses this issue by considering three research questions: how to exploit existing relationships between resources published on the Web to provide recommendations to users; how to represent the user and his context to generate better recommendations for the current situation; and how to effectively visualize the recommended resources and their relationships. To address the first question, the thesis proposes a new algorithm based on Linked Data which exploits existing relationships between resources to recommend related resources. The algorithm was integrated into a framework to deploy and evaluate Linked Data based recommendation algorithms. In fact, a related problem is how to compare them and how to evaluate their performance when applied to a given dataset. The user evaluation showed that our algorithm improves the rate of new recommendations, while maintaining a satisfying prediction accuracy. To represent the user and their context, this thesis presents the Recommender System Context ontology, which is exploited in a new context-aware approach that can be used with existing recommendation algorithms. The evaluation showed that this method can significantly improve the prediction accuracy. As regards the problem of effectively visualizing the recommended resources and their relationships, this thesis proposes a visualization framework for DBpedia (the Linked Data version of Wikipedia) and mobile devices, which is designed to be extended to other datasets. In summary, this thesis shows how it is possible to exploit structured data available on the Web to recommend useful resources to users. Linked Data were successfully exploited in recommender systems. Various proposed approaches were implemented and applied to use cases of Telecom Italia

    Moody Music Generator: Characterising Control Parameters Using Crowdsourcing.

    Get PDF
    Abstract. We characterise the expressive effects of a music generator capable of varying its moods through two control parameters. The two control parameters were constructed on the basis of existing work on va-lence and arousal in music, and intended to provide control over those two mood factors. In this paper we conduct a listener study to determine how people actually perceive the various moods the generator can produce. Rather than directly attempting to validate that our two control param-eters represent arousal and valence, instead we conduct an open-ended study to crowd-source labels characterising different parts of this two-dimensional control space. Our aim is to characterise perception of the generator’s expressive space, without constraining listeners ’ responses to labels specifically aimed at validating the original arousal/valence moti-vation. Subjects were asked to listen to clips of generated music over the Internet, and to describe the moods with free-text labels. We find that the arousal parameter does roughly map to perceived arousal, but that the nominal “valence ” parameter has strong interaction with the arousal parameter, and produces different effects in different parts of the con-trol space. We believe that the characterisation methodology described here is general and could be used to map the expressive range of other parameterisable generators.
    • …
    corecore