1,019 research outputs found
Music emotion recognition: a multimodal machine learning approach
Music emotion recognition (MER) is an emerging domain of the Music Information Retrieval (MIR) scientific community, and besides, music searches through emotions are one of the major selection preferred by web users. As the world goes to digital, the musical contents in online databases, such as Last.fm have expanded exponentially, which require substantial manual efforts for managing them and also keeping them updated. Therefore, the demand for innovative and adaptable search mechanisms, which can be personalized according to users’ emotional state, has gained increasing consideration in recent years. This thesis concentrates on addressing music emotion recognition problem by presenting several classification models, which were fed by textual features, as well as audio attributes extracted from the music. In this study, we build both supervised and semisupervised classification designs under four research experiments, that addresses the emotional role of audio features, such as tempo, acousticness, and energy, and also the impact of textual features extracted by two different approaches, which are TF-IDF and Word2Vec. Furthermore, we proposed a multi-modal approach by using a combined feature-set consisting of the features from the audio content, as well as from context-aware data. For this purpose, we generated a ground truth dataset containing over 1500 labeled song lyrics and also unlabeled big data, which stands for more than 2.5 million Turkish documents, for achieving to generate an accurate automatic emotion classification system. The analytical models were conducted by adopting several algorithms on the crossvalidated data by using Python. As a conclusion of the experiments, the best-attained performance was 44.2% when employing only audio features, whereas, with the usage of textual features, better performances were observed with 46.3% and 51.3% accuracy scores considering supervised and semi-supervised learning paradigms, respectively. As of last, even though we created a comprehensive feature set with the combination of audio and textual features, this approach did not display any significant improvement for classification performanc
An HMM-Based Framework for Supporting Accurate Classification of Music Datasets
open3In this paper, we use Hidden Markov Models (HMM) and Mel-Frequency Cepstral Coecients (MFCC) to build statistical models of classical music composers directly from the music datasets. Several
musical pieces are divided by instruments (String, Piano, Chorus, Orchestra), and, for each instrument, statistical models of the composers are computed.We selected 19 dierent composers spanning four centuries by using a total number of 400 musical pieces. Each musical piece is classied as belonging to a composer if the corresponding HMM gives the highest likelihood for that piece. We show that the so-developed models can be used to obtain useful information on the correlation between the composers. Moreover, by using the maximum likelihood approach, we also classied the instrumentation used by the same composer. Besides as an analysis tool, the described approach has been used as a classier. This overall originates an HMM-based framework for supporting accurate classication of music datasets. On a dataset of String Quartet movements, we obtained an average composer classication accuracy of more than 96%. As regards instrumentation classication, we obtained an average classication of slightly less than 100% for Piano, Orchestra and String Quartet. In this paper, the most signicant results coming from our experimental assessment and analysis are reported and discussed in detail.openCuzzocrea, Alfredo; Mumolo, Enzo; Vercelli, GianniCuzzocrea, Alfredo; Mumolo, Enzo; Vercelli, Giann
The Constructivistly-Organised Dimensional-Appraisal (CODA) Model and Evidence for the Role of Goal-directed Processes in Emotional Episodes Induced by Music
The study of affective responses to music is a flourishing field. Advancements in the study of this phenomena have been complemented by the introduction of several music-specific models of emotion, with two of the most well-cited ones being the BRECVEMA and the Multifactorial Process Model. These two models have undoubtedly contributed to the field. However, contemporary developments in the wider affective sciences (broadly described as the ‘rise of affectivism’) have yet to be incorporated into the music emotion literature. These developments in the affective sciences may aid in addressing remaining gaps in the music literature, in particular for acknowledging individual and contextual differences.
The first aim of this thesis was to outline contemporary theories from the wider affective sciences and subsequently critique current popular models of musical emotions through the lens of these advancements. The second aim was to propose a new model based on this critique: the Constructivistly-Organised Dimensional-Appraisal (CODA) model. This CODA model draws together multiple competing models into a single framework centralised around goal-directed appraisal mechanisms which are key to the wider affective sciences but are a less commonly acknowledged component of musical affect. The third aim was to empirically test some of the core hypotheses of the CODA model. In particular, examining goal-directed mechanisms, their validity in a musical context, and their ability to address individual and contextual differences in musically induced affect. Across four experiments which include exploratory and lab-based designs through to real- world applications, the results are supportive of the role of goal-directed mechanisms in musically induced emotional episodes. Experiment one presents a first test battery of multiple appraisal dimensions developed for music. The results show that several of the hypothesised appraisal dimensions are valid dimensions is a musical context. Moreover, these mechanisms cluster into goal-directed latent variables. Experiment two develops a new set of stimuli annotations relating to musical goals, showing that music can be more or less appropriate for different musical goals (functions). Experiment three, using the new stimuli set from experiment two, tests the effects of different goals with more or less appropriate music on musically induced affect. These results show that goal-directed mechanisms can change induced core-affect (valence and arousal) and intensity, even for the same piece of music. Experiment four extends the study of goal-directed mechanisms into a real-world context through an interdisciplinary and cross-cultural design. The final experiment demonstrates how goal-directed mechanisms can be manipulated through different algorithms to induce negative affect in a Colombian population.
The main conclusions of this thesis are that the CODA model, more specifically goal-directed mechanisms, provide a valuable, non-reductive, and more efficient approach to addressing individual and contextual differences for musically induced emotional episodes in the new era of affectivism
Recommended from our members
Automated Classification of Emotions Using Song Lyrics
This thesis explores the classification of emotions in song lyrics, using automatic approaches applied to a novel corpus of 100 popular songs. I use crowd sourcing via Amazon Mechanical Turk to collect line-level emotions annotations for this collection of song lyrics.  I then build classifiers that rely on textual features to automatically identify the presence of one or more of the following six Ekman emotions: anger, disgust, fear, joy, sadness and surprise. I compare different classification systems and evaluate the performance of the automatic systems against the manual annotations. I also introduce a system that uses data collected from the social network Twitter. I use the Twitter API to collect a large corpus of tweets manually labeled by their authors for one of the six emotions of interest. I then compare the classification of emotions obtained when training on data automatically collected from Twitter versus data obtained through crowd sourced annotations
A mood-based music classification and exploration system
Thesis (S.M.)--Massachusetts Institute of Technology, School of Architecture and Planning, Program in Media Arts and Sciences, 2007.Includes bibliographical references (p. 89-93).Mood classification of music is an emerging domain of music information retrieval. In the approach presented here features extracted from an audio file are used in combination with the affective value of song lyrics to map a song onto a psychologically based emotion space. The motivation behind this system is the lack of intuitive and contextually aware playlist generation tools available to music listeners. The need for such tools is made obvious by the fact that digital music libraries are constantly expanding, thus making it increasingly difficult to recall a particular song in the library or to create a playlist for a specific event. By combining audio content information with context-aware data, such as song lyrics, this system allows the listener to automatically generate a playlist to suit their current activity or mood.by Owen Craigie Meyers.S.M
Moanin\u27 at midnight: Patterns, themes, and imagery in blues songs by Howlin\u27 Wolf
Blues music originated in the Deep South where it matured into a recognizable style, known as the country blues. Socioeconomic changes. in the 1920\u27s and 1940\u27s encouraged large numbers of blacks living in the rural South to migrate northward. Chicago, Illinois was a destination for many blacks from the Mississippi Delta region, and the country blues was similarity transplanted to an urban environment. In this new setting, the familiar music of the country began to change and a new style of blues evolved, urban blues. One way to better under5tand the links between the country and urban styles we ·Id be to look at the country and urban features in the lyrics of a performer who played in both styles, Howlin\u27 Wolf. This study explores the country and urban features of Howlin\u27 Wolf\u27s music during the 1950s through an analysis of the patterns, themes, and imagery in his song lyrics. The lyrics are analyzed in the context of three theories: oral formulas in blues composition; the bluesman as fictional persona; and thematic patterns in blues lyrics. The thematic patterns in blues lyric theory proved to be the most useful in identifying the patterns, themes and imagery in the sample. The results indicate a nearly even split in the sample between country and urban lyrical features. The sample indicates that Howlin\u27 Wolfs music did change after he migrated northward; but it retained many of the major features of the country blues style
Text-based Sentiment Analysis and Music Emotion Recognition
Nowadays, with the expansion of social media, large amounts of user-generated
texts like tweets, blog posts or product reviews are shared online. Sentiment polarity
analysis of such texts has become highly attractive and is utilized in recommender
systems, market predictions, business intelligence and more. We also witness deep
learning techniques becoming top performers on those types of tasks. There are
however several problems that need to be solved for efficient use of deep neural
networks on text mining and text polarity analysis.
First of all, deep neural networks are data hungry. They need to be fed with
datasets that are big in size, cleaned and preprocessed as well as properly labeled.
Second, the modern natural language processing concept of word embeddings as a
dense and distributed text feature representation solves sparsity and dimensionality
problems of the traditional bag-of-words model. Still, there are various uncertainties
regarding the use of word vectors: should they be generated from the same dataset
that is used to train the model or it is better to source them from big and popular
collections that work as generic text feature representations? Third, it is not easy for
practitioners to find a simple and highly effective deep learning setup for various
document lengths and types. Recurrent neural networks are weak with longer texts
and optimal convolution-pooling combinations are not easily conceived. It is thus
convenient to have generic neural network architectures that are effective and can
adapt to various texts, encapsulating much of design complexity.
This thesis addresses the above problems to provide methodological and practical
insights for utilizing neural networks on sentiment analysis of texts and achieving
state of the art results. Regarding the first problem, the effectiveness of various
crowdsourcing alternatives is explored and two medium-sized and emotion-labeled
song datasets are created utilizing social tags. One of the research interests of Telecom
Italia was the exploration of relations between music emotional stimulation and
driving style. Consequently, a context-aware music recommender system that aims
to enhance driving comfort and safety was also designed. To address the second
problem, a series of experiments with large text collections of various contents and
domains were conducted. Word embeddings of different parameters were exercised
and results revealed that their quality is influenced (mostly but not only) by the
size of texts they were created from. When working with small text datasets, it is
thus important to source word features from popular and generic word embedding
collections. Regarding the third problem, a series of experiments involving convolutional
and max-pooling neural layers were conducted. Various patterns relating
text properties and network parameters with optimal classification accuracy were
observed. Combining convolutions of words, bigrams, and trigrams with regional
max-pooling layers in a couple of stacks produced the best results. The derived
architecture achieves competitive performance on sentiment polarity analysis of
movie, business and product reviews.
Given that labeled data are becoming the bottleneck of the current deep learning
systems, a future research direction could be the exploration of various data programming
possibilities for constructing even bigger labeled datasets. Investigation
of feature-level or decision-level ensemble techniques in the context of deep neural
networks could also be fruitful. Different feature types do usually represent complementary
characteristics of data. Combining word embedding and traditional text
features or utilizing recurrent networks on document splits and then aggregating the
predictions could further increase prediction accuracy of such models
- …