1,019 research outputs found

    Music emotion recognition: a multimodal machine learning approach

    Get PDF
    Music emotion recognition (MER) is an emerging domain of the Music Information Retrieval (MIR) scientific community, and besides, music searches through emotions are one of the major selection preferred by web users. As the world goes to digital, the musical contents in online databases, such as Last.fm have expanded exponentially, which require substantial manual efforts for managing them and also keeping them updated. Therefore, the demand for innovative and adaptable search mechanisms, which can be personalized according to users’ emotional state, has gained increasing consideration in recent years. This thesis concentrates on addressing music emotion recognition problem by presenting several classification models, which were fed by textual features, as well as audio attributes extracted from the music. In this study, we build both supervised and semisupervised classification designs under four research experiments, that addresses the emotional role of audio features, such as tempo, acousticness, and energy, and also the impact of textual features extracted by two different approaches, which are TF-IDF and Word2Vec. Furthermore, we proposed a multi-modal approach by using a combined feature-set consisting of the features from the audio content, as well as from context-aware data. For this purpose, we generated a ground truth dataset containing over 1500 labeled song lyrics and also unlabeled big data, which stands for more than 2.5 million Turkish documents, for achieving to generate an accurate automatic emotion classification system. The analytical models were conducted by adopting several algorithms on the crossvalidated data by using Python. As a conclusion of the experiments, the best-attained performance was 44.2% when employing only audio features, whereas, with the usage of textual features, better performances were observed with 46.3% and 51.3% accuracy scores considering supervised and semi-supervised learning paradigms, respectively. As of last, even though we created a comprehensive feature set with the combination of audio and textual features, this approach did not display any significant improvement for classification performanc

    Modeling media as latent semantics based on cognitive components

    Get PDF

    An HMM-Based Framework for Supporting Accurate Classification of Music Datasets

    Get PDF
    open3In this paper, we use Hidden Markov Models (HMM) and Mel-Frequency Cepstral Coecients (MFCC) to build statistical models of classical music composers directly from the music datasets. Several musical pieces are divided by instruments (String, Piano, Chorus, Orchestra), and, for each instrument, statistical models of the composers are computed.We selected 19 dierent composers spanning four centuries by using a total number of 400 musical pieces. Each musical piece is classied as belonging to a composer if the corresponding HMM gives the highest likelihood for that piece. We show that the so-developed models can be used to obtain useful information on the correlation between the composers. Moreover, by using the maximum likelihood approach, we also classied the instrumentation used by the same composer. Besides as an analysis tool, the described approach has been used as a classier. This overall originates an HMM-based framework for supporting accurate classication of music datasets. On a dataset of String Quartet movements, we obtained an average composer classication accuracy of more than 96%. As regards instrumentation classication, we obtained an average classication of slightly less than 100% for Piano, Orchestra and String Quartet. In this paper, the most signicant results coming from our experimental assessment and analysis are reported and discussed in detail.openCuzzocrea, Alfredo; Mumolo, Enzo; Vercelli, GianniCuzzocrea, Alfredo; Mumolo, Enzo; Vercelli, Giann

    The Constructivistly-Organised Dimensional-Appraisal (CODA) Model and Evidence for the Role of Goal-directed Processes in Emotional Episodes Induced by Music

    Get PDF
    The study of affective responses to music is a flourishing field. Advancements in the study of this phenomena have been complemented by the introduction of several music-specific models of emotion, with two of the most well-cited ones being the BRECVEMA and the Multifactorial Process Model. These two models have undoubtedly contributed to the field. However, contemporary developments in the wider affective sciences (broadly described as the ‘rise of affectivism’) have yet to be incorporated into the music emotion literature. These developments in the affective sciences may aid in addressing remaining gaps in the music literature, in particular for acknowledging individual and contextual differences. The first aim of this thesis was to outline contemporary theories from the wider affective sciences and subsequently critique current popular models of musical emotions through the lens of these advancements. The second aim was to propose a new model based on this critique: the Constructivistly-Organised Dimensional-Appraisal (CODA) model. This CODA model draws together multiple competing models into a single framework centralised around goal-directed appraisal mechanisms which are key to the wider affective sciences but are a less commonly acknowledged component of musical affect. The third aim was to empirically test some of the core hypotheses of the CODA model. In particular, examining goal-directed mechanisms, their validity in a musical context, and their ability to address individual and contextual differences in musically induced affect. Across four experiments which include exploratory and lab-based designs through to real- world applications, the results are supportive of the role of goal-directed mechanisms in musically induced emotional episodes. Experiment one presents a first test battery of multiple appraisal dimensions developed for music. The results show that several of the hypothesised appraisal dimensions are valid dimensions is a musical context. Moreover, these mechanisms cluster into goal-directed latent variables. Experiment two develops a new set of stimuli annotations relating to musical goals, showing that music can be more or less appropriate for different musical goals (functions). Experiment three, using the new stimuli set from experiment two, tests the effects of different goals with more or less appropriate music on musically induced affect. These results show that goal-directed mechanisms can change induced core-affect (valence and arousal) and intensity, even for the same piece of music. Experiment four extends the study of goal-directed mechanisms into a real-world context through an interdisciplinary and cross-cultural design. The final experiment demonstrates how goal-directed mechanisms can be manipulated through different algorithms to induce negative affect in a Colombian population. The main conclusions of this thesis are that the CODA model, more specifically goal-directed mechanisms, provide a valuable, non-reductive, and more efficient approach to addressing individual and contextual differences for musically induced emotional episodes in the new era of affectivism

    A mood-based music classification and exploration system

    Get PDF
    Thesis (S.M.)--Massachusetts Institute of Technology, School of Architecture and Planning, Program in Media Arts and Sciences, 2007.Includes bibliographical references (p. 89-93).Mood classification of music is an emerging domain of music information retrieval. In the approach presented here features extracted from an audio file are used in combination with the affective value of song lyrics to map a song onto a psychologically based emotion space. The motivation behind this system is the lack of intuitive and contextually aware playlist generation tools available to music listeners. The need for such tools is made obvious by the fact that digital music libraries are constantly expanding, thus making it increasingly difficult to recall a particular song in the library or to create a playlist for a specific event. By combining audio content information with context-aware data, such as song lyrics, this system allows the listener to automatically generate a playlist to suit their current activity or mood.by Owen Craigie Meyers.S.M

    Moanin\u27 at midnight: Patterns, themes, and imagery in blues songs by Howlin\u27 Wolf

    Get PDF
    Blues music originated in the Deep South where it matured into a recognizable style, known as the country blues. Socioeconomic changes. in the 1920\u27s and 1940\u27s encouraged large numbers of blacks living in the rural South to migrate northward. Chicago, Illinois was a destination for many blacks from the Mississippi Delta region, and the country blues was similarity transplanted to an urban environment. In this new setting, the familiar music of the country began to change and a new style of blues evolved, urban blues. One way to better under5tand the links between the country and urban styles we ·Id be to look at the country and urban features in the lyrics of a performer who played in both styles, Howlin\u27 Wolf. This study explores the country and urban features of Howlin\u27 Wolf\u27s music during the 1950s through an analysis of the patterns, themes, and imagery in his song lyrics. The lyrics are analyzed in the context of three theories: oral formulas in blues composition; the bluesman as fictional persona; and thematic patterns in blues lyrics. The thematic patterns in blues lyric theory proved to be the most useful in identifying the patterns, themes and imagery in the sample. The results indicate a nearly even split in the sample between country and urban lyrical features. The sample indicates that Howlin\u27 Wolfs music did change after he migrated northward; but it retained many of the major features of the country blues style

    Text-based Sentiment Analysis and Music Emotion Recognition

    Get PDF
    Nowadays, with the expansion of social media, large amounts of user-generated texts like tweets, blog posts or product reviews are shared online. Sentiment polarity analysis of such texts has become highly attractive and is utilized in recommender systems, market predictions, business intelligence and more. We also witness deep learning techniques becoming top performers on those types of tasks. There are however several problems that need to be solved for efficient use of deep neural networks on text mining and text polarity analysis. First of all, deep neural networks are data hungry. They need to be fed with datasets that are big in size, cleaned and preprocessed as well as properly labeled. Second, the modern natural language processing concept of word embeddings as a dense and distributed text feature representation solves sparsity and dimensionality problems of the traditional bag-of-words model. Still, there are various uncertainties regarding the use of word vectors: should they be generated from the same dataset that is used to train the model or it is better to source them from big and popular collections that work as generic text feature representations? Third, it is not easy for practitioners to find a simple and highly effective deep learning setup for various document lengths and types. Recurrent neural networks are weak with longer texts and optimal convolution-pooling combinations are not easily conceived. It is thus convenient to have generic neural network architectures that are effective and can adapt to various texts, encapsulating much of design complexity. This thesis addresses the above problems to provide methodological and practical insights for utilizing neural networks on sentiment analysis of texts and achieving state of the art results. Regarding the first problem, the effectiveness of various crowdsourcing alternatives is explored and two medium-sized and emotion-labeled song datasets are created utilizing social tags. One of the research interests of Telecom Italia was the exploration of relations between music emotional stimulation and driving style. Consequently, a context-aware music recommender system that aims to enhance driving comfort and safety was also designed. To address the second problem, a series of experiments with large text collections of various contents and domains were conducted. Word embeddings of different parameters were exercised and results revealed that their quality is influenced (mostly but not only) by the size of texts they were created from. When working with small text datasets, it is thus important to source word features from popular and generic word embedding collections. Regarding the third problem, a series of experiments involving convolutional and max-pooling neural layers were conducted. Various patterns relating text properties and network parameters with optimal classification accuracy were observed. Combining convolutions of words, bigrams, and trigrams with regional max-pooling layers in a couple of stacks produced the best results. The derived architecture achieves competitive performance on sentiment polarity analysis of movie, business and product reviews. Given that labeled data are becoming the bottleneck of the current deep learning systems, a future research direction could be the exploration of various data programming possibilities for constructing even bigger labeled datasets. Investigation of feature-level or decision-level ensemble techniques in the context of deep neural networks could also be fruitful. Different feature types do usually represent complementary characteristics of data. Combining word embedding and traditional text features or utilizing recurrent networks on document splits and then aggregating the predictions could further increase prediction accuracy of such models
    • …
    corecore