12 research outputs found

    Emotion in Music: representation and computational modeling

    No full text
    Music emotion recognition (MER) deals with music classification by emotion using signal processing and machine learning techniques. Emotion ontology for music is not well established yet. Musical emotion can be conceptualized through various emotional models: categorical, dimensional, or domain-specific. Emotion can be represented with a single label, multiple labels, or probability distributions. Also, the time scale to which an emotion label can be applied ranges from half a second to a complete musical piece. Describing musical audio with emotional labels is an inherently subjective task. MER field relies on ground truth data from human labelers. The quality of the ground truth labels is crucial to the performance of the algorithms that are trained on these data. Lack of agreement between the annotators leads to conflicting cues and poor discriminating ability of the algorithms. Conceptualizing musical emotion in a way that is most natural for the listener is crucial both to create better quality ground truth and build intuitive music retrieval systems. In this thesis we mainly deal with the problem of representation of musical emotion. The thesis consists of three parts. Part I. In this part we model induced musical emotion. We create a game with a purpose Emotify to collect emotional label using Geneva Emotional Music Scales as an emotional model. The game is able to produce high quality ground truth, but some modifications to GEMS model are suggested. We use the data from the game to create a computational model. We show that the performance of the model can be improved substantially through developing better features and this step is more crucial than finding a more sophisticated learning algorithm. We suggest new features that describe the harmonic content of music. A much bigger improvement in performance is expected, when high-level musical concepts such as rhythmic complexity, articulation or tonalness can be modeled. Part II. In this part (in collaboration with M. Soleymani and Y.H. Yang) we create a benchmark for Music Emotion Variation Detection (MEVD) algorithms (tracking per-second change in musical emotion). We describe the steps taken to improve the quality of the ground truth, benchmark evaluation metrics. We conduct a systematic evaluation of the algorithms and the feature sets presented at the benchmark. The best approach is to develop separate feature sets for Valence and Arousal dimensions, and incorporate local context either through algorithms that are capable of extracting data from the time-series (LSTM-RNN), or through smoothing. Part III. In this part we build on the experience obtained in benchmark organization and suggest that a better approach to MEVD is to view music as a succession of emotionally stable segments and transitional unstable segments. We proceed to list the reasons why the established MEVD approach is flawed and can not create good quality ground truth. We propose an approach based on CNN combined with MER-informed filtering. Three public data sets, corresponding to each part of the thesis, are released

    Emotion in Music: representation and computational modeling

    No full text
    Music emotion recognition (MER) deals with music classification by emotion using signal processing and machine learning techniques. Emotion ontology for music is not well established yet. Musical emotion can be conceptualized through various emotional models: categorical, dimensional, or domain-specific. Emotion can be represented with a single label, multiple labels, or probability distributions. Also, the time scale to which an emotion label can be applied ranges from half a second to a complete musical piece. Describing musical audio with emotional labels is an inherently subjective task. MER field relies on ground truth data from human labelers. The quality of the ground truth labels is crucial to the performance of the algorithms that are trained on these data. Lack of agreement between the annotators leads to conflicting cues and poor discriminating ability of the algorithms. Conceptualizing musical emotion in a way that is most natural for the listener is crucial both to create better quality ground truth and build intuitive music retrieval systems. In this thesis we mainly deal with the problem of representation of musical emotion. The thesis consists of three parts. Part I. In this part we model induced musical emotion. We create a game with a purpose Emotify to collect emotional label using Geneva Emotional Music Scales as an emotional model. The game is able to produce high quality ground truth, but some modifications to GEMS model are suggested. We use the data from the game to create a computational model. We show that the performance of the model can be improved substantially through developing better features and this step is more crucial than finding a more sophisticated learning algorithm. We suggest new features that describe the harmonic content of music. A much bigger improvement in performance is expected, when high-level musical concepts such as rhythmic complexity, articulation or tonalness can be modeled. Part II. In this part (in collaboration with M. Soleymani and Y.H. Yang) we create a benchmark for Music Emotion Variation Detection (MEVD) algorithms (tracking per-second change in musical emotion). We describe the steps taken to improve the quality of the ground truth, benchmark evaluation metrics. We conduct a systematic evaluation of the algorithms and the feature sets presented at the benchmark. The best approach is to develop separate feature sets for Valence and Arousal dimensions, and incorporate local context either through algorithms that are capable of extracting data from the time-series (LSTM-RNN), or through smoothing. Part III. In this part we build on the experience obtained in benchmark organization and suggest that a better approach to MEVD is to view music as a succession of emotionally stable segments and transitional unstable segments. We proceed to list the reasons why the established MEVD approach is flawed and can not create good quality ground truth. We propose an approach based on CNN combined with MER-informed filtering. Three public data sets, corresponding to each part of the thesis, are released

    Emotion based segmentation of musical audio

    No full text
    The dominant approach to musical emotion variation detection tracks emotion over time continuously and usually deals with time resolutions of one second. In this paper we discuss the problems associated with this approach and propose to move to bigger time resolutions when tracking emotion over time. We argue that it is more natural from the listener’s point of view to regard emotional variation in music as a progression of emotionally stable segments. In order to enable such tracking of emotion over time it is necessary to segment music at the emotional boundaries. To address this problem we conduct a formal evaluation of different segmentation methods as applied to a task of emotional boundary detection. We collect emotional boundary annotations from three annotators for 52 musical pieces from the RWC music collection that already have structural annotations from the SALAMI dataset. We investigate how well structural segmentation explains emotional segmentation and find that there is a large overlap, though about a quarter of emotional boundaries do not coincide with structural ones. We also study inter-annotator agreement on emotional segmentation. Lastly, we evaluate different unsupervised segmentation methods when applied to emotional boundary detection and find that, in terms of F-measure, the Structural Features method performs best

    Studying emotion induced by music through a crowdsourcing game

    Get PDF
    One of the major reasons why people find music so enjoyable is its emotional impact. Creating emotion-based playlists is a natural way of organizing music. The usability of online music streaming services could be greatly improved by developing emotion-based access methods, and automatic music emotion recognition (MER) is the most quick and feasible way of achieving it. When resorting to music for emotional regulation purposes, users are interested in the MER method to predict their induced, or felt emotion. The progress of MER in this area is impeded by the absence of publicly accessible ground-truth data on musically induced emotion. Also, there is no consensus on the question which emotional model best fits the demands of the users and can provide an unambiguous linguistic framework to describe musical emotions. In this paper we address these problems by creating a sizeable publicly available dataset of 400 musical excerpts from four genres annotated with induced emotion. We collected the data using an online “game with a purpose” Emotify, which attracted a big and varied sample of participants. We employed a nine item domain-specific emotional model GEMS (Geneva Emotional Music Scale). In this paper we analyze the collected data and report agreement of participants on different categories of GEMS. We also analyze influence of extra-musical factors on induced emotion (gender, mood, music preferences). We suggest that modifications in GEMS model are necessary

    Designing Games with a Purpose for Data Collection in Music Research. Emotify and Hooked: Two Case Studies

    Get PDF
    Collecting ground truth data for music research requires large amounts of time and money. To avoid these costs, researchers are now trying to collect information through online multiplayer games with the underlying purpose of collecting scientific data. In this paper we present two case studies of such games created for data collection in music information retrieval (MIR): Emotify, for emotional annotation of music, and Hooked, for studying musical catchiness. In addition to the basic requirement of scientific validity, both applications address essential development and design issues, for example, acquiring licensed music or employing popular social frameworks. As such, we hope that they may serve as blueprints for the development of future serious games, not only for music but also for other humanistic domains. The pilot launch of these two games showed that their models are capable of engaging participants and supporting large-scale empirical research

    Learning Music Emotion Primitives via Supervised Dynamic Clustering

    No full text
    24th ACM Multimedia Conference, MM 2016, Amsterdam, UK, 15-19 October 2016This paper explores a fundamental problem in music emotion analysis, i.e., how to segment the music sequence into a set of basic emotive units, which are named as emotion primitives. Current works on music emotion analysis are mainly based on the fixedlength music segments, which often leads to the difficulty of accurate emotion recognition. Short music segment, such as an individual music frame, may fail to evoke emotion response. Long music segment, such as an entire song, may convey various emotions over time. Moreover, the minimum length of music segment varies depending on the types of the emotions. To address these problems, we propose a novel method dubbed supervised dynamic clustering (SDC) to automatically decompose the music sequence into meaningful segments with various lengths. First, the music sequence is represented by a set of music frames. Then, the music frames are clustered according to the valence-arousal values in the emotion space. The clustering results are used to initialize the music segmentation. After that, a dynamic programming scheme is employed to jointly optimize the subsequent segmentation and grouping in the music feature space. Experimental results on standard dataset show both the effectiveness and the rationality of the proposed method.Department of Computin
    corecore