32 research outputs found

    MelodyGLM: Multi-task Pre-training for Symbolic Melody Generation

    Full text link
    Pre-trained language models have achieved impressive results in various music understanding and generation tasks. However, existing pre-training methods for symbolic melody generation struggle to capture multi-scale, multi-dimensional structural information in note sequences, due to the domain knowledge discrepancy between text and music. Moreover, the lack of available large-scale symbolic melody datasets limits the pre-training improvement. In this paper, we propose MelodyGLM, a multi-task pre-training framework for generating melodies with long-term structure. We design the melodic n-gram and long span sampling strategies to create local and global blank infilling tasks for modeling the local and global structures in melodies. Specifically, we incorporate pitch n-grams, rhythm n-grams, and their combined n-grams into the melodic n-gram blank infilling tasks for modeling the multi-dimensional structures in melodies. To this end, we have constructed a large-scale symbolic melody dataset, MelodyNet, containing more than 0.4 million melody pieces. MelodyNet is utilized for large-scale pre-training and domain-specific n-gram lexicon construction. Both subjective and objective evaluations demonstrate that MelodyGLM surpasses the standard and previous pre-training methods. In particular, subjective evaluations show that, on the melody continuation task, MelodyGLM gains average improvements of 0.82, 0.87, 0.78, and 0.94 in consistency, rhythmicity, structure, and overall quality, respectively. Notably, MelodyGLM nearly matches the quality of human-composed melodies on the melody inpainting task

    Understanding Agreement and Disagreement in Listeners’ Perceived Emotion in Live Music Performance

    Get PDF
    Emotion perception of music is subjective and time dependent. Most computational music emotion recognition (MER) systems overlook time- and listener-dependent factors by averaging emotion judgments across listeners. In this work, we investigate the influence of music, setting (live vs lab vs online), and individual factors on music emotion perception over time. In an initial study, we explore changes in perceived music emotions among audience members during live classical music performances. Fifteen audience members used a mobile application to annotate time-varying emotion judgments based on the valence-arousal model. Inter-rater reliability analyses indicate that consistency in emotion judgments varies significantly across rehearsal segments, with systematic disagreements in certain segments. In a follow-up study, we examine listeners' reasons for their ratings in segments with high and low agreement. We relate these reasons to acoustic features and individual differences. Twenty-one listeners annotated perceived emotions while watching a recorded video of the live performance. They then reflected on their judgments and provided explanations retrospectively. Disagreements were attributed to listeners attending to different musical features or being uncertain about the expressed emotions. Emotion judgments were significantly associated with personality traits, gender, cultural background, and music preference. Thematic analysis of explanations revealed cognitive processes underlying music emotion perception, highlighting attributes less frequently discussed in MER studies, such as instrumentation, arrangement, musical structure, and multimodal factors related to performer expression. Exploratory models incorporating these semantic features and individual factors were developed to predict perceived music emotion over time. Regression analyses confirmed the significance of listener-informed semantic features as independent variables, with individual factors acting as moderators between loudness, pitch range, and arousal. In our final study, we analyzed the effects of individual differences on music emotion perception among 128 participants with diverse backgrounds. Participants annotated perceived emotions for 51 piano performances of different compositions from the Western canon, spanning various era. Linear mixed effects models revealed significant variations in valence and arousal ratings, as well as the frequency of emotion ratings, with regard to several individual factors: music sophistication, music preferences, personality traits, and mood states. Additionally, participants' ratings of arousal, valence, and emotional agreement were significantly associated to the historical time periods of the examined clips. This research highlights the complexity of music emotion perception, revealing it to be a dynamic, individual and context-dependent process. It paves the way for the development of more individually nuanced, time-based models in music psychology, opening up new avenues for personalised music emotion recognition and recommendation, music emotion-driven generation and therapeutic applications

    Proceedings of the 19th Sound and Music Computing Conference

    Get PDF
    Proceedings of the 19th Sound and Music Computing Conference - June 5-12, 2022 - Saint-Étienne (France). https://smc22.grame.f

    Safe and Sound: Proceedings of the 27th Annual International Conference on Auditory Display

    Get PDF
    Complete proceedings of the 27th International Conference on Auditory Display (ICAD2022), June 24-27. Online virtual conference

    Detection and Evaluation of Clusters within Sequential Data

    Full text link
    Motivated by theoretical advancements in dimensionality reduction techniques we use a recent model, called Block Markov Chains, to conduct a practical study of clustering in real-world sequential data. Clustering algorithms for Block Markov Chains possess theoretical optimality guarantees and can be deployed in sparse data regimes. Despite these favorable theoretical properties, a thorough evaluation of these algorithms in realistic settings has been lacking. We address this issue and investigate the suitability of these clustering algorithms in exploratory data analysis of real-world sequential data. In particular, our sequential data is derived from human DNA, written text, animal movement data and financial markets. In order to evaluate the determined clusters, and the associated Block Markov Chain model, we further develop a set of evaluation tools. These tools include benchmarking, spectral noise analysis and statistical model selection tools. An efficient implementation of the clustering algorithm and the new evaluation tools is made available together with this paper. Practical challenges associated to real-world data are encountered and discussed. It is ultimately found that the Block Markov Chain model assumption, together with the tools developed here, can indeed produce meaningful insights in exploratory data analyses despite the complexity and sparsity of real-world data.Comment: 37 pages, 12 figure

    CREATING A COHERENT SCORE: THE MUSIC OF SINGLE-PLAYER FANTASY COMPUTER ROLE-PLAYING GAMES

    Get PDF
    This thesis provides a comprehensive exploration into the music of the ludic genre (Hourigan, 2005) known as a Computer Role-Playing Game (CRPG) and its two main sub-divisions: Japanese and Western Role-Playing Games (JRPGs & WRPGs). It focuses on the narrative category known as genre fiction, concentrating on fantasy fiction (Turco, 1999) and seeks to address one overall question: How do fantasy CRPG composers incorporate the variety of musical material needed to create a coherent score across the JRPG and WRPG divide? Seven main chapters form the thesis text. Chapter One provides an introduction to the thesis, detailing the research contributions in addition to outlining a variety of key terms that must be understood to continue with the rest of the text. A database accompanying this thesis showcases the vast range of CRPGs available; a literature review tackles relevant existing materials. Chapters Two and Three seek to provide the first canonical history of soundtracks used in CRPGs by dissecting typical narrative structures for games so as to provide context to their musical scores. Through analysis of existing game composer interviews, cultural influences are revealed. Chapters Four and Five mirror one another with detailed discussion respectively regarding JRPG and WRPG music including the influence that anime and Hollywood cinema have had upon them. In Chapter Six, the use of CRPG music outside of video games is explored, particularly the popularity of JRPG soundtracks in the concert hall. Chapter Seven concludes the thesis, summarising research contributions achieved and areas for future work. Throughout these chapters, the core task is to explain how the two primary sub-genres of CRPGs parted ways and why the music used to accompany these games differs so drastically
    corecore