32 research outputs found
MelodyGLM: Multi-task Pre-training for Symbolic Melody Generation
Pre-trained language models have achieved impressive results in various music
understanding and generation tasks. However, existing pre-training methods for
symbolic melody generation struggle to capture multi-scale, multi-dimensional
structural information in note sequences, due to the domain knowledge
discrepancy between text and music. Moreover, the lack of available large-scale
symbolic melody datasets limits the pre-training improvement. In this paper, we
propose MelodyGLM, a multi-task pre-training framework for generating melodies
with long-term structure. We design the melodic n-gram and long span sampling
strategies to create local and global blank infilling tasks for modeling the
local and global structures in melodies. Specifically, we incorporate pitch
n-grams, rhythm n-grams, and their combined n-grams into the melodic n-gram
blank infilling tasks for modeling the multi-dimensional structures in
melodies. To this end, we have constructed a large-scale symbolic melody
dataset, MelodyNet, containing more than 0.4 million melody pieces. MelodyNet
is utilized for large-scale pre-training and domain-specific n-gram lexicon
construction. Both subjective and objective evaluations demonstrate that
MelodyGLM surpasses the standard and previous pre-training methods. In
particular, subjective evaluations show that, on the melody continuation task,
MelodyGLM gains average improvements of 0.82, 0.87, 0.78, and 0.94 in
consistency, rhythmicity, structure, and overall quality, respectively.
Notably, MelodyGLM nearly matches the quality of human-composed melodies on the
melody inpainting task
Understanding Agreement and Disagreement in Listeners’ Perceived Emotion in Live Music Performance
Emotion perception of music is subjective and time dependent. Most computational music emotion recognition (MER) systems overlook time- and listener-dependent factors by averaging emotion judgments across listeners. In this work, we investigate the influence of music, setting (live vs lab vs online), and individual factors on music emotion perception over time. In an initial study, we explore changes in perceived music emotions among audience members during live classical music performances. Fifteen audience members used a mobile application to annotate time-varying emotion judgments based on the valence-arousal model. Inter-rater reliability analyses indicate that consistency in emotion judgments varies significantly across rehearsal segments, with systematic disagreements in certain segments. In a follow-up study, we examine listeners' reasons for their ratings in segments with high and low agreement. We relate these reasons to acoustic features and individual differences. Twenty-one listeners annotated perceived emotions while watching a recorded video of the live performance. They then reflected on their judgments and provided explanations retrospectively. Disagreements were attributed to listeners attending to different musical features or being uncertain about the expressed emotions. Emotion judgments were significantly associated with personality traits, gender, cultural background, and music preference. Thematic analysis of explanations revealed cognitive processes underlying music emotion perception, highlighting attributes less frequently discussed in MER studies, such as instrumentation, arrangement, musical structure, and multimodal factors related to performer expression. Exploratory models incorporating these semantic features and individual factors were developed to predict perceived music emotion over time. Regression analyses confirmed the significance of listener-informed semantic features as independent variables, with individual factors acting as moderators between loudness, pitch range, and arousal. In our final study, we analyzed the effects of individual differences on music emotion perception among 128 participants with diverse backgrounds. Participants annotated perceived emotions for 51 piano performances of different compositions from the Western canon, spanning various era. Linear mixed effects models revealed significant variations in valence and arousal ratings, as well as the frequency of emotion ratings, with regard to several individual factors: music sophistication, music preferences, personality traits, and mood states. Additionally, participants' ratings of arousal, valence, and emotional agreement were significantly associated to the historical time periods of the examined clips. This research highlights the complexity of music emotion perception, revealing it to be a dynamic, individual and context-dependent process. It paves the way for the development of more individually nuanced, time-based models in music psychology, opening up new avenues for personalised music emotion recognition and recommendation, music emotion-driven generation and therapeutic applications
Proceedings of the 19th Sound and Music Computing Conference
Proceedings of the 19th Sound and Music Computing Conference - June 5-12, 2022 - Saint-Étienne (France).
https://smc22.grame.f
Safe and Sound: Proceedings of the 27th Annual International Conference on Auditory Display
Complete proceedings of the 27th International Conference on Auditory Display (ICAD2022), June 24-27. Online virtual conference
Recommended from our members
Proceedings of the 1st International Conference on Live Coding
Open Access peer reviewed papers on live coding published at the 1st International Conference on Live Coding (ICLC) in Leeds
Detection and Evaluation of Clusters within Sequential Data
Motivated by theoretical advancements in dimensionality reduction techniques
we use a recent model, called Block Markov Chains, to conduct a practical study
of clustering in real-world sequential data. Clustering algorithms for Block
Markov Chains possess theoretical optimality guarantees and can be deployed in
sparse data regimes. Despite these favorable theoretical properties, a thorough
evaluation of these algorithms in realistic settings has been lacking.
We address this issue and investigate the suitability of these clustering
algorithms in exploratory data analysis of real-world sequential data. In
particular, our sequential data is derived from human DNA, written text, animal
movement data and financial markets. In order to evaluate the determined
clusters, and the associated Block Markov Chain model, we further develop a set
of evaluation tools. These tools include benchmarking, spectral noise analysis
and statistical model selection tools. An efficient implementation of the
clustering algorithm and the new evaluation tools is made available together
with this paper.
Practical challenges associated to real-world data are encountered and
discussed. It is ultimately found that the Block Markov Chain model assumption,
together with the tools developed here, can indeed produce meaningful insights
in exploratory data analyses despite the complexity and sparsity of real-world
data.Comment: 37 pages, 12 figure
CREATING A COHERENT SCORE: THE MUSIC OF SINGLE-PLAYER FANTASY COMPUTER ROLE-PLAYING GAMES
This thesis provides a comprehensive exploration into the music of the ludic genre (Hourigan, 2005) known as a Computer Role-Playing Game (CRPG) and its two main sub-divisions: Japanese and Western Role-Playing Games (JRPGs & WRPGs). It focuses on the narrative category known as genre fiction, concentrating on fantasy fiction (Turco, 1999) and seeks to address one overall question: How do fantasy CRPG composers incorporate the variety of musical material needed to create a coherent score across the JRPG and WRPG divide?
Seven main chapters form the thesis text. Chapter One provides an introduction to the thesis, detailing the research contributions in addition to outlining a variety of key terms that must be understood to continue with the rest of the text. A database accompanying this thesis showcases the vast range of CRPGs available; a literature review tackles relevant existing materials. Chapters Two and Three seek to provide the first canonical history of soundtracks used in CRPGs by dissecting typical narrative structures for games so as to provide context to their musical scores. Through analysis of existing game composer interviews, cultural influences are revealed. Chapters Four and Five mirror one another with detailed discussion respectively regarding JRPG and WRPG music including the influence that anime and Hollywood cinema have had upon them. In Chapter Six, the use of CRPG music outside of video games is explored, particularly the popularity of JRPG soundtracks in the concert hall. Chapter Seven concludes the thesis, summarising research contributions achieved and areas for future work. Throughout these chapters, the core task is to explain how the two primary sub-genres of CRPGs parted ways and why the music used to accompany these games differs so drastically