1,287 research outputs found

    RL-Duet: Online Music Accompaniment Generation Using Deep Reinforcement Learning

    Full text link
    This paper presents a deep reinforcement learning algorithm for online accompaniment generation, with potential for real-time interactive human-machine duet improvisation. Different from offline music generation and harmonization, online music accompaniment requires the algorithm to respond to human input and generate the machine counterpart in a sequential order. We cast this as a reinforcement learning problem, where the generation agent learns a policy to generate a musical note (action) based on previously generated context (state). The key of this algorithm is the well-functioning reward model. Instead of defining it using music composition rules, we learn this model from monophonic and polyphonic training data. This model considers the compatibility of the machine-generated note with both the machine-generated context and the human-generated context. Experiments show that this algorithm is able to respond to the human part and generate a melodic, harmonic and diverse machine part. Subjective evaluations on preferences show that the proposed algorithm generates music pieces of higher quality than the baseline method

    Tonal music theory: A psychoacoustic explanation?

    Get PDF
    From the seventeenth century to the present day, tonal harmonic music has had a number of invariant properties such as the use of specific chord progressions (cadences) to induce a sense of closure, the asymmetrical privileging of certain progressions, and the privileging of the major and minor scales. The most widely accepted explanation has been that this is due to a process of enculturation: frequently occurring musical patterns are learned by listeners, some of whom become composers and replicate the same patterns, which go on to influence the next โ€œgenerationโ€ of composers, and so on. In this paper, however, I present a possible psychoacoustic explanation for some important regularities of tonal-harmonic music. The core of the model is two different measures of pitch-based distance between chords. The first is voice-leading distance; the second is spectral pitch distanceโ€”a measure of the distance between the partials in one chord compared to those in another chord. I propose that when a pair of triads has a higher spectral distance than another pair of triads that is voice-leading-close, the former pair is heard as an alteration of the latter pair, and seeks resolution. I explore the extent to which this model can predict the familiar tonal cadences described in music theory (including those containing tritone substitutions), and the asymmetries that are so characteristic of tonal harmony. I also show how it may be able to shed light upon the privileged status of the major and minor scales (over the modes)

    ์Œ์•…์  ์š”์†Œ์— ๋Œ€ํ•œ ์กฐ๊ฑด๋ถ€ ์ƒ์„ฑ์˜ ๊ฐœ์„ ์— ๊ด€ํ•œ ์—ฐ๊ตฌ: ํ™”์Œ๊ณผ ํ‘œํ˜„์„ ์ค‘์‹ฌ์œผ๋กœ

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ(๋ฐ•์‚ฌ) -- ์„œ์šธ๋Œ€ํ•™๊ต๋Œ€ํ•™์› : ์œตํ•ฉ๊ณผํ•™๊ธฐ์ˆ ๋Œ€ํ•™์› ์œตํ•ฉ๊ณผํ•™๋ถ€(๋””์ง€ํ„ธ์ •๋ณด์œตํ•ฉ์ „๊ณต), 2023. 2. ์ด๊ต๊ตฌ.Conditional generation of musical components (CGMC) creates a part of music based on partial musical components such as melody or chord. CGMC is beneficial for discovering complex relationships among musical attributes. It can also assist non-experts who face difficulties in making music. However, recent studies for CGMC are still facing two challenges in terms of generation quality and model controllability. First, the structure of the generated music is not robust. Second, only limited ranges of musical factors and tasks have been examined as targets for flexible control of generation. In this thesis, we aim to mitigate these two challenges to improve the CGMC systems. For musical structure, we focus on intuitive modeling of musical hierarchy to help the model explicitly learn musically meaningful dependency. To this end, we utilize alignment paths between the raw music data and the musical units such as notes or chords. For musical creativity, we facilitate smooth control of novel musical attributes using latent representations. We attempt to achieve disentangled representations of the intended factors by regularizing them with data-driven inductive bias. This thesis verifies the proposed approaches particularly in two representative CGMC tasks, melody harmonization and expressive performance rendering. A variety of experimental results show the possibility of the proposed approaches to expand musical creativity under stable generation quality.์Œ์•…์  ์š”์†Œ๋ฅผ ์กฐ๊ฑด๋ถ€ ์ƒ์„ฑํ•˜๋Š” ๋ถ„์•ผ์ธ CGMC๋Š” ๋ฉœ๋กœ๋””๋‚˜ ํ™”์Œ๊ณผ ๊ฐ™์€ ์Œ์•…์˜ ์ผ๋ถ€๋ถ„์„ ๊ธฐ๋ฐ˜์œผ๋กœ ๋‚˜๋จธ์ง€ ๋ถ€๋ถ„์„ ์ƒ์„ฑํ•˜๋Š” ๊ฒƒ์„ ๋ชฉํ‘œ๋กœ ํ•œ๋‹ค. ์ด ๋ถ„์•ผ๋Š” ์Œ์•…์  ์š”์†Œ ๊ฐ„ ๋ณต์žกํ•œ ๊ด€๊ณ„๋ฅผ ํƒ๊ตฌํ•˜๋Š” ๋ฐ ์šฉ์ดํ•˜๊ณ , ์Œ์•…์„ ๋งŒ๋“œ๋Š” ๋ฐ ์–ด๋ ค์›€์„ ๊ฒช๋Š” ๋น„์ „๋ฌธ๊ฐ€๋“ค์„ ๋„์šธ ์ˆ˜ ์žˆ๋‹ค. ์ตœ๊ทผ ์—ฐ๊ตฌ๋“ค์€ ๋”ฅ๋Ÿฌ๋‹ ๊ธฐ์ˆ ์„ ํ™œ์šฉํ•˜์—ฌ CGMC ์‹œ์Šคํ…œ์˜ ์„ฑ๋Šฅ์„ ๋†’์—ฌ์™”๋‹ค. ํ•˜์ง€๋งŒ, ์ด๋Ÿฌํ•œ ์—ฐ๊ตฌ๋“ค์—๋Š” ์•„์ง ์ƒ์„ฑ ํ’ˆ์งˆ๊ณผ ์ œ์–ด๊ฐ€๋Šฅ์„ฑ ์ธก๋ฉด์—์„œ ๋‘ ๊ฐ€์ง€์˜ ํ•œ๊ณ„์ ์ด ์žˆ๋‹ค. ๋จผ์ €, ์ƒ์„ฑ๋œ ์Œ์•…์˜ ์Œ์•…์  ๊ตฌ์กฐ๊ฐ€ ๋ช…ํ™•ํ•˜์ง€ ์•Š๋‹ค. ๋˜ํ•œ, ์•„์ง ์ข์€ ๋ฒ”์œ„์˜ ์Œ์•…์  ์š”์†Œ ๋ฐ ํ…Œ์Šคํฌ๋งŒ์ด ์œ ์—ฐํ•œ ์ œ์–ด์˜ ๋Œ€์ƒ์œผ๋กœ์„œ ํƒ๊ตฌ๋˜์—ˆ๋‹ค. ์ด์— ๋ณธ ํ•™์œ„๋…ผ๋ฌธ์—์„œ๋Š” CGMC์˜ ๊ฐœ์„ ์„ ์œ„ํ•ด ์œ„ ๋‘ ๊ฐ€์ง€์˜ ํ•œ๊ณ„์ ์„ ํ•ด๊ฒฐํ•˜๊ณ ์ž ํ•œ๋‹ค. ์ฒซ ๋ฒˆ์งธ๋กœ, ์Œ์•… ๊ตฌ์กฐ๋ฅผ ์ด๋ฃจ๋Š” ์Œ์•…์  ์œ„๊ณ„๋ฅผ ์ง๊ด€์ ์œผ๋กœ ๋ชจ๋ธ๋งํ•˜๋Š” ๋ฐ ์ง‘์ค‘ํ•˜๊ณ ์ž ํ•œ๋‹ค. ๋ณธ๋ž˜ ๋ฐ์ดํ„ฐ์™€ ์Œ, ํ™”์Œ๊ณผ ๊ฐ™์€ ์Œ์•…์  ๋‹จ์œ„ ๊ฐ„ ์ •๋ ฌ ๊ฒฝ๋กœ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋ชจ๋ธ์ด ์Œ์•…์ ์œผ๋กœ ์˜๋ฏธ์žˆ๋Š” ์ข…์†์„ฑ์„ ๋ช…ํ™•ํ•˜๊ฒŒ ๋ฐฐ์šธ ์ˆ˜ ์žˆ๋„๋ก ํ•œ๋‹ค. ๋‘ ๋ฒˆ์งธ๋กœ, ์ž ์žฌ ํ‘œ์ƒ์„ ํ™œ์šฉํ•˜์—ฌ ์ƒˆ๋กœ์šด ์Œ์•…์  ์š”์†Œ๋“ค์„ ์œ ์—ฐํ•˜๊ฒŒ ์ œ์–ดํ•˜๊ณ ์ž ํ•œ๋‹ค. ํŠนํžˆ ์ž ์žฌ ํ‘œ์ƒ์ด ์˜๋„๋œ ์š”์†Œ์— ๋Œ€ํ•ด ํ’€๋ฆฌ๋„๋ก ํ›ˆ๋ จํ•˜๊ธฐ ์œ„ํ•ด์„œ ๋น„์ง€๋„ ํ˜น์€ ์ž๊ฐ€์ง€๋„ ํ•™์Šต ํ”„๋ ˆ์ž„์›Œํฌ์„ ์‚ฌ์šฉํ•˜์—ฌ ์ž ์žฌ ํ‘œ์ƒ์„ ์ œํ•œํ•˜๋„๋ก ํ•œ๋‹ค. ๋ณธ ํ•™์œ„๋…ผ๋ฌธ์—์„œ๋Š” CGMC ๋ถ„์•ผ์˜ ๋Œ€ํ‘œ์ ์ธ ๋‘ ํ…Œ์Šคํฌ์ธ ๋ฉœ๋กœ๋”” ํ•˜๋ชจ๋‚˜์ด์ œ์ด์…˜ ๋ฐ ํ‘œํ˜„์  ์—ฐ์ฃผ ๋ Œ๋”๋ง ํ…Œ์Šคํฌ์— ๋Œ€ํ•ด ์œ„์˜ ๋‘ ๊ฐ€์ง€ ๋ฐฉ๋ฒ•๋ก ์„ ๊ฒ€์ฆํ•œ๋‹ค. ๋‹ค์–‘ํ•œ ์‹คํ—˜์  ๊ฒฐ๊ณผ๋“ค์„ ํ†ตํ•ด ์ œ์•ˆํ•œ ๋ฐฉ๋ฒ•๋ก ์ด CGMC ์‹œ์Šคํ…œ์˜ ์Œ์•…์  ์ฐฝ์˜์„ฑ์„ ์•ˆ์ •์ ์ธ ์ƒ์„ฑ ํ’ˆ์งˆ๋กœ ํ™•์žฅํ•  ์ˆ˜ ์žˆ๋‹ค๋Š” ๊ฐ€๋Šฅ์„ฑ์„ ์‹œ์‚ฌํ•œ๋‹ค.Chapter 1 Introduction 1 1.1 Motivation 5 1.2 Definitions 8 1.3 Tasks of Interest 10 1.3.1 Generation Quality 10 1.3.2 Controllability 12 1.4 Approaches 13 1.4.1 Modeling Musical Hierarchy 14 1.4.2 Regularizing Latent Representations 16 1.4.3 Target Tasks 18 1.5 Outline of the Thesis 19 Chapter 2 Background 22 2.1 Music Generation Tasks 23 2.1.1 Melody Harmonization 23 2.1.2 Expressive Performance Rendering 25 2.2 Structure-enhanced Music Generation 27 2.2.1 Hierarchical Music Generation 27 2.2.2 Transformer-based Music Generation 28 2.3 Disentanglement Learning 29 2.3.1 Unsupervised Approaches 30 2.3.2 Supervised Approaches 30 2.3.3 Self-supervised Approaches 31 2.4 Controllable Music Generation 32 2.4.1 Score Generation 32 2.4.2 Performance Rendering 33 2.5 Summary 34 Chapter 3 Translating Melody to Chord: Structured and Flexible Harmonization of Melody with Transformer 36 3.1 Introduction 36 3.2 Proposed Methods 41 3.2.1 Standard Transformer Model (STHarm) 41 3.2.2 Variational Transformer Model (VTHarm) 44 3.2.3 Regularized Variational Transformer Model (rVTHarm) 46 3.2.4 Training Objectives 47 3.3 Experimental Settings 48 3.3.1 Datasets 49 3.3.2 Comparative Methods 50 3.3.3 Training 50 3.3.4 Metrics 51 3.4 Evaluation 56 3.4.1 Chord Coherence and Diversity 57 3.4.2 Harmonic Similarity to Human 59 3.4.3 Controlling Chord Complexity 60 3.4.4 Subjective Evaluation 62 3.4.5 Qualitative Results 67 3.4.6 Ablation Study 73 3.5 Conclusion and Future Work 74 Chapter 4 Sketching the Expression: Flexible Rendering of Expressive Piano Performance with Self-supervised Learning 76 4.1 Introduction 76 4.2 Proposed Methods 79 4.2.1 Data Representation 79 4.2.2 Modeling Musical Hierarchy 80 4.2.3 Overall Network Architecture 81 4.2.4 Regularizing the Latent Variables 84 4.2.5 Overall Objective 86 4.3 Experimental Settings 87 4.3.1 Dataset and Implementation 87 4.3.2 Comparative Methods 88 4.4 Evaluation 88 4.4.1 Generation Quality 89 4.4.2 Disentangling Latent Representations 90 4.4.3 Controllability of Expressive Attributes 91 4.4.4 KL Divergence 93 4.4.5 Ablation Study 94 4.4.6 Subjective Evaluation 95 4.4.7 Qualitative Examples 97 4.4.8 Extent of Control 100 4.5 Conclusion 102 Chapter 5 Conclusion and Future Work 103 5.1 Conclusion 103 5.2 Future Work 106 5.2.1 Deeper Investigation of Controllable Factors 106 5.2.2 More Analysis of Qualitative Evaluation Results 107 5.2.3 Improving Diversity and Scale of Dataset 108 Bibliography 109 ์ดˆ ๋ก 137๋ฐ•

    Deep Learning Techniques for Music Generation -- A Survey

    Full text link
    This paper is a survey and an analysis of different ways of using deep learning (deep artificial neural networks) to generate musical content. We propose a methodology based on five dimensions for our analysis: Objective - What musical content is to be generated? Examples are: melody, polyphony, accompaniment or counterpoint. - For what destination and for what use? To be performed by a human(s) (in the case of a musical score), or by a machine (in the case of an audio file). Representation - What are the concepts to be manipulated? Examples are: waveform, spectrogram, note, chord, meter and beat. - What format is to be used? Examples are: MIDI, piano roll or text. - How will the representation be encoded? Examples are: scalar, one-hot or many-hot. Architecture - What type(s) of deep neural network is (are) to be used? Examples are: feedforward network, recurrent network, autoencoder or generative adversarial networks. Challenge - What are the limitations and open challenges? Examples are: variability, interactivity and creativity. Strategy - How do we model and control the process of generation? Examples are: single-step feedforward, iterative feedforward, sampling or input manipulation. For each dimension, we conduct a comparative analysis of various models and techniques and we propose some tentative multidimensional typology. This typology is bottom-up, based on the analysis of many existing deep-learning based systems for music generation selected from the relevant literature. These systems are described and are used to exemplify the various choices of objective, representation, architecture, challenge and strategy. The last section includes some discussion and some prospects.Comment: 209 pages. This paper is a simplified version of the book: J.-P. Briot, G. Hadjeres and F.-D. Pachet, Deep Learning Techniques for Music Generation, Computational Synthesis and Creative Systems, Springer, 201

    Polyphonic music generation using neural networks

    Get PDF
    In this project, the application of generative models for polyphonic music generation is investigated. Polyphonic music generation falls into the field of algorithmic composition, which is a field that aims to develop models to automate, partially or completely, the composition of musical pieces. This process has many challenges both in terms of how to achieve the generation of musical pieces that are enjoyable and also how to perform a robust evaluation of the model to guide improvements. An extensive survey of the development of the field and the state-of-the-art is carried out. From this, two distinct generative models were chosen to apply to the problem of polyphonic music generation. The models chosen were the Restricted Boltzmann Machine and the Generative Adversarial Network. In particular, for the GAN, two architectures were used, the Deep Convolutional GAN and the Wasserstein GAN with gradient penalty. To train these models, a dataset containing over 9000 samples of classical musical pieces was used. Using a piano-roll representation of the musical pieces, these were converted into binary 2D arrays in which the vertical dimensions related to the pitch while the horizontal dimension represented the time, and note events were represented by active units. The first 16 seconds of each piece was extracted and used for training the model after applying data cleansing and preprocessing. Using implementations of these models, samples of musical pieces were generated. Based on listening tests performed by participants, the Deep Convolutional GAN achieved the best scores, with its compositions being ranked on average 4.80 on a scale from 1-5 of how enjoyable the pieces were. To perform a more objective evaluation, different musical features that describe rhythmic and melodic characteristics were extracted from the generated pieces and compared against the training dataset. These features included the implementation of the Krumhansl-Schmuckler algorithm for musical key detection and the average information rate used as an estimator of long-term musical structure. Within each set of the generated musical samples, the pairwise cross-validation using the Euclidean distance between each feature was performed. This was also performed between each set of generated samples and the features extracted from the training data, resulting in two sets of distances, the intra-set and inter-set distances. Using kernel density estimation, the probability density functions of these are obtained. Finally, the Kullback-Liebler divergence between the intra-set and inter-set distance of each feature for each generative model was calculated. The lower divergence indicates that the distributions are more similar. On average, the Restricted Boltzmann Machine obtained the lowest Kullback-Liebler divergences

    Natural Language Processing Methods for Symbolic Music Generation and Information Retrieval: a Survey

    Full text link
    Several adaptations of Transformers models have been developed in various domains since its breakthrough in Natural Language Processing (NLP). This trend has spread into the field of Music Information Retrieval (MIR), including studies processing music data. However, the practice of leveraging NLP tools for symbolic music data is not novel in MIR. Music has been frequently compared to language, as they share several similarities, including sequential representations of text and music. These analogies are also reflected through similar tasks in MIR and NLP. This survey reviews NLP methods applied to symbolic music generation and information retrieval studies following two axes. We first propose an overview of representations of symbolic music adapted from natural language sequential representations. Such representations are designed by considering the specificities of symbolic music. These representations are then processed by models. Such models, possibly originally developed for text and adapted for symbolic music, are trained on various tasks. We describe these models, in particular deep learning models, through different prisms, highlighting music-specialized mechanisms. We finally present a discussion surrounding the effective use of NLP tools for symbolic music data. This includes technical issues regarding NLP methods and fundamental differences between text and music, which may open several doors for further research into more effectively adapting NLP tools to symbolic MIR.Comment: 36 pages, 5 figures, 4 table

    Creative Chord Sequence Generation for Electronic Dance Music

    Get PDF
    This paper describes the theory and implementation of a digital audio workstation plug-in for chord sequence generation. The plug-in is intended to encourage and inspire a composer of electronic dance music to explore loops through chord sequence pattern definition, position locking and generation into unlocked positions. A basic cyclic first-order statistical model is extended with latent diatonicity variables which permits sequences to depart from a specified key. Degrees of diatonicity of generated sequences can be explored and parameters for voicing the sequences can be manipulated. Feedback on the concepts, interface, and usability was given by a small focus group of musicians and music producers.This research was supported by the project I2C8 (Inspiring to Create) which is funded by the European Union's Horizon 2020 Research and Innovation programme under grant agreement number 754401
    • โ€ฆ
    corecore