361 research outputs found

    Rhythm, Chord and Melody Generation for Lead Sheets using Recurrent Neural Networks

    Get PDF
    Music that is generated by recurrent neural networks often lacks a sense of direction and coherence. We therefore propose a two-stage LSTM-based model for lead sheet generation, in which the harmonic and rhythmic templates of the song are produced first, after which, in a second stage, a sequence of melody notes is generated conditioned on these templates. A subjective listening test shows that our approach outperforms the baselines and increases perceived musical coherence.Comment: 8 pages, 2 figures, 3 tables, 2 appendice

    Deep Learning Techniques for Music Generation -- A Survey

    Full text link
    This paper is a survey and an analysis of different ways of using deep learning (deep artificial neural networks) to generate musical content. We propose a methodology based on five dimensions for our analysis: Objective - What musical content is to be generated? Examples are: melody, polyphony, accompaniment or counterpoint. - For what destination and for what use? To be performed by a human(s) (in the case of a musical score), or by a machine (in the case of an audio file). Representation - What are the concepts to be manipulated? Examples are: waveform, spectrogram, note, chord, meter and beat. - What format is to be used? Examples are: MIDI, piano roll or text. - How will the representation be encoded? Examples are: scalar, one-hot or many-hot. Architecture - What type(s) of deep neural network is (are) to be used? Examples are: feedforward network, recurrent network, autoencoder or generative adversarial networks. Challenge - What are the limitations and open challenges? Examples are: variability, interactivity and creativity. Strategy - How do we model and control the process of generation? Examples are: single-step feedforward, iterative feedforward, sampling or input manipulation. For each dimension, we conduct a comparative analysis of various models and techniques and we propose some tentative multidimensional typology. This typology is bottom-up, based on the analysis of many existing deep-learning based systems for music generation selected from the relevant literature. These systems are described and are used to exemplify the various choices of objective, representation, architecture, challenge and strategy. The last section includes some discussion and some prospects.Comment: 209 pages. This paper is a simplified version of the book: J.-P. Briot, G. Hadjeres and F.-D. Pachet, Deep Learning Techniques for Music Generation, Computational Synthesis and Creative Systems, Springer, 201

    ํŠน์„ฑ ์กฐ์ ˆ์ด ๊ฐ€๋Šฅํ•œ ์‹ฌ์ธต ์‹ ๊ฒฝ๋ง ๊ธฐ๋ฐ˜์˜ ๊ตฌ์กฐ์  ๋ฉœ๋กœ๋”” ์ƒ์„ฑ

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ(๋ฐ•์‚ฌ) -- ์„œ์šธ๋Œ€ํ•™๊ต๋Œ€ํ•™์› : ๊ณต๊ณผ๋Œ€ํ•™ ์‚ฐ์—…๊ณตํ•™๊ณผ, 2021.8. ๋ฐ•์ข…ํ—Œ.This thesis aims to generate structural melodies using attribute controllable deep neural networks. The development of music-composing artificial intelligence can inspire professional composers and reduce the difficulty of creating and provide the public with the combination and utilization of music and various media content. For a melody generation model to function as a composer, it must control specific desired characteristics. The characteristics include quantifiable attributes, such as pitch level and rhythm density, and chords, which are essential elements that comprise modern popular (pop) music along with melodies. First, this thesis introduces a melody generation model that separately produces rhythm and pitch conditioned on chord progressions. The quantitative evaluation results demonstrate that the melodies produced by the proposed model have a distribution more similar to the dataset than other baseline models. Qualitative analysis reveals the presence of repetition and variation within the generated melodies. Using a subjective human listening test, we conclude that the model successfully produced new melodies that sound pleasant in rhythm and pitch. Four quantifiable attributes are considered: pitch level, pitch variety, rhythm density, and rhythm variety. We improve the previous study of training a variational autoencoder (VAE) and a discriminator in an adversarial manner to eliminate attribute information from the encoded latent variable. Rhythm and pitch VAEs are separately trained to control pitch-and rhythm-related attributes entirely independently. The experimental results indicate that though the ratio of the outputs belonging to the intended bin is not high, the model learned the relative order between the bins. Finally, a hierarchical song structure generation model is proposed. A sequence-to-sequence framework is adopted to capture the similar mood between two parts of the same song. The time axis is compressed by applying attention with different lengths of query and key to model the hierarchy of music. The concept of musical contrast is implemented by controlling attributes with relative bin information. The human evaluation results suggest the possibility of solving the problem of generating different structures of the same song with the sequence-to-sequence framework and reveal that the proposed model can create song structures with musical contrasts.๋ณธ ๋…ผ๋ฌธ์€ ํŠน์„ฑ ์กฐ์ ˆ์ด ๊ฐ€๋Šฅํ•œ ์‹ฌ์ธต ์‹ ๊ฒฝ๋ง์„ ํ™œ์šฉํ•˜์—ฌ ๊ตฌ์กฐ์  ๋ฉœ๋กœ๋””๋ฅผ ์ƒ์„ฑํ•˜๋Š” ๊ธฐ๋ฒ•์„ ์—ฐ๊ตฌํ•œ๋‹ค. ์ž‘๊ณก์„ ๋•๋Š” ์ธ๊ณต์ง€๋Šฅ์˜ ๊ฐœ๋ฐœ์€ ์ „๋ฌธ ์ž‘๊ณก๊ฐ€์—๊ฒŒ๋Š” ์ž‘๊ณก์˜ ์˜๊ฐ์„ ์ฃผ์–ด ์ฐฝ์ž‘์˜ ๊ณ ํ†ต์„ ๋œ ์ˆ˜ ์žˆ๊ณ , ์ผ๋ฐ˜ ๋Œ€์ค‘์—๊ฒŒ๋Š” ๊ฐ์ข… ๋ฏธ๋””์–ด ์ฝ˜ํ…์ธ ์˜ ์ข…๋ฅ˜์™€ ์–‘์ด ์ฆ๊ฐ€ํ•˜๋Š” ์ถ”์„ธ์—์„œ ํ•„์š”๋กœ ํ•˜๋Š” ์Œ์•…์„ ์ œ๊ณตํ•ด์คŒ์œผ๋กœ ์ธํ•ด ๋‹ค๋ฅธ ๋ฏธ๋””์–ด ๋งค์ฒด์™€์˜ ๊ฒฐํ•ฉ ๋ฐ ํ™œ์šฉ์„ ์ฆ๋Œ€ํ•  ์ˆ˜ ์žˆ๋‹ค. ์ž‘๊ณก ์ธ๊ณต์ง€๋Šฅ์˜ ์ˆ˜์ค€์ด ์ธ๊ฐ„ ์ž‘๊ณก๊ฐ€์˜ ์ˆ˜์ค€์— ๋‹ค๋‹ค๋ฅด๊ธฐ ์œ„ํ•ด์„œ๋Š” ์˜๋„์— ๋”ฐ๋ฅธ ํŠน์„ฑ ์กฐ์ ˆ ์ž‘๊ณก์ด ๊ฐ€๋Šฅํ•ด์•ผ ํ•œ๋‹ค. ์—ฌ๊ธฐ์„œ ๋งํ•˜๋Š” ํŠน์„ฑ์ด๋ž€ ์Œ์˜ ๋†’์ด๋‚˜ ๋ฆฌ๋“ฌ์˜ ๋ฐ€๋„์™€ ๊ฐ™์ด ์ˆ˜์น˜ํ™” ๊ฐ€๋Šฅํ•œ ํŠน์„ฑ ๋ฟ๋งŒ ์•„๋‹ˆ๋ผ, ๋ฉœ๋กœ๋””์™€ ํ•จ๊ฒŒ ์Œ์•…์˜ ๊ธฐ๋ณธ ๊ตฌ์„ฑ ์š”์†Œ๋ผ๊ณ  ๋ณผ ์ˆ˜ ์žˆ๋Š” ์ฝ”๋“œ ๋˜ํ•œ ํฌํ•จํ•œ๋‹ค. ๊ธฐ์กด์—๋„ ํŠน์„ฑ ์กฐ์ ˆ์ด ๊ฐ€๋Šฅํ•œ ์Œ์•… ์ƒ์„ฑ ๋ชจ๋ธ์ด ์ œ์•ˆ๋˜์—ˆ์œผ๋‚˜ ์ž‘๊ณก๊ฐ€๊ฐ€ ๊ณก ์ „์ฒด์˜ ๊ตฌ์„ฑ์„ ์—ผ๋‘์— ๋‘๊ณ  ๊ฐ ๋ถ€๋ถ„์„ ์ž‘๊ณกํ•˜๋“ฏ ๊ธด ๋ฒ”์œ„์˜ ๊ตฌ์กฐ์  ํŠน์ง• ๋ฐ ์Œ์•…์  ๋Œ€์กฐ๊ฐ€ ๊ณ ๋ ค๋œ ํŠน์„ฑ ์กฐ์ ˆ์— ๊ด€ํ•œ ์—ฐ๊ตฌ๋Š” ๋งŽ์ง€ ์•Š๋‹ค. ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ๋จผ์ € ์ฝ”๋“œ ์กฐ๊ฑด๋ถ€ ๋ฉœ๋กœ๋”” ์ƒ์„ฑ์— ์žˆ์–ด ๋ฆฌ๋“ฌ๊ณผ ์Œ๋†’์ด๋ฅผ ๊ฐ๊ฐ ๋”ฐ๋กœ ์ƒ์„ฑํ•˜๋Š” ๋ชจ๋ธ๊ณผ ๊ทธ ํ•™์Šต ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•œ๋‹ค. ์ •๋Ÿ‰์  ํ‰๊ฐ€์˜ ๊ฒฐ๊ณผ๋Š” ์ œ์•ˆํ•œ ๊ธฐ๋ฒ•์ด ๋‹ค๋ฅธ ๋น„๊ต ๋ชจ๋ธ๋“ค์— ๋น„ํ•ด ๊ทธ ์ƒ์„ฑ ๊ฒฐ๊ณผ๊ฐ€ ๋ฐ์ดํ„ฐ์…‹๊ณผ ๋” ์œ ์‚ฌํ•œ ๋ถ„ํฌ๋ฅผ ๋‚˜ํƒ€๋‚ด๊ณ  ์žˆ์Œ์„ ๋ณด์—ฌ์ค€๋‹ค. ์ •์„ฑ์  ํ‰๊ฐ€ ๊ฒฐ๊ณผ ์ƒ์„ฑ๋œ ์Œ์•…์—์„œ ์ ๋‹นํ•œ ๋ฐ˜๋ณต๊ณผ ๋ณ€ํ˜•์ด ํ™•์ธ๋˜๋ฉฐ, ์‚ฌ๋žŒ์ด ๋“ฃ๊ธฐ์— ์Œ์ •๊ณผ ๋ฐ•์ž ๋ชจ๋‘ ๋“ฃ๊ธฐ ์ข‹์€ ์ƒˆ๋กœ์šด ๋ฉœ๋กœ๋””๋ฅผ ์ƒ์„ฑํ•  ์ˆ˜ ์žˆ๋‹ค๋Š” ๊ฒฐ๋ก ์„ ๋„์ถœํ•œ๋‹ค. ์ˆ˜์น˜ํ™” ๊ฐ€๋Šฅํ•œ ํŠน์„ฑ์œผ๋กœ๋Š” ์Œ์˜ ๋†’์ด, ์Œ๋†’์ด ๋ณ€ํ™”, ๋ฆฌ๋“ฌ์˜ ๋ฐ€๋„, ๋ฆฌ๋“ฌ์˜ ๋ณต์žก๋„ ๋„ค ๊ฐ€์ง€ ํŠน์„ฑ์„ ์ •์˜ํ•œ๋‹ค. ํŠน์„ฑ ์กฐ์ ˆ์ด ๊ฐ€๋Šฅํ•œ ๋ณ€์ดํ˜• ์˜คํ† ์ธ์ฝ”๋”๋ฅผ ํ•™์Šตํ•˜๊ธฐ ์ž ์žฌ ๋ณ€์ˆ˜๋กœ๋ถ€ํ„ฐ ํŠน์„ฑ ์ •๋ณด๋ฅผ ์ œ์™ธํ•˜๋Š” ํŒ๋ณ„๊ธฐ๋ฅผ ์ ๋Œ€์ ์œผ๋กœ ํ•™์Šตํ•˜๋Š” ๊ธฐ์กด ์—ฐ๊ตฌ๋ฅผ ๋ฐœ์ „์‹œ์ผœ, ์Œ๋†’์ด์™€ ๋ฆฌ๋“ฌ ๊ด€๋ จ ํŠน์„ฑ์„ ์™„์ „ํžˆ ๋…๋ฆฝ์ ์œผ๋กœ ์กฐ์ ˆํ•  ์ˆ˜ ์žˆ๋„๋ก ๋‘ ๊ฐœ์˜ ๋ชจ๋ธ์„ ๋ถ„๋ฆฌํ•˜์—ฌ ํ•™์Šตํ•œ๋‹ค. ๊ฐ ๊ตฌ๊ฐ„๋งˆ๋‹ค ๋™์ผํ•œ ์–‘์˜ ๋ฐ์ดํ„ฐ๋ฅผ ํฌํ•จํ•˜๋„๋ก ํŠน์„ฑ ๊ฐ’์— ๋”ฐ๋ผ ๊ตฌ๊ฐ„์„ ๋‚˜๋ˆˆ ํ›„ ํ•™์Šตํ•œ ๊ฒฐ๊ณผ, ์ƒ์„ฑ ๊ฒฐ๊ณผ๊ฐ€ ์˜๋„ํ•œ ๊ตฌ๊ฐ„์— ์ •ํ™•ํžˆ ํฌํ•จ๋˜๋Š” ๋น„์œจ์€ ๋†’์ง€ ์•Š์ง€๋งŒ ์ƒ๊ด€๊ณ„์ˆ˜๋Š” ๋†’๊ฒŒ ๋‚˜ํƒ€๋‚œ๋‹ค. ๋งˆ์ง€๋ง‰์œผ๋กœ ์•ž์˜ ๋‘ ์—ฐ๊ตฌ์˜ ๊ฒฐ๊ณผ๋ฅผ ํ™œ์šฉํ•˜์—ฌ, ์Œ์•…์ ์œผ๋กœ ๋น„์Šทํ•˜๋ฉด์„œ๋„ ์„œ๋กœ ๋Œ€์กฐ๋ฅผ ์ด๋ฃจ๋Š” ๊ณก ๊ตฌ์กฐ ์ƒ์„ฑ ๊ธฐ๋ฒ•์„ ์ œ์•ˆํ•œ๋‹ค. ์‹œํ€€์Šค-ํˆฌ-์‹œํ€€์Šค ๋ฌธ์ œ ์ƒํ™ฉ์—์„œ ์ข‹์€ ์„ฑ๋Šฅ์„ ๋ณด์ด๋Š” ํŠธ๋žœ์Šคํฌ๋จธ ๋ชจ๋ธ์„ ๋ฒ ์ด์Šค๋ผ์ธ์œผ๋กœ ์‚ผ์•„ ์–ดํ…์…˜ ๋งค์ปค๋‹ˆ์ฆ˜์„ ์ ์šฉํ•œ๋‹ค. ์Œ์•…์˜ ๊ณ„์ธต์  ๊ตฌ์กฐ๋ฅผ ๋ฐ˜์˜ํ•˜๊ธฐ ์œ„ํ•ด ๊ณ„์ธต์  ์–ดํ…์…˜์„ ์ ์šฉํ•˜๋ฉฐ, ์ด ๋•Œ ์ƒ๋Œ€์  ์œ„์น˜ ์ž„๋ฒ ๋”ฉ์„ ํšจ์œจ์ ์œผ๋กœ ๊ณ„์‚ฐํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์ œ์‹œํ•œ๋‹ค. ์Œ์•…์  ๋Œ€์กฐ๋ฅผ ๊ตฌํ˜„ํ•˜๊ธฐ ์œ„ํ•ด ์•ž์„œ ์ •์˜ํ•œ ๋„ค ๊ฐ€์ง€ ํŠน์„ฑ ์ •๋ณด๋ฅผ ์กฐ์ ˆํ•˜๋„๋ก ์ ๋Œ€์  ํ•™์Šต์„ ์ง„ํ–‰ํ•˜๊ณ , ์ด ๋•Œ ํŠน์„ฑ ์ •๋ณด๋Š” ์ •ํ™•ํ•œ ๊ตฌ๊ฐ„ ์ •๋ณด๊ฐ€ ์•„๋‹Œ ์ƒ๋Œ€์  ๊ตฌ๊ฐ„ ๋น„๊ต ์ •๋ณด๋ฅผ ์‚ฌ์šฉํ•œ๋‹ค. ์ฒญ์ทจ ์‹คํ—˜ ๊ฒฐ๊ณผ ๊ฐ™์€ ๊ณก์˜ ๋‹ค๋ฅธ ๊ตฌ์กฐ๋ฅผ ์ƒ์„ฑํ•˜๋Š” ๋ฌธ์ œ๋ฅผ ์‹œํ€€์Šค-ํˆฌ-์‹œํ€€์Šค ๋ฐฉ๋ฒ•์œผ๋กœ ํ•ด๊ฒฐํ•  ์ˆ˜ ์žˆ๋Š” ๊ฐ€๋Šฅ์„ฑ์„ ์ œ์‹œํ•˜๊ณ , ์ œ์•ˆ๋œ ๊ธฐ๋ฒ•์„ ํ†ตํ•ด ์Œ์•…์  ๋Œ€์กฐ๊ฐ€ ๋‚˜ํƒ€๋‚˜๋Š” ๊ณก ๊ตฌ์กฐ ์ƒ์„ฑ์ด ๊ฐ€๋Šฅํ•˜๋‹ค๋Š” ์ ์„ ๋ณด์—ฌ์ค€๋‹ค.Chapter 1 Introduction 1 1.1 Background and Motivation 1 1.2 Objectives 4 1.3 Thesis Outline 6 Chapter 2 Literature Review 7 2.1 Chord-conditioned Melody Generation 7 2.2 Attention Mechanism and Transformer 10 2.2.1 Attention Mechanism 10 2.2.2 Transformer 10 2.2.3 Relative Positional Embedding 12 2.2.4 Funnel-Transformer 14 2.3 Attribute Controllable Music Generation 16 Chapter 3 Problem Definition 17 3.1 Data Representation 17 3.1.1 Datasets 18 3.1.2 Preprocessing 19 3.2 Notation and Formulas 21 3.2.1 Chord-conditioned Melody Generation 21 3.2.2 Attribute Controllable Melody Generation 22 3.2.3 Song Structure Generation 22 3.2.4 Notation 22 Chapter 4 Chord-conditioned Melody Generation 24 4.1 Methodology 24 4.1.1 Model Architecture 24 4.1.2 Relative Positional Embedding 27 4.2 Training and Generation 29 4.2.1 Two-phase Training 30 4.2.2 Pitch-varied Rhythm Data 30 4.2.3 Generating Melodies 31 4.3 Experiments 32 4.3.1 Experiment Settings 32 4.3.2 Baseline Models 33 4.4 Evaluation Results 34 4.4.1 Quantitative Evaluation 34 4.4.2 Qualitative Evaluation 42 Chapter 5 Attribute Controllable Melody Generation 48 5.1 Attribute Definition 48 5.1.1 Pitch-Related Attributes 48 5.1.2 Rhythm-Related Attributes 49 5.2 Model Architecture 51 5.3 Experiments 54 5.3.1 Data Preprocessing 54 5.3.2 Training 56 5.4 Results 58 5.4.1 Quantitative Results 58 5.4.2 Output Examples 60 Chapter 6 Hierarchical Song Structure Generation 68 6.1 Baseline 69 6.2 Proposed Model 70 6.2.1 Relative Hierarchical Attention 70 6.2.2 Model Architecture 78 6.3 Experiments 84 6.3.1 Training and Generation 84 6.3.2 Human Evaluation 85 6.4 Evaluation Results 86 6.4.1 Control Success Ratio 86 6.4.2 Human Perception Ratio 86 6.4.3 Generated Samples 88 Chapter 7 Conclusion 104 7.1 Summary and Contributions 104 7.2 Limitations and Future Research 107 Appendices 108 Chapter A MGEval Results Between the Music of Different Genres 109 Chapter B MGEval Results of CMT and Baseline Models 116 Chapter C Samples Generated by CMT 126 Bibliography 129 ๊ตญ๋ฌธ์ดˆ๋ก 144๋ฐ•

    Toward Interactive Music Generation: A Position Paper

    Get PDF
    Music generation using deep learning has received considerable attention in recent years. Researchers have developed various generative models capable of imitating musical conventions, comprehending the musical corpora, and generating new samples based on the learning outcome. Although the samples generated by these models are persuasive, they often lack musical structure and creativity. For instance, a vanilla end-to-end approach, which deals with all levels of music representation at once, does not offer human-level control and interaction during the learning process, leading to constrained results. Indeed, music creation is a recurrent process that follows some principles by a musician, where various musical features are reused or adapted. On the other hand, a musical piece adheres to a musical style, breaking down into precise concepts of timbre style, performance style, composition style, and the coherency between these aspects. Here, we study and analyze the current advances in music generation using deep learning models through different criteria. We discuss the shortcomings and limitations of these models regarding interactivity and adaptability. Finally, we draw the potential future research direction addressing multi-agent systems and reinforcement learning algorithms to alleviate these shortcomings and limitations

    NBP 2.0: Updated Next Bar Predictor, an Improved Algorithmic Music Generator

    Get PDF
    Deep neural network advancements have enabled machines to produce melodies emulating human-composed music. However, the implementation of such machines is costly in terms of resources. In this paper, we present NBP 2.0, a refinement of the previous model next bar predictor (NBP) with two notable improvements: first, transforming each training instance to anchor all the notes to its musical scale, and second, changing the model architecture itself. NBP 2.0 maintained its straightforward and lightweight implementation, which is an advantage over the baseline models. Improvements were assessed using quantitative and qualitative metrics and, based on the results, the improvements from these changes made are notable

    Studie van optische interconnectienetwerken op basis van multimode polymeergolfgeleiders

    Get PDF
    • โ€ฆ
    corecore