628 research outputs found

    음악적 요소에 대한 조건부 생성의 개선에 관한 연구: 화음과 표현을 중심으로

    Get PDF
    학위논문(박사) -- 서울대학교대학원 : 융합과학기술대학원 융합과학부(디지털정보융합전공), 2023. 2. 이교구.Conditional generation of musical components (CGMC) creates a part of music based on partial musical components such as melody or chord. CGMC is beneficial for discovering complex relationships among musical attributes. It can also assist non-experts who face difficulties in making music. However, recent studies for CGMC are still facing two challenges in terms of generation quality and model controllability. First, the structure of the generated music is not robust. Second, only limited ranges of musical factors and tasks have been examined as targets for flexible control of generation. In this thesis, we aim to mitigate these two challenges to improve the CGMC systems. For musical structure, we focus on intuitive modeling of musical hierarchy to help the model explicitly learn musically meaningful dependency. To this end, we utilize alignment paths between the raw music data and the musical units such as notes or chords. For musical creativity, we facilitate smooth control of novel musical attributes using latent representations. We attempt to achieve disentangled representations of the intended factors by regularizing them with data-driven inductive bias. This thesis verifies the proposed approaches particularly in two representative CGMC tasks, melody harmonization and expressive performance rendering. A variety of experimental results show the possibility of the proposed approaches to expand musical creativity under stable generation quality.음악적 요소를 조건부 생성하는 분야인 CGMC는 멜로디나 화음과 같은 음악의 일부분을 기반으로 나머지 부분을 생성하는 것을 목표로 한다. 이 분야는 음악적 요소 간 복잡한 관계를 탐구하는 데 용이하고, 음악을 만드는 데 어려움을 겪는 비전문가들을 도울 수 있다. 최근 연구들은 딥러닝 기술을 활용하여 CGMC 시스템의 성능을 높여왔다. 하지만, 이러한 연구들에는 아직 생성 품질과 제어가능성 측면에서 두 가지의 한계점이 있다. 먼저, 생성된 음악의 음악적 구조가 명확하지 않다. 또한, 아직 좁은 범위의 음악적 요소 및 테스크만이 유연한 제어의 대상으로서 탐구되었다. 이에 본 학위논문에서는 CGMC의 개선을 위해 위 두 가지의 한계점을 해결하고자 한다. 첫 번째로, 음악 구조를 이루는 음악적 위계를 직관적으로 모델링하는 데 집중하고자 한다. 본래 데이터와 음, 화음과 같은 음악적 단위 간 정렬 경로를 사용하여 모델이 음악적으로 의미있는 종속성을 명확하게 배울 수 있도록 한다. 두 번째로, 잠재 표상을 활용하여 새로운 음악적 요소들을 유연하게 제어하고자 한다. 특히 잠재 표상이 의도된 요소에 대해 풀리도록 훈련하기 위해서 비지도 혹은 자가지도 학습 프레임워크을 사용하여 잠재 표상을 제한하도록 한다. 본 학위논문에서는 CGMC 분야의 대표적인 두 테스크인 멜로디 하모나이제이션 및 표현적 연주 렌더링 테스크에 대해 위의 두 가지 방법론을 검증한다. 다양한 실험적 결과들을 통해 제안한 방법론이 CGMC 시스템의 음악적 창의성을 안정적인 생성 품질로 확장할 수 있다는 가능성을 시사한다.Chapter 1 Introduction 1 1.1 Motivation 5 1.2 Definitions 8 1.3 Tasks of Interest 10 1.3.1 Generation Quality 10 1.3.2 Controllability 12 1.4 Approaches 13 1.4.1 Modeling Musical Hierarchy 14 1.4.2 Regularizing Latent Representations 16 1.4.3 Target Tasks 18 1.5 Outline of the Thesis 19 Chapter 2 Background 22 2.1 Music Generation Tasks 23 2.1.1 Melody Harmonization 23 2.1.2 Expressive Performance Rendering 25 2.2 Structure-enhanced Music Generation 27 2.2.1 Hierarchical Music Generation 27 2.2.2 Transformer-based Music Generation 28 2.3 Disentanglement Learning 29 2.3.1 Unsupervised Approaches 30 2.3.2 Supervised Approaches 30 2.3.3 Self-supervised Approaches 31 2.4 Controllable Music Generation 32 2.4.1 Score Generation 32 2.4.2 Performance Rendering 33 2.5 Summary 34 Chapter 3 Translating Melody to Chord: Structured and Flexible Harmonization of Melody with Transformer 36 3.1 Introduction 36 3.2 Proposed Methods 41 3.2.1 Standard Transformer Model (STHarm) 41 3.2.2 Variational Transformer Model (VTHarm) 44 3.2.3 Regularized Variational Transformer Model (rVTHarm) 46 3.2.4 Training Objectives 47 3.3 Experimental Settings 48 3.3.1 Datasets 49 3.3.2 Comparative Methods 50 3.3.3 Training 50 3.3.4 Metrics 51 3.4 Evaluation 56 3.4.1 Chord Coherence and Diversity 57 3.4.2 Harmonic Similarity to Human 59 3.4.3 Controlling Chord Complexity 60 3.4.4 Subjective Evaluation 62 3.4.5 Qualitative Results 67 3.4.6 Ablation Study 73 3.5 Conclusion and Future Work 74 Chapter 4 Sketching the Expression: Flexible Rendering of Expressive Piano Performance with Self-supervised Learning 76 4.1 Introduction 76 4.2 Proposed Methods 79 4.2.1 Data Representation 79 4.2.2 Modeling Musical Hierarchy 80 4.2.3 Overall Network Architecture 81 4.2.4 Regularizing the Latent Variables 84 4.2.5 Overall Objective 86 4.3 Experimental Settings 87 4.3.1 Dataset and Implementation 87 4.3.2 Comparative Methods 88 4.4 Evaluation 88 4.4.1 Generation Quality 89 4.4.2 Disentangling Latent Representations 90 4.4.3 Controllability of Expressive Attributes 91 4.4.4 KL Divergence 93 4.4.5 Ablation Study 94 4.4.6 Subjective Evaluation 95 4.4.7 Qualitative Examples 97 4.4.8 Extent of Control 100 4.5 Conclusion 102 Chapter 5 Conclusion and Future Work 103 5.1 Conclusion 103 5.2 Future Work 106 5.2.1 Deeper Investigation of Controllable Factors 106 5.2.2 More Analysis of Qualitative Evaluation Results 107 5.2.3 Improving Diversity and Scale of Dataset 108 Bibliography 109 초 록 137박

    A Functional Taxonomy of Music Generation Systems

    Get PDF
    Digital advances have transformed the face of automatic music generation since its beginnings at the dawn of computing. Despite the many breakthroughs, issues such as the musical tasks targeted by different machines and the degree to which they succeed remain open questions. We present a functional taxonomy for music generation systems with reference to existing systems. The taxonomy organizes systems according to the purposes for which they were designed. It also reveals the inter-relatedness amongst the systems. This design-centered approach contrasts with predominant methods-based surveys and facilitates the identification of grand challenges to set the stage for new breakthroughs.Comment: survey, music generation, taxonomy, functional survey, survey, automatic composition, algorithmic compositio

    Probabilistic models for music

    Get PDF
    This thesis proposes to analyse symbolic musical data under a statistical viewpoint, using state-of-the-art machine learning techniques. Our main argument is to show that it is possible to design generative models that are able to predict and to generate music given arbitrary contexts in a genre similar to a training corpus, using a minimal amount of data. For instance, a carefully designed generative model could guess what would be a good accompaniment for a given melody. Conversely, we propose generative models in this thesis that can be sampled to generate realistic melodies given harmonic context. Most computer music research has been devoted so far to the direct modeling of audio data. However, most of the music models today do not consider the musical structure at all. We argue that reliable symbolic music models such a the ones presented in this thesis could dramatically improve the performance of audio algorithms applied in more general contexts. Hence, our main contributions in this thesis are three-fold: We have shown empirically that long term dependencies are present in music data and we provide quantitative measures of such dependencies; We have shown empirically that using domain knowledge allows to capture long term dependencies in music signal better than with standard statistical models for temporal data. We describe many probabilistic models aimed to capture various aspects of symbolic polyphonic music. Such models can be used for music prediction. Moreover, these models can be sampled to generate realistic music sequences; We designed various representations for music that could be used as observations by the proposed probabilistic models

    iJazzARTIST: Intelligent Jazz Accompanist for Real-Time human-computer Interactive muSic improvisaTion

    Get PDF
    Κάποια από τα κυριότερα χαρακτηριστικά του αυτοσχεδιασμού σε πρότυπα τζαζ εκφράζονται μέσα από τη μουσική συνοδεία. Η συνεργασία μεταξύ ανθρώπων και τεχνητών συστημάτων για την επίτευξη αυτοσχεδιασμού σε πραγματικό χρόνο, υπό το πλαίσιο κοινής παρτιτούρας, αποτελεί ένα ιδιαίτερα ενδιαφέρον αντικείμενο μελέτης για τον τομέα της Ανάκτησης Μουσικής Πληροφορίας. Οι προϋπάρχουσες προσεγγίσεις που αφορούν στη διαδικασία της συνοδείας τζαζ αυτοσχεδιασμού, έχουν παρουσιάσει συστήματα που δε διαθέτουν την ικανότητα συμμόρφωσης με δυναμικά μεταβαλλόμενα περιβάλλοντα, εξαρτώμενα από τα αυτοσχέδια δεδομένα. Η παρούσα πτυχιακή εργασία παρουσιάζει ένα σύστημα συνοδείας, το οποίο διαθέτει την ικανότητα προσαρμογής τόσο στο τζαζ σόλο του μουσικού, όσο και τους περιορισμούς που έχουν προκαθοριστεί από την παρτιτούρα. Ο τεχνητός πράκτορας που αναπτύσσεται για το σκοπό αυτό, αποτελείται από δύο υποσυστήματα, ένα μοντέλο υπεύθυνο για την παραγωγή προβλέψεων που αφορούν το σόλο του μουσικού κι ένα δεύτερο υποσύστημα που παράγει την τελική μουσική συνοδεία, αξιοποιώντας την πληροφορία για τις προθέσεις του σολίστα που παρήγαγε το πρώτο μοντέλο. Και τα δύο προαναφερθέντα μοντέλα έχουν ως σχεδιαστική βάση τα Αναδρομικά Νευρωνικά Δίκτυα. Το σύνολο των δεδομένων που χρησιμοποιήθηκαν στην εκπαίδευση των μοντέλων υποβλήθηκαν σε επεξεργασία πολλών επιπέδων, συμπεριλαμβανομένης της πιθανολογικής βελτιστοποίησης, με στόχο τη διατήρηση και την επαύξηση της χρήσιμης πληροφορίας. Το τελικό σύστημα εξετάστηκε με τη χρήση δύο τζαζ προτύπων, παρουσιάζοντας προσαρμοστική ικανότητα ως προς τους αρμονικούς περιορισμούς, καθώς και ποικιλομορφία, εξαρτώμενη από τον αυτοσχεδιασμό του μουσικού. Τέλος, αναφέρονται κάποιες δυσκολίες που προέκυψαν, όπως επίσης και προτάσεις για περαιτέρω έρευνα.Some of the most essential characteristics of improvisation on jazz standards are reflected through the accompaniment. Given a lead sheet as common ground, the study of the collaborative process of music improvisation between a human and an artificial agent in a real time setting, is a scenario of great interest in the MIR domain. So far, the approaches concerning the jazz improvisation accompaniment procedure, have presented systems that lack the capability of performing the accompaniment generation task while at the same time adapting to dynamically variable constraints depending on new, improvised data. The thesis at hand, proposes a jazz accompaniment system capable of providing proper chord voicings to the solo, while complying with both the soloist's intentions as well as the previously defined constraints set by the lead sheet. The artificial agent consists of two sub-systems; a model responsible for predicting the human soloist's intentions and a second system performing the task of the accompaniment. The latter is achieved by modeling the artificial agent's predictions, after exploiting the information on the expectations of the human agent's intentions, previously calculated by the first model. Recurrent Neural Networks (RNNs) comprise both aforementioned models. The dataset used in the training process has undergone multi-staged processing including probabilistic refinement, aiming to keep and enrich the information which is requisite for the task. The system was tested on two cases of jazz standards, demonstrating ability of compliance with the harmonic constraints. Additionally, output variability depending on the solo improvisation has been indicated. Emerging limitations as well as potential future perspectives are discussed in the conclusion of this work

    Polyffusion: A Diffusion Model for Polyphonic Score Generation with Internal and External Controls

    Full text link
    We propose Polyffusion, a diffusion model that generates polyphonic music scores by regarding music as image-like piano roll representations. The model is capable of controllable music generation with two paradigms: internal control and external control. Internal control refers to the process in which users pre-define a part of the music and then let the model infill the rest, similar to the task of masked music generation (or music inpainting). External control conditions the model with external yet related information, such as chord, texture, or other features, via the cross-attention mechanism. We show that by using internal and external controls, Polyffusion unifies a wide range of music creation tasks, including melody generation given accompaniment, accompaniment generation given melody, arbitrary music segment inpainting, and music arrangement given chords or textures. Experimental results show that our model significantly outperforms existing Transformer and sampling-based baselines, and using pre-trained disentangled representations as external conditions yields more effective controls.Comment: In Proceedings of the 24th Conference of the International Society for Music Information Retrieval (ISMIR 2023), Milan, Ital

    Deep Learning Techniques for Music Generation -- A Survey

    Full text link
    This paper is a survey and an analysis of different ways of using deep learning (deep artificial neural networks) to generate musical content. We propose a methodology based on five dimensions for our analysis: Objective - What musical content is to be generated? Examples are: melody, polyphony, accompaniment or counterpoint. - For what destination and for what use? To be performed by a human(s) (in the case of a musical score), or by a machine (in the case of an audio file). Representation - What are the concepts to be manipulated? Examples are: waveform, spectrogram, note, chord, meter and beat. - What format is to be used? Examples are: MIDI, piano roll or text. - How will the representation be encoded? Examples are: scalar, one-hot or many-hot. Architecture - What type(s) of deep neural network is (are) to be used? Examples are: feedforward network, recurrent network, autoencoder or generative adversarial networks. Challenge - What are the limitations and open challenges? Examples are: variability, interactivity and creativity. Strategy - How do we model and control the process of generation? Examples are: single-step feedforward, iterative feedforward, sampling or input manipulation. For each dimension, we conduct a comparative analysis of various models and techniques and we propose some tentative multidimensional typology. This typology is bottom-up, based on the analysis of many existing deep-learning based systems for music generation selected from the relevant literature. These systems are described and are used to exemplify the various choices of objective, representation, architecture, challenge and strategy. The last section includes some discussion and some prospects.Comment: 209 pages. This paper is a simplified version of the book: J.-P. Briot, G. Hadjeres and F.-D. Pachet, Deep Learning Techniques for Music Generation, Computational Synthesis and Creative Systems, Springer, 201

    Predictive Models for Music

    Get PDF
    Modeling long-term dependencies in time series has proved very difficult to achieve with traditional machine learning methods. This problem occurs when considering music data. In this paper, we introduce generative models for melodies. We decompose melodic modeling into two subtasks. We first propose a rhythm model based on the distributions of distances between subsequences. Then, we define a generative model for melodies given chords and rhythms based on modeling sequences of Narmour features. The rhythm model consistently outperforms a standard Hidden Markov Model in terms of conditional prediction accuracy on two different music databases. Using a similar evaluation procedure, the proposed melodic model consistently outperforms an Input/Output Hidden Markov Model. Furthermore, sampling these models given appropriate musical contexts generates realistic melodies
    corecore