103 research outputs found

    MULTI-STEP CHORD SEQUENCE PREDICTION BASED ON AGGREGATED MULTI-SCALE ENCODER-DECODER NETWORKS

    Get PDF
    International audienceThis paper studies the prediction of chord progressions for jazz music by relying on machine learning models. The motivation of our study comes from the recent success of neu-ral networks for performing automatic music composition. Although high accuracies are obtained in single-step prediction scenarios, most models fail to generate accurate multi-step chord predictions. In this paper, we postulate that this comes from the multi-scale structure of musical information and propose new architectures based on an iterative temporal aggregation of input labels. Specifically, the input and ground truth labels are merged into increasingly large temporal bags, on which we train a family of encoder-decoder networks for each temporal scale. In a second step, we use these pre-trained encoder bottleneck features at each scale in order to train a final encoder-decoder network. Furthermore, we rely on different reductions of the initial chord alphabet into three adapted chord alphabets. We perform evaluations against several state-of-the-art models and show that our multi-scale architecture outperforms existing methods in terms of accuracy and perplexity, while requiring relatively few parameters. We analyze musical properties of the results, showing the influence of downbeat position within the analysis window on accuracy , and evaluate errors using a musically-informed distance metric

    MULTI-STEP CHORD SEQUENCE PREDICTION BASED ON AGGREGATED MULTI-SCALE ENCODER-DECODER NETWORKS

    Get PDF
    International audienceThis paper studies the prediction of chord progressions for jazz music by relying on machine learning models. The motivation of our study comes from the recent success of neu-ral networks for performing automatic music composition. Although high accuracies are obtained in single-step prediction scenarios, most models fail to generate accurate multi-step chord predictions. In this paper, we postulate that this comes from the multi-scale structure of musical information and propose new architectures based on an iterative temporal aggregation of input labels. Specifically, the input and ground truth labels are merged into increasingly large temporal bags, on which we train a family of encoder-decoder networks for each temporal scale. In a second step, we use these pre-trained encoder bottleneck features at each scale in order to train a final encoder-decoder network. Furthermore, we rely on different reductions of the initial chord alphabet into three adapted chord alphabets. We perform evaluations against several state-of-the-art models and show that our multi-scale architecture outperforms existing methods in terms of accuracy and perplexity, while requiring relatively few parameters. We analyze musical properties of the results, showing the influence of downbeat position within the analysis window on accuracy , and evaluate errors using a musically-informed distance metric

    ์Œ์•…์  ์š”์†Œ์— ๋Œ€ํ•œ ์กฐ๊ฑด๋ถ€ ์ƒ์„ฑ์˜ ๊ฐœ์„ ์— ๊ด€ํ•œ ์—ฐ๊ตฌ: ํ™”์Œ๊ณผ ํ‘œํ˜„์„ ์ค‘์‹ฌ์œผ๋กœ

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ(๋ฐ•์‚ฌ) -- ์„œ์šธ๋Œ€ํ•™๊ต๋Œ€ํ•™์› : ์œตํ•ฉ๊ณผํ•™๊ธฐ์ˆ ๋Œ€ํ•™์› ์œตํ•ฉ๊ณผํ•™๋ถ€(๋””์ง€ํ„ธ์ •๋ณด์œตํ•ฉ์ „๊ณต), 2023. 2. ์ด๊ต๊ตฌ.Conditional generation of musical components (CGMC) creates a part of music based on partial musical components such as melody or chord. CGMC is beneficial for discovering complex relationships among musical attributes. It can also assist non-experts who face difficulties in making music. However, recent studies for CGMC are still facing two challenges in terms of generation quality and model controllability. First, the structure of the generated music is not robust. Second, only limited ranges of musical factors and tasks have been examined as targets for flexible control of generation. In this thesis, we aim to mitigate these two challenges to improve the CGMC systems. For musical structure, we focus on intuitive modeling of musical hierarchy to help the model explicitly learn musically meaningful dependency. To this end, we utilize alignment paths between the raw music data and the musical units such as notes or chords. For musical creativity, we facilitate smooth control of novel musical attributes using latent representations. We attempt to achieve disentangled representations of the intended factors by regularizing them with data-driven inductive bias. This thesis verifies the proposed approaches particularly in two representative CGMC tasks, melody harmonization and expressive performance rendering. A variety of experimental results show the possibility of the proposed approaches to expand musical creativity under stable generation quality.์Œ์•…์  ์š”์†Œ๋ฅผ ์กฐ๊ฑด๋ถ€ ์ƒ์„ฑํ•˜๋Š” ๋ถ„์•ผ์ธ CGMC๋Š” ๋ฉœ๋กœ๋””๋‚˜ ํ™”์Œ๊ณผ ๊ฐ™์€ ์Œ์•…์˜ ์ผ๋ถ€๋ถ„์„ ๊ธฐ๋ฐ˜์œผ๋กœ ๋‚˜๋จธ์ง€ ๋ถ€๋ถ„์„ ์ƒ์„ฑํ•˜๋Š” ๊ฒƒ์„ ๋ชฉํ‘œ๋กœ ํ•œ๋‹ค. ์ด ๋ถ„์•ผ๋Š” ์Œ์•…์  ์š”์†Œ ๊ฐ„ ๋ณต์žกํ•œ ๊ด€๊ณ„๋ฅผ ํƒ๊ตฌํ•˜๋Š” ๋ฐ ์šฉ์ดํ•˜๊ณ , ์Œ์•…์„ ๋งŒ๋“œ๋Š” ๋ฐ ์–ด๋ ค์›€์„ ๊ฒช๋Š” ๋น„์ „๋ฌธ๊ฐ€๋“ค์„ ๋„์šธ ์ˆ˜ ์žˆ๋‹ค. ์ตœ๊ทผ ์—ฐ๊ตฌ๋“ค์€ ๋”ฅ๋Ÿฌ๋‹ ๊ธฐ์ˆ ์„ ํ™œ์šฉํ•˜์—ฌ CGMC ์‹œ์Šคํ…œ์˜ ์„ฑ๋Šฅ์„ ๋†’์—ฌ์™”๋‹ค. ํ•˜์ง€๋งŒ, ์ด๋Ÿฌํ•œ ์—ฐ๊ตฌ๋“ค์—๋Š” ์•„์ง ์ƒ์„ฑ ํ’ˆ์งˆ๊ณผ ์ œ์–ด๊ฐ€๋Šฅ์„ฑ ์ธก๋ฉด์—์„œ ๋‘ ๊ฐ€์ง€์˜ ํ•œ๊ณ„์ ์ด ์žˆ๋‹ค. ๋จผ์ €, ์ƒ์„ฑ๋œ ์Œ์•…์˜ ์Œ์•…์  ๊ตฌ์กฐ๊ฐ€ ๋ช…ํ™•ํ•˜์ง€ ์•Š๋‹ค. ๋˜ํ•œ, ์•„์ง ์ข์€ ๋ฒ”์œ„์˜ ์Œ์•…์  ์š”์†Œ ๋ฐ ํ…Œ์Šคํฌ๋งŒ์ด ์œ ์—ฐํ•œ ์ œ์–ด์˜ ๋Œ€์ƒ์œผ๋กœ์„œ ํƒ๊ตฌ๋˜์—ˆ๋‹ค. ์ด์— ๋ณธ ํ•™์œ„๋…ผ๋ฌธ์—์„œ๋Š” CGMC์˜ ๊ฐœ์„ ์„ ์œ„ํ•ด ์œ„ ๋‘ ๊ฐ€์ง€์˜ ํ•œ๊ณ„์ ์„ ํ•ด๊ฒฐํ•˜๊ณ ์ž ํ•œ๋‹ค. ์ฒซ ๋ฒˆ์งธ๋กœ, ์Œ์•… ๊ตฌ์กฐ๋ฅผ ์ด๋ฃจ๋Š” ์Œ์•…์  ์œ„๊ณ„๋ฅผ ์ง๊ด€์ ์œผ๋กœ ๋ชจ๋ธ๋งํ•˜๋Š” ๋ฐ ์ง‘์ค‘ํ•˜๊ณ ์ž ํ•œ๋‹ค. ๋ณธ๋ž˜ ๋ฐ์ดํ„ฐ์™€ ์Œ, ํ™”์Œ๊ณผ ๊ฐ™์€ ์Œ์•…์  ๋‹จ์œ„ ๊ฐ„ ์ •๋ ฌ ๊ฒฝ๋กœ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋ชจ๋ธ์ด ์Œ์•…์ ์œผ๋กœ ์˜๋ฏธ์žˆ๋Š” ์ข…์†์„ฑ์„ ๋ช…ํ™•ํ•˜๊ฒŒ ๋ฐฐ์šธ ์ˆ˜ ์žˆ๋„๋ก ํ•œ๋‹ค. ๋‘ ๋ฒˆ์งธ๋กœ, ์ž ์žฌ ํ‘œ์ƒ์„ ํ™œ์šฉํ•˜์—ฌ ์ƒˆ๋กœ์šด ์Œ์•…์  ์š”์†Œ๋“ค์„ ์œ ์—ฐํ•˜๊ฒŒ ์ œ์–ดํ•˜๊ณ ์ž ํ•œ๋‹ค. ํŠนํžˆ ์ž ์žฌ ํ‘œ์ƒ์ด ์˜๋„๋œ ์š”์†Œ์— ๋Œ€ํ•ด ํ’€๋ฆฌ๋„๋ก ํ›ˆ๋ จํ•˜๊ธฐ ์œ„ํ•ด์„œ ๋น„์ง€๋„ ํ˜น์€ ์ž๊ฐ€์ง€๋„ ํ•™์Šต ํ”„๋ ˆ์ž„์›Œํฌ์„ ์‚ฌ์šฉํ•˜์—ฌ ์ž ์žฌ ํ‘œ์ƒ์„ ์ œํ•œํ•˜๋„๋ก ํ•œ๋‹ค. ๋ณธ ํ•™์œ„๋…ผ๋ฌธ์—์„œ๋Š” CGMC ๋ถ„์•ผ์˜ ๋Œ€ํ‘œ์ ์ธ ๋‘ ํ…Œ์Šคํฌ์ธ ๋ฉœ๋กœ๋”” ํ•˜๋ชจ๋‚˜์ด์ œ์ด์…˜ ๋ฐ ํ‘œํ˜„์  ์—ฐ์ฃผ ๋ Œ๋”๋ง ํ…Œ์Šคํฌ์— ๋Œ€ํ•ด ์œ„์˜ ๋‘ ๊ฐ€์ง€ ๋ฐฉ๋ฒ•๋ก ์„ ๊ฒ€์ฆํ•œ๋‹ค. ๋‹ค์–‘ํ•œ ์‹คํ—˜์  ๊ฒฐ๊ณผ๋“ค์„ ํ†ตํ•ด ์ œ์•ˆํ•œ ๋ฐฉ๋ฒ•๋ก ์ด CGMC ์‹œ์Šคํ…œ์˜ ์Œ์•…์  ์ฐฝ์˜์„ฑ์„ ์•ˆ์ •์ ์ธ ์ƒ์„ฑ ํ’ˆ์งˆ๋กœ ํ™•์žฅํ•  ์ˆ˜ ์žˆ๋‹ค๋Š” ๊ฐ€๋Šฅ์„ฑ์„ ์‹œ์‚ฌํ•œ๋‹ค.Chapter 1 Introduction 1 1.1 Motivation 5 1.2 Definitions 8 1.3 Tasks of Interest 10 1.3.1 Generation Quality 10 1.3.2 Controllability 12 1.4 Approaches 13 1.4.1 Modeling Musical Hierarchy 14 1.4.2 Regularizing Latent Representations 16 1.4.3 Target Tasks 18 1.5 Outline of the Thesis 19 Chapter 2 Background 22 2.1 Music Generation Tasks 23 2.1.1 Melody Harmonization 23 2.1.2 Expressive Performance Rendering 25 2.2 Structure-enhanced Music Generation 27 2.2.1 Hierarchical Music Generation 27 2.2.2 Transformer-based Music Generation 28 2.3 Disentanglement Learning 29 2.3.1 Unsupervised Approaches 30 2.3.2 Supervised Approaches 30 2.3.3 Self-supervised Approaches 31 2.4 Controllable Music Generation 32 2.4.1 Score Generation 32 2.4.2 Performance Rendering 33 2.5 Summary 34 Chapter 3 Translating Melody to Chord: Structured and Flexible Harmonization of Melody with Transformer 36 3.1 Introduction 36 3.2 Proposed Methods 41 3.2.1 Standard Transformer Model (STHarm) 41 3.2.2 Variational Transformer Model (VTHarm) 44 3.2.3 Regularized Variational Transformer Model (rVTHarm) 46 3.2.4 Training Objectives 47 3.3 Experimental Settings 48 3.3.1 Datasets 49 3.3.2 Comparative Methods 50 3.3.3 Training 50 3.3.4 Metrics 51 3.4 Evaluation 56 3.4.1 Chord Coherence and Diversity 57 3.4.2 Harmonic Similarity to Human 59 3.4.3 Controlling Chord Complexity 60 3.4.4 Subjective Evaluation 62 3.4.5 Qualitative Results 67 3.4.6 Ablation Study 73 3.5 Conclusion and Future Work 74 Chapter 4 Sketching the Expression: Flexible Rendering of Expressive Piano Performance with Self-supervised Learning 76 4.1 Introduction 76 4.2 Proposed Methods 79 4.2.1 Data Representation 79 4.2.2 Modeling Musical Hierarchy 80 4.2.3 Overall Network Architecture 81 4.2.4 Regularizing the Latent Variables 84 4.2.5 Overall Objective 86 4.3 Experimental Settings 87 4.3.1 Dataset and Implementation 87 4.3.2 Comparative Methods 88 4.4 Evaluation 88 4.4.1 Generation Quality 89 4.4.2 Disentangling Latent Representations 90 4.4.3 Controllability of Expressive Attributes 91 4.4.4 KL Divergence 93 4.4.5 Ablation Study 94 4.4.6 Subjective Evaluation 95 4.4.7 Qualitative Examples 97 4.4.8 Extent of Control 100 4.5 Conclusion 102 Chapter 5 Conclusion and Future Work 103 5.1 Conclusion 103 5.2 Future Work 106 5.2.1 Deeper Investigation of Controllable Factors 106 5.2.2 More Analysis of Qualitative Evaluation Results 107 5.2.3 Improving Diversity and Scale of Dataset 108 Bibliography 109 ์ดˆ ๋ก 137๋ฐ•

    The Effect of Explicit Structure Encoding of Deep Neural Networks for Symbolic Music Generation

    Full text link
    With recent breakthroughs in artificial neural networks, deep generative models have become one of the leading techniques for computational creativity. Despite very promising progress on image and short sequence generation, symbolic music generation remains a challenging problem since the structure of compositions are usually complicated. In this study, we attempt to solve the melody generation problem constrained by the given chord progression. This music meta-creation problem can also be incorporated into a plan recognition system with user inputs and predictive structural outputs. In particular, we explore the effect of explicit architectural encoding of musical structure via comparing two sequential generative models: LSTM (a type of RNN) and WaveNet (dilated temporal-CNN). As far as we know, this is the first study of applying WaveNet to symbolic music generation, as well as the first systematic comparison between temporal-CNN and RNN for music generation. We conduct a survey for evaluation in our generations and implemented Variable Markov Oracle in music pattern discovery. Experimental results show that to encode structure more explicitly using a stack of dilated convolution layers improved the performance significantly, and a global encoding of underlying chord progression into the generation procedure gains even more.Comment: 8 pages, 13 figure

    MANIFOLD REPRESENTATIONS OF MUSICAL SIGNALS AND GENERATIVE SPACES

    Get PDF
    Tra i diversi campi di ricerca nell\u2019ambito dell\u2019informatica musicale, la sintesi e la generazione di segnali audio incarna la pluridisciplinalita\u300 di questo settore, nutrendo insieme le pratiche scientifiche e musicale dalla sua creazione. Inerente all\u2019informatica dalla sua creazione, la generazione audio ha ispirato numerosi approcci, evolvendo colle pratiche musicale e gli progressi tecnologici e scientifici. Inoltre, alcuni processi di sintesi permettono anche il processo inverso, denominato analisi, in modo che i parametri di sintesi possono anche essere parzialmente o totalmente estratti dai suoni, dando una rappresentazione alternativa ai segnali analizzati. Per di piu\u300, la recente ascesa dei algoritmi di l\u2019apprendimento automatico ha vivamente interrogato il settore della ricerca scientifica, fornendo potenti data-centered metodi che sollevavano diversi epistemologici interrogativi, nonostante i sui efficacia. Particolarmente, un tipo di metodi di apprendimento automatico, denominati modelli generativi, si concentrano sulla generazione di contenuto originale usando le caratteristiche che hanno estratti dei dati analizzati. In tal caso, questi modelli non hanno soltanto interrogato i precedenti metodi di generazione, ma anche sul modo di integrare questi algoritmi nelle pratiche artistiche. Mentre questi metodi sono progressivamente introdotti nel settore del trattamento delle immagini, la loro applicazione per la sintesi di segnali audio e ancora molto marginale. In questo lavoro, il nostro obiettivo e di proporre un nuovo metodo di audio sintesi basato su questi nuovi tipi di generativi modelli, rafforazti dalle nuove avanzati dell\u2019apprendimento automatico. Al primo posto, facciamo una revisione dei approcci esistenti nei settori dei sistemi generativi e di sintesi sonore, focalizzando sul posto di nostro lavoro rispetto a questi disciplini e che cosa possiamo aspettare di questa collazione. In seguito, studiamo in maniera piu\u300 precisa i modelli generativi, e come possiamo utilizzare questi recenti avanzati per l\u2019apprendimento di complesse distribuzione di suoni, in un modo che sia flessibile e nel flusso creativo del utente. Quindi proponiamo un processo di inferenza / generazione, il quale rifletta i processi di analisi/sintesi che sono molto usati nel settore del trattamento del segnale audio, usando modelli latenti, che sono basati sull\u2019utilizzazione di un spazio continuato di alto livello, che usiamo per controllare la generazione. Studiamo dapprima i risultati preliminari ottenuti con informazione spettrale estratte da diversi tipi di dati, che valutiamo qualitativamente e quantitativamente. Successiva- mente, studiamo come fare per rendere questi metodi piu\u300 adattati ai segnali audio, fronteggiando tre diversi aspetti. Primo, proponiamo due diversi metodi di regolarizzazione di questo generativo spazio che sono specificamente sviluppati per l\u2019audio : una strategia basata sulla traduzione segnali / simboli, e una basata su vincoli percettivi. Poi, proponiamo diversi metodi per fronteggiare il aspetto temporale dei segnali audio, basati sull\u2019estrazione di rappresentazioni multiscala e sulla predizione, che permettono ai generativi spazi ottenuti di anche modellare l\u2019aspetto dinamico di questi segnali. Per finire, cambiamo il nostro approccio scientifico per un punto di visto piu\u301 ispirato dall\u2019idea di ricerca e creazione. Primo, descriviamo l\u2019architettura e il design della nostra libreria open-source, vsacids, sviluppata per permettere a esperti o non-esperti musicisti di provare questi nuovi metodi di sintesi. Poi, proponiamo una prima utilizzazione del nostro modello con la creazione di una performance in real- time, chiamata \ue6go, basata insieme sulla nostra libreria vsacids e sull\u2019uso di une agente di esplorazione, imparando con rinforzo nel corso della composizione. Finalmente, tramo dal lavoro presentato alcuni conclusioni sui diversi modi di migliorare e rinforzare il metodo di sintesi proposto, nonche\u301 eventuale applicazione artistiche.Among the diverse research fields within computer music, synthesis and generation of audio signals epitomize the cross-disciplinarity of this domain, jointly nourishing both scientific and artistic practices since its creation. Inherent in computer music since its genesis, audio generation has inspired numerous approaches, evolving both with musical practices and scientific/technical advances. Moreover, some syn- thesis processes also naturally handle the reverse process, named analysis, such that synthesis parameters can also be partially or totally extracted from actual sounds, and providing an alternative representation of the analyzed audio signals. On top of that, the recent rise of machine learning algorithms earnestly questioned the field of scientific research, bringing powerful data-centred methods that raised several epistemological questions amongst researchers, in spite of their efficiency. Especially, a family of machine learning methods, called generative models, are focused on the generation of original content using features extracted from an existing dataset. In that case, such methods not only questioned previous approaches in generation, but also the way of integrating this methods into existing creative processes. While these new generative frameworks are progressively introduced in the domain of image generation, the application of such generative techniques in audio synthesis is still marginal. In this work, we aim to propose a new audio analysis-synthesis framework based on these modern generative models, enhanced by recent advances in machine learning. We first review existing approaches, both in sound synthesis and in generative machine learning, and focus on how our work inserts itself in both practices and what can be expected from their collation. Subsequently, we focus a little more on generative models, and how modern advances in the domain can be exploited to allow us learning complex sound distributions, while being sufficiently flexible to be integrated in the creative flow of the user. We then propose an inference / generation process, mirroring analysis/synthesis paradigms that are natural in the audio processing domain, using latent models that are based on a continuous higher-level space, that we use to control the generation. We first provide preliminary results of our method applied on spectral information, extracted from several datasets, and evaluate both qualitatively and quantitatively the obtained results. Subsequently, we study how to make these methods more suitable for learning audio data, tackling successively three different aspects. First, we propose two different latent regularization strategies specifically designed for audio, based on and signal / symbol translation and perceptual constraints. Then, we propose different methods to address the inner temporality of musical signals, based on the extraction of multi-scale representations and on prediction, that allow the obtained generative spaces that also model the dynamics of the signal. As a last chapter, we swap our scientific approach to a more research & creation-oriented point of view: first, we describe the architecture and the design of our open-source library, vsacids, aiming to be used by expert and non-expert music makers as an integrated creation tool. Then, we propose an first musical use of our system by the creation of a real-time performance, called aego, based jointly on our framework vsacids and an explorative agent using reinforcement learning to be trained during the performance. Finally, we draw some conclusions on the different manners to improve and reinforce the proposed generation method, as well as possible further creative applications.A\u300 travers les diffe\u301rents domaines de recherche de la musique computationnelle, l\u2019analysie et la ge\u301ne\u301ration de signaux audio sont l\u2019exemple parfait de la trans-disciplinarite\u301 de ce domaine, nourrissant simultane\u301ment les pratiques scientifiques et artistiques depuis leur cre\u301ation. Inte\u301gre\u301e a\u300 la musique computationnelle depuis sa cre\u301ation, la synthe\u300se sonore a inspire\u301 de nombreuses approches musicales et scientifiques, e\u301voluant de pair avec les pratiques musicales et les avance\u301es technologiques et scientifiques de son temps. De plus, certaines me\u301thodes de synthe\u300se sonore permettent aussi le processus inverse, appele\u301 analyse, de sorte que les parame\u300tres de synthe\u300se d\u2019un certain ge\u301ne\u301rateur peuvent e\u302tre en partie ou entie\u300rement obtenus a\u300 partir de sons donne\u301s, pouvant ainsi e\u302tre conside\u301re\u301s comme une repre\u301sentation alternative des signaux analyse\u301s. Paralle\u300lement, l\u2019inte\u301re\u302t croissant souleve\u301 par les algorithmes d\u2019apprentissage automatique a vivement questionne\u301 le monde scientifique, apportant de puissantes me\u301thodes d\u2019analyse de donne\u301es suscitant de nombreux questionnements e\u301piste\u301mologiques chez les chercheurs, en de\u301pit de leur effectivite\u301 pratique. En particulier, une famille de me\u301thodes d\u2019apprentissage automatique, nomme\u301e mode\u300les ge\u301ne\u301ratifs, s\u2019inte\u301ressent a\u300 la ge\u301ne\u301ration de contenus originaux a\u300 partir de caracte\u301ristiques extraites directement des donne\u301es analyse\u301es. Ces me\u301thodes n\u2019interrogent pas seulement les approches pre\u301ce\u301dentes, mais aussi sur l\u2019inte\u301gration de ces nouvelles me\u301thodes dans les processus cre\u301atifs existants. Pourtant, alors que ces nouveaux processus ge\u301ne\u301ratifs sont progressivement inte\u301gre\u301s dans le domaine la ge\u301ne\u301ration d\u2019image, l\u2019application de ces techniques en synthe\u300se audio reste marginale. Dans cette the\u300se, nous proposons une nouvelle me\u301thode d\u2019analyse-synthe\u300se base\u301s sur ces derniers mode\u300les ge\u301ne\u301ratifs, depuis renforce\u301s par les avance\u301es modernes dans le domaine de l\u2019apprentissage automatique. Dans un premier temps, nous examinerons les approches existantes dans le domaine des syste\u300mes ge\u301ne\u301ratifs, sur comment notre travail peut s\u2019inse\u301rer dans les pratiques de synthe\u300se sonore existantes, et que peut-on espe\u301rer de l\u2019hybridation de ces deux approches. Ensuite, nous nous focaliserons plus pre\u301cise\u301ment sur comment les re\u301centes avance\u301es accomplies dans ce domaine dans ce domaine peuvent e\u302tre exploite\u301es pour l\u2019apprentissage de distributions sonores complexes, tout en e\u301tant suffisamment flexibles pour e\u302tre inte\u301gre\u301es dans le processus cre\u301atif de l\u2019utilisateur. Nous proposons donc un processus d\u2019infe\u301rence / g\ue9n\ue9ration, refle\u301tant les paradigmes d\u2019analyse-synthe\u300se existant dans le domaine de ge\u301ne\u301ration audio, base\u301 sur l\u2019usage de mode\u300les latents continus que l\u2019on peut utiliser pour contro\u302ler la ge\u301ne\u301ration. Pour ce faire, nous e\u301tudierons de\u301ja\u300 les re\u301sultats pre\u301liminaires obtenus par cette me\u301thode sur l\u2019apprentissage de distributions spectrales, prises d\u2019ensembles de donne\u301es diversifie\u301s, en adoptant une approche a\u300 la fois quantitative et qualitative. Ensuite, nous proposerons d\u2019ame\u301liorer ces me\u301thodes de manie\u300re spe\u301cifique a\u300 l\u2019audio sur trois aspects distincts. D\u2019abord, nous proposons deux strate\u301gies de re\u301gularisation diffe\u301rentes pour l\u2019analyse de signaux audio : une base\u301e sur la traduction signal/ symbole, ainsi qu\u2019une autre base\u301e sur des contraintes perceptives. Nous passerons par la suite a\u300 la dimension temporelle de ces signaux audio, proposant de nouvelles me\u301thodes base\u301es sur l\u2019extraction de repre\u301sentations temporelles multi-e\u301chelle et sur une ta\u302che supple\u301mentaire de pre\u301diction, permettant la mode\u301lisation de caracte\u301ristiques dynamiques par les espaces ge\u301ne\u301ratifs obtenus. En dernier lieu, nous passerons d\u2019une approche scientifique a\u300 une approche plus oriente\u301e vers un point de vue recherche & cre\u301ation. Premie\u300rement, nous pre\u301senterons notre librairie open-source, vsacids, visant a\u300 e\u302tre employe\u301e par des cre\u301ateurs experts et non-experts comme un outil inte\u301gre\u301. Ensuite, nous proposons une premie\u300re utilisation musicale de notre syste\u300me par la cre\u301ation d\u2019une performance temps re\u301el, nomme\u301e \ue6go, base\u301e a\u300 la fois sur notre librarie et sur un agent d\u2019exploration appris dynamiquement par renforcement au cours de la performance. Enfin, nous tirons les conclusions du travail accompli jusqu\u2019a\u300 maintenant, concernant les possibles ame\u301liorations et de\u301veloppements de la me\u301thode de synthe\u300se propose\u301e, ainsi que sur de possibles applications cre\u301atives

    Deep Learning Language Models for music analysis and generation

    Get PDF
    [Abstract] In this project, we tackle the problem of predicting the next note in a monophonic musical piece. We choose a symbolic representation and extract it from digital sheet music. The problem is approached as four separate tasks, each of them corresponding to a specific property of the musical note. For each task, we compare the performance of both single and multi-output deep learning algorithms. Despite the severe class imbalance in our dataset, our models manage to generate balanced predictions for the four features.[Resumo] Neste proxecto tratamos o problema de predicir a seguinte nota nunha peza musical monofรณnica. Escollemos unha representaciรณn simbรณlica e extraรฉmola dun conxunto de partituras dixitais. Afrontamos o problema como catro tarefas de predicciรณn de propiedades inherentes รก nota musical. Para cada tarefa, comparamos o rendemento de algoritmos de aprendizaxe profundo dunha e varias saรญdas. Aรญnda que o conxunto de datos estรก moi descompensado, os nosos modelos son capaces de xerar prediciรณns equilibradas nos catro problemas.Traballo fin de grao. Enxeรฑarรญa Informรกtica. Curso 2021/202

    Analyzing Granger causality in climate data with time series classification methods

    Get PDF
    Attribution studies in climate science aim for scientifically ascertaining the influence of climatic variations on natural or anthropogenic factors. Many of those studies adopt the concept of Granger causality to infer statistical cause-effect relationships, while utilizing traditional autoregressive models. In this article, we investigate the potential of state-of-the-art time series classification techniques to enhance causal inference in climate science. We conduct a comparative experimental study of different types of algorithms on a large test suite that comprises a unique collection of datasets from the area of climate-vegetation dynamics. The results indicate that specialized time series classification methods are able to improve existing inference procedures. Substantial differences are observed among the methods that were tested
    • โ€ฆ
    corecore