Leveraging Generative Models for Music and Signal Processing

Thickstun, John

Leveraging Generative Models for Music and Signal Processing

Authors: John Thickstun
Publication date: 1 January 2021
Publisher

Abstract

Thesis (Ph.D.)--University of Washington, 2021Generative models can serve as a powerful primitive for creative interaction with data. Generative models give us the ability to synthesize or re-synthesize multimedia; conditional generative modeling empowers us to control the outputs of these models. By steering a generative model with conditioning information, we can sketch the essential aspects of our creative vision, and the generative model will fill in the details. This dissertation explores the possibilities of generative modeling as a creative tool, with a focus on applications to music and audio. The dissertation proceeds in three parts: 1. We develop algorithms and evaluation metrics for aligning musical scores to audio. Alignments provide us with a dense set of labels on musical audio, that we can use to supervise conditional generation tasks such as transcription: synthesis of a musical score conditioned on an audio performance. This work on alignments leads to the construction of MusicNet: a collection of 330 freely-licensed classical music recordings, together with over 1 million annotated labels indicating the precise time of each note in every recording, the instrument that plays each note. We use this dataset to train state-of-the-art music transcription models for the MIREX Multiple Fundamental Frequency Estimation task. 2. We construct autoregressive generative models of musical scores, which exploit invariances in the structure of music. Whereas most recent work on music modeling has represented music as an ordered sequence of notes, we explore an alternative representation of music as a multi-dimensional tensor. We consider a variety of factorizations of the joint distribution over these tensors. We then turn to our attention to discriminative modeling of scores, using this tensor representation. We construct a classifier that can reliably identify the composer of a classical musical score. Our methods, which operate on the generic tensor score representation, outperform previously reported results using SVM and kNN classifiers with handcrafted features, specialized for the composer classification task. 3. We develop a sampling algorithm for likelihood-based models that allows us to steer an unconditional generative model using conditioning information. We work within a Bayesian posterior sampling framework, using a pre-trained unconditional generative model as a prior, to sample from the posterior distribution of a conditional likelihood. Samples are obtained using noise-annealed Langevin dynamics to construct a Markov chain to approximate a sample from the posterior distribution. We develop these ideas for a variety of models and applications, including source separation, in both the visual and audio domains

Similar works

Full text

Available Versions

DSpace at The University of Washington

oai:digital.lib.washington.edu...

Last time updated on 19/12/2021