523 research outputs found
Scaling Machine Learning Systems using Domain Adaptation
Machine-learned components, particularly those trained using deep learning methods, are becoming integral parts of modern intelligent systems, with applications including computer vision, speech processing, natural language processing and human activity recognition. As these machine learning (ML) systems scale to real-world settings, they will encounter scenarios where the distribution of the data in the real-world (i.e., the target domain) is different from the data on which they were trained (i.e., the source domain). This phenomenon, known as domain shift, can significantly degrade the performance of ML systems in new deployment scenarios. In this thesis, we study the impact of domain shift caused by variations in system hardware, software and user preferences on the performance of ML systems. After quantifying the performance degradation of ML models in target domains due to the various types of domain shift, we propose unsupervised domain adaptation (uDA) algorithms that leverage unlabeled data collected in the target domain to improve the performance of the ML model. At its core, this thesis argues for the need to develop uDA solutions while adhering to practical scenarios in which ML systems will scale. More specifically, we consider four scenarios: (i) opaque ML systems, wherein parameters of the source prediction model are not made accessible in the target domain, (ii) transparent ML systems, wherein source model parameters are accessible and can be modified in the target domain, (iii) ML systems where source and target domains do not have identical label spaces, and (iv) distributed ML systems, wherein the source and target domains are geographically distributed, their datasets are private and cannot be exchanged using adaptation. We study the unique challenges and constraints of each scenario and propose novel uDA algorithms that outperform state-of-the-art baselines
TimbreTron: A WaveNet(CycleGAN(CQT(Audio))) Pipeline for Musical Timbre Transfer
In this work, we address the problem of musical timbre transfer, where the
goal is to manipulate the timbre of a sound sample from one instrument to match
another instrument while preserving other musical content, such as pitch,
rhythm, and loudness. In principle, one could apply image-based style transfer
techniques to a time-frequency representation of an audio signal, but this
depends on having a representation that allows independent manipulation of
timbre as well as high-quality waveform generation. We introduce TimbreTron, a
method for musical timbre transfer which applies "image" domain style transfer
to a time-frequency representation of the audio signal, and then produces a
high-quality waveform using a conditional WaveNet synthesizer. We show that the
Constant Q Transform (CQT) representation is particularly well-suited to
convolutional architectures due to its approximate pitch equivariance. Based on
human perceptual evaluations, we confirmed that TimbreTron recognizably
transferred the timbre while otherwise preserving the musical content, for both
monophonic and polyphonic samples.Comment: 17 pages, published as a conference paper at ICLR 201
Disentanglement by Cyclic Reconstruction
Deep neural networks have demonstrated their ability to automatically extract
meaningful features from data. However, in supervised learning, information
specific to the dataset used for training, but irrelevant to the task at hand,
may remain encoded in the extracted representations. This remaining information
introduces a domain-specific bias, weakening the generalization performance. In
this work, we propose splitting the information into a task-related
representation and its complementary context representation. We propose an
original method, combining adversarial feature predictors and cyclic
reconstruction, to disentangle these two representations in the single-domain
supervised case. We then adapt this method to the unsupervised domain
adaptation problem, consisting of training a model capable of performing on
both a source and a target domain. In particular, our method promotes
disentanglement in the target domain, despite the absence of training labels.
This enables the isolation of task-specific information from both domains and a
projection into a common representation. The task-specific representation
allows efficient transfer of knowledge acquired from the source domain to the
target domain. In the single-domain case, we demonstrate the quality of our
representations on information retrieval tasks and the generalization benefits
induced by sharpened task-specific representations. We then validate the
proposed method on several classical domain adaptation benchmarks and
illustrate the benefits of disentanglement for domain adaptation
- …