30 research outputs found
Singing voice correction using canonical time warping
Expressive singing voice correction is an appealing but challenging problem.
A robust time-warping algorithm which synchronizes two singing recordings can
provide a promising solution. We thereby propose to address the problem by
canonical time warping (CTW) which aligns amateur singing recordings to
professional ones. A new pitch contour is generated given the alignment
information, and a pitch-corrected singing is synthesized back through the
vocoder. The objective evaluation shows that CTW is robust against
pitch-shifting and time-stretching effects, and the subjective test
demonstrates that CTW prevails the other methods including DTW and the
commercial auto-tuning software. Finally, we demonstrate the applicability of
the proposed method in a practical, real-world scenario
TimewarpVAE: Simultaneous Time-Warping and Representation Learning of Trajectories
Human demonstrations of trajectories are an important source of training data
for many machine learning problems. However, the difficulty of collecting human
demonstration data for complex tasks makes learning efficient representations
of those trajectories challenging. For many problems, such as for handwriting
or for quasistatic dexterous manipulation, the exact timings of the
trajectories should be factored from their spatial path characteristics. In
this work, we propose TimewarpVAE, a fully differentiable manifold-learning
algorithm that incorporates Dynamic Time Warping (DTW) to simultaneously learn
both timing variations and latent factors of spatial variation. We show how the
TimewarpVAE algorithm learns appropriate time alignments and meaningful
representations of spatial variations in small handwriting and fork
manipulation datasets. Our results have lower spatial reconstruction test error
than baseline approaches and the learned low-dimensional representations can be
used to efficiently generate semantically meaningful novel trajectories.Comment: 17 pages, 12 figure
Sequence Alignment with Dirichlet Process Mixtures
We present a probabilistic model for unsupervised alignment of high-dimensional time-warped sequences based on the Dirichlet Process Mixture Model (DPMM). We follow the approach introduced in [Kazlauskaite,2018] of simultaneously representing each data sequence as a composition of a true underlying function and a time-warping, both of which are modelled using Gaussian processes (GPs), and aligning the underlying functions using an unsupervised alignment method. In [Kazlauskaite,2018] the alignment is performed using the GP latent variable model (GP-LVM) as a model of sequences, while our main contribution is extending this approach to using DPMM, which allows us to align the sequences temporally and cluster them at the same time. We show that the DPMM achieves competitive results in comparison to the GP-LVM on synthetic and real-world data sets, and discuss the different properties of the estimated underlying functions and the time-warps favoured by these models
Generating Labels for Regression of Subjective Constructs using Triplet Embeddings
Human annotations serve an important role in computational models where the
target constructs under study are hidden, such as dimensions of affect. This is
especially relevant in machine learning, where subjective labels derived from
related observable signals (e.g., audio, video, text) are needed to support
model training and testing. Current research trends focus on correcting
artifacts and biases introduced by annotators during the annotation process
while fusing them into a single annotation. In this work, we propose a novel
annotation approach using triplet embeddings. By lifting the absolute
annotation process to relative annotations where the annotator compares
individual target constructs in triplets, we leverage the accuracy of
comparisons over absolute ratings by human annotators. We then build a
1-dimensional embedding in Euclidean space that is indexed in time and serves
as a label for regression. In this setting, the annotation fusion occurs
naturally as a union of sets of sampled triplet comparisons among different
annotators. We show that by using our proposed sampling method to find an
embedding, we are able to accurately represent synthetic hidden constructs in
time under noisy sampling conditions. We further validate this approach using
human annotations collected from Mechanical Turk and show that we can recover
the underlying structure of the hidden construct up to bias and scaling
factors.Comment: 9 pages, 5 figures, accepted journal pape
Sequence Alignment with Dirichlet Process Mixtures
We present a probabilistic model for unsupervised alignment of
high-dimensional time-warped sequences based on the Dirichlet Process Mixture
Model (DPMM). We follow the approach introduced in (Kazlauskaite, 2018) of
simultaneously representing each data sequence as a composition of a true
underlying function and a time-warping, both of which are modelled using
Gaussian processes (GPs) (Rasmussen, 2005), and aligning the underlying
functions using an unsupervised alignment method. In (Kazlauskaite, 2018) the
alignment is performed using the GP latent variable model (GP-LVM) (Lawrence,
2005) as a model of sequences, while our main contribution is extending this
approach to using DPMM, which allows us to align the sequences temporally and
cluster them at the same time. We show that the DPMM achieves competitive
results in comparison to the GP-LVM on synthetic and real-world data sets, and
discuss the different properties of the estimated underlying functions and the
time-warps favoured by these models.Comment: 6 pages, 3 figures, "All Of Bayesian Nonparametrics" Workshop at the
32nd Annual Conference on Neural Information Processing Systems
(BNP@NeurIPS2018