2 research outputs found
Deep Autotuner: A Data-Driven Approach to Natural-Sounding Pitch Correction for Singing Voice in Karaoke Performances
We describe a machine-learning approach to pitch correcting a solo singing
performance in a karaoke setting, where the solo voice and accompaniment are on
separate tracks. The proposed approach addresses the situation where no musical
score of the vocals nor the accompaniment exists: It predicts the amount of
correction from the relationship between the spectral contents of the vocal and
accompaniment tracks. Hence, the pitch shift in cents suggested by the model
can be used to make the voice sound in tune with the accompaniment. This
approach differs from commercially used automatic pitch correction systems,
where notes in the vocal tracks are shifted to be centered around notes in a
user-defined score or mapped to the closest pitch among the twelve
equal-tempered scale degrees. We train the model using a dataset of 4,702
amateur karaoke performances selected for good intonation. We present a
Convolutional Gated Recurrent Unit (CGRU) model to accomplish this task. This
method can be extended into unsupervised pitch correction of a vocal
performance, popularly referred to as autotuning
Deep Autotuner: a Pitch Correcting Network for Singing Performances
We introduce a data-driven approach to automatic pitch correction of solo
singing performances. The proposed approach predicts note-wise pitch shifts
from the relationship between the respective spectrograms of the singing and
accompaniment. This approach differs from commercial systems, where vocal track
notes are usually shifted to be centered around pitches in a user-defined
score, or mapped to the closest pitch among the twelve equal-tempered scale
degrees. The proposed system treats pitch as a continuous value rather than
relying on a set of discretized notes found in musical scores, thus allowing
for improvisation and harmonization in the singing performance. We train our
neural network model using a dataset of 4,702 amateur karaoke performances
selected for good intonation. Our model is trained on both incorrect
intonation, for which it learns a correction, and intentional pitch variation,
which it learns to preserve. The proposed deep neural network with gated
recurrent units on top of convolutional layers shows promising performance on
the real-world score-free singing pitch correction task of autotuning.Comment: arXiv admin note: text overlap with arXiv:1902.0095