3 research outputs found
Reimagining Speech: A Scoping Review of Deep Learning-Powered Voice Conversion
Research on deep learning-powered voice conversion (VC) in speech-to-speech
scenarios is getting increasingly popular. Although many of the works in the
field of voice conversion share a common global pipeline, there is a
considerable diversity in the underlying structures, methods, and neural
sub-blocks used across research efforts. Thus, obtaining a comprehensive
understanding of the reasons behind the choice of the different methods in the
voice conversion pipeline can be challenging, and the actual hurdles in the
proposed solutions are often unclear. To shed light on these aspects, this
paper presents a scoping review that explores the use of deep learning in
speech analysis, synthesis, and disentangled speech representation learning
within modern voice conversion systems. We screened 621 publications from more
than 38 different venues between the years 2017 and 2023, followed by an
in-depth review of a final database consisting of 123 eligible studies. Based
on the review, we summarise the most frequently used approaches to voice
conversion based on deep learning and highlight common pitfalls within the
community. Lastly, we condense the knowledge gathered, identify main challenges
and provide recommendations for future research directions