3 research outputs found
Voice command generation using progressive wavegans
Generative Adversarial Networks (GANs) have become exceedingly popular in a
wide range of data-driven research fields, due in part to their success in
image generation. Their ability to generate new samples, often from only a
small amount of input data, makes them an exciting research tool in areas with
limited data resources. One less-explored application of GANs is the synthesis
of speech and audio samples. Herein, we propose a set of extensions to the
WaveGAN paradigm, a recently proposed approach for sound generation using GANs.
The aim of these extensions - preprocessing, Audio-to-Audio generation, skip
connections and progressive structures - is to improve the human likeness of
synthetic speech samples. Scores from listening tests with 30 volunteers
demonstrated a moderate improvement (Cohen's d coefficient of 0.65) in human
likeness using the proposed extensions compared to the original WaveGAN
approach.Comment: 7 pages, 2 figure