33,505 research outputs found
On Using Backpropagation for Speech Texture Generation and Voice Conversion
Inspired by recent work on neural network image generation which rely on
backpropagation towards the network inputs, we present a proof-of-concept
system for speech texture synthesis and voice conversion based on two
mechanisms: approximate inversion of the representation learned by a speech
recognition neural network, and on matching statistics of neuron activations
between different source and target utterances. Similar to image texture
synthesis and neural style transfer, the system works by optimizing a cost
function with respect to the input waveform samples. To this end we use a
differentiable mel-filterbank feature extraction pipeline and train a
convolutional CTC speech recognition network. Our system is able to extract
speaker characteristics from very limited amounts of target speaker data, as
little as a few seconds, and can be used to generate realistic speech babble or
reconstruct an utterance in a different voice.Comment: Accepted to ICASSP 201
TET-GAN: Text Effects Transfer via Stylization and Destylization
Text effects transfer technology automatically makes the text dramatically
more impressive. However, previous style transfer methods either study the
model for general style, which cannot handle the highly-structured text effects
along the glyph, or require manual design of subtle matching criteria for text
effects. In this paper, we focus on the use of the powerful representation
abilities of deep neural features for text effects transfer. For this purpose,
we propose a novel Texture Effects Transfer GAN (TET-GAN), which consists of a
stylization subnetwork and a destylization subnetwork. The key idea is to train
our network to accomplish both the objective of style transfer and style
removal, so that it can learn to disentangle and recombine the content and
style features of text effects images. To support the training of our network,
we propose a new text effects dataset with as much as 64 professionally
designed styles on 837 characters. We show that the disentangled feature
representations enable us to transfer or remove all these styles on arbitrary
glyphs using one network. Furthermore, the flexible network design empowers
TET-GAN to efficiently extend to a new text style via one-shot learning where
only one example is required. We demonstrate the superiority of the proposed
method in generating high-quality stylized text over the state-of-the-art
methods.Comment: Accepted by AAAI 2019. Code and dataset will be available at
http://www.icst.pku.edu.cn/struct/Projects/TETGAN.htm
Audio style transfer
'Style transfer' among images has recently emerged as a very active research
topic, fuelled by the power of convolution neural networks (CNNs), and has
become fast a very popular technology in social media. This paper investigates
the analogous problem in the audio domain: How to transfer the style of a
reference audio signal to a target audio content? We propose a flexible
framework for the task, which uses a sound texture model to extract statistics
characterizing the reference audio style, followed by an optimization-based
audio texture synthesis to modify the target content. In contrast to mainstream
optimization-based visual transfer method, the proposed process is initialized
by the target content instead of random noise and the optimized loss is only
about texture, not structure. These differences proved key for audio style
transfer in our experiments. In order to extract features of interest, we
investigate different architectures, whether pre-trained on other tasks, as
done in image style transfer, or engineered based on the human auditory system.
Experimental results on different types of audio signal confirm the potential
of the proposed approach.Comment: ICASSP 2018 - 2018 IEEE International Conference on Acoustics, Speech
and Signal Processing (ICASSP), Apr 2018, Calgary, France. IEE
- …