469 research outputs found
Deep AutoRegressive Networks
We introduce a deep, generative autoencoder capable of learning hierarchies
of distributed representations from data. Successive deep stochastic hidden
layers are equipped with autoregressive connections, which enable the model to
be sampled from quickly and exactly via ancestral sampling. We derive an
efficient approximate parameter estimation method based on the minimum
description length (MDL) principle, which can be seen as maximising a
variational lower bound on the log-likelihood, with a feedforward neural
network implementing approximate inference. We demonstrate state-of-the-art
generative performance on a number of classic data sets: several UCI data sets,
MNIST and Atari 2600 games.Comment: Appears in Proceedings of the 31st International Conference on
Machine Learning (ICML), Beijing, China, 201
Psychophysical identity and free energy
An approach to implementing variational Bayesian inference in biological
systems is considered, under which the thermodynamic free energy of a system
directly encodes its variational free energy. In the case of the brain, this
assumption places constraints on the neuronal encoding of generative and
recognition densities, in particular requiring a stochastic population code.
The resulting relationship between thermodynamic and variational free energies
is prefigured in mind-brain identity theses in philosophy and in the Gestalt
hypothesis of psychophysical isomorphism.Comment: 22 pages; published as a research article on 8/5/2020 in Journal of
the Royal Society Interfac
Adversarially Trained Autoencoders for Parallel-Data-Free Voice Conversion
We present a method for converting the voices between a set of speakers. Our
method is based on training multiple autoencoder paths, where there is a single
speaker-independent encoder and multiple speaker-dependent decoders. The
autoencoders are trained with an addition of an adversarial loss which is
provided by an auxiliary classifier in order to guide the output of the encoder
to be speaker independent. The training of the model is unsupervised in the
sense that it does not require collecting the same utterances from the speakers
nor does it require time aligning over phonemes. Due to the use of a single
encoder, our method can generalize to converting the voice of out-of-training
speakers to speakers in the training dataset. We present subjective tests
corroborating the performance of our method
- …