12 research outputs found
Connectionist multivariate density-estimation and its application to speech synthesis
Autoregressive models factorize a multivariate joint probability distribution into a
product of one-dimensional conditional distributions. The variables are assigned
an ordering, and the conditional distribution of each variable modelled using all
variables preceding it in that ordering as predictors.
Calculating normalized probabilities and sampling has polynomial computational
complexity under autoregressive models. Moreover, binary autoregressive
models based on neural networks obtain statistical performances similar to that of
some intractable models, like restricted Boltzmann machines, on several datasets.
The use of autoregressive probability density estimators based on neural
networks to model real-valued data, while proposed before, has never been properly
investigated and reported. In this thesis we extend the formulation of neural
autoregressive distribution estimators (NADE) to real-valued data; a model we call
the real-valued neural autoregressive density estimator (RNADE). Its statistical
performance on several datasets, including visual and auditory data, is reported
and compared to that of other models. RNADE obtained higher test likelihoods
than other tractable models, while retaining all the attractive computational
properties of autoregressive models.
However, autoregressive models are limited by the ordering of the variables
inherent to their formulation. Marginalization and imputation tasks can only be
solved analytically if the missing variables are at the end of the ordering. We
present a new training technique that obtains a set of parameters that can be
used for any ordering of the variables. By choosing a model with a convenient
ordering of the dimensions at test time, it is possible to solve any marginalization
and imputation tasks analytically.
The same training procedure also makes it practical to train NADEs and
RNADEs with several hidden layers. The resulting deep and tractable models
display higher test likelihoods than the equivalent one-hidden-layer models for all
the datasets tested.
Ensembles of NADEs or RNADEs can be created inexpensively by combining
models that share their parameters but differ in the ordering of the variables. These
ensembles of autoregressive models obtain state-of-the-art statistical performances
for several datasets.
Finally, we demonstrate the application of RNADE to speech synthesis, and
confirm that capturing the phone-conditional dependencies of acoustic features
improves the quality of synthetic speech. Our model generates synthetic speech
that was judged by naive listeners as being of higher quality than that generated
by mixture density networks, which are considered a state-of-the-art synthesis
techniqu
Deep Architectures for Articulatory Inversion
We implement two deep architectures for the acousticarticulatory inversion mapping problem: a deep neural network and a deep trajectory mixture density network. We find that in both cases, deep architectures produce more accurate predictions than shallow architectures and that this is due to the higher expressive capability of a deep model and not a consequence of adding more adjustable parameters. We also find that a deep trajectory mixture density network is able to obtain better inversion accuracies than smoothing the results of a deep neural network. Our best model obtained an average root mean square error of 0.885 mm on the MNGU0 test dataset. Index Terms: Articulatory inversion, deep neural network, deep belief network, deep regression network, pretrainin
A Deep and Tractable Density Estimator
The Neural Autoregressive Distribution Estimator (NADE) and its real-valued version RNADE are competitive density models of multidimensional data across a variety of domains. These models use a fixed, arbitrary ordering of the data dimen-sions. One can easily condition on variables at the beginning of the ordering, and marginalize out variables at the end of the ordering, however other inference tasks require approximate infer-ence. In this work we introduce an efficient pro-cedure to simultaneously train a NADE model for each possible ordering of the variables, by shar-ing parameters across all these models. We can thus use the most convenient model for each infer-ence task at hand, and ensembles of such models with different orderings are immediately available. Moreover, unlike the original NADE, our train-ing procedure scales to deep models. Empirically, ensembles of Deep NADE models obtain state of the art density estimation performance. 1
RNADE: The real-valued neural autoregressive density-estimator
We introduce RNADE, a new model for joint density estimation of real-valued vectors. Our model calculates the density of a datapoint as the product of one-dimensional conditionals modeled using mixture density networks with shared parameters. RNADE learns a distributed representation of the data, while having a tractable expression for the calculation of densities. A tractable likelihood allows direct comparison with other methods and training by standard gradient-based optimizers. We compare the performance of RNADE on several datasets of heterogeneous and perceptual data, finding it outperforms mixture models in all but one case.
Modelling acoustic feature dependencies with artificial neural networks: Trajectory-RNADE
Given a transcription, sampling from a good model of acous-tic feature trajectories should result in plausible realizations of an utterance. However, samples from current probabilis-tic speech synthesis systems result in low quality synthetic speech. Henter et al. have demonstrated the need to capture the dependencies between acoustic features conditioned on the phonetic labels in order to obtain high quality synthetic speech. These dependencies are often ignored in neural network based acoustic models. We tackle this deficiency by introducing a probabilistic neural network model of acoustic trajectories, trajectory RNADE, able to capture these dependencies. Index Terms â Speech synthesis, artificial neural net-works, acoustic modelling, RNADE, trajectory mode
Neural Autoregressive Distribution Estimation
We present Neural Autoregressive Distribution Estimation (NADE) models, which
are neural network architectures applied to the problem of unsupervised
distribution and density estimation. They leverage the probability product rule
and a weight sharing scheme inspired from restricted Boltzmann machines, to
yield an estimator that is both tractable and has good generalization
performance. We discuss how they achieve competitive performance in modeling
both binary and real-valued observations. We also present how deep NADE models
can be trained to be agnostic to the ordering of input dimensions used by the
autoregressive product rule decomposition. Finally, we also show how to exploit
the topological structure of pixels in images using a deep convolutional
architecture for NADE