5,515 research outputs found
Porting concepts from DNNs back to GMMs
Deep neural networks (DNNs) have been shown to outperform Gaussian Mixture Models (GMM) on a variety of speech recognition benchmarks. In this paper we analyze the differences between the DNN and GMM modeling techniques and port the best ideas from the DNN-based modeling to a GMM-based system. By going both deep (multiple layers) and wide (multiple parallel sub-models) and by sharing model parameters, we are able to close the gap between the two modeling techniques on the TIMIT database. Since the 'deep' GMMs retain the maximum-likelihood trained Gaussians as first layer, advanced techniques such as speaker adaptation and model-based noise robustness can be readily incorporated. Regardless of their similarities, the DNNs and the deep GMMs still show a sufficient amount of complementarity to allow effective system combination
High-Rate Vector Quantization for the Neyman-Pearson Detection of Correlated Processes
This paper investigates the effect of quantization on the performance of the
Neyman-Pearson test. It is assumed that a sensing unit observes samples of a
correlated stationary ergodic multivariate process. Each sample is passed
through an N-point quantizer and transmitted to a decision device which
performs a binary hypothesis test. For any false alarm level, it is shown that
the miss probability of the Neyman-Pearson test converges to zero exponentially
as the number of samples tends to infinity, assuming that the observed process
satisfies certain mixing conditions. The main contribution of this paper is to
provide a compact closed-form expression of the error exponent in the high-rate
regime i.e., when the number N of quantization levels tends to infinity,
generalizing previous results of Gupta and Hero to the case of non-independent
observations. If d represents the dimension of one sample, it is proved that
the error exponent converges at rate N^{2/d} to the one obtained in the absence
of quantization. As an application, relevant high-rate quantization strategies
which lead to a large error exponent are determined. Numerical results indicate
that the proposed quantization rule can yield better performance than existing
ones in terms of detection error.Comment: 47 pages, 7 figures, 1 table. To appear in the IEEE Transactions on
Information Theor
Recent advances in directional statistics
Mainstream statistical methodology is generally applicable to data observed
in Euclidean space. There are, however, numerous contexts of considerable
scientific interest in which the natural supports for the data under
consideration are Riemannian manifolds like the unit circle, torus, sphere and
their extensions. Typically, such data can be represented using one or more
directions, and directional statistics is the branch of statistics that deals
with their analysis. In this paper we provide a review of the many recent
developments in the field since the publication of Mardia and Jupp (1999),
still the most comprehensive text on directional statistics. Many of those
developments have been stimulated by interesting applications in fields as
diverse as astronomy, medicine, genetics, neurology, aeronautics, acoustics,
image analysis, text mining, environmetrics, and machine learning. We begin by
considering developments for the exploratory analysis of directional data
before progressing to distributional models, general approaches to inference,
hypothesis testing, regression, nonparametric curve estimation, methods for
dimension reduction, classification and clustering, and the modelling of time
series, spatial and spatio-temporal data. An overview of currently available
software for analysing directional data is also provided, and potential future
developments discussed.Comment: 61 page
Hyper-Spectral Image Analysis with Partially-Latent Regression and Spatial Markov Dependencies
Hyper-spectral data can be analyzed to recover physical properties at large
planetary scales. This involves resolving inverse problems which can be
addressed within machine learning, with the advantage that, once a relationship
between physical parameters and spectra has been established in a data-driven
fashion, the learned relationship can be used to estimate physical parameters
for new hyper-spectral observations. Within this framework, we propose a
spatially-constrained and partially-latent regression method which maps
high-dimensional inputs (hyper-spectral images) onto low-dimensional responses
(physical parameters such as the local chemical composition of the soil). The
proposed regression model comprises two key features. Firstly, it combines a
Gaussian mixture of locally-linear mappings (GLLiM) with a partially-latent
response model. While the former makes high-dimensional regression tractable,
the latter enables to deal with physical parameters that cannot be observed or,
more generally, with data contaminated by experimental artifacts that cannot be
explained with noise models. Secondly, spatial constraints are introduced in
the model through a Markov random field (MRF) prior which provides a spatial
structure to the Gaussian-mixture hidden variables. Experiments conducted on a
database composed of remotely sensed observations collected from the Mars
planet by the Mars Express orbiter demonstrate the effectiveness of the
proposed model.Comment: 12 pages, 4 figures, 3 table
Caveats for information bottleneck in deterministic scenarios
Information bottleneck (IB) is a method for extracting information from one
random variable that is relevant for predicting another random variable
. To do so, IB identifies an intermediate "bottleneck" variable that has
low mutual information and high mutual information . The "IB
curve" characterizes the set of bottleneck variables that achieve maximal
for a given , and is typically explored by maximizing the "IB
Lagrangian", . In some cases, is a deterministic
function of , including many classification problems in supervised learning
where the output class is a deterministic function of the input . We
demonstrate three caveats when using IB in any situation where is a
deterministic function of : (1) the IB curve cannot be recovered by
maximizing the IB Lagrangian for different values of ; (2) there are
"uninteresting" trivial solutions at all points of the IB curve; and (3) for
multi-layer classifiers that achieve low prediction error, different layers
cannot exhibit a strict trade-off between compression and prediction, contrary
to a recent proposal. We also show that when is a small perturbation away
from being a deterministic function of , these three caveats arise in an
approximate way. To address problem (1), we propose a functional that, unlike
the IB Lagrangian, can recover the IB curve in all cases. We demonstrate the
three caveats on the MNIST dataset
- …