50,160 research outputs found
Consistency of maximum likelihood estimation for some dynamical systems
We consider the asymptotic consistency of maximum likelihood parameter
estimation for dynamical systems observed with noise. Under suitable conditions
on the dynamical systems and the observations, we show that maximum likelihood
parameter estimation is consistent. Our proof involves ideas from both
information theory and dynamical systems. Furthermore, we show how some
well-studied properties of dynamical systems imply the general statistical
properties related to maximum likelihood estimation. Finally, we exhibit
classical families of dynamical systems for which maximum likelihood estimation
is consistent. Examples include shifts of finite type with Gibbs measures and
Axiom A attractors with SRB measures.Comment: Published in at http://dx.doi.org/10.1214/14-AOS1259 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Analyticity of Entropy Rates of Continuous-State Hidden Markov Models
The analyticity of the entropy and relative entropy rates of continuous-state
hidden Markov models is studied here. Using the analytic continuation principle
and the stability properties of the optimal filter, the analyticity of these
rates is shown for analytically parameterized models. The obtained results hold
under relatively mild conditions and cover several classes of hidden Markov
models met in practice. These results are relevant for several (theoretically
and practically) important problems arising in statistical inference, system
identification and information theory
Convergence and Convergence Rate of Stochastic Gradient Search in the Case of Multiple and Non-Isolated Extrema
The asymptotic behavior of stochastic gradient algorithms is studied. Relying
on results from differential geometry (Lojasiewicz gradient inequality), the
single limit-point convergence of the algorithm iterates is demonstrated and
relatively tight bounds on the convergence rate are derived. In sharp contrast
to the existing asymptotic results, the new results presented here allow the
objective function to have multiple and non-isolated minima. The new results
also offer new insights into the asymptotic properties of several classes of
recursive algorithms which are routinely used in engineering, statistics,
machine learning and operations research
Nonlinear Information Bottleneck
Information bottleneck (IB) is a technique for extracting information in one
random variable that is relevant for predicting another random variable
. IB works by encoding in a compressed "bottleneck" random variable
from which can be accurately decoded. However, finding the optimal
bottleneck variable involves a difficult optimization problem, which until
recently has been considered for only two limited cases: discrete and
with small state spaces, and continuous and with a Gaussian joint
distribution (in which case optimal encoding and decoding maps are linear). We
propose a method for performing IB on arbitrarily-distributed discrete and/or
continuous and , while allowing for nonlinear encoding and decoding
maps. Our approach relies on a novel non-parametric upper bound for mutual
information. We describe how to implement our method using neural networks. We
then show that it achieves better performance than the recently-proposed
"variational IB" method on several real-world datasets
Quick and energy-efficient Bayesian computing of binocular disparity using stochastic digital signals
Reconstruction of the tridimensional geometry of a visual scene using the
binocular disparity information is an important issue in computer vision and
mobile robotics, which can be formulated as a Bayesian inference problem.
However, computation of the full disparity distribution with an advanced
Bayesian model is usually an intractable problem, and proves computationally
challenging even with a simple model. In this paper, we show how probabilistic
hardware using distributed memory and alternate representation of data as
stochastic bitstreams can solve that problem with high performance and energy
efficiency. We put forward a way to express discrete probability distributions
using stochastic data representations and perform Bayesian fusion using those
representations, and show how that approach can be applied to diparity
computation. We evaluate the system using a simulated stochastic implementation
and discuss possible hardware implementations of such architectures and their
potential for sensorimotor processing and robotics.Comment: Preprint of article submitted for publication in International
Journal of Approximate Reasoning and accepted pending minor revision
Improvements to deep convolutional neural networks for LVCSR
Deep Convolutional Neural Networks (CNNs) are more powerful than Deep Neural
Networks (DNN), as they are able to better reduce spectral variation in the
input signal. This has also been confirmed experimentally, with CNNs showing
improvements in word error rate (WER) between 4-12% relative compared to DNNs
across a variety of LVCSR tasks. In this paper, we describe different methods
to further improve CNN performance. First, we conduct a deep analysis comparing
limited weight sharing and full weight sharing with state-of-the-art features.
Second, we apply various pooling strategies that have shown improvements in
computer vision to an LVCSR speech task. Third, we introduce a method to
effectively incorporate speaker adaptation, namely fMLLR, into log-mel
features. Fourth, we introduce an effective strategy to use dropout during
Hessian-free sequence training. We find that with these improvements,
particularly with fMLLR and dropout, we are able to achieve an additional 2-3%
relative improvement in WER on a 50-hour Broadcast News task over our previous
best CNN baseline. On a larger 400-hour BN task, we find an additional 4-5%
relative improvement over our previous best CNN baseline.Comment: 6 pages, 1 figur
Simulated maximum likelihood for general stochastic volatility models: a change of variable approach
Maximum likelihood has proved to be a valuable tool for fitting the log-normal stochastic volatility model to financial returns time series. Using a sequential change of variable framework, we are able to cast more general stochastic volatility models into a form appropriate for importance samplers based on the Laplace approximation. We apply the methodology to two example models, showing that efficient importance samplers can be constructed even for highly non-Gaussian latent processes such as square-root diffusions.Change of Variable; Heston Model; Laplace Importance Sampler; Simulated Maximum Likelihood; Stochastic Volatility
- …