82 research outputs found
About the posterior distribution in hidden Markov models with unknown number of states
We consider finite state space stationary hidden Markov models (HMMs) in the
situation where the number of hidden states is unknown. We provide a
frequentist asymptotic evaluation of Bayesian analysis methods. Our main result
gives posterior concentration rates for the marginal densities, that is for the
density of a fixed number of consecutive observations. Using conditions on the
prior, we are then able to define a consistent Bayesian estimator of the number
of hidden states. It is known that the likelihood ratio test statistic for
overfitted HMMs has a nonstandard behaviour and is unbounded. Our conditions on
the prior may be seen as a way to penalize parameters to avoid this phenomenon.
Inference of parameters is a much more difficult task than inference of
marginal densities, we still provide a precise description of the situation
when the observations are i.i.d. and we allow for possible hidden states.Comment: Published in at http://dx.doi.org/10.3150/13-BEJ550 the Bernoulli
(http://isi.cbs.nl/bernoulli/) by the International Statistical
Institute/Bernoulli Society (http://isi.cbs.nl/BS/bshome.htm
Adaptive estimation of High-Dimensional Signal-to-Noise Ratios
We consider the equivalent problems of estimating the residual variance, the
proportion of explained variance and the signal strength in a
high-dimensional linear regression model with Gaussian random design. Our aim
is to understand the impact of not knowing the sparsity of the regression
parameter and not knowing the distribution of the design on minimax estimation
rates of . Depending on the sparsity of the regression parameter,
optimal estimators of either rely on estimating the regression parameter
or are based on U-type statistics, and have minimax rates depending on . In
the important situation where is unknown, we build an adaptive procedure
whose convergence rate simultaneously achieves the minimax risk over all up
to a logarithmic loss which we prove to be non avoidable. Finally, the
knowledge of the design distribution is shown to play a critical role. When the
distribution of the design is unknown, consistent estimation of explained
variance is indeed possible in much narrower regimes than for known design
distribution
Non parametric finite translation mixtures with dependent regime
In this paper we consider non parametric finite translation mixtures. We
prove that all the parameters of the model are identifiable as soon as the
matrix that defines the joint distribution of two consecutive latent variables
is non singular and the translation parameters are distinct. Under this
assumption, we provide a consistent estimator of the number of populations, of
the translation parameters and of the distribution of two consecutive latent
variables, which we prove to be asymptotically normally distributed under mild
dependency assumptions. We propose a non parametric estimator of the unknown
translated density. In case the latent variables form a Markov chain (Hidden
Markov models), we prove an oracle inequality leading to the fact that this
estimator is minimax adaptive over regularity classes of densities
About adaptive coding on countable alphabets
This paper sheds light on universal coding with respect to classes of
memoryless sources over a countable alphabet defined by an envelope function
with finite and non-decreasing hazard rate. We prove that the auto-censuring AC
code introduced by Bontemps (2011) is adaptive with respect to the collection
of such classes. The analysis builds on the tight characterization of universal
redundancy rate in terms of metric entropy % of small source classes by Opper
and Haussler (1997) and on a careful analysis of the performance of the
AC-coding algorithm. The latter relies on non-asymptotic bounds for maxima of
samples from discrete distributions with finite and non-decreasing hazard rate
Identifiability and consistent estimation of nonparametric translation hidden Markov models with general state space
This paper considers hidden Markov models where the observations are given as
the sum of a latent state which lies in a general state space and some
independent noise with unknown distribution. It is shown that these fully
nonparametric translation models are identifiable with respect to both the
distribution of the latent variables and the distribution of the noise, under
mostly a light tail assumption on the latent variables. Two nonparametric
estimation methods are proposed and we prove that the corresponding estimators
are consistent for the weak convergence topology. These results are illustrated
with numerical experiments
Efficient semiparametric estimation and model selection for multidimensional mixtures
In this paper, we consider nonparametric multidimensional finite mixture
models and we are interested in the semiparametric estimation of the population
weights. Here, the i.i.d. observations are assumed to have at least three
components which are independent given the population. We approximate the
semiparametric model by projecting the conditional distributions on step
functions associated to some partition. Our first main result is that if we
refine the partition slowly enough, the associated sequence of maximum
likelihood estimators of the weights is asymptotically efficient, and the
posterior distribution of the weights, when using a Bayesian procedure,
satisfies a semiparametric Bernstein von Mises theorem. We then propose a
cross-validation like procedure to select the partition in a finite horizon.
Our second main result is that the proposed procedure satisfies an oracle
inequality. Numerical experiments on simulated data illustrate our theoretical
results
- …