15 research outputs found
Variational excess risk bound for general state space models
In this paper, we consider variational autoencoders (VAE) for general state
space models. We consider a backward factorization of the variational
distributions to analyze the excess risk associated with VAE. Such backward
factorizations were recently proposed to perform online variational learning
and to obtain upper bounds on the variational estimation error. When
independent trajectories of sequences are observed and under strong mixing
assumptions on the state space model and on the variational distribution, we
provide an oracle inequality explicit in the number of samples and in the
length of the observation sequences. We then derive consequences of this
theoretical result. In particular, when the data distribution is given by a
state space model, we provide an upper bound for the Kullback-Leibler
divergence between the data distribution and its estimator and between the
variational posterior and the estimated state space posterior
distributions.Under classical assumptions, we prove that our results can be
applied to Gaussian backward kernels built with dense and recurrent neural
networks
Universal coding and order identification by model selection methods
The purpose of these notes is to highlight the far-reaching connections between Information Theory and Statistics. Universal coding and adaptive compression are indeed closely related to statistical inference concerning processes and using maximum likelihood or Bayesian methods. The book is divided into four chapters, the first of which introduces readers to lossless coding, provides an intrinsic lower bound on the codeword length in terms of Shannon’s entropy, and presents some coding methods that can achieve this lower bound, provided the source distribution is known. In turn, Chapter 2 addresses universal coding on finite alphabets, and seeks to find coding procedures that can achieve the optimal compression rate, regardless of the source distribution. It also quantifies the speed of convergence of the compression rate to the source entropy rate. These powerful results do not extend to infinite alphabets. In Chapter 3, it is shown that there are no universal codes over the class of stationary ergodic sources over a countable alphabet. This negative result prompts at least two different approaches: the introduction of smaller sub-classes of sources known as envelope classes, over which adaptive coding may be feasible, and the redefinition of the performance criterion by focusing on compressing the message pattern. Finally, Chapter 4 deals with the question of order identification in statistics. This question belongs to the class of model selection problems and arises in various practical situations in which the goal is to identify an integer characterizing the model: the length of dependency for a Markov chain, number of hidden states for a hidden Markov chain, and number of populations for a population mixture. The coding ideas and techniques developed in previous chapters allow us to obtain new results in this area. This book is accessible to anyone with a graduate level in Mathematics, and will appeal to information theoreticians and mathematical statisticians alike. Except for Chapter 4, all proofs are detailed and all tools needed to understand the text are reviewed
DĂ©convolution aveugle
Considering a signal X which is a process of random variaÂbles identically independently distributed, and the signal Y obtained by filtering X through a linear system s, we study the estimation of s from the observation of Y in the folloÂwing semi-parametric situation : the law of X is unknown and non Gaussian, and s has an inverse of convolution with finite length. We need no assumption on the phase of the system, i.e. on the causality or non causality of s. We propose an estimaÂtion by maximum objective. The estimates are consistent and asymptotically Gaussian ; this result is still available what ever the dimension of the index space of the series is.We study the asymptotic efficiency of the estimate and, in the causal case, we compare it to the usual minimum square estimates. The output Y being an autoregressive field, we propose a consistent method of identification of the order of the model. We study different types of robustness : robustness to underparametrization, robustness to additive noise on the observations. We also invesÂtigate the case where the law of X has infinite moments, and we show that, for "standardized cumulants" as objectives, and under assumptions which are in particular verified for laws in the attraction domains of stable laws, the obtained estimates are still consistent, and the speed of convergence is, in the causal case, better than for laws with finite variance.ConsidĂ©rant une sĂ©rie X formĂ©e de variables alĂ©atoires indĂ©pendamment identiquement distribuĂ©es, et le signal Y obtenu lorsque l'on filtre X par un système linĂ©aire s, nous Ă©tudions l'estimation de s sur la base des observations Y dans le cadre semi-paramĂ©trique suivant : la loi des X est inconnue et non gaussienne, et s possède un inverse de convolution de longueur finie fixĂ©e. Aucune hypothèse n'est faite sur la phase du système, c'est-Ă -dire sur la causalitĂ© ou non causalitĂ© de s. Nous proposons une estimation par maximum d'objectif. L'estimateur ainsi obtenu est consistant et asymptotiquement gaussien, ce rĂ©sultat restant valable quelle que soit la dimension de l'espace d'indexation des sĂ©ries considĂ©rĂ©es. Nous Ă©tudions l'efficacitĂ© asymptotique de la mĂ©thode et, dans le cas causal, nous la comparons aux mĂ©thodes usuelles de moindres carrĂ©s. InterprĂ©tant notre signal sortant comme un champ autorĂ©gressif, nous proposons une mĂ©thode consistante d'identification de l'ordre du modèle.Nous Ă©tudions divers types de robustesse des estimateurs : robustesse Ă une sous-paramĂ©trisation, robustesse Ă l'addition d'un bruit sur l'observation. Nous nous intĂ©ressons enfin au cas oĂą la loi de X a des moments infinis, et montrons que, pour des objectifs "cumulants standardisĂ©s" et sous certaines hypothèses vĂ©rifiĂ©es en particulier pour les lois dans les domaines d'attraction de lois stables, l'estimateur obtenu reste consistant, et sa vitesse de convergence, dans le cas causal, est meilleur que pour des lois de variance finie
Deconvolution of spherical data corrupted with unknown noise.
We consider the deconvolution problem for densities supported on a (d-1)-dimensional sphere with unknown center and unknown radius, in the situation where the distribution of the noise is unknown and without any other observations. We propose estimators of the radius, of the center, and of the density of the signal on the sphere that are proved consistent without further information. The estimator of the radius is proved to have almost parametric convergence rate for any dimensiond. When d= 2, the estimator of the density is proved to achieve the same rate of convergence over Sobolev regularity classes of densities as when the noise distribution is known
Support and distribution inference from noisy data
We consider noisy observations of a distribution with unknown support. In the deconvolution model, it has been proved recently [19] that, under very mild assumptions, it ispossible to solve the deconvolution problem without knowing the noise distribution and with no sample of the noise. We first give general settings where the theory applies and provide classes of supports that can be recovered in this context. We then exhibit classes of distributions over which we prove adaptive minimax rates (up to a log log factor) for the estimation of the support in Hausdorff distance. Moreover, for the class of distributions with compact support, we provide estimators of the unknown (in general singular) distribution and prove maximum rates in Wasserstein distance. We also prove an almost matching lower bound on the associated minimax risk
Additive smoothing error in backward variational inference for general state-space models
International audienceWe consider the problem of state estimation in general state-space models using variational inference. For a generic variational family defined using the same backward decomposition as the actual joint smoothing distribution, we establish under mixing assumptions that the variational approximation of expectations of additive state functionals induces an error which grows at most linearly in the number of observations. This guarantee is consistent with the known upper bounds for the approximation of smoothing distributions using standard Monte Carlo methods. We illustrate our theoretical result with state-of-the art variational solutions based both on the backward parameterization and on alternatives using forward decompositions.This numerical study proposes guidelines for variational inference based on neural networks in state-space models