15 research outputs found

    Variational excess risk bound for general state space models

    Full text link
    In this paper, we consider variational autoencoders (VAE) for general state space models. We consider a backward factorization of the variational distributions to analyze the excess risk associated with VAE. Such backward factorizations were recently proposed to perform online variational learning and to obtain upper bounds on the variational estimation error. When independent trajectories of sequences are observed and under strong mixing assumptions on the state space model and on the variational distribution, we provide an oracle inequality explicit in the number of samples and in the length of the observation sequences. We then derive consequences of this theoretical result. In particular, when the data distribution is given by a state space model, we provide an upper bound for the Kullback-Leibler divergence between the data distribution and its estimator and between the variational posterior and the estimated state space posterior distributions.Under classical assumptions, we prove that our results can be applied to Gaussian backward kernels built with dense and recurrent neural networks

    Universal coding and order identification by model selection methods

    No full text
    The purpose of these notes is to highlight the far-reaching connections between Information Theory and Statistics. Universal coding and adaptive compression are indeed closely related to statistical inference concerning processes and using maximum likelihood or Bayesian methods. The book is divided into four chapters, the first of which introduces readers to lossless coding, provides an intrinsic lower bound on the codeword length in terms of Shannon’s entropy, and presents some coding methods that can achieve this lower bound, provided the source distribution is known. In turn, Chapter 2 addresses universal coding on finite alphabets, and seeks to find coding procedures that can achieve the optimal compression rate, regardless of the source distribution. It also quantifies the speed of convergence of the compression rate to the source entropy rate. These powerful results do not extend to infinite alphabets. In Chapter 3, it is shown that there are no universal codes over the class of stationary ergodic sources over a countable alphabet. This negative result prompts at least two different approaches: the introduction of smaller sub-classes of sources known as envelope classes, over which adaptive coding may be feasible, and the redefinition of the performance criterion by focusing on compressing the message pattern. Finally, Chapter 4 deals with the question of order identification in statistics. This question belongs to the class of model selection problems and arises in various practical situations in which the goal is to identify an integer characterizing the model: the length of dependency for a Markov chain, number of hidden states for a hidden Markov chain, and number of populations for a population mixture. The coding ideas and techniques developed in previous chapters allow us to obtain new results in this area. This book is accessible to anyone with a graduate level in Mathematics, and will appeal to information theoreticians and mathematical statisticians alike. Except for Chapter 4, all proofs are detailed and all tools needed to understand the text are reviewed

    DĂ©convolution aveugle

    No full text
    Considering a signal X which is a process of random varia­bles identically independently distributed, and the signal Y obtained by filtering X through a linear system s, we study the estimation of s from the observation of Y in the follo­wing semi-parametric situation : the law of X is unknown and non Gaussian, and s has an inverse of convolution with finite length. We need no assumption on the phase of the system, i.e. on the causality or non causality of s. We propose an estima­tion by maximum objective. The estimates are consistent and asymptotically Gaussian ; this result is still available what­ ever the dimension of the index space of the series is.We study the asymptotic efficiency of the estimate and, in the causal case, we compare it to the usual minimum square estimates. The output Y being an autoregressive field, we propose a consistent method of identification of the order of the model. We study different types of robustness : robustness to underparametrization, robustness to additive noise on the observations. We also inves­tigate the case where the law of X has infinite moments, and we show that, for "standardized cumulants" as objectives, and under assumptions which are in particular verified for laws in the attraction domains of stable laws, the obtained estimates are still consistent, and the speed of convergence is, in the causal case, better than for laws with finite variance.Considérant une série X formée de variables aléatoires indépendamment identiquement distribuées, et le signal Y obtenu lorsque l'on filtre X par un système linéaire s, nous étudions l'estimation de s sur la base des observations Y dans le cadre semi-paramétrique suivant : la loi des X est inconnue et non gaussienne, et s possède un inverse de convolution de longueur finie fixée. Aucune hypothèse n'est faite sur la phase du système, c'est-à-dire sur la causalité ou non causalité de s. Nous proposons une estimation par maximum d'objectif. L'estimateur ainsi obtenu est consistant et asymptotiquement gaussien, ce résultat restant valable quelle que soit la dimension de l'espace d'indexation des séries considérées. Nous étudions l'efficacité asymptotique de la méthode et, dans le cas causal, nous la comparons aux méthodes usuelles de moindres carrés. Interprétant notre signal sortant comme un champ autorégressif, nous proposons une méthode consistante d'identification de l'ordre du modèle.Nous étudions divers types de robustesse des estimateurs : robustesse à une sous-paramétrisation, robustesse à l'addition d'un bruit sur l'observation. Nous nous intéressons enfin au cas où la loi de X a des moments infinis, et montrons que, pour des objectifs "cumulants standardisés" et sous certaines hypothèses vérifiées en particulier pour les lois dans les domaines d'attraction de lois stables, l'estimateur obtenu reste consistant, et sa vitesse de convergence, dans le cas causal, est meilleur que pour des lois de variance finie

    Deconvolution of spherical data corrupted with unknown noise.

    No full text
    We consider the deconvolution problem for densities supported on a (d-1)-dimensional sphere with unknown center and unknown radius, in the situation where the distribution of the noise is unknown and without any other observations. We propose estimators of the radius, of the center, and of the density of the signal on the sphere that are proved consistent without further information. The estimator of the radius is proved to have almost parametric convergence rate for any dimensiond. When d= 2, the estimator of the density is proved to achieve the same rate of convergence over Sobolev regularity classes of densities as when the noise distribution is known

    Support and distribution inference from noisy data

    No full text
    We consider noisy observations of a distribution with unknown support. In the deconvolution model, it has been proved recently [19] that, under very mild assumptions, it ispossible to solve the deconvolution problem without knowing the noise distribution and with no sample of the noise. We first give general settings where the theory applies and provide classes of supports that can be recovered in this context. We then exhibit classes of distributions over which we prove adaptive minimax rates (up to a log log factor) for the estimation of the support in Hausdorff distance. Moreover, for the class of distributions with compact support, we provide estimators of the unknown (in general singular) distribution and prove maximum rates in Wasserstein distance. We also prove an almost matching lower bound on the associated minimax risk

    Additive smoothing error in backward variational inference for general state-space models

    No full text
    International audienceWe consider the problem of state estimation in general state-space models using variational inference. For a generic variational family defined using the same backward decomposition as the actual joint smoothing distribution, we establish under mixing assumptions that the variational approximation of expectations of additive state functionals induces an error which grows at most linearly in the number of observations. This guarantee is consistent with the known upper bounds for the approximation of smoothing distributions using standard Monte Carlo methods. We illustrate our theoretical result with state-of-the art variational solutions based both on the backward parameterization and on alternatives using forward decompositions.This numerical study proposes guidelines for variational inference based on neural networks in state-space models
    corecore