Search CORE

17,234 research outputs found

Density Estimation in Infinite Dimensional Exponential Families

Author: Fukumizu Kenji
Gretton Arthur
Hyvärinen Aapo
Kumar Revant
Sriperumbudur Bharath
Publication venue
Publication date: 26/05/2017
Field of study

In this paper, we consider an infinite dimensional exponential family,

\mathcal{P}

of probability densities, which are parametrized by functions in a reproducing kernel Hilbert space,

H

and show it to be quite rich in the sense that a broad class of densities on

\mathbb{R}^d

can be approximated arbitrarily well in Kullback-Leibler (KL) divergence by elements in

\mathcal{P}

. The main goal of the paper is to estimate an unknown density,

p_0

through an element in

\mathcal{P}

. Standard techniques like maximum likelihood estimation (MLE) or pseudo MLE (based on the method of sieves), which are based on minimizing the KL divergence between

p_0

and

\mathcal{P}

, do not yield practically useful estimators because of their inability to efficiently handle the log-partition function. Instead, we propose an estimator,

\hat{p}_n

based on minimizing the \emph{Fisher divergence},

J(p_0\Vert p)

between

p_0

and

p\in \mathcal{P}

, which involves solving a simple finite-dimensional linear system. When

p_0\in\mathcal{P}

, we show that the proposed estimator is consistent, and provide a convergence rate of

n^{-\min\left\{\frac{2}{3},\frac{2\beta+1}{2\beta+2}\right\}}

in Fisher divergence under the smoothness assumption that

\log p_0\in\mathcal{R}(C^\beta)

for some

\beta\ge 0

, where

C

is a certain Hilbert-Schmidt operator on

H

and

\mathcal{R}(C^\beta)

denotes the image of

C^\beta

. We also investigate the misspecified case of

p_0\notin\mathcal{P}

and show that

J(p_0\Vert\hat{p}_n)\rightarrow \inf_{p\in\mathcal{P}}J(p_0\Vert p)

n\rightarrow\infty

, and provide a rate for this convergence under a similar smoothness condition as above. Through numerical simulations we demonstrate that the proposed estimator outperforms the non-parametric kernel density estimator, and that the advantage with the proposed estimator grows as

d

increases.Comment: 58 pages, 8 figures; Fixed some errors and typo

arXiv.org e-Print Archive

UCL Discovery

Convergence rates for Bayesian density estimation of infinite-dimensional exponential families

Author: Scricciolo Catia
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/01/2006
Field of study

We study the rate of convergence of posterior distributions in density estimation problems for log-densities in periodic Sobolev classes characterized by a smoothness parameter p. The posterior expected density provides a nonparametric estimation procedure attaining the optimal minimax rate of convergence under Hellinger loss if the posterior distribution achieves the optimal rate over certain uniformity classes. A prior on the density class of interest is induced by a prior on the coefficients of the trigonometric series expansion of the log-density. We show that when p is known, the posterior distribution of a Gaussian prior achieves the optimal rate provided the prior variances die off sufficiently rapidly. For a mixture of normal distributions, the mixing weights on the dimension of the exponential family are assumed to be bounded below by an exponentially decreasing sequence. To avoid the use of infinite bases, we develop priors that cut off the series at a sample-size-dependent truncation point. When the degree of smoothness is unknown, a finite mixture of normal priors indexed by the smoothness parameter, which is also assigned a prior, produces the best rate. A rate-adaptive estimator is derived.Comment: Published at http://dx.doi.org/10.1214/009053606000000911 in the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

CiteSeerX

Crossref

Archivio istituzionale della Ricerca - Bocconi

Catalogo dei prodotti della ricerca

Kernel Exponential Family Estimation via Doubly Dual Embedding

Author: Dai Bo
Dai Hanjun
Gretton Arthur
He Niao
Schuurmans Dale
Song Le
Publication venue
Publication date: 01/01/2019
Field of study

We investigate penalized maximum log-likelihood estimation for exponential family distributions whose natural parameter resides in a reproducing kernel Hilbert space. Key to our approach is a novel technique, doubly dual embedding, that avoids computation of the partition function. This technique also allows the development of a flexible sampling strategy that amortizes the cost of Monte-Carlo sampling in the inference stage. The resulting estimator can be easily generalized to kernel conditional exponential families. We establish a connection between kernel exponential family estimation and MMD-GANs, revealing a new perspective for understanding GANs. Compared to the score matching based estimators, the proposed method improves both memory and time efficiency while enjoying stronger statistical properties, such as fully capturing smoothness in its statistical convergence rate while the score matching estimator appears to saturate. Finally, we show that the proposed estimator empirically outperforms state-of-the-artComment: 22 pages, 20 figures; AISTATS 201

arXiv.org e-Print Archive

UCL Discovery

Gradient-free Hamiltonian Monte Carlo with Efficient Kernel Exponential Families

Author: Gretton Arthur
Livingstone Samuel
Sejdinovic Dino
Strathmann Heiko
Szabo Zoltan
Publication venue
Publication date: 01/01/2015
Field of study

We propose Kernel Hamiltonian Monte Carlo (KMC), a gradient-free adaptive MCMC algorithm based on Hamiltonian Monte Carlo (HMC). On target densities where classical HMC is not an option due to intractable gradients, KMC adaptively learns the target's gradient structure by fitting an exponential family model in a Reproducing Kernel Hilbert Space. Computational costs are reduced by two novel efficient approximations to this gradient. While being asymptotically exact, KMC mimics HMC in terms of sampling efficiency, and offers substantial mixing improvements over state-of-the-art gradient free samplers. We support our claims with experimental studies on both toy and real-world applications, including Approximate Bayesian Computation and exact-approximate MCMC.Comment: 20 pages, 7 figure

arXiv.org e-Print Archive

CiteSeerX

UCL Discovery

Oxford University Research Archive

Horizon-Independent Optimal Prediction with Log-Loss in Exponential Families

Author: Bartlett Peter
Grunwald Peter
Harremoes Peter
Hedayati Fares
Kotlowski Wojciech
Publication venue
Publication date: 01/01/2013
Field of study

We study online learning under logarithmic loss with regular parametric models. Hedayati and Bartlett (2012b) showed that a Bayesian prediction strategy with Jeffreys prior and sequential normalized maximum likelihood (SNML) coincide and are optimal if and only if the latter is exchangeable, and if and only if the optimal strategy can be calculated without knowing the time horizon in advance. They put forward the question what families have exchangeable SNML strategies. This paper fully answers this open problem for one-dimensional exponential families. The exchangeability can happen only for three classes of natural exponential family distributions, namely the Gaussian, Gamma, and the Tweedie exponential family of order 3/2. Keywords: SNML Exchangeability, Exponential Family, Online Learning, Logarithmic Loss, Bayesian Strategy, Jeffreys Prior, Fisher Information1Comment: 23 page

arXiv.org e-Print Archive

Queensland University of Technology ePrints Archive

Nonparametric Information Geometry

The differential-geometric structure of the set of positive densities on a given measure space has raised the interest of many mathematicians after the discovery by C.R. Rao of the geometric meaning of the Fisher information. Most of the research is focused on parametric statistical models. In series of papers by author and coworkers a particular version of the nonparametric case has been discussed. It consists of a minimalistic structure modeled according the theory of exponential families: given a reference density other densities are represented by the centered log likelihood which is an element of an Orlicz space. This mappings give a system of charts of a Banach manifold. It has been observed that, while the construction is natural, the practical applicability is limited by the technical difficulty to deal with such a class of Banach spaces. It has been suggested recently to replace the exponential function with other functions with similar behavior but polynomial growth at infinity in order to obtain more tractable Banach spaces, e.g. Hilbert spaces. We give first a review of our theory with special emphasis on the specific issues of the infinite dimensional setting. In a second part we discuss two specific topics, differential equations and the metric connection. The position of this line of research with respect to other approaches is briefly discussed.Comment: Submitted for publication in the Proceedings od GSI2013 Aug 28-30 2013 Pari

arXiv.org e-Print Archive

CiteSeerX

Crossref