Search CORE

919 research outputs found

Kernel Exponential Family Estimation via Doubly Dual Embedding

Author: Dai Bo
Dai Hanjun
Gretton Arthur
He Niao
Schuurmans Dale
Song Le
Publication venue
Publication date: 01/01/2019
Field of study

We investigate penalized maximum log-likelihood estimation for exponential family distributions whose natural parameter resides in a reproducing kernel Hilbert space. Key to our approach is a novel technique, doubly dual embedding, that avoids computation of the partition function. This technique also allows the development of a flexible sampling strategy that amortizes the cost of Monte-Carlo sampling in the inference stage. The resulting estimator can be easily generalized to kernel conditional exponential families. We establish a connection between kernel exponential family estimation and MMD-GANs, revealing a new perspective for understanding GANs. Compared to the score matching based estimators, the proposed method improves both memory and time efficiency while enjoying stronger statistical properties, such as fully capturing smoothness in its statistical convergence rate while the score matching estimator appears to saturate. Finally, we show that the proposed estimator empirically outperforms state-of-the-artComment: 22 pages, 20 figures; AISTATS 201

arXiv.org e-Print Archive

UCL Discovery

Kernel Exponential Family Estimation via Doubly Dual Embedding

Author: Dai B
Dai H
Gretton A
He N
Schuurmans D
Song L
Publication venue: Proceedings of Machine Learning Research
Publication date: 01/01/2019
Field of study

UCL Discovery

Exponential Family Estimation via Adversarial Dynamics Embedding

Author: Dai B
Dai H
Gretton A
He N
Le S
Liu Z
Schurmaans D
Publication venue: NeurIPS
Publication date: 14/12/2019
Field of study

We present an efficient algorithm for maximum likelihood estimation (MLE) of exponential family models, with a general parametrization of the energy function that includes neural networks. We exploit the primal-dual view of the MLE with a kinetics augmented model to obtain an estimate associated with an adversarial dual sampler. To represent this sampler, we introduce a novel neural architecture, dynamics embedding, that generalizes Hamiltonian Monte-Carlo (HMC). The proposed approach inherits the flexibility of HMC while enabling tractable entropy estimation for the augmented model. By learning both a dual sampler and the primal model simultaneously, and sharing parameters between them, we obviate the requirement to design a separate sampling procedure once the model has been trained, leading to more effective learning. We show that many existing estimators, such as contrastive divergence, pseudo/composite-likelihood, score matching, minimum Stein discrepancy estimator, non-local contrastive objectives, noise-contrastive estimation, and minimum probability flow, are special cases of the proposed approach, each expressed by a different (fixed) dual sampler. An empirical investigation shows that adapting the sampler during MLE can significantly improve on state-of-the-art estimators

UCL Discovery

The $Z$ -invariant massive Laplacian on isoradial graphs

Author: Boutillier Cédric
de Tilière Béatrice
Raschel Kilian
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

We introduce a one-parameter family of massive Laplacian operators

(\Delta^{m(k)})_{k\in[0,1)}

defined on isoradial graphs, involving elliptic functions. We prove an explicit formula for the inverse of

\Delta^{m(k)}

, the massive Green function, which has the remarkable property of only depending on the local geometry of the graph, and compute its asymptotics. We study the corresponding statistical mechanics model of random rooted spanning forests. We prove an explicit local formula for an infinite volume Boltzmann measure, and for the free energy of the model. We show that the model undergoes a second order phase transition at

k=0

, thus proving that spanning trees corresponding to the Laplacian introduced by Kenyon are critical. We prove that the massive Laplacian operators

(\Delta^{m(k)})_{k\in(0,1)}

provide a one-parameter family of

Z

-invariant rooted spanning forest models. When the isoradial graph is moreover

\mathbb{Z}^2

-periodic, we consider the spectral curve of the characteristic polynomial of the massive Laplacian. We provide an explicit parametrization of the curve and prove that it is Harnack and has genus

1

. We further show that every Harnack curve of genus

1

with

(z,w)\leftrightarrow(z^{-1},w^{-1})

symmetry arises from such a massive Laplacian.Comment: 71 pages, 13 figures, to appear in Inventiones mathematica

arXiv.org e-Print Archive

HAL Université de Tours

Hal-Diderot

HAL - UPEC / UPEM

Kernel-Based Just-In-Time Learning for Passing Expectation Propagation Messages

Author: Eslami S. M. Ali
Gretton Arthur
Heess Nicolas
Jitkrittum Wittawat
Lakshminarayanan Balaji
Sejdinovic Dino
Szabó Zoltán
Publication venue
Publication date: 01/01/2015
Field of study

We propose an efficient nonparametric strategy for learning a message operator in expectation propagation (EP), which takes as input the set of incoming messages to a factor node, and produces an outgoing message as output. This learned operator replaces the multivariate integral required in classical EP, which may not have an analytic expression. We use kernel-based regression, which is trained on a set of probability distributions representing the incoming messages, and the associated outgoing messages. The kernel approach has two main advantages: first, it is fast, as it is implemented using a novel two-layer random feature representation of the input message distributions; second, it has principled uncertainty estimates, and can be cheaply updated online, meaning it can request and incorporate new training data when it encounters inputs on which it is uncertain. In experiments, our approach is able to solve learning problems where a single message operator is required for multiple, substantially different data sets (logistic regression for a variety of classification problems), where it is essential to accurately assess uncertainty and to efficiently and robustly update the message operator.Comment: accepted to UAI 2015. Correct typos. Add more content to the appendix. Main results unchange

arXiv.org e-Print Archive

CiteSeerX

UCL Discovery

Oxford University Research Archive

Sinkhorn Barycenters with Free Support via Frank-Wolfe Algorithm

Author: Ciliberto Carlo
Luise Giulia
Pontil Massimiliano
Salzo Saverio
Publication venue
Publication date: 30/05/2019
Field of study

We present a novel algorithm to estimate the barycenter of arbitrary probability distributions with respect to the Sinkhorn divergence. Based on a Frank-Wolfe optimization strategy, our approach proceeds by populating the support of the barycenter incrementally, without requiring any pre-allocation. We consider discrete as well as continuous distributions, proving convergence rates of the proposed algorithm in both settings. Key elements of our analysis are a new result showing that the Sinkhorn divergence on compact domains has Lipschitz continuous gradient with respect to the Total Variation and a characterization of the sample complexity of Sinkhorn potentials. Experiments validate the effectiveness of our method in practice.Comment: 46 pages, 8 figure

arXiv.org e-Print Archive

UCL Discovery

Noise Contrastive Meta-Learning for Conditional Density Estimation using Kernel Mean Embeddings

Author: Chan Lucian
Sejdinovic Dino
Teh Yee Whye
Ton Jean-Francois
Publication venue
Publication date: 01/01/2021
Field of study

Current meta-learning approaches focus on learning functional representations of relationships between variables, i.e. on estimating conditional expectations in regression. In many applications, however, we are faced with conditional distributions which cannot be meaningfully summarized using expectation only (due to e.g. multimodality). Hence, we consider the problem of conditional density estimation in the meta-learning setting. We introduce a novel technique for meta-learning which combines neural representation and noise-contrastive estimation with the established literature of conditional mean embeddings into reproducing kernel Hilbert spaces. The method is validated on synthetic and real-world problems, demonstrating the utility of sharing learned representations across multiple conditional density estimation tasks

arXiv.org e-Print Archive

Oxford University Research Archive