thesis

Kernel-based distribution features for statistical tests and Bayesian inference

Abstract

The kernel mean embedding is known to provide a data representation which preserves full information of the data distribution. While typically computationally costly, its nonparametric nature has an advantage of requiring no explicit model specification of the data. At the other extreme are approaches which summarize data distributions into a finite-dimensional vector of hand-picked summary statistics. This explicit finite-dimensional representation offers a computationally cheaper alternative. Clearly, there is a trade-off between cost and sufficiency of the representation, and it is of interest to have a computationally efficient technique which can produce a data-driven representation, thus combining the advantages from both extremes. The main focus of this thesis is on the development of linear-time mean-embedding-based methods to automatically extract informative features of data distributions, for statistical tests and Bayesian inference. In the first part on statistical tests, several new linear-time techniques are developed. These include a new kernel-based distance measure for distributions, a new linear-time nonparametric dependence measure, and a linear-time discrepancy measure between a probabilistic model and a sample, based on a Stein operator. These new measures give rise to linear-time and consistent tests of homogeneity, independence, and goodness of fit, respectively. The key idea behind these new tests is to explicitly learn distribution-characterizing feature vectors, by maximizing a proxy for the probability of correctly rejecting the null hypothesis. We theoretically show that these new tests are consistent for any finite number of features. In the second part, we explore the use of random Fourier features to construct approximate kernel mean embeddings, for representing messages in expectation propagation (EP) algorithm. The goal is to learn a message operator which predicts EP outgoing messages from incoming messages. We derive a novel two-layer random feature representation of the input messages, allowing online learning of the operator during EP inference

    Similar works