Search CORE

12 research outputs found

Informative Features for Model Comparison

Author: Gretton Arthur
Hays James
Jitkrittum Wittawat
Kanagawa Heishiro
Sangkloy Patsorn
Schölkopf Bernhard
Publication venue
Publication date: 27/10/2018
Field of study

Given two candidate models, and a set of target observations, we address the problem of measuring the relative goodness of fit of the two models. We propose two new statistical tests which are nonparametric, computationally efficient (runtime complexity is linear in the sample size), and interpretable. As a unique advantage, our tests can produce a set of examples (informative features) indicating the regions in the data domain where one model fits significantly better than the other. In a real-world problem of comparing GAN models, the test power of our new test matches that of the state-of-the-art test of relative goodness of fit, while being one order of magnitude faster.Comment: Accepted to NIPS 201

arXiv.org e-Print Archive

UCL Discovery

A Linear-Time Kernel Goodness-of-Fit Test

Author: Fukumizu Kenji
Gretton Arthur
Jitkrittum Wittawat
Szabo Zoltan
Xu Wenkai
Publication venue
Publication date: 24/10/2017
Field of study

We propose a novel adaptive test of goodness-of-fit, with computational cost linear in the number of samples. We learn the test features that best indicate the differences between observed samples and a reference model, by minimizing the false negative rate. These features are constructed via Stein's method, meaning that it is not necessary to compute the normalising constant of the model. We analyse the asymptotic Bahadur efficiency of the new test, and prove that under a mean-shift alternative, our test always has greater relative efficiency than a previous linear-time kernel test, regardless of the choice of parameters for that test. In experiments, the performance of our method exceeds that of the earlier linear-time test, and matches or exceeds the power of a quadratic-time kernel test. In high dimensions and where model structure may be exploited, our goodness of fit test performs far better than a quadratic-time two-sample test based on the Maximum Mean Discrepancy, with samples drawn from the model.Comment: Accepted to NIPS 201

arXiv.org e-Print Archive

HAL-Polytechnique

Cost-Effective Incentive Allocation via Structured Counterfactual Inference

Author: Jordan Michael I.
Li Chenchen
Lopez Romain
Qi Yuan
Song Le
Xiong Junwu
Yan Xiang
Publication venue
Publication date: 11/11/2019
Field of study

We address a practical problem ubiquitous in modern marketing campaigns, in which a central agent tries to learn a policy for allocating strategic financial incentives to customers and observes only bandit feedback. In contrast to traditional policy optimization frameworks, we take into account the additional reward structure and budget constraints common in this setting, and develop a new two-step method for solving this constrained counterfactual policy optimization problem. Our method first casts the reward estimation problem as a domain adaptation problem with supplementary structure, and then subsequently uses the estimators for optimizing the policy with constraints. We also establish theoretical error bounds for our estimation procedure and we empirically show that the approach leads to significant improvement on both synthetic and real datasets

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Testing Goodness of Fit of Conditional Density Models with Kernels

Author: Jitkrittum Wittawat
Kanagawa Heishiro
Schölkopf Bernhard
Publication venue
Publication date: 01/01/2020
Field of study

We propose two nonparametric statistical tests of goodness of fit for conditional distributions: given a conditional probability density function

p(y|x)

and a joint sample, decide whether the sample is drawn from

p(y|x)r_x(x)

for some density

r_x

. Our tests, formulated with a Stein operator, can be applied to any differentiable conditional density model, and require no knowledge of the normalizing constant. We show that 1) our tests are consistent against any fixed alternative conditional model; 2) the statistics can be estimated easily, requiring no density estimation as an intermediate step; and 3) our second test offers an interpretable test result providing insight on where the conditional model does not fit well in the domain of the covariate. We demonstrate the interpretability of our test on a task of modeling the distribution of New York City's taxi drop-off location given a pick-up point. To our knowledge, our work is the first to propose such conditional goodness-of-fit tests that simultaneously have all these desirable properties.Comment: In UAI 2020. http://auai.org/uai2020/accepted.ph

arXiv.org e-Print Archive

UCL Discovery

MPG.PuRe

Learning Kernel Tests Without Data Splitting

Author: Jitkrittum Wittawat
Kübler Jonas M.
Muandet Krikamol
Schölkopf Bernhard
Publication venue
Publication date: 05/06/2020
Field of study

Modern large-scale kernel-based tests such as maximum mean discrepancy (MMD) and kernelized Stein discrepancy (KSD) optimize kernel hyperparameters on a held-out sample via data splitting to obtain the most powerful test statistics. While data splitting results in a tractable null distribution, it suffers from a reduction in test power due to smaller test sample size. Inspired by the selective inference framework, we propose an approach that enables learning the hyperparameters and testing on the full sample without data splitting. Our approach can correctly calibrate the test in the presence of such dependency, and yield a test threshold in closed form. At the same significance level, our approach's test power is empirically larger than that of the data-splitting approach, regardless of its split proportion.Comment: 24 (10+14) pages, 9 figures. Under Review v2: added missing references and acknowledgment

arXiv.org e-Print Archive

MPG.PuRe

A linear-time kernel goodness-of-fit test

Author: Fukumizu K
Gretton A
Jitkrittum W
Szabó Z
Xu W
Publication venue: NIPS Foundation
Publication date: 09/12/2017
Field of study

UCL Discovery

Kernel-based distribution features for statistical tests and Bayesian inference

Author: Jitkrittum Wittawat
Publication venue: UCL (University College London)
Publication date: 28/11/2017
Field of study

The kernel mean embedding is known to provide a data representation which preserves full information of the data distribution. While typically computationally costly, its nonparametric nature has an advantage of requiring no explicit model specification of the data. At the other extreme are approaches which summarize data distributions into a finite-dimensional vector of hand-picked summary statistics. This explicit finite-dimensional representation offers a computationally cheaper alternative. Clearly, there is a trade-off between cost and sufficiency of the representation, and it is of interest to have a computationally efficient technique which can produce a data-driven representation, thus combining the advantages from both extremes. The main focus of this thesis is on the development of linear-time mean-embedding-based methods to automatically extract informative features of data distributions, for statistical tests and Bayesian inference. In the first part on statistical tests, several new linear-time techniques are developed. These include a new kernel-based distance measure for distributions, a new linear-time nonparametric dependence measure, and a linear-time discrepancy measure between a probabilistic model and a sample, based on a Stein operator. These new measures give rise to linear-time and consistent tests of homogeneity, independence, and goodness of fit, respectively. The key idea behind these new tests is to explicitly learn distribution-characterizing feature vectors, by maximizing a proxy for the probability of correctly rejecting the null hypothesis. We theoretically show that these new tests are consistent for any finite number of features. In the second part, we explore the use of random Fourier features to construct approximate kernel mean embeddings, for representing messages in expectation propagation (EP) algorithm. The goal is to learn a message operator which predicts EP outgoing messages from incoming messages. We derive a novel two-layer random feature representation of the input messages, allowing online learning of the operator during EP inference

UCL Discovery