Search CORE

31 research outputs found

Independence Testing for Bounded Degree Bayesian Network

Author: Bhattacharyya Arnab
Canonne Clément L.
Yang Joy Qiping
Publication venue
Publication date: 19/04/2022
Field of study

We study the following independence testing problem: given access to samples from a distribution

P

over

\{0,1\}^n

, decide whether

P

is a product distribution or whether it is

\varepsilon

-far in total variation distance from any product distribution. For arbitrary distributions, this problem requires

\exp(n)

samples. We show in this work that if

P

has a sparse structure, then in fact only linearly many samples are required. Specifically, if

P

is Markov with respect to a Bayesian network whose underlying DAG has in-degree bounded by

d

, then

\tilde{\Theta}(2^{d/2}\cdot n/\varepsilon^2)

samples are necessary and sufficient for independence testing

arXiv.org e-Print Archive

New Algorithms for Large Datasets and Distributions

Author: Gayen Sutanu
Publication venue: DigitalCommons@University of Nebraska - Lincoln
Publication date: 15/07/2019
Field of study

In this dissertation, we make progress on certain algorithmic problems broadly over two computational models: the streaming model for large datasets and the distribution testing model for large probability distributions. First we consider the streaming model, where a large sequence of data items arrives one by one. The computer needs to make one pass over this sequence, processing every item quickly, in a limited space. In Chapter 2 motivated by a bioinformatics application, we consider the problem of estimating the number of low-frequency items in a stream, which has received only a limited theoretical work so far. We give an efficient streaming algorithm for this problem and show its complexity is almost optimal. In Chapter 3 we consider a distributed variation of the streaming model, where each item of the data sequence arrives arbitrarily to one among a set of computers, who together need to compute certain functions over the entire stream. In such scenarios combining the data at a computer is infeasible due to large communication overhead. We give the first algorithm for k-median clustering in this model. Moreover, we give new algorithms for frequency moments and clustering functions in the distributed sliding window model, where the computation is limited to the most recent W items, as the items arrive in the stream. In Chapter 5, in our identity testing problem, given two distributions P (unknown, only samples are obtained) and Q (known) over a common sample space of exponential size, we need to distinguish P = Q (output ‘yes’) versus P is far from Q (output ‘no’). This problem requires an exponential number of samples. To circumvent this lower bound, this problem was recently studied with certain structural assumptions. In particular, optimally efficient testers were given assuming P and Q are product distributions. For such product distributions, we give the first tolerant testers, which not only output yes when P = Q but also when P is close to Q, in Chapter 5. Likewise, we study the tolerant closeness testing problem for such product distributions, where Q too is accessed only by samples. Adviser: Vinodchandran N. Variya

Contraction rates for sparse variational approximations in Gaussian process regression

Author: Nieman Dennis
Szabo Botond
van Zanten Harry
Publication venue
Publication date: 01/01/2022
Field of study

We study the theoretical properties of a variational Bayes method in the Gaussian Process regression model. We consider the inducing variables method introduced by Titsias (2009a) and derive sufficient conditions for obtaining contraction rates for the corresponding variational Bayes (VB) posterior. As examples we show that for three particular covariance kernels (Mat\'ern, squared exponential, random series prior) the VB approach can achieve optimal, minimax contraction rates for a sufficiently large number of appropriately chosen inducing variables. The theoretical findings are demonstrated by numerical experiments.Comment: 26 pages, 6 figures, 1 tabl

arXiv.org e-Print Archive

VU Research Portal

Archivio istituzionale della Ricerca - Bocconi

Contraction rates for sparse variational approximations in Gaussian process regression

Author: Nieman Dennis
Szabo Botond
van Zanten Harry
Publication venue
Publication date: 01/01/2022
Field of study

We study the theoretical properties of a variational Bayes method in the Gaussian Process regression model. We consider the inducing variables method and derive sufficient conditions for obtaining contraction rates for the corresponding variational Bayes (VB) posterior. As examples we show that for three particular covariance kernels (Matérn, squared exponential, random series prior) the VB approach can achieve optimal, minimax contraction rates for a sufficiently large number of appropriately chosen inducing variables. The theoretical findings are demonstrated by numerical experiments

Archivio istituzionale della Ricerca - Bocconi