Search CORE

18 research outputs found

On the Power of Conditional Samples in Distribution Testing

Author: Chakraborty Sourav
Fischer Eldar
Goldhirsh Yonatan
Matsliah Arie
Publication venue
Publication date: 01/01/2013
Field of study

In this paper we define and examine the power of the {\em conditional-sampling} oracle in the context of distribution-property testing. The conditional-sampling oracle for a discrete distribution

\mu

takes as input a subset

S \subset [n]

of the domain, and outputs a random sample

i \in S

drawn according to

\mu

, conditioned on

S

(and independently of all prior samples). The conditional-sampling oracle is a natural generalization of the ordinary sampling oracle in which

S

always equals

[n]

. We show that with the conditional-sampling oracle, testing uniformity, testing identity to a known distribution, and testing any label-invariant property of distributions is easier than with the ordinary sampling oracle. On the other hand, we also show that for some distribution properties the sample-complexity remains near-maximal even with conditional sampling

arXiv.org e-Print Archive

CiteSeerX

Crossref

The Power of an Example: Hidden Set Size Approximation Using Group Queries and Conditional Sampling

Author: Ron Dana
Tsur Gilad
Publication venue
Publication date: 20/04/2014
Field of study

We study a basic problem of approximating the size of an unknown set

S

in a known universe

U

. We consider two versions of the problem. In both versions the algorithm can specify subsets

T\subseteq U

. In the first version, which we refer to as the group query or subset query version, the algorithm is told whether

T\cap S

is non-empty. In the second version, which we refer to as the subset sampling version, if

T\cap S

is non-empty, then the algorithm receives a uniformly selected element from

T\cap S

. We study the difference between these two versions under different conditions on the subsets that the algorithm may query/sample, and in both the case that the algorithm is adaptive and the case where it is non-adaptive. In particular we focus on a natural family of allowed subsets, which correspond to intervals, as well as variants of this family

arXiv.org e-Print Archive

CiteSeerX

Testing probability distributions underlying aggregated data

Author: A. Blum
C. Dwork
C. Dwork
C.L. Canonne
L. Birgé
L. Paninski
M. Parnas
P. Valiant
R. Rubinfeld
S. Chakraborty
S.K. Ma
T. Batu
Publication venue
Publication date: 01/01/2014
Field of study

In this paper, we analyze and study a hybrid model for testing and learning probability distributions. Here, in addition to samples, the testing algorithm is provided with one of two different types of oracles to the unknown distribution

D

over

[n]

. More precisely, we define both the dual and cumulative dual access models, in which the algorithm

A

can both sample from

D

and respectively, for any

i\in[n]

, - query the probability mass

D(i)

(query access); or - get the total mass of

\{1,\dots,i\}

, i.e.

\sum_{j=1}^i D(j)

(cumulative access) These two models, by generalizing the previously studied sampling and query oracle models, allow us to bypass the strong lower bounds established for a number of problems in these settings, while capturing several interesting aspects of these problems -- and providing new insight on the limitations of the models. Finally, we show that while the testing algorithms can be in most cases strictly more efficient, some tasks remain hard even with this additional power

arXiv.org e-Print Archive

CiteSeerX

Crossref

DSpace@MIT

Sampling Correctors

Author: Canonne Clément
Gouleakis Themis
Rubinfeld Ronitt
Publication venue
Publication date: 31/03/2018
Field of study

In many situations, sample data is obtained from a noisy or imperfect source. In order to address such corruptions, this paper introduces the concept of a sampling corrector. Such algorithms use structure that the distribution is purported to have, in order to allow one to make "on-the-fly" corrections to samples drawn from probability distributions. These algorithms then act as filters between the noisy data and the end user. We show connections between sampling correctors, distribution learning algorithms, and distribution property testing algorithms. We show that these connections can be utilized to expand the applicability of known distribution learning and property testing algorithms as well as to achieve improved algorithms for those tasks. As a first step, we show how to design sampling correctors using proper learning algorithms. We then focus on the question of whether algorithms for sampling correctors can be more efficient in terms of sample complexity than learning algorithms for the analogous families of distributions. When correcting monotonicity, we show that this is indeed the case when also granted query access to the cumulative distribution function. We also obtain sampling correctors for monotonicity without this stronger type of access, provided that the distribution be originally very close to monotone (namely, at a distance

O(1/\log^2 n)

). In addition to that, we consider a restricted error model that aims at capturing "missing data" corruptions. In this model, we show that distributions that are close to monotone have sampling correctors that are significantly more efficient than achievable by the learning approach. We also consider the question of whether an additional source of independent random bits is required by sampling correctors to implement the correction process

arXiv.org e-Print Archive

DSpace@MIT

Adaptive Estimation in Weighted Group Testing

Author: Clément L Canonne
Jayadev Acharya
Publication venue
Publication date: 06/03/2020
Field of study

Abstract-We consider a generalization of the problem of estimating the support size of a hidden subset S of a universe U from samples. This framework falls under the group testing [1] and the conditional sampling model

CiteSeerX