18 research outputs found
On the Power of Conditional Samples in Distribution Testing
In this paper we define and examine the power of the {\em
conditional-sampling} oracle in the context of distribution-property testing.
The conditional-sampling oracle for a discrete distribution takes as
input a subset of the domain, and outputs a random sample drawn according to , conditioned on (and independently of all
prior samples). The conditional-sampling oracle is a natural generalization of
the ordinary sampling oracle in which always equals .
We show that with the conditional-sampling oracle, testing uniformity,
testing identity to a known distribution, and testing any label-invariant
property of distributions is easier than with the ordinary sampling oracle. On
the other hand, we also show that for some distribution properties the
sample-complexity remains near-maximal even with conditional sampling
The Power of an Example: Hidden Set Size Approximation Using Group Queries and Conditional Sampling
We study a basic problem of approximating the size of an unknown set in a
known universe . We consider two versions of the problem. In both versions
the algorithm can specify subsets . In the first version, which
we refer to as the group query or subset query version, the algorithm is told
whether is non-empty. In the second version, which we refer to as the
subset sampling version, if is non-empty, then the algorithm receives
a uniformly selected element from . We study the difference between
these two versions under different conditions on the subsets that the algorithm
may query/sample, and in both the case that the algorithm is adaptive and the
case where it is non-adaptive. In particular we focus on a natural family of
allowed subsets, which correspond to intervals, as well as variants of this
family
Testing probability distributions underlying aggregated data
In this paper, we analyze and study a hybrid model for testing and learning
probability distributions. Here, in addition to samples, the testing algorithm
is provided with one of two different types of oracles to the unknown
distribution over . More precisely, we define both the dual and
cumulative dual access models, in which the algorithm can both sample from
and respectively, for any ,
- query the probability mass (query access); or
- get the total mass of , i.e. (cumulative
access)
These two models, by generalizing the previously studied sampling and query
oracle models, allow us to bypass the strong lower bounds established for a
number of problems in these settings, while capturing several interesting
aspects of these problems -- and providing new insight on the limitations of
the models. Finally, we show that while the testing algorithms can be in most
cases strictly more efficient, some tasks remain hard even with this additional
power
Sampling Correctors
In many situations, sample data is obtained from a noisy or imperfect source.
In order to address such corruptions, this paper introduces the concept of a
sampling corrector. Such algorithms use structure that the distribution is
purported to have, in order to allow one to make "on-the-fly" corrections to
samples drawn from probability distributions. These algorithms then act as
filters between the noisy data and the end user.
We show connections between sampling correctors, distribution learning
algorithms, and distribution property testing algorithms. We show that these
connections can be utilized to expand the applicability of known distribution
learning and property testing algorithms as well as to achieve improved
algorithms for those tasks.
As a first step, we show how to design sampling correctors using proper
learning algorithms. We then focus on the question of whether algorithms for
sampling correctors can be more efficient in terms of sample complexity than
learning algorithms for the analogous families of distributions. When
correcting monotonicity, we show that this is indeed the case when also granted
query access to the cumulative distribution function. We also obtain sampling
correctors for monotonicity without this stronger type of access, provided that
the distribution be originally very close to monotone (namely, at a distance
). In addition to that, we consider a restricted error model
that aims at capturing "missing data" corruptions. In this model, we show that
distributions that are close to monotone have sampling correctors that are
significantly more efficient than achievable by the learning approach.
We also consider the question of whether an additional source of independent
random bits is required by sampling correctors to implement the correction
process
Adaptive Estimation in Weighted Group Testing
Abstract-We consider a generalization of the problem of estimating the support size of a hidden subset S of a universe U from samples. This framework falls under the group testing [1] and the conditional sampling model