18 research outputs found

    On the Power of Conditional Samples in Distribution Testing

    Full text link
    In this paper we define and examine the power of the {\em conditional-sampling} oracle in the context of distribution-property testing. The conditional-sampling oracle for a discrete distribution μ\mu takes as input a subset S[n]S \subset [n] of the domain, and outputs a random sample iSi \in S drawn according to μ\mu, conditioned on SS (and independently of all prior samples). The conditional-sampling oracle is a natural generalization of the ordinary sampling oracle in which SS always equals [n][n]. We show that with the conditional-sampling oracle, testing uniformity, testing identity to a known distribution, and testing any label-invariant property of distributions is easier than with the ordinary sampling oracle. On the other hand, we also show that for some distribution properties the sample-complexity remains near-maximal even with conditional sampling

    The Power of an Example: Hidden Set Size Approximation Using Group Queries and Conditional Sampling

    Full text link
    We study a basic problem of approximating the size of an unknown set SS in a known universe UU. We consider two versions of the problem. In both versions the algorithm can specify subsets TUT\subseteq U. In the first version, which we refer to as the group query or subset query version, the algorithm is told whether TST\cap S is non-empty. In the second version, which we refer to as the subset sampling version, if TST\cap S is non-empty, then the algorithm receives a uniformly selected element from TST\cap S. We study the difference between these two versions under different conditions on the subsets that the algorithm may query/sample, and in both the case that the algorithm is adaptive and the case where it is non-adaptive. In particular we focus on a natural family of allowed subsets, which correspond to intervals, as well as variants of this family

    Testing probability distributions underlying aggregated data

    Full text link
    In this paper, we analyze and study a hybrid model for testing and learning probability distributions. Here, in addition to samples, the testing algorithm is provided with one of two different types of oracles to the unknown distribution DD over [n][n]. More precisely, we define both the dual and cumulative dual access models, in which the algorithm AA can both sample from DD and respectively, for any i[n]i\in[n], - query the probability mass D(i)D(i) (query access); or - get the total mass of {1,,i}\{1,\dots,i\}, i.e. j=1iD(j)\sum_{j=1}^i D(j) (cumulative access) These two models, by generalizing the previously studied sampling and query oracle models, allow us to bypass the strong lower bounds established for a number of problems in these settings, while capturing several interesting aspects of these problems -- and providing new insight on the limitations of the models. Finally, we show that while the testing algorithms can be in most cases strictly more efficient, some tasks remain hard even with this additional power

    Sampling Correctors

    Full text link
    In many situations, sample data is obtained from a noisy or imperfect source. In order to address such corruptions, this paper introduces the concept of a sampling corrector. Such algorithms use structure that the distribution is purported to have, in order to allow one to make "on-the-fly" corrections to samples drawn from probability distributions. These algorithms then act as filters between the noisy data and the end user. We show connections between sampling correctors, distribution learning algorithms, and distribution property testing algorithms. We show that these connections can be utilized to expand the applicability of known distribution learning and property testing algorithms as well as to achieve improved algorithms for those tasks. As a first step, we show how to design sampling correctors using proper learning algorithms. We then focus on the question of whether algorithms for sampling correctors can be more efficient in terms of sample complexity than learning algorithms for the analogous families of distributions. When correcting monotonicity, we show that this is indeed the case when also granted query access to the cumulative distribution function. We also obtain sampling correctors for monotonicity without this stronger type of access, provided that the distribution be originally very close to monotone (namely, at a distance O(1/log2n)O(1/\log^2 n)). In addition to that, we consider a restricted error model that aims at capturing "missing data" corruptions. In this model, we show that distributions that are close to monotone have sampling correctors that are significantly more efficient than achievable by the learning approach. We also consider the question of whether an additional source of independent random bits is required by sampling correctors to implement the correction process

    Adaptive Estimation in Weighted Group Testing

    Get PDF
    Abstract-We consider a generalization of the problem of estimating the support size of a hidden subset S of a universe U from samples. This framework falls under the group testing [1] and the conditional sampling model
    corecore