20,985 research outputs found

    Tolerant Testers of Image Properties

    Get PDF
    We initiate a systematic study of tolerant testers of image properties or, equivalently, algorithms that approximate the distance from a given image to the desired property (that is, the smallest fraction of pixels that need to change in the image to ensure that the image satisfies the desired property). Image processing is a particularly compelling area of applications for sublinear-time algorithms and, specifically, property testing. However, for testing algorithms to reach their full potential in image processing, they have to be tolerant, which allows them to be resilient to noise. Prior to this work, only one tolerant testing algorithm for an image property (image partitioning) has been published. We design efficient approximation algorithms for the following fundamental questions: What fraction of pixels have to be changed in an image so that it becomes a half-plane? a representation of a convex object? a representation of a connected object? More precisely, our algorithms approximate the distance to three basic properties (being a half-plane, convexity, and connectedness) within a small additive error epsilon, after reading a number of pixels polynomial in 1/epsilon and independent of the size of the image. The running time of the testers for half-plane and convexity is also polynomial in 1/epsilon. Tolerant testers for these three properties were not investigated previously. For convexity and connectedness, even the existence of distance approximation algorithms with query complexity independent of the input size is not implied by previous work. (It does not follow from the VC-dimension bounds, since VC dimension of convexity and connectedness, even in two dimensions, depends on the input size. It also does not follow from the existence of non-tolerant testers.) Our algorithms require very simple access to the input: uniform random samples for the half-plane property and convexity, and samples from uniformly random blocks for connectedness. However, the analysis of the algorithms, especially for convexity, requires many geometric and combinatorial insights. For example, in the analysis of the algorithm for convexity, we define a set of reference polygons P_{epsilon} such that (1) every convex image has a nearby polygon in P_{epsilon} and (2) one can use dynamic programming to quickly compute the smallest empirical distance to a polygon in P_{epsilon}. This construction might be of independent interest

    Deleting and Testing Forbidden Patterns in Multi-Dimensional Arrays

    Get PDF
    Understanding the local behaviour of structured multi-dimensional data is a fundamental problem in various areas of computer science. As the amount of data is often huge, it is desirable to obtain sublinear time algorithms, and specifically property testers, to understand local properties of the data. We focus on the natural local problem of testing pattern freeness: given a large dd-dimensional array AA and a fixed dd-dimensional pattern PP over a finite alphabet, we say that AA is PP-free if it does not contain a copy of the forbidden pattern PP as a consecutive subarray. The distance of AA to PP-freeness is the fraction of entries of AA that need to be modified to make it PP-free. For any ϵ[0,1]\epsilon \in [0,1] and any large enough pattern PP over any alphabet, other than a very small set of exceptional patterns, we design a tolerant tester that distinguishes between the case that the distance is at least ϵ\epsilon and the case that it is at most adϵa_d \epsilon, with query complexity and running time cdϵ1c_d \epsilon^{-1}, where ad<1a_d < 1 and cdc_d depend only on dd. To analyze the testers we establish several combinatorial results, including the following dd-dimensional modification lemma, which might be of independent interest: for any large enough pattern PP over any alphabet (excluding a small set of exceptional patterns for the binary case), and any array AA containing a copy of PP, one can delete this copy by modifying one of its locations without creating new PP-copies in AA. Our results address an open question of Fischer and Newman, who asked whether there exist efficient testers for properties related to tight substructures in multi-dimensional structured data. They serve as a first step towards a general understanding of local properties of multi-dimensional arrays, as any such property can be characterized by a fixed family of forbidden patterns

    Testing probability distributions underlying aggregated data

    Full text link
    In this paper, we analyze and study a hybrid model for testing and learning probability distributions. Here, in addition to samples, the testing algorithm is provided with one of two different types of oracles to the unknown distribution DD over [n][n]. More precisely, we define both the dual and cumulative dual access models, in which the algorithm AA can both sample from DD and respectively, for any i[n]i\in[n], - query the probability mass D(i)D(i) (query access); or - get the total mass of {1,,i}\{1,\dots,i\}, i.e. j=1iD(j)\sum_{j=1}^i D(j) (cumulative access) These two models, by generalizing the previously studied sampling and query oracle models, allow us to bypass the strong lower bounds established for a number of problems in these settings, while capturing several interesting aspects of these problems -- and providing new insight on the limitations of the models. Finally, we show that while the testing algorithms can be in most cases strictly more efficient, some tasks remain hard even with this additional power

    Hard Properties with (Very) Short PCPPs and Their Applications

    Get PDF
    We show that there exist properties that are maximally hard for testing, while still admitting PCPPs with a proof size very close to linear. Specifically, for every fixed ?, we construct a property P^(?)? {0,1}^n satisfying the following: Any testing algorithm for P^(?) requires ?(n) many queries, and yet P^(?) has a constant query PCPP whose proof size is O(n?log^(?)n), where log^(?) denotes the ? times iterated log function (e.g., log^(2)n = log log n). The best previously known upper bound on the PCPP proof size for a maximally hard to test property was O(n?polylog(n)). As an immediate application, we obtain stronger separations between the standard testing model and both the tolerant testing model and the erasure-resilient testing model: for every fixed ?, we construct a property that has a constant-query tester, but requires ?(n/log^(?)(n)) queries for every tolerant or erasure-resilient tester

    Sampling Correctors

    Full text link
    In many situations, sample data is obtained from a noisy or imperfect source. In order to address such corruptions, this paper introduces the concept of a sampling corrector. Such algorithms use structure that the distribution is purported to have, in order to allow one to make "on-the-fly" corrections to samples drawn from probability distributions. These algorithms then act as filters between the noisy data and the end user. We show connections between sampling correctors, distribution learning algorithms, and distribution property testing algorithms. We show that these connections can be utilized to expand the applicability of known distribution learning and property testing algorithms as well as to achieve improved algorithms for those tasks. As a first step, we show how to design sampling correctors using proper learning algorithms. We then focus on the question of whether algorithms for sampling correctors can be more efficient in terms of sample complexity than learning algorithms for the analogous families of distributions. When correcting monotonicity, we show that this is indeed the case when also granted query access to the cumulative distribution function. We also obtain sampling correctors for monotonicity without this stronger type of access, provided that the distribution be originally very close to monotone (namely, at a distance O(1/log2n)O(1/\log^2 n)). In addition to that, we consider a restricted error model that aims at capturing "missing data" corruptions. In this model, we show that distributions that are close to monotone have sampling correctors that are significantly more efficient than achievable by the learning approach. We also consider the question of whether an additional source of independent random bits is required by sampling correctors to implement the correction process
    corecore