11 research outputs found

    Proofs of Proximity for Distribution Testing

    Get PDF
    Distribution testing is an area of property testing that studies algorithms that receive few samples from a probability distribution D and decide whether D has a certain property or is far (in total variation distance) from all distributions with that property. Most natural properties of distributions, however, require a large number of samples to test, which motivates the question of whether there are natural settings wherein fewer samples suffice. We initiate a study of proofs of proximity for properties of distributions. In their basic form, these proof systems consist of a tester that not only has sample access to a distribution but also explicit access to a proof string that depends on the distribution. We refer to these as NP distribution testers, or MA distribution testers if the tester is a probabilistic algorithm. We also study the more general notion of IP distribution testers, in which the tester interacts with an all-powerful untrusted prover. We investigate the power and limitations of proofs of proximity for distributions and chart a landscape that, surprisingly, is significantly different from that of proofs of proximity for functions. Our main results include showing that MA distribution testers can be quadratically stronger than standard distribution testers, but no stronger than that; in contrast, IP distribution testers can be exponentially stronger than standard distribution testers, but when restricted to public coins they can be at best quadratically stronger

    An exponential separation between MA and AM proofs of proximity

    Get PDF
    Interactive proofs of proximity allow a sublinear-time verifier to check that a given input is close to the language, using a small amount of communication with a powerful (but untrusted) prover. In this work we consider two natural minimally interactive variants of such proofs systems, in which the prover only sends a single message, referred to as the proof. The first variant, known as MA-proofs of Proximity (MAP), is fully non-interactive, meaning that the proof is a function of the input only. The second variant, known as AM-proofs of Proximity (AMP), allows the proof to additionally depend on the verifier's (entire) random string. The complexity of both MAPs and AMPs is the total number of bits that the verifier observes - namely, the sum of the proof length and query complexity. Our main result is an exponential separation between the power of MAPs and AMPs. Specifically, we exhibit an explicit and natural property Pi that admits an AMP with complexity O(log n), whereas any MAP for Pi has complexity Omega~(n^{1/4}), where n denotes the length of the input in bits. Our MAP lower bound also yields an alternate proof, which is more general and arguably much simpler, for a recent result of Fischer et al. (ITCS, 2014). Lastly, we also consider the notion of oblivious proofs of proximity, in which the verifier's queries are oblivious to the proof. In this setting we show that AMPs can only be quadratically stronger than MAPs. As an application of this result, we show an exponential separation between the power of public and private coin for oblivious interactive proofs of proximity

    Sample-Based Proofs of Proximity

    Get PDF

    On Interactive Proofs of Proximity with Proof-Oblivious Queries

    Get PDF
    Interactive proofs of proximity (IPPs) offer ultra-fast approximate verification of assertions regarding their input, where ultra-fast means that only a small portion of the input is read and approximate verification is analogous to the notion of approximate decision that underlies property testing. Specifically, in an IPP, the prover can make the verifier accept each input in the property, but cannot fool the verifier into accepting an input that is far from the property (except for with small probability). The verifier in an IPP system engages in two very different types of activities: interacting with an untrusted prover, and querying its input. The definition allows for arbitrary coordination between these two activities, but keeping them separate is both conceptually interesting and necessary for important applications such as addressing temporal considerations (i.e., at what time is each of the services available) and facilitating the construction of zero-knowledge schemes. In this work we embark on a systematic study of IPPs with proof-oblivious queries, where the queries should not be affected by the interaction with the prover. We assign the query and interaction activities to separate modules, and consider different limitations on their coordination. The most strict limitation requires these activities to be totally isolated from one another; they just feed their views to a separate deciding module. We show that such systems can be efficiently emulated by standard testers. Going to the other extreme, we only disallow information to flow from the interacting module to the querying module, but allow free information flow in the other direction. We show that extremely efficient one-round (i.e., two-message) systems of such type can be used to verify properties that are extremely hard to test (without the help of a prover). That is, the complexity of verifying can be polylogarithmic in the complexity of testing. This stands in contrast the MAPs (viewed as 1/2-round systems) in which proof-oblivious queries are as limited as our isolated model. Our focus is on an intermediate model that allows shared randomness between the querying and interacting modules but no information flow between them. In this case we show that 1-round systems are efficiently emulated by standard testers but 3/2-round systems of extremely low complexity exist for properties that are extremely hard to test. One additional result about this model is that it can efficiently emulate any IPP for any property of low-degree polynomials

    PAC Verification of Statistical Algorithms

    Full text link
    Goldwasser et al. (2021) recently proposed the setting of PAC verification, where a hypothesis (machine learning model) that purportedly satisfies the agnostic PAC learning objective is verified using an interactive proof. In this paper we develop this notion further in a number of ways. First, we prove a lower bound of Ω(d/ε2)\Omega\left(\sqrt{d}/\varepsilon^2\right) i.i.d.\ samples for PAC verification of hypothesis classes of VC dimension dd. Second, we present a protocol for PAC verification of unions of intervals over R\mathbb{R} that improves upon their proposed protocol for that task, and matches our lower bound's dependence on dd. Third, we introduce a natural generalization of their definition to verification of general statistical algorithms, which is applicable to a wider variety of settings beyond agnostic PAC learning. Showcasing our proposed definition, our final result is a protocol for the verification of statistical query algorithms that satisfy a combinatorial constraint on their queries

    Interactive Proofs for Verifying Machine Learning

    Get PDF
    We consider the following question: using a source of labeled data and interaction with an untrusted prover, what is the complexity of verifying that a given hypothesis is "approximately correct"? We study interactive proof systems for PAC verification, where a verifier that interacts with a prover is required to accept good hypotheses, and reject bad hypotheses. Both the verifier and the prover are efficient and have access to labeled data samples from an unknown distribution. We are interested in cases where the verifier can use significantly less data than is required for (agnostic) PAC learning, or use a substantially cheaper data source (e.g., using only random samples for verification, even though learning requires membership queries). We believe that today, when data and data-driven algorithms are quickly gaining prominence, the question of verifying purported outcomes of data analyses is very well-motivated. We show three main results. First, we prove that for a specific hypothesis class, verification is significantly cheaper than learning in terms of sample complexity, even if the verifier engages with the prover only in a single-round (NP-like) protocol. Moreover, for this class we prove that single-round verification is also significantly cheaper than testing closeness to the class. Second, for the broad class of Fourier-sparse boolean functions, we show a multi-round (IP-like) verification protocol, where the prover uses membership queries, and the verifier is able to assess the result while only using random samples. Third, we show that verification is not always more efficient. Namely, we show a class of functions where verification requires as many samples as learning does, up to a logarithmic factor

    Delegating Data Collection in Decentralized Machine Learning

    Full text link
    Motivated by the emergence of decentralized machine learning ecosystems, we study the delegation of data collection. Taking the field of contract theory as our starting point, we design optimal and near-optimal contracts that deal with two fundamental machine learning challenges: lack of certainty in the assessment of model quality and lack of knowledge regarding the optimal performance of any model. We show that lack of certainty can be dealt with via simple linear contracts that achieve 1-1/e fraction of the first-best utility, even if the principal has a small test set. Furthermore, we give sufficient conditions on the size of the principal's test set that achieves a vanishing additive approximation to the optimal utility. To address the lack of a priori knowledge regarding the optimal performance, we give a convex program that can adaptively and efficiently compute the optimal contract

    Efficient Batch Verification for UP

    Get PDF
    Consider a setting in which a prover wants to convince a verifier of the correctness of k NP statements. For example, the prover wants to convince the verifier that k given integers N_1,...,N_k are all RSA moduli (i.e., products of equal length primes). Clearly this problem can be solved by simply having the prover send the k NP witnesses, but this involves a lot of communication. Can interaction help? In particular, is it possible to construct interactive proofs for this task whose communication grows sub-linearly with k? Our main result is such an interactive proof for verifying the correctness of any k UP statements (i.e., NP statements that have a unique witness). The proof-system uses only a constant number of rounds and the communication complexity is k^delta * poly(m), where delta>0 is an arbitrarily small constant, m is the length of a single witness, and the poly term refers to a fixed polynomial that only depends on the language and not on delta. The (honest) prover strategy can be implemented in polynomial-time given access to the k (unique) witnesses. Our proof leverages "interactive witness verification" (IWV), a new type of proof-system that may be of independent interest. An IWV is a proof-system in which the verifier needs to verify the correctness of an NP statement using: (i) a sublinear number of queries to an alleged NP witness, and (ii) a short interaction with a powerful but untrusted prover. In contrast to the setting of PCPs and Interactive PCPs, here the verifier only has access to the raw NP witness, rather than some encoding thereof

    Distribution-Free Proofs of Proximity

    Full text link
    Motivated by the fact that input distributions are often unknown in advance, distribution-free property testing considers a setting in which the algorithmic task is to accept functions f:[n]→{0,1}f : [n] \to \{0,1\} having a certain property Π\Pi and reject functions that are ϵ\epsilon-far from Π\Pi, where the distance is measured according to an arbitrary and unknown input distribution D∼[n]D \sim [n]. As usual in property testing, the tester is required to do so while making only a sublinear number of input queries, but as the distribution is unknown, we also allow a sublinear number of samples from the distribution DD. In this work we initiate the study of distribution-free interactive proofs of proximity (df-IPP) in which the distribution-free testing algorithm is assisted by an all powerful but untrusted prover. Our main result is a df-IPP for any problem Π∈NC\Pi \in NC, with O~(n)\tilde{O}(\sqrt{n}) communication, sample, query, and verification complexities, for any proximity parameter ϵ>1/n\epsilon>1/\sqrt{n}. For such proximity parameters, this result matches the parameters of the best-known general purpose IPPs in the standard uniform setting, and is optimal under reasonable cryptographic assumptions. For general values of the proximity parameter ϵ\epsilon, our distribution-free IPP has optimal query complexity O(1/ϵ)O(1/\epsilon) but the communication complexity is O~(ϵ⋅n+1/ϵ)\tilde{O}(\epsilon \cdot n + 1/\epsilon), which is worse than what is known for uniform IPPs when ϵ<1/n\epsilon<1/\sqrt{n}. With the aim of improving on this gap, we further show that for IPPs over specialised, but large distribution families, such as sufficiently smooth distributions and product distributions, the communication complexity can be reduced to ϵ⋅n⋅(1/ϵ)o(1)\epsilon\cdot n\cdot(1/\epsilon)^{o(1)} (keeping the query complexity roughly the same as before) to match the communication complexity of the uniform case

    Certifying Private Probabilistic Mechanisms

    Get PDF
    In past years, entire research communities have arisen to address concerns of privacy and fairness in data analysis. At present, however, the public must trust that institutions will re-implement algorithms voluntarily to account for these social concerns. Due to additional cost, widespread adoption is unlikely without effective legal enforcement. A technical challenge for enforcement is that the methods proposed are often probabilistic mechanisms, whose output must be drawn according to precise, and sometimes secret, distributions. The Differential Privacy (DP) case is illustrative: if a cheating curator answers queries according to an overly-accurate mechanism, privacy violations could go undetected. The need for effective enforcement raises the central question of our paper: Can we efficiently certify the output of a probabilistic mechanism enacted by an untrusted party? To this end: (1) We introduce two new notions: Certified Probabilistic Mechanisms (CPM) and Random Variable Commitment Schemes (RVCS). A CPM is an interactive protocol that forces a prover to enact a given probabilistic mechanism or be caught; importantly, the interaction does not reveal secret parameters of the mechanism. An RVCS—a key primitive for constructing CPMs—is a commitment scheme where the verifier is convinced that the commitment is to an RV sampled according to an agreed-upon distribution, but learns nothing else. (2) We instantiate the general notion of CPM for the special case of Certifying DP. We build a lightweight, doubly-efficent interactive proof system to certify arbitrary-predicate counting queries released via the DP Binomial mechanism. The construction relies on a commitment scheme with perfect hiding and additive homomorphic properties that can be used to release a broad class of queries about a committed database, which we construct on top of Pedersen commitments. (3) Finally, we demonstrate the immediate feasibility of Certified DP via a highly-efficient and scalable prototype implementation to answer counting queries of arbitrary predicates. The mechanism is composed of an offline and online stage, where the online phase allows for non-interactive certification of queries. For example, we show that CDP queries over a US Census Public Use Microdata Sample (PUMS) (n=7000n=7000) can be completed in only 1.6 ms and verified in just 38 μs\mu \text{s}. Our implementation is available in open source at https://github.com/jlwatson/certified-dp
    corecore