Search CORE

6 research outputs found

Exploring the Gap Between Tolerant and Non-Tolerant Distribution Testing

Author: Chakraborty Sourav
Fischer Eldar
Ghosh Arijit
Mishra Gopinath
Sen Sayantan
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2022)
Publication date: 19/10/2021
Field of study

The framework of distribution testing is currently ubiquitous in the field of property testing. In this model, the input is a probability distribution accessible via independently drawn samples from an oracle. The testing task is to distinguish a distribution that satisfies some property from a distribution that is far in some distance measure from satisfying it. The task of tolerant testing imposes a further restriction, that distributions close to satisfying the property are also accepted. This work focuses on the connection between the sample complexities of non-tolerant testing of distributions and their tolerant testing counterparts. When limiting our scope to label-invariant (symmetric) properties of distributions, we prove that the gap is at most quadratic, ignoring poly-logarithmic factors. Conversely, the property of being the uniform distribution is indeed known to have an almost-quadratic gap. When moving to general, not necessarily label-invariant properties, the situation is more complicated, and we show some partial results. We show that if a property requires the distributions to be non-concentrated, that is, the probability mass of the distribution is sufficiently spread out, then it cannot be non-tolerantly tested with o(?n) many samples, where n denotes the universe size. Clearly, this implies at most a quadratic gap, because a distribution can be learned (and hence tolerantly tested against any property) using ?(n) many samples. Being non-concentrated is a strong requirement on properties, as we also prove a close to linear lower bound against their tolerant tests. Apart from the case where the distribution is non-concentrated, we also show if an input distribution is very concentrated, in the sense that it is mostly supported on a subset of size s of the universe, then it can be learned using only ?(s) many samples. The learning procedure adapts to the input, and works without knowing s in advance

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

Warwick Research Archives Portal Repository

Improved quantum data analysis

Author: Bădescu Costin
O'Donnell Ryan
Publication venue
Publication date: 21/11/2020
Field of study

We provide more sample-efficient versions of some basic routines in quantum data analysis, along with simpler proofs. Particularly, we give a quantum "Threshold Search" algorithm that requires only

O((\log^2 m)/\epsilon^2)

samples of a

d

-dimensional state

\rho

. That is, given observables

0 \le A_1, A_2, ..., A_m \le 1

such that

\mathrm{tr}(\rho A_i) \ge 1/2

for at least one

i

, the algorithm finds

j

with

\mathrm{tr}(\rho A_j) \ge 1/2-\epsilon

. As a consequence, we obtain a Shadow Tomography algorithm requiring only

\tilde{O}((\log^2 m)(\log d)/\epsilon^4)

samples, which simultaneously achieves the best known dependence on each parameter

m

d

\epsilon

. This yields the same sample complexity for quantum Hypothesis Selection among

m

states; we also give an alternative Hypothesis Selection method using

\tilde{O}((\log^3 m)/\epsilon^2)

samples

arXiv.org e-Print Archive

Simple Binary Hypothesis Testing under Local Differential Privacy and Communication Constraints

Author: Asadi Amir R.
Jog Varun
Loh Po-Ling
Pensia Ankit
Publication venue
Publication date: 15/12/2023
Field of study

We study simple binary hypothesis testing under both local differential privacy (LDP) and communication constraints. We qualify our results as either minimax optimal or instance optimal: the former hold for the set of distribution pairs with prescribed Hellinger divergence and total variation distance, whereas the latter hold for specific distribution pairs. For the sample complexity of simple hypothesis testing under pure LDP constraints, we establish instance-optimal bounds for distributions with binary support; minimax-optimal bounds for general distributions; and (approximately) instance-optimal, computationally efficient algorithms for general distributions. When both privacy and communication constraints are present, we develop instance-optimal, computationally efficient algorithms that achieve the minimum possible sample complexity (up to universal constants). Our results on instance-optimal algorithms hinge on identifying the extreme points of the joint range set

\mathcal A

of two distributions

p

and

q

, defined as

\mathcal A := \{(\mathbf T p, \mathbf T q) | \mathbf T \in \mathcal C\}

, where

\mathcal C

is the set of channels characterizing the constraints.Comment: 1 figur

arXiv.org e-Print Archive

Private hypothesis selection

Author: Bun Mark
Kamath Gautam
Steinke Thomas
Wu Zhiwei Steven
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 04/01/2021
Field of study

We provide a differentially private algorithm for hypothesis selection. Given samples from an unknown probability distribution P and a set of m probability distributions H, the goal is to output, in a ε-differentially private manner, a distribution from H whose total variation distance to P is comparable to that of the best such distribution (which we denote by α). The sample complexity of our basic algorithm is O(log m/α^2 + log m/αε), representing a minimal cost for privacy when compared to the non-private algorithm. We also can handle infinite hypothesis classes H by relaxing to (ε, δ)-differential privacy. We apply our hypothesis selection algorithm to give learning algorithms for a number of natural distribution classes, including Gaussians, product distributions, sums of independent random variables, piecewise polynomials, and mixture classes. Our hypothesis selection procedure allows us to generically convert a cover for a class to a learning algorithm, complementing known learning lower bounds which are in terms of the size of the packing number of the class. As the covering and packing numbers are often closely related, for constant α, our algorithms achieve the optimal sample complexity for many classes of interest. Finally, we describe an application to private distribution-free PAC learning.https://arxiv.org/abs/1905.1322

arXiv.org e-Print Archive

Boston University Institutional Repository (OpenBU)