Search CORE

16 research outputs found

Identity Testing for High-Dimensional Distributions via Entropy Tensorization

Author: Blanca Antonio
Chen Zongchen
Vigoda Eric
Štefankovič Daniel
Publication venue
Publication date: 19/07/2022
Field of study

We present improved algorithms and matching statistical and computational lower bounds for the problem of identity testing

n

-dimensional distributions. In the identity testing problem, we are given as input an explicit distribution

\mu

, an

\varepsilon>0

, and access to a sampling oracle for a hidden distribution

\pi

. The goal is to distinguish whether the two distributions

\mu

and

\pi

are identical or are at least

\varepsilon

-far apart. When there is only access to full samples from the hidden distribution

\pi

, it is known that exponentially many samples may be needed, and hence previous works have studied identity testing with additional access to various conditional sampling oracles. We consider here a significantly weaker conditional sampling oracle, called the Coordinate Oracle, and provide a fairly complete computational and statistical characterization of the identity testing problem in this new model. We prove that if an analytic property known as approximate tensorization of entropy holds for the visible distribution

\mu

, then there is an efficient identity testing algorithm for any hidden

\pi

that uses

\tilde{O}(n/\varepsilon)

queries to the Coordinate Oracle. Approximate tensorization of entropy is a classical tool for proving optimal mixing time bounds of Markov chains for high-dimensional distributions, and recently has been established for many families of distributions via spectral independence. We complement our algorithmic result for identity testing with a matching

\Omega(n/\varepsilon)

statistical lower bound for the number of queries under the Coordinate Oracle. We also prove a computational phase transition: for sparse antiferromagnetic Ising models over

\{+1,-1\}^n

, in the regime where approximate tensorization of entropy fails, there is no efficient identity testing algorithm unless RP=NP

arXiv.org e-Print Archive

Private Distribution Testing with Heterogeneous Constraints: Your Epsilon Might Not Be Mine

Author: Canonne Clément L.
Sun Yucheng
Publication venue
Publication date: 13/09/2023
Field of study

Private closeness testing asks to decide whether the underlying probability distributions of two sensitive datasets are identical or differ significantly in statistical distance, while guaranteeing (differential) privacy of the data. As in most (if not all) distribution testing questions studied under privacy constraints, however, previous work assumes that the two datasets are equally sensitive, i.e., must be provided the same privacy guarantees. This is often an unrealistic assumption, as different sources of data come with different privacy requirements; as a result, known closeness testing algorithms might be unnecessarily conservative, "paying" too high a privacy budget for half of the data. In this work, we initiate the study of the closeness testing problem under heterogeneous privacy constraints, where the two datasets come with distinct privacy requirements. We formalize the question and provide algorithms under the three most widely used differential privacy settings, with a particular focus on the local and shuffle models of privacy; and show that one can indeed achieve better sample efficiency when taking into account the two different "epsilon" requirements

arXiv.org e-Print Archive

Coverage processes

Author: Kokic Philip N
Publication venue
Publication date
Field of study

The Australian National University

Recommended from our members

Property Testing and Probability Distributions: New Techniques, New Models, and New Goals

Author: Canonne Clement Louis
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2017
Field of study

In order to study the real world, scientists (and computer scientists) develop simplified models that attempt to capture the essential features of the observed system. Understanding the power and limitations of these models, when they apply or fail to fully capture the situation at hand, is therefore of uttermost importance. In this thesis, we investigate the role of some of these models in property testing of probability distributions (distribution testing), as well as in related areas. We introduce natural extensions of the standard model (which only allows access to independent draws from the underlying distribution), in order to circumvent some of its limitations or draw new insights about the problems they aim at capturing. Our results are organized in three main directions: (i) We provide systematic approaches to tackle distribution testing questions. Specifically, we provide two general algorithmic frameworks that apply to a wide range of properties, and yield efficient and near-optimal results for many of them. We complement these by introducing two methodologies to prove information-theoretic lower bounds in distribution testing, which enable us to derive hardness results in a clean and unified way. (ii) We introduce and investigate two new models of access to the unknown distributions, which both generalize the standard sampling model in different ways and allow testing algorithms to achieve significantly better efficiency. Our study of the power and limitations of algorithms in these models shows how these could lead to faster algorithms in practical situations, and yields a better understanding of the underlying bottlenecks in the standard sampling setting. (iii) We then leave the field of distribution testing to explore areas adjacent to property testing. We define a new algorithmic primitive of sampling correction, which in some sense lies in between distribution learning and testing and aims to capture settings where data originates from imperfect or noisy sources. Our work sets out to model these situations in a rigorous and abstracted way, in order to enable the development of systematic methods to address these issues

Columbia University Academic Commons

LIPIcs, Volume 251, ITCS 2023, Complete Volume

Author: Tauman Kalai Yael
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 14th Innovations in Theoretical Computer Science Conference (ITCS 2023)
Publication date: 01/01/2023
Field of study

LIPIcs, Volume 251, ITCS 2023, Complete Volum

Dagstuhl Research Online Publication Server

Space Programs Summary no. 37-38, volume IV FOR the period February 1, 1966 to March 31, 1966. Supporting research and advanced development

Author
Publication venue
Publication date
Field of study

Supporting research in systems analysis, guidance and control, environmental simulation, space sciences, propulsion systems, and radio telecommunication

NASA Technical Reports Server