16 research outputs found
Identity Testing for High-Dimensional Distributions via Entropy Tensorization
We present improved algorithms and matching statistical and computational
lower bounds for the problem of identity testing -dimensional distributions.
In the identity testing problem, we are given as input an explicit distribution
, an , and access to a sampling oracle for a hidden
distribution . The goal is to distinguish whether the two distributions
and are identical or are at least -far apart. When
there is only access to full samples from the hidden distribution , it is
known that exponentially many samples may be needed, and hence previous works
have studied identity testing with additional access to various conditional
sampling oracles. We consider here a significantly weaker conditional sampling
oracle, called the Coordinate Oracle, and provide a fairly complete
computational and statistical characterization of the identity testing problem
in this new model.
We prove that if an analytic property known as approximate tensorization of
entropy holds for the visible distribution , then there is an efficient
identity testing algorithm for any hidden that uses
queries to the Coordinate Oracle. Approximate
tensorization of entropy is a classical tool for proving optimal mixing time
bounds of Markov chains for high-dimensional distributions, and recently has
been established for many families of distributions via spectral independence.
We complement our algorithmic result for identity testing with a matching
statistical lower bound for the number of queries under
the Coordinate Oracle. We also prove a computational phase transition: for
sparse antiferromagnetic Ising models over , in the regime where
approximate tensorization of entropy fails, there is no efficient identity
testing algorithm unless RP=NP
Private Distribution Testing with Heterogeneous Constraints: Your Epsilon Might Not Be Mine
Private closeness testing asks to decide whether the underlying probability
distributions of two sensitive datasets are identical or differ significantly
in statistical distance, while guaranteeing (differential) privacy of the data.
As in most (if not all) distribution testing questions studied under privacy
constraints, however, previous work assumes that the two datasets are equally
sensitive, i.e., must be provided the same privacy guarantees. This is often an
unrealistic assumption, as different sources of data come with different
privacy requirements; as a result, known closeness testing algorithms might be
unnecessarily conservative, "paying" too high a privacy budget for half of the
data. In this work, we initiate the study of the closeness testing problem
under heterogeneous privacy constraints, where the two datasets come with
distinct privacy requirements.
We formalize the question and provide algorithms under the three most widely
used differential privacy settings, with a particular focus on the local and
shuffle models of privacy; and show that one can indeed achieve better sample
efficiency when taking into account the two different "epsilon" requirements
Recommended from our members
Property Testing and Probability Distributions: New Techniques, New Models, and New Goals
In order to study the real world, scientists (and computer scientists) develop simplified models that attempt to capture the essential features of the observed system. Understanding the power and limitations of these models, when they apply or fail to fully capture the situation at hand, is therefore of uttermost importance.
In this thesis, we investigate the role of some of these models in property testing of probability distributions (distribution testing), as well as in related areas. We introduce natural extensions of the standard model (which only allows access to independent draws from the underlying distribution), in order to circumvent some of its limitations or draw new insights about the problems they aim at capturing. Our results are organized in three main directions:
(i) We provide systematic approaches to tackle distribution testing questions. Specifically, we provide two general algorithmic frameworks that apply to a wide range of properties, and yield efficient and near-optimal results for many of them. We complement these by introducing two methodologies to prove information-theoretic lower bounds in distribution testing, which enable us to derive hardness results in a clean and unified way.
(ii) We introduce and investigate two new models of access to the unknown distributions, which both generalize the standard sampling model in different ways and allow testing algorithms to achieve significantly better efficiency. Our study of the power and limitations of algorithms in these models shows how these could lead to faster algorithms in practical situations, and yields a better understanding of the underlying bottlenecks in the standard sampling setting.
(iii) We then leave the field of distribution testing to explore areas adjacent to property testing. We define a new algorithmic primitive of sampling correction, which in some sense lies in between distribution learning and testing and aims to capture settings where data originates from imperfect or noisy sources. Our work sets out to model these situations in a rigorous and abstracted way, in order to enable the development of systematic methods to address these issues
LIPIcs, Volume 251, ITCS 2023, Complete Volume
LIPIcs, Volume 251, ITCS 2023, Complete Volum
Space Programs Summary no. 37-38, volume IV FOR the period February 1, 1966 to March 31, 1966. Supporting research and advanced development
Supporting research in systems analysis, guidance and control, environmental simulation, space sciences, propulsion systems, and radio telecommunication