853 research outputs found
Testing Conditional Independence of Discrete Distributions
We study the problem of testing \emph{conditional independence} for discrete
distributions. Specifically, given samples from a discrete random variable on domain , we want to distinguish,
with probability at least , between the case that and are
conditionally independent given from the case that is
-far, in -distance, from every distribution that has this
property. Conditional independence is a concept of central importance in
probability and statistics with a range of applications in various scientific
domains. As such, the statistical task of testing conditional independence has
been extensively studied in various forms within the statistics and
econometrics communities for nearly a century. Perhaps surprisingly, this
problem has not been previously considered in the framework of distribution
property testing and in particular no tester with sublinear sample complexity
is known, even for the important special case that the domains of and
are binary.
The main algorithmic result of this work is the first conditional
independence tester with {\em sublinear} sample complexity for discrete
distributions over . To complement our upper
bounds, we prove information-theoretic lower bounds establishing that the
sample complexity of our algorithm is optimal, up to constant factors, for a
number of settings. Specifically, for the prototypical setting when , we show that the sample complexity of testing conditional
independence (upper bound and matching lower bound) is
\[
\Theta\left({\max\left(n^{1/2}/\epsilon^2,\min\left(n^{7/8}/\epsilon,n^{6/7}/\epsilon^{8/7}\right)\right)}\right)\,.
\
Sampling Correctors
In many situations, sample data is obtained from a noisy or imperfect source.
In order to address such corruptions, this paper introduces the concept of a
sampling corrector. Such algorithms use structure that the distribution is
purported to have, in order to allow one to make "on-the-fly" corrections to
samples drawn from probability distributions. These algorithms then act as
filters between the noisy data and the end user.
We show connections between sampling correctors, distribution learning
algorithms, and distribution property testing algorithms. We show that these
connections can be utilized to expand the applicability of known distribution
learning and property testing algorithms as well as to achieve improved
algorithms for those tasks.
As a first step, we show how to design sampling correctors using proper
learning algorithms. We then focus on the question of whether algorithms for
sampling correctors can be more efficient in terms of sample complexity than
learning algorithms for the analogous families of distributions. When
correcting monotonicity, we show that this is indeed the case when also granted
query access to the cumulative distribution function. We also obtain sampling
correctors for monotonicity without this stronger type of access, provided that
the distribution be originally very close to monotone (namely, at a distance
). In addition to that, we consider a restricted error model
that aims at capturing "missing data" corruptions. In this model, we show that
distributions that are close to monotone have sampling correctors that are
significantly more efficient than achievable by the learning approach.
We also consider the question of whether an additional source of independent
random bits is required by sampling correctors to implement the correction
process
Chapter 37 Improv, Stand-Up, and Comedy
The idea of improvisation, broadly defined, has been integral to our imagination of the medieval musical past. It can be related to many elements of production: to the act of un-notated creation; to the manipulation and amplification of notated materials; to our observance of rigid rules and formulae; or to spontaneous freedom. Likely a product of the Carolingian Renaissance, this is the first medieval music treatise to address an aspect of chant performance that does not only relate to a memorized repertoire, but includes an unwritten practice of extemporizing an accompanying voice to a pre-given melody. The art of “coloration” or the ornamentation of a line, whether polyphonic or monophonic, had been an integral part of extemporization since at least the time of the Ad organum faciendum treatises. When planning author's ontological inquiries, the author's would do well to remember the possible existence of creativity that is not inspired, or ephemerality that is not performer- or expression-centered
Recommended from our members
Property Testing and Probability Distributions: New Techniques, New Models, and New Goals
In order to study the real world, scientists (and computer scientists) develop simplified models that attempt to capture the essential features of the observed system. Understanding the power and limitations of these models, when they apply or fail to fully capture the situation at hand, is therefore of uttermost importance.
In this thesis, we investigate the role of some of these models in property testing of probability distributions (distribution testing), as well as in related areas. We introduce natural extensions of the standard model (which only allows access to independent draws from the underlying distribution), in order to circumvent some of its limitations or draw new insights about the problems they aim at capturing. Our results are organized in three main directions:
(i) We provide systematic approaches to tackle distribution testing questions. Specifically, we provide two general algorithmic frameworks that apply to a wide range of properties, and yield efficient and near-optimal results for many of them. We complement these by introducing two methodologies to prove information-theoretic lower bounds in distribution testing, which enable us to derive hardness results in a clean and unified way.
(ii) We introduce and investigate two new models of access to the unknown distributions, which both generalize the standard sampling model in different ways and allow testing algorithms to achieve significantly better efficiency. Our study of the power and limitations of algorithms in these models shows how these could lead to faster algorithms in practical situations, and yields a better understanding of the underlying bottlenecks in the standard sampling setting.
(iii) We then leave the field of distribution testing to explore areas adjacent to property testing. We define a new algorithmic primitive of sampling correction, which in some sense lies in between distribution learning and testing and aims to capture settings where data originates from imperfect or noisy sources. Our work sets out to model these situations in a rigorous and abstracted way, in order to enable the development of systematic methods to address these issues
- …