Search CORE

853 research outputs found

Testing Conditional Independence of Discrete Distributions

Author: Acharya J.
Batu T.
Bayesian Square Hellinger
Canonne C.
Canonne C. L.
Canonne C. L.
Closeness Near-Optimal
Fisher R. A.
Hardt M.
Mantel N.
Publication venue
Publication date: 01/07/2018
Field of study

We study the problem of testing \emph{conditional independence} for discrete distributions. Specifically, given samples from a discrete random variable

(X, Y, Z)

on domain

[\ell_1]\times[\ell_2] \times [n]

, we want to distinguish, with probability at least

2/3

, between the case that

X

and

Y

are conditionally independent given

Z

from the case that

(X, Y, Z)

\epsilon

-far, in

\ell_1

-distance, from every distribution that has this property. Conditional independence is a concept of central importance in probability and statistics with a range of applications in various scientific domains. As such, the statistical task of testing conditional independence has been extensively studied in various forms within the statistics and econometrics communities for nearly a century. Perhaps surprisingly, this problem has not been previously considered in the framework of distribution property testing and in particular no tester with sublinear sample complexity is known, even for the important special case that the domains of

X

and

Y

are binary. The main algorithmic result of this work is the first conditional independence tester with {\em sublinear} sample complexity for discrete distributions over

[\ell_1]\times[\ell_2] \times [n]

. To complement our upper bounds, we prove information-theoretic lower bounds establishing that the sample complexity of our algorithm is optimal, up to constant factors, for a number of settings. Specifically, for the prototypical setting when

\ell_1, \ell_2 = O(1)

, we show that the sample complexity of testing conditional independence (upper bound and matching lower bound) is \[ \Theta\left({\max\left(n^{1/2}/\epsilon^2,\min\left(n^{7/8}/\epsilon,n^{6/7}/\epsilon^{8/7}\right)\right)}\right)\,. \

arXiv.org e-Print Archive

Crossref

Sampling Correctors

Author: Canonne Clément
Gouleakis Themis
Rubinfeld Ronitt
Publication venue
Publication date: 31/03/2018
Field of study

In many situations, sample data is obtained from a noisy or imperfect source. In order to address such corruptions, this paper introduces the concept of a sampling corrector. Such algorithms use structure that the distribution is purported to have, in order to allow one to make "on-the-fly" corrections to samples drawn from probability distributions. These algorithms then act as filters between the noisy data and the end user. We show connections between sampling correctors, distribution learning algorithms, and distribution property testing algorithms. We show that these connections can be utilized to expand the applicability of known distribution learning and property testing algorithms as well as to achieve improved algorithms for those tasks. As a first step, we show how to design sampling correctors using proper learning algorithms. We then focus on the question of whether algorithms for sampling correctors can be more efficient in terms of sample complexity than learning algorithms for the analogous families of distributions. When correcting monotonicity, we show that this is indeed the case when also granted query access to the cumulative distribution function. We also obtain sampling correctors for monotonicity without this stronger type of access, provided that the distribution be originally very close to monotone (namely, at a distance

O(1/\log^2 n)

). In addition to that, we consider a restricted error model that aims at capturing "missing data" corruptions. In this model, we show that distributions that are close to monotone have sampling correctors that are significantly more efficient than achievable by the learning approach. We also consider the question of whether an additional source of independent random bits is required by sampling correctors to implement the correction process

arXiv.org e-Print Archive

DSpace@MIT

Chapter 37 Improv, Stand-Up, and Comedy

Author: Canonne Clément
Publication venue: 'Informa UK Limited'
Publication date: 15/12/2021
Field of study

The idea of improvisation, broadly defined, has been integral to our imagination of the medieval musical past. It can be related to many elements of production: to the act of un-notated creation; to the manipulation and amplification of notated materials; to our observance of rigid rules and formulae; or to spontaneous freedom. Likely a product of the Carolingian Renaissance, this is the first medieval music treatise to address an aspect of chant performance that does not only relate to a memorized repertoire, but includes an unwritten practice of extemporizing an accompanying voice to a pre-given melody. The art of “coloration” or the ornamentation of a line, whether polyphonic or monophonic, had been an integral part of extemporization since at least the time of the Ad organum faciendum treatises. When planning author's ontological inquiries, the author's would do well to remember the possible existence of creativity that is not inspired, or ephemerality that is not performer- or expression-centered

Directory of Open Access Books (DOAB)

Study of the putative role of MPV17 in cancer cell proliferation

Author: Canonne Morgane
Publication venue
Publication date: 27/08/2020
Field of study

Repository of the University of Namur

Recommended from our members

Property Testing and Probability Distributions: New Techniques, New Models, and New Goals

Author: Canonne Clement Louis
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2017
Field of study

In order to study the real world, scientists (and computer scientists) develop simplified models that attempt to capture the essential features of the observed system. Understanding the power and limitations of these models, when they apply or fail to fully capture the situation at hand, is therefore of uttermost importance. In this thesis, we investigate the role of some of these models in property testing of probability distributions (distribution testing), as well as in related areas. We introduce natural extensions of the standard model (which only allows access to independent draws from the underlying distribution), in order to circumvent some of its limitations or draw new insights about the problems they aim at capturing. Our results are organized in three main directions: (i) We provide systematic approaches to tackle distribution testing questions. Specifically, we provide two general algorithmic frameworks that apply to a wide range of properties, and yield efficient and near-optimal results for many of them. We complement these by introducing two methodologies to prove information-theoretic lower bounds in distribution testing, which enable us to derive hardness results in a clean and unified way. (ii) We introduce and investigate two new models of access to the unknown distributions, which both generalize the standard sampling model in different ways and allow testing algorithms to achieve significantly better efficiency. Our study of the power and limitations of algorithms in these models shows how these could lead to faster algorithms in practical situations, and yields a better understanding of the underlying bottlenecks in the standard sampling setting. (iii) We then leave the field of distribution testing to explore areas adjacent to property testing. We define a new algorithmic primitive of sampling correction, which in some sense lies in between distribution learning and testing and aims to capture settings where data originates from imperfect or noisy sources. Our work sets out to model these situations in a rigorous and abstracted way, in order to enable the development of systematic methods to address these issues

Columbia University Academic Commons