Search CORE

116 research outputs found

05291 Abstracts Collection -- Sublinear Algorithms

Author: Czumaj Artur
Muthukrishnan S. Muthu
Rubinfeld Ronitt
Sohler Christian
Publication venue: Dagstuhl Seminar Proceedings. 05291 - Sublinear Algorithms
Publication date: 01/01/2006
Field of study

From 17.07.05 to 22.07.05, the Dagstuhl Seminar 05291 ``Sublinear Algorithms\u27\u27 was held in the International Conference and Research Center (IBFI), Schloss Dagstuhl. During the seminar, several participants presented their current research, and ongoing work and open problems were discussed. Abstracts of the presentations given during the seminar as well as abstracts of seminar results and ideas are put together in this paper. The first section describes the seminar topics and goals in general. Links to extended abstracts or full papers are provided, if available

Dagstuhl Research Online Publication Server

Optimal testing for properties of distributions

Author: Acharya Jayadev
Daskalakis Konstantinos
Kamath Gautam Chetan
Publication venue: Neural Information Processing Systems Foundation
Publication date: 01/12/2015
Field of study

Given samples from an unknown discrete distribution p, is it possible to distinguish whether p belongs to some class of distributions C versus p being far from every distribution in C? This fundamental question has received tremendous attention in statistics, focusing primarily on asymptotic analysis, as well as in information theory and theoretical computer science, where the emphasis has been on small sample size and computational complexity. Nevertheless, even for basic properties of discrete distributions such as monotonicity, independence, logconcavity, unimodality, and monotone-hazard rate, the optimal sample complexity is unknown. We provide a general approach via which we obtain sample-optimal and computationally efficient testers for all these distribution families. At the core of our approach is an algorithm which solves the following problem: Given samples from an unknown distribution p, and a known distribution q, are p and q close in x[superscript 2]-distance, or far in total variation distance? The optimality of our testers is established by providing matching lower bounds, up to constant factors. Finally, a necessary building block for our testers and an important byproduct of our work are the first known computationally efficient proper learners for discrete log-concave, monotone hazard rate distributions

DSpace@MIT

Testing probability distributions underlying aggregated data

Author: A. Blum
C. Dwork
C. Dwork
C.L. Canonne
L. Birgé
L. Paninski
M. Parnas
P. Valiant
R. Rubinfeld
S. Chakraborty
S.K. Ma
T. Batu
Publication venue
Publication date: 01/01/2014
Field of study

In this paper, we analyze and study a hybrid model for testing and learning probability distributions. Here, in addition to samples, the testing algorithm is provided with one of two different types of oracles to the unknown distribution

D

over

[n]

. More precisely, we define both the dual and cumulative dual access models, in which the algorithm

A

can both sample from

D

and respectively, for any

i\in[n]

, - query the probability mass

D(i)

(query access); or - get the total mass of

\{1,\dots,i\}

, i.e.

\sum_{j=1}^i D(j)

(cumulative access) These two models, by generalizing the previously studied sampling and query oracle models, allow us to bypass the strong lower bounds established for a number of problems in these settings, while capturing several interesting aspects of these problems -- and providing new insight on the limitations of the models. Finally, we show that while the testing algorithms can be in most cases strictly more efficient, some tasks remain hard even with this additional power

arXiv.org e-Print Archive

CiteSeerX

Crossref

DSpace@MIT

Learning mixtures of structured distributions over discrete domains

Author: Chan Siu-on
Diakonikolas Ilias
Servedio Rocco A.
Sun Xiaorui
Publication venue
Publication date: 02/10/2012
Field of study

Let

\mathfrak{C}

be a class of probability distributions over the discrete domain

[n] = \{1,...,n\}.

We show that if

\mathfrak{C}

satisfies a rather general condition -- essentially, that each distribution in

\mathfrak{C}

can be well-approximated by a variable-width histogram with few bins -- then there is a highly efficient (both in terms of running time and sample complexity) algorithm that can learn any mixture of

k

unknown distributions from

\mathfrak{C}.

We analyze several natural types of distributions over

[n]

, including log-concave, monotone hazard rate and unimodal distributions, and show that they have the required structural property of being well-approximated by a histogram with few bins. Applying our general algorithm, we obtain near-optimally efficient algorithms for all these mixture learning problems.Comment: preliminary full version of soda'13 pape

arXiv.org e-Print Archive

CiteSeerX

Crossref

Testing k-Modal Distributions: Optimal Algorithms via Reductions

Author: Daskalakis Konstantinos
Diakonikolas Ilias
Servedio Rocco A.
Valiant Gregory
Valiant Paul
Publication venue: 'Society for Industrial & Applied Mathematics (SIAM)'
Publication date: 01/10/2012
Field of study

We give highly efficient algorithms, and almost matching lower bounds, for a range of basic statistical problems that involve testing and estimating the L[subscript 1] (total variation) distance between two k-modal distributions p and q over the discrete domain {1, …, n}. More precisely, we consider the following four problems: given sample access to an unknown k-modal distribution p, Testing identity to a known or unknown distribution: 1. Determine whether p = q (for an explicitly given k-modal distribution q) versus p is e-far from q; 2. Determine whether p = q (where q is available via sample access) versus p is ε-far from q; Estimating L[subscript 1] distance (“tolerant testing”) against a known or unknown distribution: 3. Approximate d[subscript TV](p, q) to within additive ε where q is an explicitly given k-modal distribution q; 4. Approximate d[subscript TV] (p, q) to within additive ε where q is available via sample access. For each of these four problems we give sub-logarithmic sample algorithms, and show that our algorithms have optimal sample complexity up to additive poly (k) and multiplicative polylog log n + polylogk factors. Our algorithms significantly improve the previous results of [BKR04], which were for testing identity of distributions (items (1) and (2) above) in the special cases k = 0 (monotone distributions) and k = 1 (unimodal distributions) and required O((log n)[superscript 3]) samples. As our main conceptual contribution, we introduce a new reduction-based approach for distribution-testing problems that lets us obtain all the above results in a unified way. Roughly speaking, this approach enables us to transform various distribution testing problems for k-modal distributions over {1, …, n} to the corresponding distribution testing problems for unrestricted distributions over a much smaller domain {1, …, ℓ} where ℓ = O(k log n).National Science Foundation (U.S.) (CAREER Award CCF-0953960)Alfred P. Sloan Foundation (Fellowship

DSpace@MIT

Learning $k$ -Modal Distributions via Testing

Author: Daskalakis Constantinos
Diakonikolas Ilias
Servedio Rocco A.
Publication venue
Publication date: 14/09/2014
Field of study

k

-modal probability distribution over the discrete domain

\{1,...,n\}

is one whose histogram has at most

k

"peaks" and "valleys." Such distributions are natural generalizations of monotone (

k=0

) and unimodal (

k=1

) probability distributions, which have been intensively studied in probability theory and statistics. In this paper we consider the problem of \emph{learning} (i.e., performing density estimation of) an unknown

k

-modal distribution with respect to the

L_1

distance. The learning algorithm is given access to independent samples drawn from an unknown

k

-modal distribution

p

, and it must output a hypothesis distribution

\widehat{p}

such that with high probability the total variation distance between

p

and

\widehat{p}

is at most

\epsilon.

Our main goal is to obtain \emph{computationally efficient} algorithms for this problem that use (close to) an information-theoretically optimal number of samples. We give an efficient algorithm for this problem that runs in time

\mathrm{poly}(k,\log(n),1/\epsilon)

. For

k \leq \tilde{O}(\log n)

, the number of samples used by our algorithm is very close (within an

\tilde{O}(\log(1/\epsilon))

factor) to being information-theoretically optimal. Prior to this work computationally efficient algorithms were known only for the cases

k=0,1

\cite{Birge:87b,Birge:97}. A novel feature of our approach is that our learning algorithm crucially uses a new algorithm for \emph{property testing of probability distributions} as a key subroutine. The learning algorithm uses the property tester to efficiently decompose the

k

-modal distribution into

k

(near-)monotone distributions, which are easier to learn.Comment: 28 pages, full version of SODA'12 paper, to appear in Theory of Computin

arXiv.org e-Print Archive

Crossref

Edinburgh Research Explorer

Quantum algorithms for testing properties of distributions

Author: Aram
Avinatan Hassidim
Sergey Bravyi
W. Harrow
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2009
Field of study

Suppose one has access to oracles generating samples from two unknown probability distributions P and Q on some N-element set. How many samples does one need to test whether the two distributions are close or far from each other in the L_1-norm ? This and related questions have been extensively studied during the last years in the field of property testing. In the present paper we study quantum algorithms for testing properties of distributions. It is shown that the L_1-distance between P and Q can be estimated with a constant precision using approximately N^{1/2} queries in the quantum settings, whereas classical computers need \Omega(N) queries. We also describe quantum algorithms for testing Uniformity and Orthogonality with query complexity O(N^{1/3}). The classical query complexity of these problems is known to be \Omega(N^{1/2}).Comment: 20 page

arXiv.org e-Print Archive

CiteSeerX

Crossref

Dagstuhl Research Online Publication Server