116 research outputs found
05291 Abstracts Collection -- Sublinear Algorithms
From 17.07.05 to 22.07.05, the Dagstuhl Seminar
05291 ``Sublinear Algorithms\u27\u27 was held
in the International Conference and Research Center (IBFI),
Schloss Dagstuhl.
During the seminar, several participants presented their current
research, and ongoing work and open problems were discussed. Abstracts of
the presentations given during the seminar as well as abstracts of
seminar results and ideas are put together in this paper. The first section
describes the seminar topics and goals in general.
Links to extended abstracts or full papers are provided, if available
Optimal testing for properties of distributions
Given samples from an unknown discrete distribution p, is it possible to distinguish whether p belongs to some class of distributions C versus p being far from every distribution in C? This fundamental question has received tremendous attention in statistics, focusing primarily on asymptotic analysis, as well as in information theory and theoretical computer science, where the emphasis has been on small sample size and computational complexity. Nevertheless, even for basic properties of discrete distributions such as monotonicity, independence, logconcavity, unimodality, and monotone-hazard rate, the optimal sample complexity
is unknown. We provide a general approach via which we obtain sample-optimal and computationally efficient testers for all these distribution families. At the core of our approach is an algorithm which solves the following problem: Given samples from an unknown distribution p, and a known distribution q, are p and q close in x[superscript 2]-distance, or far in total variation distance? The optimality of our testers is established by providing matching lower bounds, up to constant factors. Finally, a necessary building block for our testers and an important byproduct of our work are the first known computationally efficient proper learners for discrete log-concave, monotone hazard rate distributions
Testing probability distributions underlying aggregated data
In this paper, we analyze and study a hybrid model for testing and learning
probability distributions. Here, in addition to samples, the testing algorithm
is provided with one of two different types of oracles to the unknown
distribution over . More precisely, we define both the dual and
cumulative dual access models, in which the algorithm can both sample from
and respectively, for any ,
- query the probability mass (query access); or
- get the total mass of , i.e. (cumulative
access)
These two models, by generalizing the previously studied sampling and query
oracle models, allow us to bypass the strong lower bounds established for a
number of problems in these settings, while capturing several interesting
aspects of these problems -- and providing new insight on the limitations of
the models. Finally, we show that while the testing algorithms can be in most
cases strictly more efficient, some tasks remain hard even with this additional
power
Learning mixtures of structured distributions over discrete domains
Let be a class of probability distributions over the discrete
domain We show that if satisfies a rather
general condition -- essentially, that each distribution in can
be well-approximated by a variable-width histogram with few bins -- then there
is a highly efficient (both in terms of running time and sample complexity)
algorithm that can learn any mixture of unknown distributions from
We analyze several natural types of distributions over , including
log-concave, monotone hazard rate and unimodal distributions, and show that
they have the required structural property of being well-approximated by a
histogram with few bins. Applying our general algorithm, we obtain
near-optimally efficient algorithms for all these mixture learning problems.Comment: preliminary full version of soda'13 pape
Testing k-Modal Distributions: Optimal Algorithms via Reductions
We give highly efficient algorithms, and almost matching lower bounds, for a range of basic statistical problems that involve testing and estimating the L[subscript 1] (total variation) distance between two k-modal distributions p and q over the discrete domain {1, âŠ, n}. More precisely, we consider the following four problems: given sample access to an unknown k-modal distribution p,
Testing identity to a known or unknown distribution:
1. Determine whether p = q (for an explicitly given k-modal distribution q) versus p is e-far from q;
2. Determine whether p = q (where q is available via sample access) versus p is Δ-far from q;
Estimating L[subscript 1] distance (âtolerant testingâ) against a known or unknown distribution:
3. Approximate d[subscript TV](p, q) to within additive Δ where q is an explicitly given k-modal distribution q;
4. Approximate d[subscript TV] (p, q) to within additive Δ where q is available via sample access.
For each of these four problems we give sub-logarithmic sample algorithms, and show that our algorithms have optimal sample complexity up to additive poly (k) and multiplicative polylog log n + polylogk factors. Our algorithms significantly improve the previous results of [BKR04], which were for testing identity of distributions (items (1) and (2) above) in the special cases k = 0 (monotone distributions) and k = 1 (unimodal distributions) and required O((log n)[superscript 3]) samples.
As our main conceptual contribution, we introduce a new reduction-based approach for distribution-testing problems that lets us obtain all the above results in a unified way. Roughly speaking, this approach enables us to transform various distribution testing problems for k-modal distributions over {1, âŠ, n} to the corresponding distribution testing problems for unrestricted distributions over a much smaller domain {1, âŠ, â} where â = O(k log n).National Science Foundation (U.S.) (CAREER Award CCF-0953960)Alfred P. Sloan Foundation (Fellowship
Learning -Modal Distributions via Testing
A -modal probability distribution over the discrete domain
is one whose histogram has at most "peaks" and "valleys." Such
distributions are natural generalizations of monotone () and unimodal
() probability distributions, which have been intensively studied in
probability theory and statistics.
In this paper we consider the problem of \emph{learning} (i.e., performing
density estimation of) an unknown -modal distribution with respect to the
distance. The learning algorithm is given access to independent samples
drawn from an unknown -modal distribution , and it must output a
hypothesis distribution such that with high probability the total
variation distance between and is at most Our
main goal is to obtain \emph{computationally efficient} algorithms for this
problem that use (close to) an information-theoretically optimal number of
samples.
We give an efficient algorithm for this problem that runs in time
. For , the
number of samples used by our algorithm is very close (within an
factor) to being information-theoretically
optimal. Prior to this work computationally efficient algorithms were known
only for the cases \cite{Birge:87b,Birge:97}.
A novel feature of our approach is that our learning algorithm crucially uses
a new algorithm for \emph{property testing of probability distributions} as a
key subroutine. The learning algorithm uses the property tester to efficiently
decompose the -modal distribution into (near-)monotone distributions,
which are easier to learn.Comment: 28 pages, full version of SODA'12 paper, to appear in Theory of
Computin
Quantum algorithms for testing properties of distributions
Suppose one has access to oracles generating samples from two unknown
probability distributions P and Q on some N-element set. How many samples does
one need to test whether the two distributions are close or far from each other
in the L_1-norm ? This and related questions have been extensively studied
during the last years in the field of property testing. In the present paper we
study quantum algorithms for testing properties of distributions. It is shown
that the L_1-distance between P and Q can be estimated with a constant
precision using approximately N^{1/2} queries in the quantum settings, whereas
classical computers need \Omega(N) queries. We also describe quantum algorithms
for testing Uniformity and Orthogonality with query complexity O(N^{1/3}). The
classical query complexity of these problems is known to be \Omega(N^{1/2}).Comment: 20 page
- âŠ