116 research outputs found

    05291 Abstracts Collection -- Sublinear Algorithms

    Get PDF
    From 17.07.05 to 22.07.05, the Dagstuhl Seminar 05291 ``Sublinear Algorithms\u27\u27 was held in the International Conference and Research Center (IBFI), Schloss Dagstuhl. During the seminar, several participants presented their current research, and ongoing work and open problems were discussed. Abstracts of the presentations given during the seminar as well as abstracts of seminar results and ideas are put together in this paper. The first section describes the seminar topics and goals in general. Links to extended abstracts or full papers are provided, if available

    Optimal testing for properties of distributions

    Get PDF
    Given samples from an unknown discrete distribution p, is it possible to distinguish whether p belongs to some class of distributions C versus p being far from every distribution in C? This fundamental question has received tremendous attention in statistics, focusing primarily on asymptotic analysis, as well as in information theory and theoretical computer science, where the emphasis has been on small sample size and computational complexity. Nevertheless, even for basic properties of discrete distributions such as monotonicity, independence, logconcavity, unimodality, and monotone-hazard rate, the optimal sample complexity is unknown. We provide a general approach via which we obtain sample-optimal and computationally efficient testers for all these distribution families. At the core of our approach is an algorithm which solves the following problem: Given samples from an unknown distribution p, and a known distribution q, are p and q close in x[superscript 2]-distance, or far in total variation distance? The optimality of our testers is established by providing matching lower bounds, up to constant factors. Finally, a necessary building block for our testers and an important byproduct of our work are the first known computationally efficient proper learners for discrete log-concave, monotone hazard rate distributions

    Testing probability distributions underlying aggregated data

    Full text link
    In this paper, we analyze and study a hybrid model for testing and learning probability distributions. Here, in addition to samples, the testing algorithm is provided with one of two different types of oracles to the unknown distribution DD over [n][n]. More precisely, we define both the dual and cumulative dual access models, in which the algorithm AA can both sample from DD and respectively, for any i∈[n]i\in[n], - query the probability mass D(i)D(i) (query access); or - get the total mass of {1,
,i}\{1,\dots,i\}, i.e. ∑j=1iD(j)\sum_{j=1}^i D(j) (cumulative access) These two models, by generalizing the previously studied sampling and query oracle models, allow us to bypass the strong lower bounds established for a number of problems in these settings, while capturing several interesting aspects of these problems -- and providing new insight on the limitations of the models. Finally, we show that while the testing algorithms can be in most cases strictly more efficient, some tasks remain hard even with this additional power

    Learning mixtures of structured distributions over discrete domains

    Full text link
    Let C\mathfrak{C} be a class of probability distributions over the discrete domain [n]={1,...,n}.[n] = \{1,...,n\}. We show that if C\mathfrak{C} satisfies a rather general condition -- essentially, that each distribution in C\mathfrak{C} can be well-approximated by a variable-width histogram with few bins -- then there is a highly efficient (both in terms of running time and sample complexity) algorithm that can learn any mixture of kk unknown distributions from C.\mathfrak{C}. We analyze several natural types of distributions over [n][n], including log-concave, monotone hazard rate and unimodal distributions, and show that they have the required structural property of being well-approximated by a histogram with few bins. Applying our general algorithm, we obtain near-optimally efficient algorithms for all these mixture learning problems.Comment: preliminary full version of soda'13 pape

    Testing k-Modal Distributions: Optimal Algorithms via Reductions

    Get PDF
    We give highly efficient algorithms, and almost matching lower bounds, for a range of basic statistical problems that involve testing and estimating the L[subscript 1] (total variation) distance between two k-modal distributions p and q over the discrete domain {1, 
, n}. More precisely, we consider the following four problems: given sample access to an unknown k-modal distribution p, Testing identity to a known or unknown distribution: 1. Determine whether p = q (for an explicitly given k-modal distribution q) versus p is e-far from q; 2. Determine whether p = q (where q is available via sample access) versus p is Δ-far from q; Estimating L[subscript 1] distance (“tolerant testing”) against a known or unknown distribution: 3. Approximate d[subscript TV](p, q) to within additive Δ where q is an explicitly given k-modal distribution q; 4. Approximate d[subscript TV] (p, q) to within additive Δ where q is available via sample access. For each of these four problems we give sub-logarithmic sample algorithms, and show that our algorithms have optimal sample complexity up to additive poly (k) and multiplicative polylog log n + polylogk factors. Our algorithms significantly improve the previous results of [BKR04], which were for testing identity of distributions (items (1) and (2) above) in the special cases k = 0 (monotone distributions) and k = 1 (unimodal distributions) and required O((log n)[superscript 3]) samples. As our main conceptual contribution, we introduce a new reduction-based approach for distribution-testing problems that lets us obtain all the above results in a unified way. Roughly speaking, this approach enables us to transform various distribution testing problems for k-modal distributions over {1, 
, n} to the corresponding distribution testing problems for unrestricted distributions over a much smaller domain {1, 
, ℓ} where ℓ = O(k log n).National Science Foundation (U.S.) (CAREER Award CCF-0953960)Alfred P. Sloan Foundation (Fellowship

    Learning kk-Modal Distributions via Testing

    Get PDF
    A kk-modal probability distribution over the discrete domain {1,...,n}\{1,...,n\} is one whose histogram has at most kk "peaks" and "valleys." Such distributions are natural generalizations of monotone (k=0k=0) and unimodal (k=1k=1) probability distributions, which have been intensively studied in probability theory and statistics. In this paper we consider the problem of \emph{learning} (i.e., performing density estimation of) an unknown kk-modal distribution with respect to the L1L_1 distance. The learning algorithm is given access to independent samples drawn from an unknown kk-modal distribution pp, and it must output a hypothesis distribution p^\widehat{p} such that with high probability the total variation distance between pp and p^\widehat{p} is at most Ï”.\epsilon. Our main goal is to obtain \emph{computationally efficient} algorithms for this problem that use (close to) an information-theoretically optimal number of samples. We give an efficient algorithm for this problem that runs in time poly(k,log⁥(n),1/Ï”)\mathrm{poly}(k,\log(n),1/\epsilon). For k≀O~(log⁥n)k \leq \tilde{O}(\log n), the number of samples used by our algorithm is very close (within an O~(log⁥(1/Ï”))\tilde{O}(\log(1/\epsilon)) factor) to being information-theoretically optimal. Prior to this work computationally efficient algorithms were known only for the cases k=0,1k=0,1 \cite{Birge:87b,Birge:97}. A novel feature of our approach is that our learning algorithm crucially uses a new algorithm for \emph{property testing of probability distributions} as a key subroutine. The learning algorithm uses the property tester to efficiently decompose the kk-modal distribution into kk (near-)monotone distributions, which are easier to learn.Comment: 28 pages, full version of SODA'12 paper, to appear in Theory of Computin

    Quantum algorithms for testing properties of distributions

    Get PDF
    Suppose one has access to oracles generating samples from two unknown probability distributions P and Q on some N-element set. How many samples does one need to test whether the two distributions are close or far from each other in the L_1-norm ? This and related questions have been extensively studied during the last years in the field of property testing. In the present paper we study quantum algorithms for testing properties of distributions. It is shown that the L_1-distance between P and Q can be estimated with a constant precision using approximately N^{1/2} queries in the quantum settings, whereas classical computers need \Omega(N) queries. We also describe quantum algorithms for testing Uniformity and Orthogonality with query complexity O(N^{1/3}). The classical query complexity of these problems is known to be \Omega(N^{1/2}).Comment: 20 page
    • 

    corecore