12 research outputs found

    Sample-Efficient Learning of Mixtures

    Full text link
    We consider PAC learning of probability distributions (a.k.a. density estimation), where we are given an i.i.d. sample generated from an unknown target distribution, and want to output a distribution that is close to the target in total variation distance. Let F\mathcal F be an arbitrary class of probability distributions, and let Fk\mathcal{F}^k denote the class of kk-mixtures of elements of F\mathcal F. Assuming the existence of a method for learning F\mathcal F with sample complexity mF(ϵ)m_{\mathcal{F}}(\epsilon), we provide a method for learning Fk\mathcal F^k with sample complexity O(klogkmF(ϵ)/ϵ2)O({k\log k \cdot m_{\mathcal F}(\epsilon) }/{\epsilon^{2}}). Our mixture learning algorithm has the property that, if the F\mathcal F-learner is proper/agnostic, then the Fk\mathcal F^k-learner would be proper/agnostic as well. This general result enables us to improve the best known sample complexity upper bounds for a variety of important mixture classes. First, we show that the class of mixtures of kk axis-aligned Gaussians in Rd\mathbb{R}^d is PAC-learnable in the agnostic setting with O~(kd/ϵ4)\widetilde{O}({kd}/{\epsilon ^ 4}) samples, which is tight in kk and dd up to logarithmic factors. Second, we show that the class of mixtures of kk Gaussians in Rd\mathbb{R}^d is PAC-learnable in the agnostic setting with sample complexity O~(kd2/ϵ4)\widetilde{O}({kd^2}/{\epsilon ^ 4}), which improves the previous known bounds of O~(k3d2/ϵ4)\widetilde{O}({k^3d^2}/{\epsilon ^ 4}) and O~(k4d4/ϵ2)\widetilde{O}(k^4d^4/\epsilon ^ 2) in its dependence on kk and dd. Finally, we show that the class of mixtures of kk log-concave distributions over Rd\mathbb{R}^d is PAC-learnable using O~(d(d+5)/2ϵ(d+9)/2k)\widetilde{O}(d^{(d+5)/2}\epsilon^{-(d+9)/2}k) samples.Comment: A bug from the previous version, which appeared in AAAI 2018 proceedings, is fixed. 18 page

    Improved quantum data analysis

    Full text link
    We provide more sample-efficient versions of some basic routines in quantum data analysis, along with simpler proofs. Particularly, we give a quantum "Threshold Search" algorithm that requires only O((log2m)/ϵ2)O((\log^2 m)/\epsilon^2) samples of a dd-dimensional state ρ\rho. That is, given observables 0A1,A2,...,Am10 \le A_1, A_2, ..., A_m \le 1 such that tr(ρAi)1/2\mathrm{tr}(\rho A_i) \ge 1/2 for at least one ii, the algorithm finds jj with tr(ρAj)1/2ϵ\mathrm{tr}(\rho A_j) \ge 1/2-\epsilon. As a consequence, we obtain a Shadow Tomography algorithm requiring only O~((log2m)(logd)/ϵ4)\tilde{O}((\log^2 m)(\log d)/\epsilon^4) samples, which simultaneously achieves the best known dependence on each parameter mm, dd, ϵ\epsilon. This yields the same sample complexity for quantum Hypothesis Selection among mm states; we also give an alternative Hypothesis Selection method using O~((log3m)/ϵ2)\tilde{O}((\log^3 m)/\epsilon^2) samples

    Minimum distance histograms with universal performance guarantees

    Get PDF
    Abstract(#br)We present a data-adaptive multivariate histogram estimator of an unknown density f based on n independent samples from it. Such histograms are based on binary trees called regular pavings (RPs). RPs represent a computationally convenient class of simple functions that remain closed under addition and scalar multiplication. Unlike other density estimation methods, including various regularization and Bayesian methods based on the likelihood, the minimum distance estimate (MDE) is guaranteed to be within an L1L_1 L 1 distance bound from f for a given n , no matter what the underlying f happens to be, and is thus said to have universal performance guarantees (Devroye and Lugosi, Combinatorial methods in density estimation. Springer, New York, 2001 ). Using a form of tree..

    Private hypothesis selection

    Full text link
    We provide a differentially private algorithm for hypothesis selection. Given samples from an unknown probability distribution P and a set of m probability distributions H, the goal is to output, in a ε-differentially private manner, a distribution from H whose total variation distance to P is comparable to that of the best such distribution (which we denote by α). The sample complexity of our basic algorithm is O(log m/α^2 + log m/αε), representing a minimal cost for privacy when compared to the non-private algorithm. We also can handle infinite hypothesis classes H by relaxing to (ε, δ)-differential privacy. We apply our hypothesis selection algorithm to give learning algorithms for a number of natural distribution classes, including Gaussians, product distributions, sums of independent random variables, piecewise polynomials, and mixture classes. Our hypothesis selection procedure allows us to generically convert a cover for a class to a learning algorithm, complementing known learning lower bounds which are in terms of the size of the packing number of the class. As the covering and packing numbers are often closely related, for constant α, our algorithms achieve the optimal sample complexity for many classes of interest. Finally, we describe an application to private distribution-free PAC learning.https://arxiv.org/abs/1905.1322