23 research outputs found

    Faster and Sample Near-Optimal Algorithms for Proper Learning Mixtures of Gaussians

    Full text link
    We provide an algorithm for properly learning mixtures of two single-dimensional Gaussians without any separability assumptions. Given O~(1/ε2)\tilde{O}(1/\varepsilon^2) samples from an unknown mixture, our algorithm outputs a mixture that is ε\varepsilon-close in total variation distance, in time O~(1/ε5)\tilde{O}(1/\varepsilon^5). Our sample complexity is optimal up to logarithmic factors, and significantly improves upon both Kalai et al., whose algorithm has a prohibitive dependence on 1/ε1/\varepsilon, and Feldman et al., whose algorithm requires bounds on the mixture parameters and depends pseudo-polynomially in these parameters. One of our main contributions is an improved and generalized algorithm for selecting a good candidate distribution from among competing hypotheses. Namely, given a collection of NN hypotheses containing at least one candidate that is ε\varepsilon-close to an unknown distribution, our algorithm outputs a candidate which is O(ε)O(\varepsilon)-close to the distribution. The algorithm requires O(logN/ε2){O}(\log{N}/\varepsilon^2) samples from the unknown distribution and O(NlogN/ε2){O}(N \log N/\varepsilon^2) time, which improves previous such results (such as the Scheff\'e estimator) from a quadratic dependence of the running time on NN to quasilinear. Given the wide use of such results for the purpose of hypothesis selection, our improved algorithm implies immediate improvements to any such use.Comment: 31 pages, to appear in COLT 201

    Robust hypothesis testing and distribution estimation in Hellinger distance

    Full text link
    We propose a simple robust hypothesis test that has the same sample complexity as that of the optimal Neyman-Pearson test up to constants, but robust to distribution perturbations under Hellinger distance. We discuss the applicability of such a robust test for estimating distributions in Hellinger distance. We empirically demonstrate the power of the test on canonical distributions

    Efficient Robust Proper Learning of Log-concave Distributions

    Full text link
    We study the {\em robust proper learning} of univariate log-concave distributions (over continuous and discrete domains). Given a set of samples drawn from an unknown target distribution, we want to compute a log-concave hypothesis distribution that is as close as possible to the target, in total variation distance. In this work, we give the first computationally efficient algorithm for this learning problem. Our algorithm achieves the information-theoretically optimal sample size (up to a constant factor), runs in polynomial time, and is robust to model misspecification with nearly-optimal error guarantees. Specifically, we give an algorithm that, on input n=O(1/\eps^{5/2}) samples from an unknown distribution ff, runs in time O~(n8/5)\widetilde{O}(n^{8/5}), and outputs a log-concave hypothesis hh that (with high probability) satisfies \dtv(h, f) = O(\opt)+\eps, where \opt is the minimum total variation distance between ff and the class of log-concave distributions. Our approach to the robust proper learning problem is quite flexible and may be applicable to many other univariate distribution families

    Sparse Solutions to Nonnegative Linear Systems and Applications

    Full text link
    We give an efficient algorithm for finding sparse approximate solutions to linear systems of equations with nonnegative coefficients. Unlike most known results for sparse recovery, we do not require {\em any} assumption on the matrix other than non-negativity. Our algorithm is combinatorial in nature, inspired by techniques for the set cover problem, as well as the multiplicative weight update method. We then present a natural application to learning mixture models in the PAC framework. For learning a mixture of kk axis-aligned Gaussians in dd dimensions, we give an algorithm that outputs a mixture of O(k/ϵ3)O(k/\epsilon^3) Gaussians that is ϵ\epsilon-close in statistical distance to the true distribution, without any separation assumptions. The time and sample complexity is roughly O(kd/ϵ3)dO(kd/\epsilon^3)^{d}. This is polynomial when dd is constant -- precisely the regime in which known methods fail to identify the components efficiently. Given that non-negativity is a natural assumption, we believe that our result may find use in other settings in which we wish to approximately explain data using a small number of a (large) candidate set of components.Comment: 22 page

    Maximum Selection and Sorting with Adversarial Comparators and an Application to Density Estimation

    Full text link
    We study maximum selection and sorting of nn numbers using pairwise comparators that output the larger of their two inputs if the inputs are more than a given threshold apart, and output an adversarially-chosen input otherwise. We consider two adversarial models. A non-adaptive adversary that decides on the outcomes in advance based solely on the inputs, and an adaptive adversary that can decide on the outcome of each query depending on previous queries and outcomes. Against the non-adaptive adversary, we derive a maximum-selection algorithm that uses at most 2n2n comparisons in expectation, and a sorting algorithm that uses at most 2nlnn2n \ln n comparisons in expectation. These numbers are within small constant factors from the best possible. Against the adaptive adversary, we propose a maximum-selection algorithm that uses Θ(nlog(1/ϵ))\Theta(n\log (1/{\epsilon})) comparisons to output a correct answer with probability at least 1ϵ1-\epsilon. The existence of this algorithm affirmatively resolves an open problem of Ajtai, Feldman, Hassadim, and Nelson. Our study was motivated by a density-estimation problem where, given samples from an unknown underlying distribution, we would like to find a distribution in a known class of nn candidate distributions that is close to underlying distribution in 1\ell_1 distance. Scheffe's algorithm outputs a distribution at an 1\ell_1 distance at most 9 times the minimum and runs in time Θ(n2logn)\Theta(n^2\log n). Using maximum selection, we propose an algorithm with the same approximation guarantee but run time of Θ(nlogn)\Theta(n\log n)

    Properly Learning Poisson Binomial Distributions in Almost Polynomial Time

    Full text link
    We give an algorithm for properly learning Poisson binomial distributions. A Poisson binomial distribution (PBD) of order nn is the discrete probability distribution of the sum of nn mutually independent Bernoulli random variables. Given O~(1/ϵ2)\widetilde{O}(1/\epsilon^2) samples from an unknown PBD p\mathbf{p}, our algorithm runs in time (1/ϵ)O(loglog(1/ϵ))(1/\epsilon)^{O(\log \log (1/\epsilon))}, and outputs a hypothesis PBD that is ϵ\epsilon-close to p\mathbf{p} in total variation distance. The previously best known running time for properly learning PBDs was (1/ϵ)O(log(1/ϵ))(1/\epsilon)^{O(\log(1/\epsilon))}. As one of our main contributions, we provide a novel structural characterization of PBDs. We prove that, for all ϵ>0,\epsilon >0, there exists an explicit collection M\cal{M} of (1/ϵ)O(loglog(1/ϵ))(1/\epsilon)^{O(\log \log (1/\epsilon))} vectors of multiplicities, such that for any PBD p\mathbf{p} there exists a PBD q\mathbf{q} with O(log(1/ϵ))O(\log(1/\epsilon)) distinct parameters whose multiplicities are given by some element of M{\cal M}, such that q\mathbf{q} is ϵ\epsilon-close to p\mathbf{p}. Our proof combines tools from Fourier analysis and algebraic geometry. Our approach to the proper learning problem is as follows: Starting with an accurate non-proper hypothesis, we fit a PBD to this hypothesis. More specifically, we essentially start with the hypothesis computed by the computationally efficient non-proper learning algorithm in our recent work~\cite{DKS15}. Our aforementioned structural characterization allows us to reduce the corresponding fitting problem to a collection of (1/ϵ)O(loglog(1/ϵ))(1/\epsilon)^{O(\log \log(1/\epsilon))} systems of low-degree polynomial inequalities. We show that each such system can be solved in time (1/ϵ)O(loglog(1/ϵ))(1/\epsilon)^{O(\log \log(1/\epsilon))}, which yields the overall running time of our algorithm

    Algebraic and Analytic Approaches for Parameter Learning in Mixture Models

    Full text link
    We present two different approaches for parameter learning in several mixture models in one dimension. Our first approach uses complex-analytic methods and applies to Gaussian mixtures with shared variance, binomial mixtures with shared success probability, and Poisson mixtures, among others. An example result is that exp(O(N1/3))\exp(O(N^{1/3})) samples suffice to exactly learn a mixture of k<Nk<N Poisson distributions, each with integral rate parameters bounded by NN. Our second approach uses algebraic and combinatorial tools and applies to binomial mixtures with shared trial parameter NN and differing success parameters, as well as to mixtures of geometric distributions. Again, as an example, for binomial mixtures with kk components and success parameters discretized to resolution ϵ\epsilon, O(k2(N/ϵ)8/ϵ)O(k^2(N/\epsilon)^{8/\sqrt{\epsilon}}) samples suffice to exactly recover the parameters. For some of these distributions, our results represent the first guarantees for parameter estimation.Comment: 22 pages, Accepted at Algorithmic Learning Theory (ALT) 202

    Hadamard Response: Estimating Distributions Privately, Efficiently, and with Little Communication

    Full text link
    We study the problem of estimating kk-ary distributions under ε\varepsilon-local differential privacy. nn samples are distributed across users who send privatized versions of their sample to a central server. All previously known sample optimal algorithms require linear (in kk) communication from each user in the high privacy regime (ε=O(1))(\varepsilon=O(1)), and run in time that grows as nkn\cdot k, which can be prohibitive for large domain size kk. We propose Hadamard Response (HR}, a local privatization scheme that requires no shared randomness and is symmetric with respect to the users. Our scheme has order optimal sample complexity for all ε\varepsilon, a communication of at most logk+2\log k+2 bits per user, and nearly linear running time of O~(n+k)\tilde{O}(n + k). Our encoding and decoding are based on Hadamard matrices, and are simple to implement. The statistical performance relies on the coding theoretic aspects of Hadamard matrices, ie, the large Hamming distance between the rows. An efficient implementation of the algorithm using the Fast Walsh-Hadamard transform gives the computational gains. We compare our approach with Randomized Response (RR), RAPPOR, and subset-selection mechanisms (SS), both theoretically, and experimentally. For k=10000k=10000, our algorithm runs about 100x faster than SS, and RAPPOR

    Splintering with distributions: A stochastic decoy scheme for private computation

    Full text link
    Performing computations while maintaining privacy is an important problem in todays distributed machine learning solutions. Consider the following two set ups between a client and a server, where in setup i) the client has a public data vector x\mathbf{x}, the server has a large private database of data vectors B\mathcal{B} and the client wants to find the inner products x,yk,ykB\langle \mathbf{x,y_k} \rangle, \forall \mathbf{y_k} \in \mathcal{B}. The client does not want the server to learn x\mathbf{x} while the server does not want the client to learn the records in its database. This is in contrast to another setup ii) where the client would like to perform an operation solely on its data, such as computation of a matrix inverse on its data matrix M\mathbf{M}, but would like to use the superior computing ability of the server to do so without having to leak M\mathbf{M} to the server. \par We present a stochastic scheme for splitting the client data into privatized shares that are transmitted to the server in such settings. The server performs the requested operations on these shares instead of on the raw client data at the server. The obtained intermediate results are sent back to the client where they are assembled by the client to obtain the final result.Comment: 28 pages, 6 figure

    A Nearly Optimal and Agnostic Algorithm for Properly Learning a Mixture of k Gaussians, for any Constant k

    Full text link
    Learning a Gaussian mixture model (GMM) is a fundamental problem in machine learning, learning theory, and statistics. One notion of learning a GMM is proper learning: here, the goal is to find a mixture of kk Gaussians M\mathcal{M} that is close to the density ff of the unknown distribution from which we draw samples. The distance between M\mathcal{M} and ff is typically measured in the total variation or L1L_1-norm. We give an algorithm for learning a mixture of kk univariate Gaussians that is nearly optimal for any fixed kk. The sample complexity of our algorithm is O~(kϵ2)\tilde{O}(\frac{k}{\epsilon^2}) and the running time is (klog1ϵ)O(k4)+O~(kϵ2)(k \cdot \log\frac{1}{\epsilon})^{O(k^4)} + \tilde{O}(\frac{k}{\epsilon^2}). It is well-known that this sample complexity is optimal (up to logarithmic factors), and it was already achieved by prior work. However, the best known time complexity for proper learning a kk-GMM was O~(1ϵ3k1)\tilde{O}(\frac{1}{\epsilon^{3k-1}}). In particular, the dependence between 1ϵ\frac{1}{\epsilon} and kk was exponential. We significantly improve this dependence by replacing the 1ϵ\frac{1}{\epsilon} term with a log1ϵ\log \frac{1}{\epsilon} while only increasing the exponent moderately. Hence, for any fixed kk, the O~(kϵ2)\tilde{O} (\frac{k}{\epsilon^2}) term dominates our running time, and thus our algorithm runs in time which is nearly-linear in the number of samples drawn. Achieving a running time of poly(k,1ϵ)\textrm{poly}(k, \frac{1}{\epsilon}) for proper learning of kk-GMMs has recently been stated as an open problem by multiple researchers, and we make progress on this question. Moreover, our approach offers an agnostic learning guarantee: our algorithm returns a good GMM even if the distribution we are sampling from is not a mixture of Gaussians. To the best of our knowledge, our algorithm is the first agnostic proper learning algorithm for GMMs
    corecore