Search CORE

23 research outputs found

Faster and Sample Near-Optimal Algorithms for Proper Learning Mixtures of Gaussians

Author: Daskalakis Constantinos
Kamath Gautam
Publication venue
Publication date: 19/05/2014
Field of study

We provide an algorithm for properly learning mixtures of two single-dimensional Gaussians without any separability assumptions. Given

\tilde{O}(1/\varepsilon^2)

samples from an unknown mixture, our algorithm outputs a mixture that is

\varepsilon

-close in total variation distance, in time

\tilde{O}(1/\varepsilon^5)

. Our sample complexity is optimal up to logarithmic factors, and significantly improves upon both Kalai et al., whose algorithm has a prohibitive dependence on

1/\varepsilon

, and Feldman et al., whose algorithm requires bounds on the mixture parameters and depends pseudo-polynomially in these parameters. One of our main contributions is an improved and generalized algorithm for selecting a good candidate distribution from among competing hypotheses. Namely, given a collection of

N

hypotheses containing at least one candidate that is

\varepsilon

-close to an unknown distribution, our algorithm outputs a candidate which is

O(\varepsilon)

-close to the distribution. The algorithm requires

{O}(\log{N}/\varepsilon^2)

samples from the unknown distribution and

{O}(N \log N/\varepsilon^2)

time, which improves previous such results (such as the Scheff\'e estimator) from a quadratic dependence of the running time on

N

to quasilinear. Given the wide use of such results for the purpose of hypothesis selection, our improved algorithm implies immediate improvements to any such use.Comment: 31 pages, to appear in COLT 201

arXiv.org e-Print Archive

Robust hypothesis testing and distribution estimation in Hellinger distance

Author: Suresh Ananda Theertha
Publication venue
Publication date: 03/11/2020
Field of study

We propose a simple robust hypothesis test that has the same sample complexity as that of the optimal Neyman-Pearson test up to constants, but robust to distribution perturbations under Hellinger distance. We discuss the applicability of such a robust test for estimating distributions in Hellinger distance. We empirically demonstrate the power of the test on canonical distributions

arXiv.org e-Print Archive

Efficient Robust Proper Learning of Log-concave Distributions

Author: Diakonikolas Ilias
Kane Daniel M.
Stewart Alistair
Publication venue
Publication date: 09/06/2016
Field of study

We study the {\em robust proper learning} of univariate log-concave distributions (over continuous and discrete domains). Given a set of samples drawn from an unknown target distribution, we want to compute a log-concave hypothesis distribution that is as close as possible to the target, in total variation distance. In this work, we give the first computationally efficient algorithm for this learning problem. Our algorithm achieves the information-theoretically optimal sample size (up to a constant factor), runs in polynomial time, and is robust to model misspecification with nearly-optimal error guarantees. Specifically, we give an algorithm that, on input n=O(1/\eps^{5/2}) samples from an unknown distribution

f

, runs in time

\widetilde{O}(n^{8/5})

, and outputs a log-concave hypothesis

h

that (with high probability) satisfies \dtv(h, f) = O(\opt)+\eps, where \opt is the minimum total variation distance between

f

and the class of log-concave distributions. Our approach to the robust proper learning problem is quite flexible and may be applicable to many other univariate distribution families

arXiv.org e-Print Archive

Sparse Solutions to Nonnegative Linear Systems and Applications

Author: Bhaskara Aditya
Suresh Ananda Theertha
Zadimoghaddam Morteza
Publication venue
Publication date: 07/01/2015
Field of study

We give an efficient algorithm for finding sparse approximate solutions to linear systems of equations with nonnegative coefficients. Unlike most known results for sparse recovery, we do not require {\em any} assumption on the matrix other than non-negativity. Our algorithm is combinatorial in nature, inspired by techniques for the set cover problem, as well as the multiplicative weight update method. We then present a natural application to learning mixture models in the PAC framework. For learning a mixture of

k

axis-aligned Gaussians in

d

dimensions, we give an algorithm that outputs a mixture of

O(k/\epsilon^3)

Gaussians that is

\epsilon

-close in statistical distance to the true distribution, without any separation assumptions. The time and sample complexity is roughly

O(kd/\epsilon^3)^{d}

. This is polynomial when

d

is constant -- precisely the regime in which known methods fail to identify the components efficiently. Given that non-negativity is a natural assumption, we believe that our result may find use in other settings in which we wish to approximately explain data using a small number of a (large) candidate set of components.Comment: 22 page

arXiv.org e-Print Archive

Maximum Selection and Sorting with Adversarial Comparators and an Application to Density Estimation

Author: Acharya Jayadev
Falahatgar Moein
Jafarpour Ashkan
Orlitsky Alon
Suresh Ananda Theertha
Publication venue
Publication date: 08/06/2016
Field of study

We study maximum selection and sorting of

n

numbers using pairwise comparators that output the larger of their two inputs if the inputs are more than a given threshold apart, and output an adversarially-chosen input otherwise. We consider two adversarial models. A non-adaptive adversary that decides on the outcomes in advance based solely on the inputs, and an adaptive adversary that can decide on the outcome of each query depending on previous queries and outcomes. Against the non-adaptive adversary, we derive a maximum-selection algorithm that uses at most

2n

comparisons in expectation, and a sorting algorithm that uses at most

2n \ln n

comparisons in expectation. These numbers are within small constant factors from the best possible. Against the adaptive adversary, we propose a maximum-selection algorithm that uses

\Theta(n\log (1/{\epsilon}))

comparisons to output a correct answer with probability at least

1-\epsilon

. The existence of this algorithm affirmatively resolves an open problem of Ajtai, Feldman, Hassadim, and Nelson. Our study was motivated by a density-estimation problem where, given samples from an unknown underlying distribution, we would like to find a distribution in a known class of

n

candidate distributions that is close to underlying distribution in

\ell_1

distance. Scheffe's algorithm outputs a distribution at an

\ell_1

distance at most 9 times the minimum and runs in time

\Theta(n^2\log n)

. Using maximum selection, we propose an algorithm with the same approximation guarantee but run time of

\Theta(n\log n)

arXiv.org e-Print Archive

Properly Learning Poisson Binomial Distributions in Almost Polynomial Time

Author: Diakonikolas Ilias
Kane Daniel M.
Stewart Alistair
Publication venue
Publication date: 12/11/2015
Field of study

We give an algorithm for properly learning Poisson binomial distributions. A Poisson binomial distribution (PBD) of order

n

is the discrete probability distribution of the sum of

n

mutually independent Bernoulli random variables. Given

\widetilde{O}(1/\epsilon^2)

samples from an unknown PBD

\mathbf{p}

, our algorithm runs in time

(1/\epsilon)^{O(\log \log (1/\epsilon))}

, and outputs a hypothesis PBD that is

\epsilon

-close to

\mathbf{p}

in total variation distance. The previously best known running time for properly learning PBDs was

(1/\epsilon)^{O(\log(1/\epsilon))}

. As one of our main contributions, we provide a novel structural characterization of PBDs. We prove that, for all

\epsilon >0,

there exists an explicit collection

\cal{M}

(1/\epsilon)^{O(\log \log (1/\epsilon))}

vectors of multiplicities, such that for any PBD

\mathbf{p}

there exists a PBD

\mathbf{q}

with

O(\log(1/\epsilon))

distinct parameters whose multiplicities are given by some element of

{\cal M}

, such that

\mathbf{q}

\epsilon

-close to

\mathbf{p}

. Our proof combines tools from Fourier analysis and algebraic geometry. Our approach to the proper learning problem is as follows: Starting with an accurate non-proper hypothesis, we fit a PBD to this hypothesis. More specifically, we essentially start with the hypothesis computed by the computationally efficient non-proper learning algorithm in our recent work~\cite{DKS15}. Our aforementioned structural characterization allows us to reduce the corresponding fitting problem to a collection of

(1/\epsilon)^{O(\log \log(1/\epsilon))}

systems of low-degree polynomial inequalities. We show that each such system can be solved in time

(1/\epsilon)^{O(\log \log(1/\epsilon))}

, which yields the overall running time of our algorithm

arXiv.org e-Print Archive

Algebraic and Analytic Approaches for Parameter Learning in Mixture Models

Author: Krishnamurthy Akshay
Mazumdar Arya
McGregor Andrew
Pal Soumyabrata
Publication venue
Publication date: 19/01/2020
Field of study

We present two different approaches for parameter learning in several mixture models in one dimension. Our first approach uses complex-analytic methods and applies to Gaussian mixtures with shared variance, binomial mixtures with shared success probability, and Poisson mixtures, among others. An example result is that

\exp(O(N^{1/3}))

samples suffice to exactly learn a mixture of

k<N

Poisson distributions, each with integral rate parameters bounded by

N

. Our second approach uses algebraic and combinatorial tools and applies to binomial mixtures with shared trial parameter

N

and differing success parameters, as well as to mixtures of geometric distributions. Again, as an example, for binomial mixtures with

k

components and success parameters discretized to resolution

\epsilon

O(k^2(N/\epsilon)^{8/\sqrt{\epsilon}})

samples suffice to exactly recover the parameters. For some of these distributions, our results represent the first guarantees for parameter estimation.Comment: 22 pages, Accepted at Algorithmic Learning Theory (ALT) 202

arXiv.org e-Print Archive

Hadamard Response: Estimating Distributions Privately, Efficiently, and with Little Communication

Author: Acharya Jayadev
Sun Ziteng
Zhang Huanyu
Publication venue
Publication date: 27/06/2018
Field of study

We study the problem of estimating

k

-ary distributions under

\varepsilon

-local differential privacy.

n

samples are distributed across users who send privatized versions of their sample to a central server. All previously known sample optimal algorithms require linear (in

k

) communication from each user in the high privacy regime

(\varepsilon=O(1))

, and run in time that grows as

n\cdot k

, which can be prohibitive for large domain size

k

. We propose Hadamard Response (HR}, a local privatization scheme that requires no shared randomness and is symmetric with respect to the users. Our scheme has order optimal sample complexity for all

\varepsilon

, a communication of at most

\log k+2

bits per user, and nearly linear running time of

\tilde{O}(n + k)

. Our encoding and decoding are based on Hadamard matrices, and are simple to implement. The statistical performance relies on the coding theoretic aspects of Hadamard matrices, ie, the large Hamming distance between the rows. An efficient implementation of the algorithm using the Fast Walsh-Hadamard transform gives the computational gains. We compare our approach with Randomized Response (RR), RAPPOR, and subset-selection mechanisms (SS), both theoretically, and experimentally. For

k=10000

, our algorithm runs about 100x faster than SS, and RAPPOR

arXiv.org e-Print Archive

Splintering with distributions: A stochastic decoy scheme for private computation

Author: Balla Julia
Raskar Ramesh
Vepakomma Praneeth
Publication venue
Publication date: 07/07/2020
Field of study

Performing computations while maintaining privacy is an important problem in todays distributed machine learning solutions. Consider the following two set ups between a client and a server, where in setup i) the client has a public data vector

\mathbf{x}

, the server has a large private database of data vectors

\mathcal{B}

and the client wants to find the inner products

\langle \mathbf{x,y_k} \rangle, \forall \mathbf{y_k} \in \mathcal{B}

. The client does not want the server to learn

\mathbf{x}

while the server does not want the client to learn the records in its database. This is in contrast to another setup ii) where the client would like to perform an operation solely on its data, such as computation of a matrix inverse on its data matrix

\mathbf{M}

, but would like to use the superior computing ability of the server to do so without having to leak

\mathbf{M}

to the server. \par We present a stochastic scheme for splitting the client data into privatized shares that are transmitted to the server in such settings. The server performs the requested operations on these shares instead of on the raw client data at the server. The obtained intermediate results are sent back to the client where they are assembled by the client to obtain the final result.Comment: 28 pages, 6 figure

arXiv.org e-Print Archive

A Nearly Optimal and Agnostic Algorithm for Properly Learning a Mixture of k Gaussians, for any Constant k

Author: Li Jerry
Schmidt Ludwig
Publication venue
Publication date: 03/06/2015
Field of study

Learning a Gaussian mixture model (GMM) is a fundamental problem in machine learning, learning theory, and statistics. One notion of learning a GMM is proper learning: here, the goal is to find a mixture of

k

Gaussians

\mathcal{M}

that is close to the density

f

of the unknown distribution from which we draw samples. The distance between

\mathcal{M}

and

f

is typically measured in the total variation or

L_1

-norm. We give an algorithm for learning a mixture of

k

univariate Gaussians that is nearly optimal for any fixed

k

. The sample complexity of our algorithm is

\tilde{O}(\frac{k}{\epsilon^2})

and the running time is

(k \cdot \log\frac{1}{\epsilon})^{O(k^4)} + \tilde{O}(\frac{k}{\epsilon^2})

. It is well-known that this sample complexity is optimal (up to logarithmic factors), and it was already achieved by prior work. However, the best known time complexity for proper learning a

k

-GMM was

\tilde{O}(\frac{1}{\epsilon^{3k-1}})

. In particular, the dependence between

\frac{1}{\epsilon}

and

k

was exponential. We significantly improve this dependence by replacing the

\frac{1}{\epsilon}

term with a

\log \frac{1}{\epsilon}

while only increasing the exponent moderately. Hence, for any fixed

k

, the

\tilde{O} (\frac{k}{\epsilon^2})

term dominates our running time, and thus our algorithm runs in time which is nearly-linear in the number of samples drawn. Achieving a running time of

\textrm{poly}(k, \frac{1}{\epsilon})

for proper learning of

k

-GMMs has recently been stated as an open problem by multiple researchers, and we make progress on this question. Moreover, our approach offers an agnostic learning guarantee: our algorithm returns a good GMM even if the distribution we are sampling from is not a mixture of Gaussians. To the best of our knowledge, our algorithm is the first agnostic proper learning algorithm for GMMs

arXiv.org e-Print Archive