23 research outputs found
Faster and Sample Near-Optimal Algorithms for Proper Learning Mixtures of Gaussians
We provide an algorithm for properly learning mixtures of two
single-dimensional Gaussians without any separability assumptions. Given
samples from an unknown mixture, our algorithm
outputs a mixture that is -close in total variation distance, in
time . Our sample complexity is optimal up to
logarithmic factors, and significantly improves upon both Kalai et al., whose
algorithm has a prohibitive dependence on , and Feldman et al.,
whose algorithm requires bounds on the mixture parameters and depends
pseudo-polynomially in these parameters.
One of our main contributions is an improved and generalized algorithm for
selecting a good candidate distribution from among competing hypotheses.
Namely, given a collection of hypotheses containing at least one candidate
that is -close to an unknown distribution, our algorithm outputs a
candidate which is -close to the distribution. The algorithm
requires samples from the unknown distribution and
time, which improves previous such results (such
as the Scheff\'e estimator) from a quadratic dependence of the running time on
to quasilinear. Given the wide use of such results for the purpose of
hypothesis selection, our improved algorithm implies immediate improvements to
any such use.Comment: 31 pages, to appear in COLT 201
Robust hypothesis testing and distribution estimation in Hellinger distance
We propose a simple robust hypothesis test that has the same sample
complexity as that of the optimal Neyman-Pearson test up to constants, but
robust to distribution perturbations under Hellinger distance. We discuss the
applicability of such a robust test for estimating distributions in Hellinger
distance. We empirically demonstrate the power of the test on canonical
distributions
Efficient Robust Proper Learning of Log-concave Distributions
We study the {\em robust proper learning} of univariate log-concave
distributions (over continuous and discrete domains). Given a set of samples
drawn from an unknown target distribution, we want to compute a log-concave
hypothesis distribution that is as close as possible to the target, in total
variation distance. In this work, we give the first computationally efficient
algorithm for this learning problem. Our algorithm achieves the
information-theoretically optimal sample size (up to a constant factor), runs
in polynomial time, and is robust to model misspecification with nearly-optimal
error guarantees.
Specifically, we give an algorithm that, on input n=O(1/\eps^{5/2}) samples
from an unknown distribution , runs in time , and
outputs a log-concave hypothesis that (with high probability) satisfies
\dtv(h, f) = O(\opt)+\eps, where \opt is the minimum total variation
distance between and the class of log-concave distributions. Our approach
to the robust proper learning problem is quite flexible and may be applicable
to many other univariate distribution families
Sparse Solutions to Nonnegative Linear Systems and Applications
We give an efficient algorithm for finding sparse approximate solutions to
linear systems of equations with nonnegative coefficients. Unlike most known
results for sparse recovery, we do not require {\em any} assumption on the
matrix other than non-negativity. Our algorithm is combinatorial in nature,
inspired by techniques for the set cover problem, as well as the multiplicative
weight update method.
We then present a natural application to learning mixture models in the PAC
framework. For learning a mixture of axis-aligned Gaussians in
dimensions, we give an algorithm that outputs a mixture of
Gaussians that is -close in statistical distance to the true
distribution, without any separation assumptions. The time and sample
complexity is roughly . This is polynomial when is
constant -- precisely the regime in which known methods fail to identify the
components efficiently.
Given that non-negativity is a natural assumption, we believe that our result
may find use in other settings in which we wish to approximately explain data
using a small number of a (large) candidate set of components.Comment: 22 page
Maximum Selection and Sorting with Adversarial Comparators and an Application to Density Estimation
We study maximum selection and sorting of numbers using pairwise
comparators that output the larger of their two inputs if the inputs are more
than a given threshold apart, and output an adversarially-chosen input
otherwise. We consider two adversarial models. A non-adaptive adversary that
decides on the outcomes in advance based solely on the inputs, and an adaptive
adversary that can decide on the outcome of each query depending on previous
queries and outcomes.
Against the non-adaptive adversary, we derive a maximum-selection algorithm
that uses at most comparisons in expectation, and a sorting algorithm that
uses at most comparisons in expectation. These numbers are within
small constant factors from the best possible. Against the adaptive adversary,
we propose a maximum-selection algorithm that uses comparisons to output a correct answer with probability at
least . The existence of this algorithm affirmatively resolves an
open problem of Ajtai, Feldman, Hassadim, and Nelson.
Our study was motivated by a density-estimation problem where, given samples
from an unknown underlying distribution, we would like to find a distribution
in a known class of candidate distributions that is close to underlying
distribution in distance. Scheffe's algorithm outputs a distribution
at an distance at most 9 times the minimum and runs in time
. Using maximum selection, we propose an algorithm with the
same approximation guarantee but run time of
Properly Learning Poisson Binomial Distributions in Almost Polynomial Time
We give an algorithm for properly learning Poisson binomial distributions. A
Poisson binomial distribution (PBD) of order is the discrete probability
distribution of the sum of mutually independent Bernoulli random variables.
Given samples from an unknown PBD ,
our algorithm runs in time , and
outputs a hypothesis PBD that is -close to in total
variation distance. The previously best known running time for properly
learning PBDs was .
As one of our main contributions, we provide a novel structural
characterization of PBDs. We prove that, for all there exists an
explicit collection of
vectors of multiplicities, such that for any PBD there exists a
PBD with distinct parameters whose
multiplicities are given by some element of , such that
is -close to . Our proof combines tools from Fourier
analysis and algebraic geometry.
Our approach to the proper learning problem is as follows: Starting with an
accurate non-proper hypothesis, we fit a PBD to this hypothesis. More
specifically, we essentially start with the hypothesis computed by the
computationally efficient non-proper learning algorithm in our recent
work~\cite{DKS15}. Our aforementioned structural characterization allows us to
reduce the corresponding fitting problem to a collection of
systems of low-degree polynomial
inequalities. We show that each such system can be solved in time
, which yields the overall running
time of our algorithm
Algebraic and Analytic Approaches for Parameter Learning in Mixture Models
We present two different approaches for parameter learning in several mixture
models in one dimension. Our first approach uses complex-analytic methods and
applies to Gaussian mixtures with shared variance, binomial mixtures with
shared success probability, and Poisson mixtures, among others. An example
result is that samples suffice to exactly learn a mixture of
Poisson distributions, each with integral rate parameters bounded by .
Our second approach uses algebraic and combinatorial tools and applies to
binomial mixtures with shared trial parameter and differing success
parameters, as well as to mixtures of geometric distributions. Again, as an
example, for binomial mixtures with components and success parameters
discretized to resolution ,
samples suffice to exactly recover the parameters. For some of these
distributions, our results represent the first guarantees for parameter
estimation.Comment: 22 pages, Accepted at Algorithmic Learning Theory (ALT) 202
Hadamard Response: Estimating Distributions Privately, Efficiently, and with Little Communication
We study the problem of estimating -ary distributions under
-local differential privacy. samples are distributed across
users who send privatized versions of their sample to a central server. All
previously known sample optimal algorithms require linear (in )
communication from each user in the high privacy regime ,
and run in time that grows as , which can be prohibitive for large
domain size .
We propose Hadamard Response (HR}, a local privatization scheme that requires
no shared randomness and is symmetric with respect to the users. Our scheme has
order optimal sample complexity for all , a communication of at
most bits per user, and nearly linear running time of .
Our encoding and decoding are based on Hadamard matrices, and are simple to
implement. The statistical performance relies on the coding theoretic aspects
of Hadamard matrices, ie, the large Hamming distance between the rows. An
efficient implementation of the algorithm using the Fast Walsh-Hadamard
transform gives the computational gains.
We compare our approach with Randomized Response (RR), RAPPOR, and
subset-selection mechanisms (SS), both theoretically, and experimentally. For
, our algorithm runs about 100x faster than SS, and RAPPOR
Splintering with distributions: A stochastic decoy scheme for private computation
Performing computations while maintaining privacy is an important problem in
todays distributed machine learning solutions. Consider the following two set
ups between a client and a server, where in setup i) the client has a public
data vector , the server has a large private database of data
vectors and the client wants to find the inner products . The client does
not want the server to learn while the server does not want the
client to learn the records in its database. This is in contrast to another
setup ii) where the client would like to perform an operation solely on its
data, such as computation of a matrix inverse on its data matrix ,
but would like to use the superior computing ability of the server to do so
without having to leak to the server. \par We present a stochastic
scheme for splitting the client data into privatized shares that are
transmitted to the server in such settings. The server performs the requested
operations on these shares instead of on the raw client data at the server. The
obtained intermediate results are sent back to the client where they are
assembled by the client to obtain the final result.Comment: 28 pages, 6 figure
A Nearly Optimal and Agnostic Algorithm for Properly Learning a Mixture of k Gaussians, for any Constant k
Learning a Gaussian mixture model (GMM) is a fundamental problem in machine
learning, learning theory, and statistics. One notion of learning a GMM is
proper learning: here, the goal is to find a mixture of Gaussians
that is close to the density of the unknown distribution from
which we draw samples. The distance between and is typically
measured in the total variation or -norm.
We give an algorithm for learning a mixture of univariate Gaussians that
is nearly optimal for any fixed . The sample complexity of our algorithm is
and the running time is . It is
well-known that this sample complexity is optimal (up to logarithmic factors),
and it was already achieved by prior work. However, the best known time
complexity for proper learning a -GMM was
. In particular, the dependence between
and was exponential. We significantly improve this
dependence by replacing the term with a while only increasing the exponent moderately. Hence, for
any fixed , the term dominates our
running time, and thus our algorithm runs in time which is nearly-linear in the
number of samples drawn. Achieving a running time of for proper learning of -GMMs has recently been stated
as an open problem by multiple researchers, and we make progress on this
question.
Moreover, our approach offers an agnostic learning guarantee: our algorithm
returns a good GMM even if the distribution we are sampling from is not a
mixture of Gaussians. To the best of our knowledge, our algorithm is the first
agnostic proper learning algorithm for GMMs