Search CORE

341 research outputs found

Mixtures of Gaussians are Privately Learnable with a Polynomial Number of Samples

Author: Afzali Mohammad
Ashtiani Hassan
Liaw Christopher
Publication venue
Publication date: 28/09/2023
Field of study

We study the problem of estimating mixtures of Gaussians under the constraint of differential privacy (DP). Our main result is that

\tilde{O}(k^2 d^4 \log(1/\delta) / \alpha^2 \varepsilon)

samples are sufficient to estimate a mixture of

k

Gaussians up to total variation distance

\alpha

while satisfying

(\varepsilon, \delta)

-DP. This is the first finite sample complexity upper bound for the problem that does not make any structural assumptions on the GMMs. To solve the problem, we devise a new framework which may be useful for other tasks. On a high level, we show that if a class of distributions (such as Gaussians) is (1) list decodable and (2) admits a "locally small'' cover (Bun et al., 2021) with respect to total variation distance, then the class of its mixtures is privately learnable. The proof circumvents a known barrier indicating that, unlike Gaussians, GMMs do not admit a locally small cover (Aden-Ali et al., 2021b)

arXiv.org e-Print Archive

A super-polynomial lower bound for learning nonparametric mixtures

Author: Aragam Bryon
Tai Wai Ming
Publication venue
Publication date: 28/03/2022
Field of study

We study the problem of learning nonparametric distributions in a finite mixture, and establish a super-polynomial lower bound on the sample complexity of learning the component distributions in such models. Namely, we are given i.i.d. samples from

f

where

f=\sum_{i=1}^k w_i f_i, \quad\sum_{i=1}^k w_i=1, \quad w_i>0

and we are interested in learning each component

f_i

. Without any assumptions on

f_i

, this problem is ill-posed. In order to identify the components

f_i

, we assume that each

f_i

can be written as a convolution of a Gaussian and a compactly supported density

\nu_i

with

\text{supp}(\nu_i)\cap \text{supp}(\nu_j)=\emptyset

. Our main result shows that

\Omega((\frac{1}{\varepsilon})^{C\log\log \frac{1}{\varepsilon}})

samples are required for estimating each

f_i

. The proof relies on a fast rate for approximation with Gaussians, which may be of independent interest. This result has important implications for the hardness of learning more general nonparametric latent variable models that arise in machine learning applications

arXiv.org e-Print Archive

Private hypothesis selection

Author: Bun Mark
Kamath Gautam
Steinke Thomas
Wu Zhiwei Steven
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 04/01/2021
Field of study

We provide a differentially private algorithm for hypothesis selection. Given samples from an unknown probability distribution P and a set of m probability distributions H, the goal is to output, in a ε-differentially private manner, a distribution from H whose total variation distance to P is comparable to that of the best such distribution (which we denote by α). The sample complexity of our basic algorithm is O(log m/α^2 + log m/αε), representing a minimal cost for privacy when compared to the non-private algorithm. We also can handle infinite hypothesis classes H by relaxing to (ε, δ)-differential privacy. We apply our hypothesis selection algorithm to give learning algorithms for a number of natural distribution classes, including Gaussians, product distributions, sums of independent random variables, piecewise polynomials, and mixture classes. Our hypothesis selection procedure allows us to generically convert a cover for a class to a learning algorithm, complementing known learning lower bounds which are in terms of the size of the packing number of the class. As the covering and packing numbers are often closely related, for constant α, our algorithms achieve the optimal sample complexity for many classes of interest. Finally, we describe an application to private distribution-free PAC learning.https://arxiv.org/abs/1905.1322

arXiv.org e-Print Archive

Boston University Institutional Repository (OpenBU)