Search CORE

54 research outputs found

Representation Learning for Clustering: A Statistical Framework

Author: Ashtiani Hassan
Ben-David Shai
Publication venue
Publication date: 19/06/2015
Field of study

We address the problem of communicating domain knowledge from a user to the designer of a clustering algorithm. We propose a protocol in which the user provides a clustering of a relatively small random sample of a data set. The algorithm designer then uses that sample to come up with a data representation under which

k

-means clustering results in a clustering (of the full data set) that is aligned with the user's clustering. We provide a formal statistical model for analyzing the sample complexity of learning a clustering representation with this paradigm. We then introduce a notion of capacity of a class of possible representations, in the spirit of the VC-dimension, showing that classes of representations that have finite such dimension can be successfully learned with sample size error bounds, and end our discussion with an analysis of that dimension for classes of representations induced by linear embeddings.Comment: To be published in Proceedings of UAI 201

arXiv.org e-Print Archive

CiteSeerX

Sample-Efficient Learning of Mixtures

Author: Ashtiani Hassan
Ben-David Shai
Mehrabian Abbas
Publication venue
Publication date: 29/04/2018
Field of study

We consider PAC learning of probability distributions (a.k.a. density estimation), where we are given an i.i.d. sample generated from an unknown target distribution, and want to output a distribution that is close to the target in total variation distance. Let

\mathcal F

be an arbitrary class of probability distributions, and let

\mathcal{F}^k

denote the class of

k

-mixtures of elements of

\mathcal F

. Assuming the existence of a method for learning

\mathcal F

with sample complexity

m_{\mathcal{F}}(\epsilon)

, we provide a method for learning

\mathcal F^k

with sample complexity

O({k\log k \cdot m_{\mathcal F}(\epsilon) }/{\epsilon^{2}})

. Our mixture learning algorithm has the property that, if the

\mathcal F

-learner is proper/agnostic, then the

\mathcal F^k

-learner would be proper/agnostic as well. This general result enables us to improve the best known sample complexity upper bounds for a variety of important mixture classes. First, we show that the class of mixtures of

k

axis-aligned Gaussians in

\mathbb{R}^d

is PAC-learnable in the agnostic setting with

\widetilde{O}({kd}/{\epsilon ^ 4})

samples, which is tight in

k

and

d

up to logarithmic factors. Second, we show that the class of mixtures of

k

Gaussians in

\mathbb{R}^d

is PAC-learnable in the agnostic setting with sample complexity

\widetilde{O}({kd^2}/{\epsilon ^ 4})

, which improves the previous known bounds of

\widetilde{O}({k^3d^2}/{\epsilon ^ 4})

and

\widetilde{O}(k^4d^4/\epsilon ^ 2)

in its dependence on

k

and

d

. Finally, we show that the class of mixtures of

k

log-concave distributions over

\mathbb{R}^d

is PAC-learnable using

\widetilde{O}(d^{(d+5)/2}\epsilon^{-(d+9)/2}k)

samples.Comment: A bug from the previous version, which appeared in AAAI 2018 proceedings, is fixed. 18 page

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

On the Role of Noise in the Sample Complexity of Learning Recurrent Neural Networks: Exponential Gaps for Long Sequences

Author: Ashtiani Hassan
Pour Alireza Fathollah
Publication venue
Publication date: 28/05/2023
Field of study

We consider the class of noisy multi-layered sigmoid recurrent neural networks with

w

(unbounded) weights for classification of sequences of length

T

, where independent noise distributed according to

\mathcal{N}(0,\sigma^2)

is added to the output of each neuron in the network. Our main result shows that the sample complexity of PAC learning this class can be bounded by

O (w\log(T/\sigma))

. For the non-noisy version of the same class (i.e.,

\sigma=0

), we prove a lower bound of

\Omega (wT)

for the sample complexity. Our results indicate an exponential gap in the dependence of sample complexity on

T

for noisy versus non-noisy networks. Moreover, given the mild logarithmic dependence of the upper bound on

1/\sigma

, this gap still holds even for numerically negligible values of

\sigma

.Comment: arXiv admin note: text overlap with arXiv:2206.0719

arXiv.org e-Print Archive

Adversarially Robust Learning with Tolerance

Author: Ashtiani Hassan
Pathak Vinayak
Urner Ruth
Publication venue
Publication date: 14/02/2023
Field of study

We initiate the study of tolerant adversarial PAC-learning with respect to metric perturbation sets. In adversarial PAC-learning, an adversary is allowed to replace a test point

x

with an arbitrary point in a closed ball of radius

r

centered at

x

. In the tolerant version, the error of the learner is compared with the best achievable error with respect to a slightly larger perturbation radius

(1+\gamma)r

. This simple tweak helps us bridge the gap between theory and practice and obtain the first PAC-type guarantees for algorithmic techniques that are popular in practice. Our first result concerns the widely-used ``perturb-and-smooth'' approach for adversarial learning. For perturbation sets with doubling dimension

d

, we show that a variant of these approaches PAC-learns any hypothesis class

\mathcal{H}

with VC-dimension

v

in the

\gamma

-tolerant adversarial setting with

O\left(\frac{v(1+1/\gamma)^{O(d)}}{\varepsilon}\right)

samples. This is in contrast to the traditional (non-tolerant) setting in which, as we show, the perturb-and-smooth approach can provably fail. Our second result shows that one can PAC-learn the same class using

\widetilde{O}\left(\frac{d.v\log(1+1/\gamma)}{\varepsilon^2}\right)

samples even in the agnostic setting. This result is based on a novel compression-based algorithm, and achieves a linear dependence on the doubling dimension as well as the VC-dimension. This is in contrast to the non-tolerant setting where there is no known sample complexity upper bound that depend polynomially on the VC-dimension.Comment: The paper was accepted for ALT 202

arXiv.org e-Print Archive

Mixtures of Gaussians are Privately Learnable with a Polynomial Number of Samples

Author: Afzali Mohammad
Ashtiani Hassan
Liaw Christopher
Publication venue
Publication date: 28/09/2023
Field of study

We study the problem of estimating mixtures of Gaussians under the constraint of differential privacy (DP). Our main result is that

\tilde{O}(k^2 d^4 \log(1/\delta) / \alpha^2 \varepsilon)

samples are sufficient to estimate a mixture of

k

Gaussians up to total variation distance

\alpha

while satisfying

(\varepsilon, \delta)

-DP. This is the first finite sample complexity upper bound for the problem that does not make any structural assumptions on the GMMs. To solve the problem, we devise a new framework which may be useful for other tasks. On a high level, we show that if a class of distributions (such as Gaussians) is (1) list decodable and (2) admits a "locally small'' cover (Bun et al., 2021) with respect to total variation distance, then the class of its mixtures is privately learnable. The proof circumvents a known barrier indicating that, unlike Gaussians, GMMs do not admit a locally small cover (Aden-Ali et al., 2021b)

arXiv.org e-Print Archive

Polynomial Time and Private Learning of Unbounded Gaussian Mixture Models

Author: Arbas Jamil
Ashtiani Hassan
Liaw Christopher
Publication venue
Publication date: 07/06/2023
Field of study

We study the problem of privately estimating the parameters of

d

-dimensional Gaussian Mixture Models (GMMs) with

k

components. For this, we develop a technique to reduce the problem to its non-private counterpart. This allows us to privatize existing non-private algorithms in a blackbox manner, while incurring only a small overhead in the sample complexity and running time. As the main application of our framework, we develop an

(\varepsilon, \delta)

-differentially private algorithm to learn GMMs using the non-private algorithm of Moitra and Valiant [MV10] as a blackbox. Consequently, this gives the first sample complexity upper bound and first polynomial time algorithm for privately learning GMMs without any boundedness assumptions on the parameters. As part of our analysis, we prove a tight (up to a constant factor) lower bound on the total variation distance of high-dimensional Gaussians which can be of independent interest.Comment: Accepted in ICML 202

arXiv.org e-Print Archive