54 research outputs found

    Representation Learning for Clustering: A Statistical Framework

    Full text link
    We address the problem of communicating domain knowledge from a user to the designer of a clustering algorithm. We propose a protocol in which the user provides a clustering of a relatively small random sample of a data set. The algorithm designer then uses that sample to come up with a data representation under which kk-means clustering results in a clustering (of the full data set) that is aligned with the user's clustering. We provide a formal statistical model for analyzing the sample complexity of learning a clustering representation with this paradigm. We then introduce a notion of capacity of a class of possible representations, in the spirit of the VC-dimension, showing that classes of representations that have finite such dimension can be successfully learned with sample size error bounds, and end our discussion with an analysis of that dimension for classes of representations induced by linear embeddings.Comment: To be published in Proceedings of UAI 201

    Sample-Efficient Learning of Mixtures

    Full text link
    We consider PAC learning of probability distributions (a.k.a. density estimation), where we are given an i.i.d. sample generated from an unknown target distribution, and want to output a distribution that is close to the target in total variation distance. Let F\mathcal F be an arbitrary class of probability distributions, and let Fk\mathcal{F}^k denote the class of kk-mixtures of elements of F\mathcal F. Assuming the existence of a method for learning F\mathcal F with sample complexity mF(ϵ)m_{\mathcal{F}}(\epsilon), we provide a method for learning Fk\mathcal F^k with sample complexity O(klogkmF(ϵ)/ϵ2)O({k\log k \cdot m_{\mathcal F}(\epsilon) }/{\epsilon^{2}}). Our mixture learning algorithm has the property that, if the F\mathcal F-learner is proper/agnostic, then the Fk\mathcal F^k-learner would be proper/agnostic as well. This general result enables us to improve the best known sample complexity upper bounds for a variety of important mixture classes. First, we show that the class of mixtures of kk axis-aligned Gaussians in Rd\mathbb{R}^d is PAC-learnable in the agnostic setting with O~(kd/ϵ4)\widetilde{O}({kd}/{\epsilon ^ 4}) samples, which is tight in kk and dd up to logarithmic factors. Second, we show that the class of mixtures of kk Gaussians in Rd\mathbb{R}^d is PAC-learnable in the agnostic setting with sample complexity O~(kd2/ϵ4)\widetilde{O}({kd^2}/{\epsilon ^ 4}), which improves the previous known bounds of O~(k3d2/ϵ4)\widetilde{O}({k^3d^2}/{\epsilon ^ 4}) and O~(k4d4/ϵ2)\widetilde{O}(k^4d^4/\epsilon ^ 2) in its dependence on kk and dd. Finally, we show that the class of mixtures of kk log-concave distributions over Rd\mathbb{R}^d is PAC-learnable using O~(d(d+5)/2ϵ(d+9)/2k)\widetilde{O}(d^{(d+5)/2}\epsilon^{-(d+9)/2}k) samples.Comment: A bug from the previous version, which appeared in AAAI 2018 proceedings, is fixed. 18 page

    On the Role of Noise in the Sample Complexity of Learning Recurrent Neural Networks: Exponential Gaps for Long Sequences

    Full text link
    We consider the class of noisy multi-layered sigmoid recurrent neural networks with ww (unbounded) weights for classification of sequences of length TT, where independent noise distributed according to N(0,σ2)\mathcal{N}(0,\sigma^2) is added to the output of each neuron in the network. Our main result shows that the sample complexity of PAC learning this class can be bounded by O(wlog(T/σ))O (w\log(T/\sigma)). For the non-noisy version of the same class (i.e., σ=0\sigma=0), we prove a lower bound of Ω(wT)\Omega (wT) for the sample complexity. Our results indicate an exponential gap in the dependence of sample complexity on TT for noisy versus non-noisy networks. Moreover, given the mild logarithmic dependence of the upper bound on 1/σ1/\sigma, this gap still holds even for numerically negligible values of σ\sigma.Comment: arXiv admin note: text overlap with arXiv:2206.0719

    Adversarially Robust Learning with Tolerance

    Full text link
    We initiate the study of tolerant adversarial PAC-learning with respect to metric perturbation sets. In adversarial PAC-learning, an adversary is allowed to replace a test point xx with an arbitrary point in a closed ball of radius rr centered at xx. In the tolerant version, the error of the learner is compared with the best achievable error with respect to a slightly larger perturbation radius (1+γ)r(1+\gamma)r. This simple tweak helps us bridge the gap between theory and practice and obtain the first PAC-type guarantees for algorithmic techniques that are popular in practice. Our first result concerns the widely-used ``perturb-and-smooth'' approach for adversarial learning. For perturbation sets with doubling dimension dd, we show that a variant of these approaches PAC-learns any hypothesis class H\mathcal{H} with VC-dimension vv in the γ\gamma-tolerant adversarial setting with O(v(1+1/γ)O(d)ε)O\left(\frac{v(1+1/\gamma)^{O(d)}}{\varepsilon}\right) samples. This is in contrast to the traditional (non-tolerant) setting in which, as we show, the perturb-and-smooth approach can provably fail. Our second result shows that one can PAC-learn the same class using O~(d.vlog(1+1/γ)ε2)\widetilde{O}\left(\frac{d.v\log(1+1/\gamma)}{\varepsilon^2}\right) samples even in the agnostic setting. This result is based on a novel compression-based algorithm, and achieves a linear dependence on the doubling dimension as well as the VC-dimension. This is in contrast to the non-tolerant setting where there is no known sample complexity upper bound that depend polynomially on the VC-dimension.Comment: The paper was accepted for ALT 202

    Mixtures of Gaussians are Privately Learnable with a Polynomial Number of Samples

    Full text link
    We study the problem of estimating mixtures of Gaussians under the constraint of differential privacy (DP). Our main result is that O~(k2d4log(1/δ)/α2ε)\tilde{O}(k^2 d^4 \log(1/\delta) / \alpha^2 \varepsilon) samples are sufficient to estimate a mixture of kk Gaussians up to total variation distance α\alpha while satisfying (ε,δ)(\varepsilon, \delta)-DP. This is the first finite sample complexity upper bound for the problem that does not make any structural assumptions on the GMMs. To solve the problem, we devise a new framework which may be useful for other tasks. On a high level, we show that if a class of distributions (such as Gaussians) is (1) list decodable and (2) admits a "locally small'' cover (Bun et al., 2021) with respect to total variation distance, then the class of its mixtures is privately learnable. The proof circumvents a known barrier indicating that, unlike Gaussians, GMMs do not admit a locally small cover (Aden-Ali et al., 2021b)

    Polynomial Time and Private Learning of Unbounded Gaussian Mixture Models

    Full text link
    We study the problem of privately estimating the parameters of dd-dimensional Gaussian Mixture Models (GMMs) with kk components. For this, we develop a technique to reduce the problem to its non-private counterpart. This allows us to privatize existing non-private algorithms in a blackbox manner, while incurring only a small overhead in the sample complexity and running time. As the main application of our framework, we develop an (ε,δ)(\varepsilon, \delta)-differentially private algorithm to learn GMMs using the non-private algorithm of Moitra and Valiant [MV10] as a blackbox. Consequently, this gives the first sample complexity upper bound and first polynomial time algorithm for privately learning GMMs without any boundedness assumptions on the parameters. As part of our analysis, we prove a tight (up to a constant factor) lower bound on the total variation distance of high-dimensional Gaussians which can be of independent interest.Comment: Accepted in ICML 202
    corecore