2,912 research outputs found

    Learning Mixtures of Gaussians in High Dimensions

    Full text link
    Efficiently learning mixture of Gaussians is a fundamental problem in statistics and learning theory. Given samples coming from a random one out of k Gaussian distributions in Rn, the learning problem asks to estimate the means and the covariance matrices of these Gaussians. This learning problem arises in many areas ranging from the natural sciences to the social sciences, and has also found many machine learning applications. Unfortunately, learning mixture of Gaussians is an information theoretically hard problem: in order to learn the parameters up to a reasonable accuracy, the number of samples required is exponential in the number of Gaussian components in the worst case. In this work, we show that provided we are in high enough dimensions, the class of Gaussian mixtures is learnable in its most general form under a smoothed analysis framework, where the parameters are randomly perturbed from an adversarial starting point. In particular, given samples from a mixture of Gaussians with randomly perturbed parameters, when n > {\Omega}(k^2), we give an algorithm that learns the parameters with polynomial running time and using polynomial number of samples. The central algorithmic ideas consist of new ways to decompose the moment tensor of the Gaussian mixture by exploiting its structural properties. The symmetries of this tensor are derived from the combinatorial structure of higher order moments of Gaussian distributions (sometimes referred to as Isserlis' theorem or Wick's theorem). We also develop new tools for bounding smallest singular values of structured random matrices, which could be useful in other smoothed analysis settings

    Exocrine Pancreatic Insufficiency in Diabetes Mellitus: A Complication of Diabetic Neuropathy or a Different Type of Diabetes?

    Get PDF
    Pancreatic exocrine insufficiency is a frequently observed phenomenon in type 1 and type 2 diabetes mellitus. Alterations of exocrine pancreatic morphology can also be found frequently in diabetic patients. Several hypotheses try to explain these findings, including lack of insulin as a trophic factor for exocrine tissue, changes in secretion and/or action of other islet hormones, and autoimmunity against common endocrine and exocrine antigens. Another explanation might be that diabetes mellitus could also be a consequence of underlying pancreatic diseases (e.g., chronic pancreatitis). Another pathophysiological concept proposes the functional and morphological alterations as a consequence of diabetic neuropathy. This paper discusses the currently available studies on this subject and tries to provide an overview of the current concepts of exocrine pancreatic insufficiency in diabetes mellitus

    Paradoxes in Fair Computer-Aided Decision Making

    Full text link
    Computer-aided decision making--where a human decision-maker is aided by a computational classifier in making a decision--is becoming increasingly prevalent. For instance, judges in at least nine states make use of algorithmic tools meant to determine "recidivism risk scores" for criminal defendants in sentencing, parole, or bail decisions. A subject of much recent debate is whether such algorithmic tools are "fair" in the sense that they do not discriminate against certain groups (e.g., races) of people. Our main result shows that for "non-trivial" computer-aided decision making, either the classifier must be discriminatory, or a rational decision-maker using the output of the classifier is forced to be discriminatory. We further provide a complete characterization of situations where fair computer-aided decision making is possible

    Evaluating the performance and scalability of the Ceph distributed storage system

    Get PDF

    Private Multiplicative Weights Beyond Linear Queries

    Full text link
    A wide variety of fundamental data analyses in machine learning, such as linear and logistic regression, require minimizing a convex function defined by the data. Since the data may contain sensitive information about individuals, and these analyses can leak that sensitive information, it is important to be able to solve convex minimization in a privacy-preserving way. A series of recent results show how to accurately solve a single convex minimization problem in a differentially private manner. However, the same data is often analyzed repeatedly, and little is known about solving multiple convex minimization problems with differential privacy. For simpler data analyses, such as linear queries, there are remarkable differentially private algorithms such as the private multiplicative weights mechanism (Hardt and Rothblum, FOCS 2010) that accurately answer exponentially many distinct queries. In this work, we extend these results to the case of convex minimization and show how to give accurate and differentially private solutions to *exponentially many* convex minimization problems on a sensitive dataset

    Marginal Release Under Local Differential Privacy

    Full text link
    Many analysis and machine learning tasks require the availability of marginal statistics on multidimensional datasets while providing strong privacy guarantees for the data subjects. Applications for these statistics range from finding correlations in the data to fitting sophisticated prediction models. In this paper, we provide a set of algorithms for materializing marginal statistics under the strong model of local differential privacy. We prove the first tight theoretical bounds on the accuracy of marginals compiled under each approach, perform empirical evaluation to confirm these bounds, and evaluate them for tasks such as modeling and correlation testing. Our results show that releasing information based on (local) Fourier transformations of the input is preferable to alternatives based directly on (local) marginals
    corecore