2,912 research outputs found
Learning Mixtures of Gaussians in High Dimensions
Efficiently learning mixture of Gaussians is a fundamental problem in
statistics and learning theory. Given samples coming from a random one out of k
Gaussian distributions in Rn, the learning problem asks to estimate the means
and the covariance matrices of these Gaussians. This learning problem arises in
many areas ranging from the natural sciences to the social sciences, and has
also found many machine learning applications. Unfortunately, learning mixture
of Gaussians is an information theoretically hard problem: in order to learn
the parameters up to a reasonable accuracy, the number of samples required is
exponential in the number of Gaussian components in the worst case. In this
work, we show that provided we are in high enough dimensions, the class of
Gaussian mixtures is learnable in its most general form under a smoothed
analysis framework, where the parameters are randomly perturbed from an
adversarial starting point. In particular, given samples from a mixture of
Gaussians with randomly perturbed parameters, when n > {\Omega}(k^2), we give
an algorithm that learns the parameters with polynomial running time and using
polynomial number of samples. The central algorithmic ideas consist of new ways
to decompose the moment tensor of the Gaussian mixture by exploiting its
structural properties. The symmetries of this tensor are derived from the
combinatorial structure of higher order moments of Gaussian distributions
(sometimes referred to as Isserlis' theorem or Wick's theorem). We also develop
new tools for bounding smallest singular values of structured random matrices,
which could be useful in other smoothed analysis settings
Exocrine Pancreatic Insufficiency in Diabetes Mellitus: A Complication of Diabetic Neuropathy or a Different Type of Diabetes?
Pancreatic exocrine insufficiency is a frequently observed phenomenon in type 1 and type 2 diabetes mellitus. Alterations of exocrine pancreatic morphology can also be found frequently in diabetic patients. Several hypotheses try to explain these findings, including lack of insulin as a trophic factor for exocrine tissue, changes in secretion and/or action of other islet hormones, and autoimmunity against common endocrine and exocrine antigens. Another explanation might be that diabetes mellitus could also be a consequence of underlying pancreatic diseases (e.g., chronic pancreatitis). Another pathophysiological concept proposes the functional and morphological alterations as a consequence of diabetic neuropathy. This paper discusses the currently available studies on this subject and tries to provide an overview of the current concepts of exocrine pancreatic insufficiency in diabetes mellitus
Paradoxes in Fair Computer-Aided Decision Making
Computer-aided decision making--where a human decision-maker is aided by a
computational classifier in making a decision--is becoming increasingly
prevalent. For instance, judges in at least nine states make use of algorithmic
tools meant to determine "recidivism risk scores" for criminal defendants in
sentencing, parole, or bail decisions. A subject of much recent debate is
whether such algorithmic tools are "fair" in the sense that they do not
discriminate against certain groups (e.g., races) of people.
Our main result shows that for "non-trivial" computer-aided decision making,
either the classifier must be discriminatory, or a rational decision-maker
using the output of the classifier is forced to be discriminatory. We further
provide a complete characterization of situations where fair computer-aided
decision making is possible
Private Multiplicative Weights Beyond Linear Queries
A wide variety of fundamental data analyses in machine learning, such as
linear and logistic regression, require minimizing a convex function defined by
the data. Since the data may contain sensitive information about individuals,
and these analyses can leak that sensitive information, it is important to be
able to solve convex minimization in a privacy-preserving way.
A series of recent results show how to accurately solve a single convex
minimization problem in a differentially private manner. However, the same data
is often analyzed repeatedly, and little is known about solving multiple convex
minimization problems with differential privacy. For simpler data analyses,
such as linear queries, there are remarkable differentially private algorithms
such as the private multiplicative weights mechanism (Hardt and Rothblum, FOCS
2010) that accurately answer exponentially many distinct queries. In this work,
we extend these results to the case of convex minimization and show how to give
accurate and differentially private solutions to *exponentially many* convex
minimization problems on a sensitive dataset
Marginal Release Under Local Differential Privacy
Many analysis and machine learning tasks require the availability of marginal
statistics on multidimensional datasets while providing strong privacy
guarantees for the data subjects. Applications for these statistics range from
finding correlations in the data to fitting sophisticated prediction models. In
this paper, we provide a set of algorithms for materializing marginal
statistics under the strong model of local differential privacy. We prove the
first tight theoretical bounds on the accuracy of marginals compiled under each
approach, perform empirical evaluation to confirm these bounds, and evaluate
them for tasks such as modeling and correlation testing. Our results show that
releasing information based on (local) Fourier transformations of the input is
preferable to alternatives based directly on (local) marginals
- …