6 research outputs found
Universal Private Estimators
We present \textit{universal} estimators for the statistical mean, variance,
and scale (in particular, the interquartile range) under pure differential
privacy. These estimators are universal in the sense that they work on an
arbitrary, unknown continuous distribution over ,
while yielding strong utility guarantees except for ill-behaved .
For certain distribution families like Gaussians or heavy-tailed distributions,
we show that our universal estimators match or improve existing estimators,
which are often specifically designed for the given family and under \textit{a
priori} boundedness assumptions on the mean and variance of . This
is the first time these boundedness assumptions are removed under pure
differential privacy. The main technical tools in our development are
instance-optimal empirical estimators for the mean and quantiles over the
unbounded integer domain, which can be of independent interest
Randomized Algorithms for Computation of Tucker decomposition and Higher Order SVD (HOSVD)
Big data analysis has become a crucial part of new emerging technologies such
as the internet of things, cyber-physical analysis, deep learning, anomaly
detection, etc. Among many other techniques, dimensionality reduction plays a
key role in such analyses and facilitates feature selection and feature
extraction. Randomized algorithms are efficient tools for handling big data
tensors. They accelerate decomposing large-scale data tensors by reducing the
computational complexity of deterministic algorithms and the communication
among different levels of the memory hierarchy, which is the main bottleneck in
modern computing environments and architectures. In this paper, we review
recent advances in randomization for the computation of Tucker decomposition
and Higher Order SVD (HOSVD). We discuss random projection and sampling
approaches, single-pass, and multi-pass randomized algorithms, and how to
utilize them in the computation of the Tucker decomposition and the HOSVD.
Simulations on synthetic and real datasets are provided to compare the
performance of some of the best and most promising algorithms
Compressive Learning with Privacy Guarantees
International audienceThis work addresses the problem of learning from large collections of data with privacy guarantees. The compressive learning framework proposes to deal with the large scale of datasets by compressing them into a single vector of generalized random moments, from which the learning task is then performed. We show that a simple perturbation of this mechanism with additive noise is sufficient to satisfy differential privacy, a well established formalism for defining and quantifying the privacy of a random mechanism. We combine this with a feature subsampling mechanism, which reduces the computational cost without damaging privacy. The framework is applied to the tasks of Gaussian modeling, k-means clustering and principal component analysis (PCA), for which sharp privacy bounds are derived. Empirically, the quality (for subsequent learning) of the compressed representation produced by our mechanism is strongly related with the induced noise level, for which we give analytical expressions