1,613 research outputs found
Distributed Private Heavy Hitters
In this paper, we give efficient algorithms and lower bounds for solving the
heavy hitters problem while preserving differential privacy in the fully
distributed local model. In this model, there are n parties, each of which
possesses a single element from a universe of size N. The heavy hitters problem
is to find the identity of the most common element shared amongst the n
parties. In the local model, there is no trusted database administrator, and so
the algorithm must interact with each of the parties separately, using a
differentially private protocol. We give tight information-theoretic upper and
lower bounds on the accuracy to which this problem can be solved in the local
model (giving a separation between the local model and the more common
centralized model of privacy), as well as computationally efficient algorithms
even in the case where the data universe N may be exponentially large
Compressed Regression
Recent research has studied the role of sparsity in high dimensional
regression and signal reconstruction, establishing theoretical limits for
recovering sparse models from sparse data. This line of work shows that
-regularized least squares regression can accurately estimate a sparse
linear model from noisy examples in dimensions, even if is much
larger than . In this paper we study a variant of this problem where the
original input variables are compressed by a random linear transformation
to examples in dimensions, and establish conditions under which a
sparse linear model can be successfully recovered from the compressed data. A
primary motivation for this compression procedure is to anonymize the data and
preserve privacy by revealing little information about the original data. We
characterize the number of random projections that are required for
-regularized compressed regression to identify the nonzero coefficients
in the true model with probability approaching one, a property called
``sparsistence.'' In addition, we show that -regularized compressed
regression asymptotically predicts as well as an oracle linear model, a
property called ``persistence.'' Finally, we characterize the privacy
properties of the compression procedure in information-theoretic terms,
establishing upper bounds on the mutual information between the compressed and
uncompressed data that decay to zero.Comment: 59 pages, 5 figure, Submitted for revie
Random projection to preserve patient privacy
With the availability of accessible and widely used cloud services, it is natural that large components of healthcare systems migrate to them; for example, patient databases can be stored and processed in the cloud. Such cloud services provide enhanced flexibility and additional gains, such as availability, ease of data share, and so on. This trend poses serious threats regarding the privacy of the patients and the trust that an individual must put into the healthcare system itself. Thus, there is a strong need of privacy preservation, achieved through a variety of different approaches. In this paper, we study the application of a random projection-based approach to patient data as a means to achieve two goals: (1) provably mask the identity of users under some adversarial-attack settings, (2) preserve enough information to allow for aggregate data analysis and application of machine-learning techniques. As far as we know, such approaches have not been applied and tested on medical data. We analyze the tradeoff between the loss of accuracy on the outcome of machine-learning algorithms and the resilience against an adversary. We show that random projections proved to be strong against known input/output attacks while offering high quality data, as long as the projected space is smaller than the original space, and as long as the amount of leaked data available to the adversary is limited
Randomized Sketches of Convex Programs with Sharp Guarantees
Random projection (RP) is a classical technique for reducing storage and
computational costs. We analyze RP-based approximations of convex programs, in
which the original optimization problem is approximated by the solution of a
lower-dimensional problem. Such dimensionality reduction is essential in
computation-limited settings, since the complexity of general convex
programming can be quite high (e.g., cubic for quadratic programs, and
substantially higher for semidefinite programs). In addition to computational
savings, random projection is also useful for reducing memory usage, and has
useful properties for privacy-sensitive optimization. We prove that the
approximation ratio of this procedure can be bounded in terms of the geometry
of constraint set. For a broad class of random projections, including those
based on various sub-Gaussian distributions as well as randomized Hadamard and
Fourier transforms, the data matrix defining the cost function can be projected
down to the statistical dimension of the tangent cone of the constraints at the
original solution, which is often substantially smaller than the original
dimension. We illustrate consequences of our theory for various cases,
including unconstrained and -constrained least squares, support vector
machines, low-rank matrix estimation, and discuss implications on
privacy-sensitive optimization and some connections with de-noising and
compressed sensing
On Lightweight Privacy-Preserving Collaborative Learning for IoT Objects
The Internet of Things (IoT) will be a main data generation infrastructure
for achieving better system intelligence. This paper considers the design and
implementation of a practical privacy-preserving collaborative learning scheme,
in which a curious learning coordinator trains a better machine learning model
based on the data samples contributed by a number of IoT objects, while the
confidentiality of the raw forms of the training data is protected against the
coordinator. Existing distributed machine learning and data encryption
approaches incur significant computation and communication overhead, rendering
them ill-suited for resource-constrained IoT objects. We study an approach that
applies independent Gaussian random projection at each IoT object to obfuscate
data and trains a deep neural network at the coordinator based on the projected
data from the IoT objects. This approach introduces light computation overhead
to the IoT objects and moves most workload to the coordinator that can have
sufficient computing resources. Although the independent projections performed
by the IoT objects address the potential collusion between the curious
coordinator and some compromised IoT objects, they significantly increase the
complexity of the projected data. In this paper, we leverage the superior
learning capability of deep learning in capturing sophisticated patterns to
maintain good learning performance. Extensive comparative evaluation shows that
this approach outperforms other lightweight approaches that apply additive
noisification for differential privacy and/or support vector machines for
learning in the applications with light data pattern complexities.Comment: 12 pages,IOTDI 201
- …