11,904 research outputs found
Differentially private regression and classification with sparse Gaussian processes
A continuing challenge for machine learning is providing methods to perform computation on data while ensuring the data remains private. In this paper we build on the provable privacy guarantees of differential privacy which has been combined with Gaussian processes through the previously published \emph{cloaking method}. In this paper we solve several shortcomings of this method, starting with the problem of predictions in regions with low data density. We experiment with the use of inducing points to provide a sparse approximation and show that these can provide robust differential privacy in outlier areas and at higher dimensions. We then look at classification, and modify the Laplace approximation approach to provide differentially private predictions. We then combine this with the sparse approximation and demonstrate the capability to perform classification in high dimensions. We finally explore the issue of hyperparameter selection and develop a method for their private selection. This paper and associated libraries provide a robust toolkit for combining differential privacy and GPs in a practical manner
Generation of Differentially Private Heterogeneous Electronic Health Records
Electronic Health Records (EHRs) are commonly used by the machine learning
community for research on problems specifically related to health care and
medicine. EHRs have the advantages that they can be easily distributed and
contain many features useful for e.g. classification problems. What makes EHR
data sets different from typical machine learning data sets is that they are
often very sparse, due to their high dimensionality, and often contain
heterogeneous (mixed) data types. Furthermore, the data sets deal with
sensitive information, which limits the distribution of any models learned
using them, due to privacy concerns. For these reasons, using EHR data in
practice presents a real challenge. In this work, we explore using Generative
Adversarial Networks to generate synthetic, heterogeneous EHRs with the goal of
using these synthetic records in place of existing data sets for downstream
classification tasks. We will further explore applying differential privacy
(DP) preserving optimization in order to produce DP synthetic EHR data sets,
which provide rigorous privacy guarantees, and are therefore shareable and
usable in the real world. The performance (measured by AUROC, AUPRC and
accuracy) of our model's synthetic, heterogeneous data is very close to the
original data set (within 3 - 5% of the baseline) for the non-DP model when
tested in a binary classification task. Using strong DP, our
model still produces data useful for machine learning tasks, albeit incurring a
roughly 17% performance penalty in our tested classification task. We
additionally perform a sub-population analysis and find that our model does not
introduce any bias into the synthetic EHR data compared to the baseline in
either male/female populations, or the 0-18, 19-50 and 51+ age groups in terms
of classification performance for either the non-DP or DP variant
On Lightweight Privacy-Preserving Collaborative Learning for IoT Objects
The Internet of Things (IoT) will be a main data generation infrastructure
for achieving better system intelligence. This paper considers the design and
implementation of a practical privacy-preserving collaborative learning scheme,
in which a curious learning coordinator trains a better machine learning model
based on the data samples contributed by a number of IoT objects, while the
confidentiality of the raw forms of the training data is protected against the
coordinator. Existing distributed machine learning and data encryption
approaches incur significant computation and communication overhead, rendering
them ill-suited for resource-constrained IoT objects. We study an approach that
applies independent Gaussian random projection at each IoT object to obfuscate
data and trains a deep neural network at the coordinator based on the projected
data from the IoT objects. This approach introduces light computation overhead
to the IoT objects and moves most workload to the coordinator that can have
sufficient computing resources. Although the independent projections performed
by the IoT objects address the potential collusion between the curious
coordinator and some compromised IoT objects, they significantly increase the
complexity of the projected data. In this paper, we leverage the superior
learning capability of deep learning in capturing sophisticated patterns to
maintain good learning performance. Extensive comparative evaluation shows that
this approach outperforms other lightweight approaches that apply additive
noisification for differential privacy and/or support vector machines for
learning in the applications with light data pattern complexities.Comment: 12 pages,IOTDI 201
Distributed Private Online Learning for Social Big Data Computing over Data Center Networks
With the rapid growth of Internet technologies, cloud computing and social
networks have become ubiquitous. An increasing number of people participate in
social networks and massive online social data are obtained. In order to
exploit knowledge from copious amounts of data obtained and predict social
behavior of users, we urge to realize data mining in social networks. Almost
all online websites use cloud services to effectively process the large scale
of social data, which are gathered from distributed data centers. These data
are so large-scale, high-dimension and widely distributed that we propose a
distributed sparse online algorithm to handle them. Additionally,
privacy-protection is an important point in social networks. We should not
compromise the privacy of individuals in networks, while these social data are
being learned for data mining. Thus we also consider the privacy problem in
this article. Our simulations shows that the appropriate sparsity of data would
enhance the performance of our algorithm and the privacy-preserving method does
not significantly hurt the performance of the proposed algorithm.Comment: ICC201
- âŠ