8,203 research outputs found
Towards Differentially Private Text Representations
Most deep learning frameworks require users to pool their local data or model
updates to a trusted server to train or maintain a global model. The assumption
of a trusted server who has access to user information is ill-suited in many
applications. To tackle this problem, we develop a new deep learning framework
under an untrusted server setting, which includes three modules: (1) embedding
module, (2) randomization module, and (3) classifier module. For the
randomization module, we propose a novel local differentially private (LDP)
protocol to reduce the impact of privacy parameter on accuracy, and
provide enhanced flexibility in choosing randomization probabilities for LDP.
Analysis and experiments show that our framework delivers comparable or even
better performance than the non-private framework and existing LDP protocols,
demonstrating the advantages of our LDP protocol.Comment: Accepted to SIGIR'2
Auditing Data Provenance in Text-Generation Models
To help enforce data-protection regulations such as GDPR and detect
unauthorized uses of personal data, we develop a new \emph{model auditing}
technique that helps users check if their data was used to train a machine
learning model. We focus on auditing deep-learning models that generate
natural-language text, including word prediction and dialog generation. These
models are at the core of popular online services and are often trained on
personal data such as users' messages, searches, chats, and comments.
We design and evaluate a black-box auditing method that can detect, with very
few queries to a model, if a particular user's texts were used to train it
(among thousands of other users). We empirically show that our method can
successfully audit well-generalized models that are not overfitted to the
training data. We also analyze how text-generation models memorize word
sequences and explain why this memorization makes them amenable to auditing
Deep Learning Towards Mobile Applications
Recent years have witnessed an explosive growth of mobile devices. Mobile
devices are permeating every aspect of our daily lives. With the increasing
usage of mobile devices and intelligent applications, there is a soaring demand
for mobile applications with machine learning services. Inspired by the
tremendous success achieved by deep learning in many machine learning tasks, it
becomes a natural trend to push deep learning towards mobile applications.
However, there exist many challenges to realize deep learning in mobile
applications, including the contradiction between the miniature nature of
mobile devices and the resource requirement of deep neural networks, the
privacy and security concerns about individuals' data, and so on. To resolve
these challenges, during the past few years, great leaps have been made in this
area. In this paper, we provide an overview of the current challenges and
representative achievements about pushing deep learning on mobile devices from
three aspects: training with mobile data, efficient inference on mobile
devices, and applications of mobile deep learning. The former two aspects cover
the primary tasks of deep learning. Then, we go through our two recent
applications that apply the data collected by mobile devices to inferring mood
disturbance and user identification. Finally, we conclude this paper with the
discussion of the future of this area.Comment: Conference version accepted by ICDCS'1
An Overview of Privacy in Machine Learning
Over the past few years, providers such as Google, Microsoft, and Amazon have
started to provide customers with access to software interfaces allowing them
to easily embed machine learning tasks into their applications. Overall,
organizations can now use Machine Learning as a Service (MLaaS) engines to
outsource complex tasks, e.g., training classifiers, performing predictions,
clustering, etc. They can also let others query models trained on their data.
Naturally, this approach can also be used (and is often advocated) in other
contexts, including government collaborations, citizen science projects, and
business-to-business partnerships. However, if malicious users were able to
recover data used to train these models, the resulting information leakage
would create serious issues. Likewise, if the inner parameters of the model are
considered proprietary information, then access to the model should not allow
an adversary to learn such parameters. In this document, we set to review
privacy challenges in this space, providing a systematic review of the relevant
research literature, also exploring possible countermeasures. More
specifically, we provide ample background information on relevant concepts
around machine learning and privacy. Then, we discuss possible adversarial
models and settings, cover a wide range of attacks that relate to private
and/or sensitive information leakage, and review recent results attempting to
defend against such attacks. Finally, we conclude with a list of open problems
that require more work, including the need for better evaluations, more
targeted defenses, and the study of the relation to policy and data protection
efforts
Learning Differentially Private Recurrent Language Models
We demonstrate that it is possible to train large recurrent language models
with user-level differential privacy guarantees with only a negligible cost in
predictive accuracy. Our work builds on recent advances in the training of deep
networks on user-partitioned data and privacy accounting for stochastic
gradient descent. In particular, we add user-level privacy protection to the
federated averaging algorithm, which makes "large step" updates from user-level
data. Our work demonstrates that given a dataset with a sufficiently large
number of users (a requirement easily met by even small internet-scale
datasets), achieving differential privacy comes at the cost of increased
computation, rather than in decreased utility as in most prior work. We find
that our private LSTM language models are quantitatively and qualitatively
similar to un-noised models when trained on a large dataset.Comment: Camera-ready ICLR 2018 version, minor edits from previou
Security and Privacy Issues in Deep Learning
With the development of machine learning (ML), expectations for artificial
intelligence (AI) technology have been increasing daily. In particular, deep
neural networks have shown outstanding performance results in many fields. Many
applications are deeply involved in our daily life, such as making significant
decisions in application areas based on predictions or classifications, in
which a DL model could be relevant. Hence, if a DL model causes mispredictions
or misclassifications due to malicious external influences, then it can cause
very large difficulties in real life. Moreover, training DL models involve an
enormous amount of data and the training data often include sensitive
information. Therefore, DL models should not expose the privacy of such data.
In this paper, we review the vulnerabilities and the developed defense methods
on the security of the models and data privacy under the notion of secure and
private AI (SPAI). We also discuss current challenges and open issues
Benchmarking Differential Privacy and Federated Learning for BERT Models
Natural Language Processing (NLP) techniques can be applied to help with the
diagnosis of medical conditions such as depression, using a collection of a
person's utterances. Depression is a serious medical illness that can have
adverse effects on how one feels, thinks, and acts, which can lead to emotional
and physical problems. Due to the sensitive nature of such data, privacy
measures need to be taken for handling and training models with such data. In
this work, we study the effects that the application of Differential Privacy
(DP) has, in both a centralized and a Federated Learning (FL) setup, on
training contextualized language models (BERT, ALBERT, RoBERTa and DistilBERT).
We offer insights on how to privately train NLP models and what architectures
and setups provide more desirable privacy utility trade-offs. We envisage this
work to be used in future healthcare and mental health studies to keep medical
history private. Therefore, we provide an open-source implementation of this
work.Comment: 4 pages, 3 tables, 1 figur
Privacy-Preserving Graph Convolutional Networks for Text Classification
Graph convolutional networks (GCNs) are a powerful architecture for
representation learning on documents that naturally occur as graphs, e.g.,
citation or social networks. However, sensitive personal information, such as
documents with people's profiles or relationships as edges, are prone to
privacy leaks, as the trained model might reveal the original input. Although
differential privacy (DP) offers a well-founded privacy-preserving framework,
GCNs pose theoretical and practical challenges due to their training specifics.
We address these challenges by adapting differentially-private gradient-based
training to GCNs and conduct experiments using two optimizers on five NLP
datasets in two languages. We propose a simple yet efficient method based on
random graph splits that not only improves the baseline privacy bounds by a
factor of 2.7 while retaining competitive F1 scores, but also provides strong
privacy guarantees of epsilon = 1.0. We show that, under certain modeling
choices, privacy-preserving GCNs perform up to 90% of their non-private
variants, while formally guaranteeing strong privacy measures
Towards Training Graph Neural Networks with Node-Level Differential Privacy
Graph Neural Networks (GNNs) have achieved great success in mining
graph-structured data. Despite the superior performance of GNNs in learning
graph representations, serious privacy concerns have been raised for the
trained models which could expose the sensitive information of graphs. We
conduct the first formal study of training GNN models to ensure utility while
satisfying the rigorous node-level differential privacy considering the private
information of both node features and edges. We adopt the training framework
utilizing personalized PageRank to decouple the message-passing process from
feature aggregation during training GNN models and propose differentially
private PageRank algorithms to protect graph topology information formally.
Furthermore, we analyze the privacy degradation caused by the sampling process
dependent on the differentially private PageRank results during model training
and propose a differentially private GNN (DPGNN) algorithm to further protect
node features and achieve rigorous node-level differential privacy. Extensive
experiments on real-world graph datasets demonstrate the effectiveness of the
proposed algorithms for providing node-level differential privacy while
preserving good model utility
Towards Federated Learning at Scale: System Design
Federated Learning is a distributed machine learning approach which enables
model training on a large corpus of decentralized data. We have built a
scalable production system for Federated Learning in the domain of mobile
devices, based on TensorFlow. In this paper, we describe the resulting
high-level design, sketch some of the challenges and their solutions, and touch
upon the open problems and future directions
- …