51,556 research outputs found
NegDL: Privacy-Preserving Deep Learning Based on Negative Database
In the era of big data, deep learning has become an increasingly popular
topic. It has outstanding achievements in the fields of image recognition,
object detection, and natural language processing et al. The first priority of
deep learning is exploiting valuable information from a large amount of data,
which will inevitably induce privacy issues that are worthy of attention.
Presently, several privacy-preserving deep learning methods have been proposed,
but most of them suffer from a non-negligible degradation of either efficiency
or accuracy. Negative database (\textit{NDB}) is a new type of data
representation which can protect data privacy by storing and utilizing the
complementary form of original data. In this paper, we propose a
privacy-preserving deep learning method named NegDL based on \textit{NDB}.
Specifically, private data are first converted to \textit{NDB} as the input of
deep learning models by a generation algorithm called \textit{QK}-hidden
algorithm, and then the sketches of \textit{NDB} are extracted for training and
inference. We demonstrate that the computational complexity of NegDL is the
same as the original deep learning model without privacy protection.
Experimental results on Breast Cancer, MNIST, and CIFAR-10 benchmark datasets
demonstrate that the accuracy of NegDL could be comparable to the original deep
learning model in most cases, and it performs better than the method based on
differential privacy
Share your Model instead of your Data: Privacy Preserving Mimic Learning for Ranking
Deep neural networks have become a primary tool for solving problems in many
fields. They are also used for addressing information retrieval problems and
show strong performance in several tasks. Training these models requires large,
representative datasets and for most IR tasks, such data contains sensitive
information from users. Privacy and confidentiality concerns prevent many data
owners from sharing the data, thus today the research community can only
benefit from research on large-scale datasets in a limited manner. In this
paper, we discuss privacy preserving mimic learning, i.e., using predictions
from a privacy preserving trained model instead of labels from the original
sensitive training data as a supervision signal. We present the results of
preliminary experiments in which we apply the idea of mimic learning and
privacy preserving mimic learning for the task of document re-ranking as one of
the core IR tasks. This research is a step toward laying the ground for
enabling researchers from data-rich environments to share knowledge learned
from actual users' data, which should facilitate research collaborations.Comment: SIGIR 2017 Workshop on Neural Information Retrieval
(Neu-IR'17)}{}{August 7--11, 2017, Shinjuku, Tokyo, Japa
Privacy-preserving scoring of tree ensembles : a novel framework for AI in healthcare
Machine Learning (ML) techniques now impact a wide variety of domains. Highly regulated industries such as healthcare and finance have stringent compliance and data governance policies around data sharing. Advances in secure multiparty computation (SMC) for privacy-preserving machine learning (PPML) can help transform these regulated industries by allowing ML computations over encrypted data with personally identifiable information (PII). Yet very little of SMC-based PPML has been put into practice so far. In this paper we present the very first framework for privacy-preserving classification of tree ensembles with application in healthcare. We first describe the underlying cryptographic protocols that enable a healthcare organization to send encrypted data securely to a ML scoring service and obtain encrypted class labels without the scoring service actually seeing that input in the clear. We then describe the deployment challenges we solved to integrate these protocols in a cloud based scalable risk-prediction platform with multiple ML models for healthcare AI. Included are system internals, and evaluations of our deployment for supporting physicians to drive better clinical outcomes in an accurate, scalable, and provably secure manner. To the best of our knowledge, this is the first such applied framework with SMC-based privacy-preserving machine learning for healthcare
Enabling Privacy-preserving Auctions in Big Data
We study how to enable auctions in the big data context to solve many
upcoming data-based decision problems in the near future. We consider the
characteristics of the big data including, but not limited to, velocity,
volume, variety, and veracity, and we believe any auction mechanism design in
the future should take the following factors into consideration: 1) generality
(variety); 2) efficiency and scalability (velocity and volume); 3) truthfulness
and verifiability (veracity). In this paper, we propose a privacy-preserving
construction for auction mechanism design in the big data, which prevents
adversaries from learning unnecessary information except those implied in the
valid output of the auction. More specifically, we considered one of the most
general form of the auction (to deal with the variety), and greatly improved
the the efficiency and scalability by approximating the NP-hard problems and
avoiding the design based on garbled circuits (to deal with velocity and
volume), and finally prevented stakeholders from lying to each other for their
own benefit (to deal with the veracity). We achieve these by introducing a
novel privacy-preserving winner determination algorithm and a novel payment
mechanism. Additionally, we further employ a blind signature scheme as a
building block to let bidders verify the authenticity of their payment reported
by the auctioneer. The comparison with peer work shows that we improve the
asymptotic performance of peer works' overhead from the exponential growth to a
linear growth and from linear growth to a logarithmic growth, which greatly
improves the scalability
Pyramid: Enhancing Selectivity in Big Data Protection with Count Featurization
Protecting vast quantities of data poses a daunting challenge for the growing
number of organizations that collect, stockpile, and monetize it. The ability
to distinguish data that is actually needed from data collected "just in case"
would help these organizations to limit the latter's exposure to attack. A
natural approach might be to monitor data use and retain only the working-set
of in-use data in accessible storage; unused data can be evicted to a highly
protected store. However, many of today's big data applications rely on machine
learning (ML) workloads that are periodically retrained by accessing, and thus
exposing to attack, the entire data store. Training set minimization methods,
such as count featurization, are often used to limit the data needed to train
ML workloads to improve performance or scalability. We present Pyramid, a
limited-exposure data management system that builds upon count featurization to
enhance data protection. As such, Pyramid uniquely introduces both the idea and
proof-of-concept for leveraging training set minimization methods to instill
rigor and selectivity into big data management. We integrated Pyramid into
Spark Velox, a framework for ML-based targeting and personalization. We
evaluate it on three applications and show that Pyramid approaches
state-of-the-art models while training on less than 1% of the raw data
- …