90 research outputs found
Privacy-preserving scoring of tree ensembles : a novel framework for AI in healthcare
Machine Learning (ML) techniques now impact a wide variety of domains. Highly regulated industries such as healthcare and finance have stringent compliance and data governance policies around data sharing. Advances in secure multiparty computation (SMC) for privacy-preserving machine learning (PPML) can help transform these regulated industries by allowing ML computations over encrypted data with personally identifiable information (PII). Yet very little of SMC-based PPML has been put into practice so far. In this paper we present the very first framework for privacy-preserving classification of tree ensembles with application in healthcare. We first describe the underlying cryptographic protocols that enable a healthcare organization to send encrypted data securely to a ML scoring service and obtain encrypted class labels without the scoring service actually seeing that input in the clear. We then describe the deployment challenges we solved to integrate these protocols in a cloud based scalable risk-prediction platform with multiple ML models for healthcare AI. Included are system internals, and evaluations of our deployment for supporting physicians to drive better clinical outcomes in an accurate, scalable, and provably secure manner. To the best of our knowledge, this is the first such applied framework with SMC-based privacy-preserving machine learning for healthcare
Enabling Privacy-preserving Auctions in Big Data
We study how to enable auctions in the big data context to solve many
upcoming data-based decision problems in the near future. We consider the
characteristics of the big data including, but not limited to, velocity,
volume, variety, and veracity, and we believe any auction mechanism design in
the future should take the following factors into consideration: 1) generality
(variety); 2) efficiency and scalability (velocity and volume); 3) truthfulness
and verifiability (veracity). In this paper, we propose a privacy-preserving
construction for auction mechanism design in the big data, which prevents
adversaries from learning unnecessary information except those implied in the
valid output of the auction. More specifically, we considered one of the most
general form of the auction (to deal with the variety), and greatly improved
the the efficiency and scalability by approximating the NP-hard problems and
avoiding the design based on garbled circuits (to deal with velocity and
volume), and finally prevented stakeholders from lying to each other for their
own benefit (to deal with the veracity). We achieve these by introducing a
novel privacy-preserving winner determination algorithm and a novel payment
mechanism. Additionally, we further employ a blind signature scheme as a
building block to let bidders verify the authenticity of their payment reported
by the auctioneer. The comparison with peer work shows that we improve the
asymptotic performance of peer works' overhead from the exponential growth to a
linear growth and from linear growth to a logarithmic growth, which greatly
improves the scalability
Supporting Regularized Logistic Regression Privately and Efficiently
As one of the most popular statistical and machine learning models, logistic
regression with regularization has found wide adoption in biomedicine, social
sciences, information technology, and so on. These domains often involve data
of human subjects that are contingent upon strict privacy regulations.
Increasing concerns over data privacy make it more and more difficult to
coordinate and conduct large-scale collaborative studies, which typically rely
on cross-institution data sharing and joint analysis. Our work here focuses on
safeguarding regularized logistic regression, a widely-used machine learning
model in various disciplines while at the same time has not been investigated
from a data security and privacy perspective. We consider a common use scenario
of multi-institution collaborative studies, such as in the form of research
consortia or networks as widely seen in genetics, epidemiology, social
sciences, etc. To make our privacy-enhancing solution practical, we demonstrate
a non-conventional and computationally efficient method leveraging distributing
computing and strong cryptography to provide comprehensive protection over
individual-level and summary data. Extensive empirical evaluation on several
studies validated the privacy guarantees, efficiency and scalability of our
proposal. We also discuss the practical implications of our solution for
large-scale studies and applications from various disciplines, including
genetic and biomedical studies, smart grid, network analysis, etc
A Novel Privacy-Preserved Recommender System Framework based on Federated Learning
Recommender System (RS) is currently an effective way to solve information
overload. To meet users' next click behavior, RS needs to collect users'
personal information and behavior to achieve a comprehensive and profound user
preference perception. However, these centrally collected data are
privacy-sensitive, and any leakage may cause severe problems to both users and
service providers. This paper proposed a novel privacy-preserved recommender
system framework (PPRSF), through the application of federated learning
paradigm, to enable the recommendation algorithm to be trained and carry out
inference without centrally collecting users' private data. The PPRSF not only
able to reduces the privacy leakage risk, satisfies legal and regulatory
requirements but also allows various recommendation algorithms to be applied
- …