48,277 research outputs found
Privacy-preserving scoring of tree ensembles : a novel framework for AI in healthcare
Machine Learning (ML) techniques now impact a wide variety of domains. Highly regulated industries such as healthcare and finance have stringent compliance and data governance policies around data sharing. Advances in secure multiparty computation (SMC) for privacy-preserving machine learning (PPML) can help transform these regulated industries by allowing ML computations over encrypted data with personally identifiable information (PII). Yet very little of SMC-based PPML has been put into practice so far. In this paper we present the very first framework for privacy-preserving classification of tree ensembles with application in healthcare. We first describe the underlying cryptographic protocols that enable a healthcare organization to send encrypted data securely to a ML scoring service and obtain encrypted class labels without the scoring service actually seeing that input in the clear. We then describe the deployment challenges we solved to integrate these protocols in a cloud based scalable risk-prediction platform with multiple ML models for healthcare AI. Included are system internals, and evaluations of our deployment for supporting physicians to drive better clinical outcomes in an accurate, scalable, and provably secure manner. To the best of our knowledge, this is the first such applied framework with SMC-based privacy-preserving machine learning for healthcare
Privacy-preserving machine learning for healthcare: open challenges and future perspectives
Machine Learning (ML) has recently shown tremendous success in modeling
various healthcare prediction tasks, ranging from disease diagnosis and
prognosis to patient treatment. Due to the sensitive nature of medical data,
privacy must be considered along the entire ML pipeline, from model training to
inference. In this paper, we conduct a review of recent literature concerning
Privacy-Preserving Machine Learning (PPML) for healthcare. We primarily focus
on privacy-preserving training and inference-as-a-service, and perform a
comprehensive review of existing trends, identify challenges, and discuss
opportunities for future research directions. The aim of this review is to
guide the development of private and efficient ML models in healthcare, with
the prospects of translating research efforts into real-world settings.Comment: ICLR 2023 Workshop on Trustworthy Machine Learning for Healthcare
(TML4H
Location Privacy in the Era of Big Data and Machine Learning
Location data of individuals is one of the most sensitive sources of information that once revealed to ill-intended individuals or service providers, can cause severe privacy concerns. In this thesis, we aim at preserving the privacy of users in telecommunication networks against untrusted service providers as well as improving their privacy in the publication of location datasets. For improving the location privacy of users in telecommunication networks, we consider the movement of users in trajectories and investigate the threats that the query history may pose on location privacy. We develop an attack model based on the Viterbi algorithm termed as Viterbi attack, which represents a realistic privacy threat in trajectories. Next, we propose a metric called transition entropy that helps to evaluate the performance of dummy generation algorithms, followed by developing a robust dummy generation algorithm that can defend users against the Viterbi attack. We compare and evaluate our proposed algorithm and metric on a publicly available dataset published by Microsoft, i.e., Geolife dataset. For privacy preserving data publishing, an enhanced framework for anonymization of spatio-temporal trajectory datasets termed the machine learning based anonymization (MLA) is proposed. The framework consists of a robust alignment technique and a machine learning approach for clustering datasets. The framework and all the proposed algorithms are applied to the Geolife dataset, which includes GPS logs of over 180 users in Beijing, China
APPFLx: Providing Privacy-Preserving Cross-Silo Federated Learning as a Service
Cross-silo privacy-preserving federated learning (PPFL) is a powerful tool to
collaboratively train robust and generalized machine learning (ML) models
without sharing sensitive (e.g., healthcare of financial) local data. To ease
and accelerate the adoption of PPFL, we introduce APPFLx, a ready-to-use
platform that provides privacy-preserving cross-silo federated learning as a
service. APPFLx employs Globus authentication to allow users to easily and
securely invite trustworthy collaborators for PPFL, implements several
synchronous and asynchronous FL algorithms, streamlines the FL experiment
launch process, and enables tracking and visualizing the life cycle of FL
experiments, allowing domain experts and ML practitioners to easily orchestrate
and evaluate cross-silo FL under one platform. APPFLx is available online at
https://appflx.lin
A Privacy-Preserving Outsourced Data Model in Cloud Environment
Nowadays, more and more machine learning applications, such as medical
diagnosis, online fraud detection, email spam filtering, etc., services are
provided by cloud computing. The cloud service provider collects the data from
the various owners to train or classify the machine learning system in the
cloud environment. However, multiple data owners may not entirely rely on the
cloud platform that a third party engages. Therefore, data security and privacy
problems are among the critical hindrances to using machine learning tools,
particularly with multiple data owners. In addition, unauthorized entities can
detect the statistical input data and infer the machine learning model
parameters. Therefore, a privacy-preserving model is proposed, which protects
the privacy of the data without compromising machine learning efficiency. In
order to protect the data of data owners, the epsilon-differential privacy is
used, and fog nodes are used to address the problem of the lower bandwidth and
latency in this proposed scheme. The noise is produced by the
epsilon-differential mechanism, which is then added to the data. Moreover, the
noise is injected at the data owner site to protect the owners data. Fog nodes
collect the noise-added data from the data owners, then shift it to the cloud
platform for storage, computation, and performing the classification tasks
purposes
Adversarial Learning of Privacy-Preserving and Task-Oriented Representations
Data privacy has emerged as an important issue as data-driven deep learning
has been an essential component of modern machine learning systems. For
instance, there could be a potential privacy risk of machine learning systems
via the model inversion attack, whose goal is to reconstruct the input data
from the latent representation of deep networks. Our work aims at learning a
privacy-preserving and task-oriented representation to defend against such
model inversion attacks. Specifically, we propose an adversarial reconstruction
learning framework that prevents the latent representations decoded into
original input data. By simulating the expected behavior of adversary, our
framework is realized by minimizing the negative pixel reconstruction loss or
the negative feature reconstruction (i.e., perceptual distance) loss. We
validate the proposed method on face attribute prediction, showing that our
method allows protecting visual privacy with a small decrease in utility
performance. In addition, we show the utility-privacy trade-off with different
choices of hyperparameter for negative perceptual distance loss at training,
allowing service providers to determine the right level of privacy-protection
with a certain utility performance. Moreover, we provide an extensive study
with different selections of features, tasks, and the data to further analyze
their influence on privacy protection
- …