6 research outputs found
Differentially Private Regression for Discrete-Time Survival Analysis
In survival analysis, regression models are used to understand the effects of
explanatory variables (e.g., age, sex, weight, etc.) to the survival
probability. However, for sensitive survival data such as medical data, there
are serious concerns about the privacy of individuals in the data set when
medical data is used to fit the regression models. The closest work addressing
such privacy concerns is the work on Cox regression which linearly projects the
original data to a lower dimensional space. However, the weakness of this
approach is that there is no formal privacy guarantee for such projection. In
this work, we aim to propose solutions for the regression problem in survival
analysis with the protection of differential privacy which is a golden standard
of privacy protection in data privacy research. To this end, we extend the
Output Perturbation and Objective Perturbation approaches which are originally
proposed to protect differential privacy for the Empirical Risk Minimization
(ERM) problems. In addition, we also propose a novel sampling approach based on
the Markov Chain Monte Carlo (MCMC) method to practically guarantee
differential privacy with better accuracy. We show that our proposed approaches
achieve good accuracy as compared to the non-private results while guaranteeing
differential privacy for individuals in the private data set.Comment: 19 pages, CIKM1
ModelChain: Decentralized Privacy-Preserving Healthcare Predictive Modeling Framework on Private Blockchain Networks
Cross-institutional healthcare predictive modeling can accelerate research
and facilitate quality improvement initiatives, and thus is important for
national healthcare delivery priorities. For example, a model that predicts
risk of re-admission for a particular set of patients will be more
generalizable if developed with data from multiple institutions. While
privacy-protecting methods to build predictive models exist, most are based on
a centralized architecture, which presents security and robustness
vulnerabilities such as single-point-of-failure (and single-point-of-breach)
and accidental or malicious modification of records. In this article, we
describe a new framework, ModelChain, to adapt Blockchain technology for
privacy-preserving machine learning. Each participating site contributes to
model parameter estimation without revealing any patient health information
(i.e., only model data, no observation-level data, are exchanged across
institutions). We integrate privacy-preserving online machine learning with a
private Blockchain network, apply transaction metadata to disseminate partial
models, and design a new proof-of-information algorithm to determine the order
of the online learning process. We also discuss the benefits and potential
issues of applying Blockchain technology to solve the privacy-preserving
healthcare predictive modeling task and to increase interoperability between
institutions, to support the Nationwide Interoperability Roadmap and national
healthcare delivery priorities such as Patient-Centered Outcomes Research
(PCOR)
A Critical Overview of Privacy-Preserving Approaches for Collaborative Forecasting
Cooperation between different data owners may lead to an improvement in
forecast quality - for instance by benefiting from spatial-temporal
dependencies in geographically distributed time series. Due to business
competitive factors and personal data protection questions, said data owners
might be unwilling to share their data, which increases the interest in
collaborative privacy-preserving forecasting. This paper analyses the
state-of-the-art and unveils several shortcomings of existing methods in
guaranteeing data privacy when employing Vector Autoregressive (VAR) models.
The paper also provides mathematical proofs and numerical analysis to evaluate
existing privacy-preserving methods, dividing them into three groups: data
transformation, secure multi-party computations, and decomposition methods. The
analysis shows that state-of-the-art techniques have limitations in preserving
data privacy, such as a trade-off between privacy and forecasting accuracy,
while the original data in iterative model fitting processes, in which
intermediate results are shared, can be inferred after some iterations
Accurate training of the Cox proportional hazards model on vertically-partitioned data while preserving privacy
BACKGROUND: Analysing distributed medical data is challenging because of data sensitivity and various regulations to access and combine data. Some privacy-preserving methods are known for analyzing horizontally-partitioned data, where different organisations have similar data on disjoint sets of people. Technically more challenging is the case of vertically-partitioned data, dealing with data on overlapping sets of people. We use an emerging technology based on cryptographic techniques called secure multi-party computation (MPC), and apply it to perform privacy-preserving survival analysis on vertically-distributed data by means of the Cox proportional hazards (CPH) model. Both MPC and CPH are explained. METHODS: We use a Newton-Raphson solver to securely train the CPH model with MPC, jointly with all data holders, without revealing any sensitive data. In order to securely compute the log-partial likelihood in each iteration, we run into several technical challenges to preserve the efficiency and security of our solution. To tackle these technical challenges, we generalize a cryptographic protocol for securely computing the inverse of the Hessian matrix and develop a new method for securely computing exponentiations. A theoretical complexity estimate is given to get insight into the computational and communication effort that is needed. RESULTS: Our secure solution is implemented in a setting with three different machines, each presenting a different data holder, which can communicate through the internet. The MPyC platform is used for implementing this privacy-preserving solution to obtain the CPH model. We test the accuracy and computation time of our methods on three standard benchmark survival datasets. We identify future work to make our solution more efficient. CONCLUSIONS: Our secure solution is comparable with the standard, non-secure solver in terms of accuracy and convergence speed. The computation time is considerably larger, although the theoretical complexity is still cubic in the number of covariates and quadratic in the number of subjects. We conclude that this is a promising way of performing parametric survival analysis on vertically-distributed medical data, while realising high level of security and privacy