10,135 research outputs found
Privacy-preserving Logistic Regression with Secret Sharing
BACKGROUND: Logistic regression (LR) is a widely used classification method for modeling binary outcomes in many medical data classification tasks. Researchers that collect and combine datasets from various data custodians and jurisdictions can greatly benefit from the increased statistical power to support their analysis goals. However, combining data from different sources creates serious privacy concerns that need to be addressed. METHODS: In this paper, we propose two privacy-preserving protocols for performing logistic regression with the Newton–Raphson method in the estimation of parameters. Our proposals are based on secure Multi-Party Computation (MPC) and tailored to the honest majority and dishonest majority security settings. RESULTS: The proposed protocols are evaluated against both synthetic and real-world datasets in terms of efficiency and accuracy, and a comparison is made with the ordinary logistic regression. The experimental results demonstrate that the proposed protocols are highly efficient and accurate. CONCLUSIONS: Our work introduces two iterative algorithms to enable the distributed training of a logistic regression model in a privacy-preserving manner. The implementation results show that our algorithms can handle large datasets from multiple sources
Differentially Private Empirical Risk Minimization
Privacy-preserving machine learning algorithms are crucial for the
increasingly common setting in which personal data, such as medical or
financial records, are analyzed. We provide general techniques to produce
privacy-preserving approximations of classifiers learned via (regularized)
empirical risk minimization (ERM). These algorithms are private under the
-differential privacy definition due to Dwork et al. (2006). First we
apply the output perturbation ideas of Dwork et al. (2006), to ERM
classification. Then we propose a new method, objective perturbation, for
privacy-preserving machine learning algorithm design. This method entails
perturbing the objective function before optimizing over classifiers. If the
loss and regularizer satisfy certain convexity and differentiability criteria,
we prove theoretical results showing that our algorithms preserve privacy, and
provide generalization bounds for linear and nonlinear kernels. We further
present a privacy-preserving technique for tuning the parameters in general
machine learning algorithms, thereby providing end-to-end privacy guarantees
for the training process. We apply these results to produce privacy-preserving
analogues of regularized logistic regression and support vector machines. We
obtain encouraging results from evaluating their performance on real
demographic and benchmark data sets. Our results show that both theoretically
and empirically, objective perturbation is superior to the previous
state-of-the-art, output perturbation, in managing the inherent tradeoff
between privacy and learning performance.Comment: 40 pages, 7 figures, accepted to the Journal of Machine Learning
Researc
Is Vertical Logistic Regression Privacy-Preserving? A Comprehensive Privacy Analysis and Beyond
We consider vertical logistic regression (VLR) trained with mini-batch
gradient descent -- a setting which has attracted growing interest among
industries and proven to be useful in a wide range of applications including
finance and medical research. We provide a comprehensive and rigorous privacy
analysis of VLR in a class of open-source Federated Learning frameworks, where
the protocols might differ between one another, yet a procedure of obtaining
local gradients is implicitly shared. We first consider the honest-but-curious
threat model, in which the detailed implementation of protocol is neglected and
only the shared procedure is assumed, which we abstract as an oracle. We find
that even under this general setting, single-dimension feature and label can
still be recovered from the other party under suitable constraints of batch
size, thus demonstrating the potential vulnerability of all frameworks
following the same philosophy. Then we look into a popular instantiation of the
protocol based on Homomorphic Encryption (HE). We propose an active attack that
significantly weaken the constraints on batch size in the previous analysis via
generating and compressing auxiliary ciphertext. To address the privacy leakage
within the HE-based protocol, we develop a simple-yet-effective countermeasure
based on Differential Privacy (DP), and provide both utility and privacy
guarantees for the updated algorithm. Finally, we empirically verify the
effectiveness of our attack and defense on benchmark datasets. Altogether, our
findings suggest that all vertical federated learning frameworks that solely
depend on HE might contain severe privacy risks, and DP, which has already
demonstrated its power in horizontal federated learning, can also play a
crucial role in the vertical setting, especially when coupled with HE or secure
multi-party computation (MPC) techniques
Efficient Privacy Preserving Logistic Regression Inference and Training
Recently, privacy-preserving logistic regression techniques on distributed data among several data owners drew attention in terms of their applicability in federated learning environment. Many of them have been built upon cryptographic primitives such as secure multiparty computations(MPC) and homomorphic encryptions(HE) to protect the privacy of data. The secure multiparty computation provides fast and secure unit operations for arithmetic and bit operations but they often does not scale with large data well enough due to large computation cost and communication overhead. From recent works, many HE primitives provide their operations in a batch sense so that the technique can be an appropriate choice in a big data environment. However computationally expensive operations such as ciphertext slot rotation or refreshment(so called bootstrapping) and large public key size are hurdles that hamper widespread of the technique in the industry-level environment.
In this paper, we provide a new hybrid approach of a privacy-preserving logistic regression training and a inference, which utilizes both MPC and HE techniques to provide efficient and scalable solution while minimizing needs of key management and complexity of computation in encrypted state. Utilizing batch sense properties of HE, we present a method to securely compute multiplications of vectors and matrices using one HE multiplication, compared to the naive approach which requires linear number of multiplications regarding to the size of input data. We also show how we used a 2-party additive secret sharing scheme to control noises of expensive HE operations such as bootstrapping efficiently
DPpack: An R Package for Differentially Private Statistical Analysis and Machine Learning
Differential privacy (DP) is the state-of-the-art framework for guaranteeing
privacy for individuals when releasing aggregated statistics or building
statistical/machine learning models from data. We develop the open-source R
package DPpack that provides a large toolkit of differentially private
analysis. The current version of DPpack implements three popular mechanisms for
ensuring DP: Laplace, Gaussian, and exponential. Beyond that, DPpack provides a
large toolkit of easily accessible privacy-preserving descriptive statistics
functions. These include mean, variance, covariance, and quantiles, as well as
histograms and contingency tables. Finally, DPpack provides user-friendly
implementation of privacy-preserving versions of logistic regression, SVM, and
linear regression, as well as differentially private hyperparameter tuning for
each of these models. This extensive collection of implemented differentially
private statistics and models permits hassle-free utilization of differential
privacy principles in commonly performed statistical analysis. We plan to
continue developing DPpack and make it more comprehensive by including more
differentially private machine learning techniques, statistical modeling and
inference in the future
- …