3,131 research outputs found

    Privacy-Preserving Ridge Regression on Distributed Data

    Get PDF
    Linear regression is an important statistical tool that models the relationship between some explanatory values and an outcome value using a linear function. In many current applications (e.g. predictive modelling in personalized healthcare), these values represent sensitive data owned by several different parties that are unwilling to share them. In this setting, training a linear regression model becomes challenging and needs specific cryptographic solutions. In this work, we propose a new system that can train a linear regression model with 2-norm regularization (i.e. ridge regression) on a dataset obtained by merging a finite number of private datasets. Our system is composed of two phases: The first one is based on a simple homomorphic encryption scheme and takes care of securely merging the private datasets. The second phase is a new ad-hoc two-party protocol that computes a ridge regression model solving a linear system where all coefficients are encrypted. The efficiency of our system is evaluated both on synthetically generated and real-world datasets

    Supporting Regularized Logistic Regression Privately and Efficiently

    Full text link
    As one of the most popular statistical and machine learning models, logistic regression with regularization has found wide adoption in biomedicine, social sciences, information technology, and so on. These domains often involve data of human subjects that are contingent upon strict privacy regulations. Increasing concerns over data privacy make it more and more difficult to coordinate and conduct large-scale collaborative studies, which typically rely on cross-institution data sharing and joint analysis. Our work here focuses on safeguarding regularized logistic regression, a widely-used machine learning model in various disciplines while at the same time has not been investigated from a data security and privacy perspective. We consider a common use scenario of multi-institution collaborative studies, such as in the form of research consortia or networks as widely seen in genetics, epidemiology, social sciences, etc. To make our privacy-enhancing solution practical, we demonstrate a non-conventional and computationally efficient method leveraging distributing computing and strong cryptography to provide comprehensive protection over individual-level and summary data. Extensive empirical evaluation on several studies validated the privacy guarantees, efficiency and scalability of our proposal. We also discuss the practical implications of our solution for large-scale studies and applications from various disciplines, including genetic and biomedical studies, smart grid, network analysis, etc
    • …
    corecore