18 research outputs found

    Differentially Private Bayesian Inference for Generalized Linear Models

    Get PDF
    Generalized linear models (GLMs) such as logistic regression are among the most widely used arms in data analyst's repertoire and often used on sensitive datasets. A large body of prior works that investigate GLMs under differential privacy (DP) constraints provide only private point estimates of the regression coefficients, and are not able to quantify parameter uncertainty. In this work, with logistic and Poisson regression as running examples, we introduce a generic noise-aware DP Bayesian inference method for a GLM at hand, given a noisy sum of summary statistics. Quantifying uncertainty allows us to determine which of the regression coefficients are statistically significantly different from zero. We provide a tight privacy analysis and experimentally demonstrate that the posteriors obtained from our model, while adhering to strong privacy guarantees, are close to the non-private posteriors.Peer reviewe

    SoK: Training Machine Learning Models over Multiple Sources with Privacy Preservation

    Full text link
    Nowadays, gathering high-quality training data from multiple data controllers with privacy preservation is a key challenge to train high-quality machine learning models. The potential solutions could dramatically break the barriers among isolated data corpus, and consequently enlarge the range of data available for processing. To this end, both academia researchers and industrial vendors are recently strongly motivated to propose two main-stream folders of solutions: 1) Secure Multi-party Learning (MPL for short); and 2) Federated Learning (FL for short). These two solutions have their advantages and limitations when we evaluate them from privacy preservation, ways of communication, communication overhead, format of data, the accuracy of trained models, and application scenarios. Motivated to demonstrate the research progress and discuss the insights on the future directions, we thoroughly investigate these protocols and frameworks of both MPL and FL. At first, we define the problem of training machine learning models over multiple data sources with privacy-preserving (TMMPP for short). Then, we compare the recent studies of TMMPP from the aspects of the technical routes, parties supported, data partitioning, threat model, and supported machine learning models, to show the advantages and limitations. Next, we introduce the state-of-the-art platforms which support online training over multiple data sources. Finally, we discuss the potential directions to resolve the problem of TMMPP.Comment: 17 pages, 4 figure

    Machine Learning Algorithms for Privacy-preserving Behavioral Data Analytics

    Get PDF
    PhD thesisBehavioral patterns observed in data generated by mobile and wearable devices are used by many applications, such as wellness monitoring or service personalization. However, sensitive information may be inferred from these data when they are shared with cloud-based services. In this thesis, we propose machine learning algorithms for data transformations to allow the inference of information required for specific tasks while preventing the inference of privacy-sensitive information. Specifically, we focus on protecting the user’s privacy when sharing motion-sensor data and web-browsing histories. Firstly, for human activity recognition using data of wearable sensors, we introduce two algorithms for training deep neural networks to transform motion-sensor data, focusing on two objectives: (i) to prevent the inference of privacy-sensitive activities (e.g. smoking or drinking), and (ii) to protect user’s sensitive attributes (e.g. gender) and prevent the re-identification of user. We show how to combine these two algorithms and propose a compound architecture that protects both sensitive activities and attributes. Alongside the algorithmic contributions, we published a motion-sensor dataset for human activity recognition. Secondly, to prevent the identification of users using their web-browsing behavior, we introduce an algorithm for privacy-preserving collaborative training of contextual bandit algorithms. The proposed method improves the accuracy of personalized recommendation agents that run locally on the user’s devices. We propose an encoding algorithm for the user’s web-browsing data that preserves the required information for the personalization of the future contents while ensuring differential privacy for the participants in collaborative training. In addition, for processing multivariate sensor data, we show how to make neural network architectures adaptive to dynamic sampling rate and sensor selection. This allows handling situations in human activity recognition where the dimensions of input data can be varied at inference time. Specifically, we introduce a customized pooling layer for neural networks and propose a customized training procedure to generalize over a large number of feasible data dimensions. Using the proposed architectural improvement, we show how to convert existing non-adaptive deep neural networks into an adaptive network while keeping the same classification accuracy. We conclude this thesis by discussing open questions and the potential future directions for continuing research in this area
    corecore