76,034 research outputs found

    Deep Models Under the GAN: Information Leakage from Collaborative Deep Learning

    Full text link
    Deep Learning has recently become hugely popular in machine learning, providing significant improvements in classification accuracy in the presence of highly-structured and large databases. Researchers have also considered privacy implications of deep learning. Models are typically trained in a centralized manner with all the data being processed by the same training algorithm. If the data is a collection of users' private data, including habits, personal pictures, geographical positions, interests, and more, the centralized server will have access to sensitive information that could potentially be mishandled. To tackle this problem, collaborative deep learning models have recently been proposed where parties locally train their deep learning structures and only share a subset of the parameters in the attempt to keep their respective training sets private. Parameters can also be obfuscated via differential privacy (DP) to make information extraction even more challenging, as proposed by Shokri and Shmatikov at CCS'15. Unfortunately, we show that any privacy-preserving collaborative deep learning is susceptible to a powerful attack that we devise in this paper. In particular, we show that a distributed, federated, or decentralized deep learning approach is fundamentally broken and does not protect the training sets of honest participants. The attack we developed exploits the real-time nature of the learning process that allows the adversary to train a Generative Adversarial Network (GAN) that generates prototypical samples of the targeted training set that was meant to be private (the samples generated by the GAN are intended to come from the same distribution as the training data). Interestingly, we show that record-level DP applied to the shared parameters of the model, as suggested in previous work, is ineffective (i.e., record-level DP is not designed to address our attack).Comment: ACM CCS'17, 16 pages, 18 figure

    From Causal History to Social Network in Distributed Social Semantic Software

    Get PDF
    International audienceWeb 2.0 raises the importance of collaboration powered by social software. Social software clearly illustrated how it is possible to convert a community of strangers into a community of collaborators producing all together valuable content. However, collaboration is currently supported by collaboration providers such as Google, Yahoo, etc. following "Collaboration as a Service (CaaS)" approach. This approach arises privacy and censorship issues. Users have to trust CaaS providers for both security of hosted data and usage of collected data. Alternative approaches including private peer-to-peer networks, friend-to-friend networks, distributed version control systems, distributed peer-to-peer groupware, support collaboration without requiring a collaboration provider. Collaboration is powered with the resources provided by the users. If it is easy for a collaboration provider to extract the complete social network graph from the observed interactions. Obtaining social network informations in the distributed approach is more challenging. In fact, the distributed approach is designed to protect privacy of users and thus makes extracting the whole social network difficult. In this paper, we show how it is possible to compute a local view of the social network on each site in a distributed collaborative system approach

    Protecting Patient Privacy: Strategies for Regulating Electronic Health Records Exchange

    Get PDF
    The report offers policymakers 10 recommendations to protect patient privacy as New York state develops a centralized system for sharing electronic medical records. Those recommendations include:Require that the electronic systems employed by HIEs have the capability to sort and segregate medical information in order to comply with guaranteed privacy protections of New York and federal law. Presently, they do not.Offer patients the right to opt-out of the system altogether. Currently, people's records can be uploaded to the system without their consent.Require that patient consent forms offer clear information-sharing options. The forms should give patients three options: to opt-in and allow providers access to their electronic medical records, to opt-out except in the event of a medical emergency, or to opt-out altogether.Prohibit and sanction the misuse of medical information. New York must protect patients from potential bad actors--that small minority of providers who may abuse information out of fear, prejudice or malice.Prohibit the health information-sharing networks from selling data. The State Legislature should pass legislation prohibiting the networks from selling patients' private health information

    Supporting Regularized Logistic Regression Privately and Efficiently

    Full text link
    As one of the most popular statistical and machine learning models, logistic regression with regularization has found wide adoption in biomedicine, social sciences, information technology, and so on. These domains often involve data of human subjects that are contingent upon strict privacy regulations. Increasing concerns over data privacy make it more and more difficult to coordinate and conduct large-scale collaborative studies, which typically rely on cross-institution data sharing and joint analysis. Our work here focuses on safeguarding regularized logistic regression, a widely-used machine learning model in various disciplines while at the same time has not been investigated from a data security and privacy perspective. We consider a common use scenario of multi-institution collaborative studies, such as in the form of research consortia or networks as widely seen in genetics, epidemiology, social sciences, etc. To make our privacy-enhancing solution practical, we demonstrate a non-conventional and computationally efficient method leveraging distributing computing and strong cryptography to provide comprehensive protection over individual-level and summary data. Extensive empirical evaluation on several studies validated the privacy guarantees, efficiency and scalability of our proposal. We also discuss the practical implications of our solution for large-scale studies and applications from various disciplines, including genetic and biomedical studies, smart grid, network analysis, etc
    • …
    corecore