176 research outputs found

    Privacy in the Genomic Era

    Get PDF
    Genome sequencing technology has advanced at a rapid pace and it is now possible to generate highly-detailed genotypes inexpensively. The collection and analysis of such data has the potential to support various applications, including personalized medical services. While the benefits of the genomics revolution are trumpeted by the biomedical community, the increased availability of such data has major implications for personal privacy; notably because the genome has certain essential features, which include (but are not limited to) (i) an association with traits and certain diseases, (ii) identification capability (e.g., forensics), and (iii) revelation of family relationships. Moreover, direct-to-consumer DNA testing increases the likelihood that genome data will be made available in less regulated environments, such as the Internet and for-profit companies. The problem of genome data privacy thus resides at the crossroads of computer science, medicine, and public policy. While the computer scientists have addressed data privacy for various data types, there has been less attention dedicated to genomic data. Thus, the goal of this paper is to provide a systematization of knowledge for the computer science community. In doing so, we address some of the (sometimes erroneous) beliefs of this field and we report on a survey we conducted about genome data privacy with biomedical specialists. Then, after characterizing the genome privacy problem, we review the state-of-the-art regarding privacy attacks on genomic data and strategies for mitigating such attacks, as well as contextualizing these attacks from the perspective of medicine and public policy. This paper concludes with an enumeration of the challenges for genome data privacy and presents a framework to systematize the analysis of threats and the design of countermeasures as the field moves forward

    A Systematic Literature Review of Individuals\u27 Perspectives on Privacy and Genetic Information in the United States

    Get PDF
    Concerns about genetic privacy affect individuals\u27 willingness to accept genetic testing in clinical care and to participate in genomics research. To learn what is already known about these views, we conducted a systematic review, which ultimately analyzed 53 studies involving the perspectives of 47,974 participants on real or hypothetical privacy issues related to human genetic data. Bibliographic databases included MEDLINE, Web of Knowledge, and Sociological Abstracts. Three investigators independently screened studies against predetermined criteria and assessed risk of bias. The picture of genetic privacy that emerges from this systematic literature review is complex and riddled with gaps. When asked specifically are you worried about genetic privacy, the general public, patients, and professionals frequently said yes. In many cases, however, that question was posed poorly or only in the most general terms. While many participants expressed concern that genomic and medical information would be revealed to others, respondents frequently seemed to conflate privacy, confidentiality, control, and security. People varied widely in how much control they wanted over the use of data. They were more concerned about use by employers, insurers, and the government than they were about researchers and commercial entities. In addition, people are often willing to give up some privacy to obtain other goods. Importantly, little attention was paid to understanding the factor-sociocultural, relational, and media - that influence people\u27s opinions and decisions. Future investigations should explore in greater depth which concerns about genetic privacy are most salient to people and the social forces and contexts that influence those perceptions. It is also critical to identify the social practices that will make the collection and use of these data more trustworthy for participants as well as to identify the circumstances that lead people to set aside worries and decide to participate in research

    The Luminosity Function of Young Star Clusters In "The Antennae" Galaxies (NGC 4038/4039)

    Get PDF
    The WFPC2 of the HST has been used to obtain high-resolution images of NGC 4038/4039 that go roughly 3 magnitudes deeper in V than previous observations made during Cycle 2 (-14 < M_V < -6). To first order the luminosity function (LF) is a power law, with exponent \alpha = -2.12 +/- 0.04. However, after decoupling the cluster and stellar LFs, which overlap in the range -9 < M_V < -6, we find an apparent bend in the young cluster LF at approximately M_V = -10.4. The LF has a power law exponent -2.6 +/- 0.2 in the brightward and -1.7 +/- 0.2 in the faintward. The bend corresponds to a mass ~ 10^5 M_{\odot}, only slightly lower than the characteristic mass of globular clusters in the Milky Way (~2x10^5 M_{\odot}). The star clusters of the Antennae appear slightly resolved, with median effective radii of 4 +/- 1 pc, similar to or perhaps slightly larger than those of globular clusters in our Galaxy. However, the radial extents of some of the very young clusters (ages < 10 Myr) are much larger than those of old globular clusters. A combination of the UBVI colors, \Halpha morphology, and GHRS spectra enables us to age-date the clusters in different regions of The Antennae. We find two groups of young star clusters with ages <~ 20Myr and ~100Myr, as well as an intermediate-age group (~500 Myr) and a handful of old globular clusters from the progenitor galaxies. Age estimates derived from GHRS spectroscopy yield 3 +/- 1 Myr for Knot K (just south of the nucleus of NGC 4038) and 7 +/- 1 Myr for Knot S in the Western Loop, in good agreement with ages derived from the UBVI colors. Effective gas-outflow velocities from Knots S and K are estimated to be about 25-30 km/s. However, the measured widths of the interstellar absorption lines suggest dispersion velocities of ~400 km/s along the lines of sight to Knots S and K.Comment: 56 pages, 4 tables and 23 figures, texts in AAS style, to be published in A

    Split Learning for Distributed Collaborative Training of Deep Learning Models in Health Informatics

    Full text link
    Deep learning continues to rapidly evolve and is now demonstrating remarkable potential for numerous medical prediction tasks. However, realizing deep learning models that generalize across healthcare organizations is challenging. This is due, in part, to the inherent siloed nature of these organizations and patient privacy requirements. To address this problem, we illustrate how split learning can enable collaborative training of deep learning models across disparate and privately maintained health datasets, while keeping the original records and model parameters private. We introduce a new privacy-preserving distributed learning framework that offers a higher level of privacy compared to conventional federated learning. We use several biomedical imaging and electronic health record (EHR) datasets to show that deep learning models trained via split learning can achieve highly similar performance to their centralized and federated counterparts while greatly improving computational efficiency and reducing privacy risks

    Risk of re-identification for shared clinical speech recordings

    Full text link
    Large, curated datasets are required to leverage speech-based tools in healthcare. These are costly to produce, resulting in increased interest in data sharing. As speech can potentially identify speakers (i.e., voiceprints), sharing recordings raises privacy concerns. We examine the re-identification risk for speech recordings, without reference to demographic or metadata, using a state-of-the-art speaker recognition system. We demonstrate that the risk is inversely related to the number of comparisons an adversary must consider, i.e., the search space. Risk is high for a small search space but drops as the search space grows (precision>0.85precision >0.85 for <1∗106<1*10^{6} comparisons, precision3∗106precision 3*10^{6} comparisons). Next, we show that the nature of a speech recording influences re-identification risk, with non-connected speech (e.g., vowel prolongation) being harder to identify. Our findings suggest that speaker recognition systems can be used to re-identify participants in specific circumstances, but in practice, the re-identification risk appears low.Comment: 24 pages, 6 figure

    A Multifaceted Benchmarking of Synthetic Electronic Health Record Generation Models

    Full text link
    Synthetic health data have the potential to mitigate privacy concerns when sharing data to support biomedical research and the development of innovative healthcare applications. Modern approaches for data generation based on machine learning, generative adversarial networks (GAN) methods in particular, continue to evolve and demonstrate remarkable potential. Yet there is a lack of a systematic assessment framework to benchmark methods as they emerge and determine which methods are most appropriate for which use cases. In this work, we introduce a generalizable benchmarking framework to appraise key characteristics of synthetic health data with respect to utility and privacy metrics. We apply the framework to evaluate synthetic data generation methods for electronic health records (EHRs) data from two large academic medical centers with respect to several use cases. The results illustrate that there is a utility-privacy tradeoff for sharing synthetic EHR data. The results further indicate that no method is unequivocally the best on all criteria in each use case, which makes it evident why synthetic data generation methods need to be assessed in context
    • 

    corecore