39,805 research outputs found

    Privacy in the Genomic Era

    Get PDF
    Genome sequencing technology has advanced at a rapid pace and it is now possible to generate highly-detailed genotypes inexpensively. The collection and analysis of such data has the potential to support various applications, including personalized medical services. While the benefits of the genomics revolution are trumpeted by the biomedical community, the increased availability of such data has major implications for personal privacy; notably because the genome has certain essential features, which include (but are not limited to) (i) an association with traits and certain diseases, (ii) identification capability (e.g., forensics), and (iii) revelation of family relationships. Moreover, direct-to-consumer DNA testing increases the likelihood that genome data will be made available in less regulated environments, such as the Internet and for-profit companies. The problem of genome data privacy thus resides at the crossroads of computer science, medicine, and public policy. While the computer scientists have addressed data privacy for various data types, there has been less attention dedicated to genomic data. Thus, the goal of this paper is to provide a systematization of knowledge for the computer science community. In doing so, we address some of the (sometimes erroneous) beliefs of this field and we report on a survey we conducted about genome data privacy with biomedical specialists. Then, after characterizing the genome privacy problem, we review the state-of-the-art regarding privacy attacks on genomic data and strategies for mitigating such attacks, as well as contextualizing these attacks from the perspective of medicine and public policy. This paper concludes with an enumeration of the challenges for genome data privacy and presents a framework to systematize the analysis of threats and the design of countermeasures as the field moves forward

    Formulating genome-scale kinetic models in the post-genome era.

    Get PDF
    The biological community is now awash in high-throughput data sets and is grappling with the challenge of integrating disparate data sets. Such integration has taken the form of statistical analysis of large data sets, or through the bottom-up reconstruction of reaction networks. While progress has been made with statistical and structural methods, large-scale systems have remained refractory to dynamic model building by traditional approaches. The availability of annotated genomes enabled the reconstruction of genome-scale networks, and now the availability of high-throughput metabolomic and fluxomic data along with thermodynamic information opens the possibility to build genome-scale kinetic models. We describe here a framework for building and analyzing such models. The mathematical analysis challenges are reflected in four foundational properties, (i) the decomposition of the Jacobian matrix into chemical, kinetic and thermodynamic information, (ii) the structural similarity between the stoichiometric matrix and the transpose of the gradient matrix, (iii) the duality transformations enabling either fluxes or concentrations to serve as the independent variables and (iv) the timescale hierarchy in biological networks. Recognition and appreciation of these properties highlight notable and challenging new in silico analysis issues

    Scalable Privacy-Preserving Data Sharing Methodology for Genome-Wide Association Studies

    Full text link
    The protection of privacy of individual-level information in genome-wide association study (GWAS) databases has been a major concern of researchers following the publication of "an attack" on GWAS data by Homer et al. (2008) Traditional statistical methods for confidentiality and privacy protection of statistical databases do not scale well to deal with GWAS data, especially in terms of guarantees regarding protection from linkage to external information. The more recent concept of differential privacy, introduced by the cryptographic community, is an approach that provides a rigorous definition of privacy with meaningful privacy guarantees in the presence of arbitrary external information, although the guarantees may come at a serious price in terms of data utility. Building on such notions, Uhler et al. (2013) proposed new methods to release aggregate GWAS data without compromising an individual's privacy. We extend the methods developed in Uhler et al. (2013) for releasing differentially-private χ2\chi^2-statistics by allowing for arbitrary number of cases and controls, and for releasing differentially-private allelic test statistics. We also provide a new interpretation by assuming the controls' data are known, which is a realistic assumption because some GWAS use publicly available data as controls. We assess the performance of the proposed methods through a risk-utility analysis on a real data set consisting of DNA samples collected by the Wellcome Trust Case Control Consortium and compare the methods with the differentially-private release mechanism proposed by Johnson and Shmatikov (2013).Comment: 28 pages, 2 figures, source code available upon reques

    Molecular Genetic Influences on Normative and Problematic Alcohol Use in a Population-Based Sample of College Students

    Get PDF
    Background: Genetic factors impact alcohol use behaviors and these factors may become increasingly evident during emerging adulthood. Examination of the effects of individual variants as well as aggregate genetic variation can clarify mechanisms underlying risk. Methods: We conducted genome-wide association studies (GWAS) in an ethnically diverse sample of college students for three quantitative outcomes including typical monthly alcohol consumption, alcohol problems, and maximum number of drinks in 24 h. Heritability based on common genetic variants (h2SNP) was assessed. We also evaluated whether risk variants in aggregate were associated with alcohol use outcomes in an independent sample of young adults. Results: Two genome-wide significant markers were observed: rs11201929 in GRID1 for maximum drinks in 24 h, with supportive evidence across all ancestry groups; and rs73317305 in SAMD12 (alcohol problems), tested only in the African ancestry group. The h2SNP estimate was 0.19 (SE = 0.11) for consumption, and was non-significant for other outcomes. Genome-wide polygenic scores were significantly associated with alcohol outcomes in an independent sample. Conclusions: These results robustly identify genetic risk for alcohol use outcomes at the variant level and in aggregate. We confirm prior evidence that genetic variation in GRID1impacts alcohol use, and identify novel loci of interest for multiple alcohol outcomes in emerging adults. These findings indicate that genetic variation influencing normative and problematic alcohol use is, to some extent, convergent across ancestry groups. Studying college populations represents a promising avenue by which to obtain large, diverse samples for gene identification

    TrAp: a Tree Approach for Fingerprinting Subclonal Tumor Composition

    Full text link
    Revealing the clonal composition of a single tumor is essential for identifying cell subpopulations with metastatic potential in primary tumors or with resistance to therapies in metastatic tumors. Sequencing technologies provide an overview of an aggregate of numerous cells, rather than subclonal-specific quantification of aberrations such as single nucleotide variants (SNVs). Computational approaches to de-mix a single collective signal from the mixed cell population of a tumor sample into its individual components are currently not available. Herein we propose a framework for deconvolving data from a single genome-wide experiment to infer the composition, abundance and evolutionary paths of the underlying cell subpopulations of a tumor. The method is based on the plausible biological assumption that tumor progression is an evolutionary process where each individual aberration event stems from a unique subclone and is present in all its descendants subclones. We have developed an efficient algorithm (TrAp) for solving this mixture problem. In silico analyses show that TrAp correctly deconvolves mixed subpopulations when the number of subpopulations and the measurement errors are moderate. We demonstrate the applicability of the method using tumor karyotypes and somatic hypermutation datasets. We applied TrAp to SNV frequency profile from Exome-Seq experiment of a renal cell carcinoma tumor sample and compared the mutational profile of the inferred subpopulations to the mutational profiles of twenty single cells of the same tumor. Despite the large experimental noise, specific co-occurring mutations found in clones inferred by TrAp are also present in some of these single cells. Finally, we deconvolve Exome-Seq data from three distinct metastases from different body compartments of one melanoma patient and exhibit the evolutionary relationships of their subpopulations

    WormBase 2012: more genomes, more data, new website

    Get PDF
    Since its release in 2000, WormBase (http://www.wormbase.org) has grown from a small resource focusing on a single species and serving a dedicated research community, to one now spanning 15 species essential to the broader biomedical and agricultural research fields. To enhance the rate of curation, we have automated the identification of key data in the scientific literature and use similar methodology for data extraction. To ease access to the data, we are collaborating with journals to link entities in research publications to their report pages at WormBase. To facilitate discovery, we have added new views of the data, integrated large-scale datasets and expanded descriptions of models for human disease. Finally, we have introduced a dramatic overhaul of the WormBase website for public beta testing. Designed to balance complexity and usability, the new site is species-agnostic, highly customizable, and interactive. Casual users and developers alike will be able to leverage the public RESTful application programming interface (API) to generate custom data mining solutions and extensions to the site. We report on the growth of our database and on our work in keeping pace with the growing demand for data, efforts to anticipate the requirements of users and new collaborations with the larger science community
    corecore