39,805 research outputs found
Privacy in the Genomic Era
Genome sequencing technology has advanced at a rapid pace and it is now
possible to generate highly-detailed genotypes inexpensively. The collection
and analysis of such data has the potential to support various applications,
including personalized medical services. While the benefits of the genomics
revolution are trumpeted by the biomedical community, the increased
availability of such data has major implications for personal privacy; notably
because the genome has certain essential features, which include (but are not
limited to) (i) an association with traits and certain diseases, (ii)
identification capability (e.g., forensics), and (iii) revelation of family
relationships. Moreover, direct-to-consumer DNA testing increases the
likelihood that genome data will be made available in less regulated
environments, such as the Internet and for-profit companies. The problem of
genome data privacy thus resides at the crossroads of computer science,
medicine, and public policy. While the computer scientists have addressed data
privacy for various data types, there has been less attention dedicated to
genomic data. Thus, the goal of this paper is to provide a systematization of
knowledge for the computer science community. In doing so, we address some of
the (sometimes erroneous) beliefs of this field and we report on a survey we
conducted about genome data privacy with biomedical specialists. Then, after
characterizing the genome privacy problem, we review the state-of-the-art
regarding privacy attacks on genomic data and strategies for mitigating such
attacks, as well as contextualizing these attacks from the perspective of
medicine and public policy. This paper concludes with an enumeration of the
challenges for genome data privacy and presents a framework to systematize the
analysis of threats and the design of countermeasures as the field moves
forward
Recommended from our members
Mitigation of off-target toxicity in CRISPR-Cas9 screens for essential non-coding elements.
Pooled CRISPR-Cas9 screens are a powerful method for functionally characterizing regulatory elements in the non-coding genome, but off-target effects in these experiments have not been systematically evaluated. Here, we investigate Cas9, dCas9, and CRISPRi/a off-target activity in screens for essential regulatory elements. The sgRNAs with the largest effects in genome-scale screens for essential CTCF loop anchors in K562 cells were not single guide RNAs (sgRNAs) that disrupted gene expression near the on-target CTCF anchor. Rather, these sgRNAs had high off-target activity that, while only weakly correlated with absolute off-target site number, could be predicted by the recently developed GuideScan specificity score. Screens conducted in parallel with CRISPRi/a, which do not induce double-stranded DNA breaks, revealed that a distinct set of off-targets also cause strong confounding fitness effects with these epigenome-editing tools. Promisingly, filtering of CRISPRi libraries using GuideScan specificity scores removed these confounded sgRNAs and enabled identification of essential regulatory elements
Recommended from our members
Transcript-indexed ATAC-seq for precision immune profiling.
T cells create vast amounts of diversity in the genes that encode their T cell receptors (TCRs), which enables individual clones to recognize specific peptide-major histocompatibility complex (MHC) ligands. Here we combined sequencing of the TCR-encoding genes with assay for transposase-accessible chromatin with sequencing (ATAC-seq) analysis at the single-cell level to provide information on the TCR specificity and epigenomic state of individual T cells. By using this approach, termed transcript-indexed ATAC-seq (T-ATAC-seq), we identified epigenomic signatures in immortalized leukemic T cells, primary human T cells from healthy volunteers and primary leukemic T cells from patient samples. In peripheral blood CD4+ T cells from healthy individuals, we identified cis and trans regulators of naive and memory T cell states and found substantial heterogeneity in surface-marker-defined T cell populations. In patients with a leukemic form of cutaneous T cell lymphoma, T-ATAC-seq enabled identification of leukemic and nonleukemic regulatory pathways in T cells from the same individual by allowing separation of the signals that arose from the malignant clone from the background T cell noise. Thus, T-ATAC-seq is a new tool that enables analysis of epigenomic landscapes in clonal T cells and should be valuable for studies of T cell malignancy, immunity and immunotherapy
Recommended from our members
Common DNA sequence variation influences 3-dimensional conformation of the human genome.
BACKGROUND:The 3-dimensional (3D) conformation of chromatin inside the nucleus is integral to a variety of nuclear processes including transcriptional regulation, DNA replication, and DNA damage repair. Aberrations in 3D chromatin conformation have been implicated in developmental abnormalities and cancer. Despite the importance of 3D chromatin conformation to cellular function and human health, little is known about how 3D chromatin conformation varies in the human population, or whether DNA sequence variation between individuals influences 3D chromatin conformation. RESULTS:To address these questions, we perform Hi-C on lymphoblastoid cell lines from 20 individuals. We identify thousands of regions across the genome where 3D chromatin conformation varies between individuals and find that this variation is often accompanied by variation in gene expression, histone modifications, and transcription factor binding. Moreover, we find that DNA sequence variation influences several features of 3D chromatin conformation including loop strength, contact insulation, contact directionality, and density of local cis contacts. We map hundreds of quantitative trait loci associated with 3D chromatin features and find evidence that some of these same variants are associated at modest levels with other molecular phenotypes as well as complex disease risk. CONCLUSION:Our results demonstrate that common DNA sequence variants can influence 3D chromatin conformation, pointing to a more pervasive role for 3D chromatin conformation in human phenotypic variation than previously recognized
Formulating genome-scale kinetic models in the post-genome era.
The biological community is now awash in high-throughput data sets and is grappling with the challenge of integrating disparate data sets. Such integration has taken the form of statistical analysis of large data sets, or through the bottom-up reconstruction of reaction networks. While progress has been made with statistical and structural methods, large-scale systems have remained refractory to dynamic model building by traditional approaches. The availability of annotated genomes enabled the reconstruction of genome-scale networks, and now the availability of high-throughput metabolomic and fluxomic data along with thermodynamic information opens the possibility to build genome-scale kinetic models. We describe here a framework for building and analyzing such models. The mathematical analysis challenges are reflected in four foundational properties, (i) the decomposition of the Jacobian matrix into chemical, kinetic and thermodynamic information, (ii) the structural similarity between the stoichiometric matrix and the transpose of the gradient matrix, (iii) the duality transformations enabling either fluxes or concentrations to serve as the independent variables and (iv) the timescale hierarchy in biological networks. Recognition and appreciation of these properties highlight notable and challenging new in silico analysis issues
Scalable Privacy-Preserving Data Sharing Methodology for Genome-Wide Association Studies
The protection of privacy of individual-level information in genome-wide
association study (GWAS) databases has been a major concern of researchers
following the publication of "an attack" on GWAS data by Homer et al. (2008)
Traditional statistical methods for confidentiality and privacy protection of
statistical databases do not scale well to deal with GWAS data, especially in
terms of guarantees regarding protection from linkage to external information.
The more recent concept of differential privacy, introduced by the
cryptographic community, is an approach that provides a rigorous definition of
privacy with meaningful privacy guarantees in the presence of arbitrary
external information, although the guarantees may come at a serious price in
terms of data utility. Building on such notions, Uhler et al. (2013) proposed
new methods to release aggregate GWAS data without compromising an individual's
privacy. We extend the methods developed in Uhler et al. (2013) for releasing
differentially-private -statistics by allowing for arbitrary number of
cases and controls, and for releasing differentially-private allelic test
statistics. We also provide a new interpretation by assuming the controls' data
are known, which is a realistic assumption because some GWAS use publicly
available data as controls. We assess the performance of the proposed methods
through a risk-utility analysis on a real data set consisting of DNA samples
collected by the Wellcome Trust Case Control Consortium and compare the methods
with the differentially-private release mechanism proposed by Johnson and
Shmatikov (2013).Comment: 28 pages, 2 figures, source code available upon reques
Molecular Genetic Influences on Normative and Problematic Alcohol Use in a Population-Based Sample of College Students
Background: Genetic factors impact alcohol use behaviors and these factors may become increasingly evident during emerging adulthood. Examination of the effects of individual variants as well as aggregate genetic variation can clarify mechanisms underlying risk.
Methods: We conducted genome-wide association studies (GWAS) in an ethnically diverse sample of college students for three quantitative outcomes including typical monthly alcohol consumption, alcohol problems, and maximum number of drinks in 24 h. Heritability based on common genetic variants (h2SNP) was assessed. We also evaluated whether risk variants in aggregate were associated with alcohol use outcomes in an independent sample of young adults.
Results: Two genome-wide significant markers were observed: rs11201929 in GRID1 for maximum drinks in 24 h, with supportive evidence across all ancestry groups; and rs73317305 in SAMD12 (alcohol problems), tested only in the African ancestry group. The h2SNP estimate was 0.19 (SE = 0.11) for consumption, and was non-significant for other outcomes. Genome-wide polygenic scores were significantly associated with alcohol outcomes in an independent sample.
Conclusions: These results robustly identify genetic risk for alcohol use outcomes at the variant level and in aggregate. We confirm prior evidence that genetic variation in GRID1impacts alcohol use, and identify novel loci of interest for multiple alcohol outcomes in emerging adults. These findings indicate that genetic variation influencing normative and problematic alcohol use is, to some extent, convergent across ancestry groups. Studying college populations represents a promising avenue by which to obtain large, diverse samples for gene identification
TrAp: a Tree Approach for Fingerprinting Subclonal Tumor Composition
Revealing the clonal composition of a single tumor is essential for
identifying cell subpopulations with metastatic potential in primary tumors or
with resistance to therapies in metastatic tumors. Sequencing technologies
provide an overview of an aggregate of numerous cells, rather than
subclonal-specific quantification of aberrations such as single nucleotide
variants (SNVs). Computational approaches to de-mix a single collective signal
from the mixed cell population of a tumor sample into its individual components
are currently not available. Herein we propose a framework for deconvolving
data from a single genome-wide experiment to infer the composition, abundance
and evolutionary paths of the underlying cell subpopulations of a tumor. The
method is based on the plausible biological assumption that tumor progression
is an evolutionary process where each individual aberration event stems from a
unique subclone and is present in all its descendants subclones. We have
developed an efficient algorithm (TrAp) for solving this mixture problem. In
silico analyses show that TrAp correctly deconvolves mixed subpopulations when
the number of subpopulations and the measurement errors are moderate. We
demonstrate the applicability of the method using tumor karyotypes and somatic
hypermutation datasets. We applied TrAp to SNV frequency profile from Exome-Seq
experiment of a renal cell carcinoma tumor sample and compared the mutational
profile of the inferred subpopulations to the mutational profiles of twenty
single cells of the same tumor. Despite the large experimental noise, specific
co-occurring mutations found in clones inferred by TrAp are also present in
some of these single cells. Finally, we deconvolve Exome-Seq data from three
distinct metastases from different body compartments of one melanoma patient
and exhibit the evolutionary relationships of their subpopulations
WormBase 2012: more genomes, more data, new website
Since its release in 2000, WormBase (http://www.wormbase.org) has grown from a small resource focusing on a single species and serving a dedicated research community, to one now spanning 15 species essential to the broader biomedical and agricultural research fields. To enhance the rate of curation, we have automated the identification of key data in the scientific literature and use similar methodology for data extraction. To ease access to the data, we are collaborating with journals to link entities in research publications to their report pages at WormBase. To facilitate discovery, we have added new views of the data, integrated large-scale datasets and expanded descriptions of models for human disease. Finally, we have introduced a dramatic overhaul of the WormBase website for public beta testing. Designed to balance complexity and usability, the new site is species-agnostic, highly customizable, and interactive. Casual users and developers alike will be able to leverage the public RESTful application programming interface (API) to generate custom data mining solutions and extensions to the site. We report on the growth of our database and on our work in keeping pace with the growing demand for data, efforts to anticipate the requirements of users and new collaborations with the larger science community
- …