11 research outputs found

    Tools and statistical approaches for integrating DNA sequencing into clinical care

    Get PDF
    The discovery of DNA fundamentally changed the world, revolutionizing our understanding of life and the practice of medicine. After a century of studying DNA, medicine entered a new frontier with the completion of the nearly 20-year billion-dollar effort to sequence the first human genome. We can now sequence a human genome in a matter of days for hundreds (not billions) of dollars. Technological advances and medical geneticists’ robust efforts to interpret human variation have led to exponential clinical sequencing growth. The medical genetics community currently faces three primary challenges: (1) variant interpretation; (2) overcoming difficult detection problems (e.g., structural variants and low-frequency variants); (3) moving beyond a linear poorly-representative reference genome. The work herein addresses how to overcome two specific detection problems. First, I present a novel approach for detecting exon-level copy number variation using exome sequencing. The vast majority of available sequencing collected lacks the power to detect small copy number variants, leading to a significant blind spot in our understanding of genetic variation. I demonstrate how modifying the exome capture step to capture multiple samples simultaneously significantly reduces the inter-sample variance and improves copy number discrimination. I then demonstrate the utility of a novel statistical algorithm specifically for multiplexed-capture exome sequencing. Second, I outline the shortcomings of noninvasive exome sequencing in prenatal genetics. Utilizing cell-free fetal DNA in maternal circulation, we can diagnose a wide range of genetic conditions noninvasively. Efforts have suggested the possibility of noninvasive fetal genome and exome sequencing. However, to-date, no one has demonstrated accurate fetal genotyping purely from cell-free DNA. I use probability theory to demonstrate why efforts have failed, and suggest a path forward for noninvasive fetal genotyping. Finally, I briefly outline my ongoing work in prenatal genetics and propose a validation study to further interrogate exon-level copy number variation.Doctor of Philosoph

    Projected tt-SNE for batch correction

    Get PDF
    Biomedical research often produces high-dimensional data confounded by batch effects such as systematic experimental variations, different protocols and subject identifiers. Without proper correction, low-dimensional representation of high-dimensional data might encode and reproduce the same systematic variations observed in the original data, and compromise the interpretation of the results. In this article, we propose a novel procedure to remove batch effects from low-dimensional embeddings obtained with t-SNE dimensionality reduction. The proposed methods are based on linear algebra and constrained optimization, leading to efficient algorithms and fast computation in many high-dimensional settings. Results on artificial single-cell transcription profiling data show that the proposed procedure successfully removes multiple batch effects from t-SNE embeddings, while retaining fundamental information on cell types. When applied to single-cell gene expression data to investigate mouse medulloblastoma, the proposed method successfully removes batches related with mice identifiers and the date of the experiment, while preserving clusters of oligodendrocytes, astrocytes, and endothelial cells and microglia, which are expected to lie in the stroma within or adjacent to the tumors.Comment: 16 pages, 3 figure

    Integrated Model of Chemical Perturbations of a Biological Pathway Using 18 In Vitro High Throughput Screening Assays for the Estrogen Receptor

    Get PDF
    We demonstrate a computational network model that integrates 18 in vitro, high-throughput screening assays measuring estrogen receptor (ER) binding, dimerization, chromatin binding, transcriptional activation and ER-dependent cell proliferation. The network model uses activity patterns across the in vitro assays to predict whether a chemical is an ER agonist or antagonist, or is otherwise influencing the assays through a manner dependent on the physics and chemistry of the technology platform (“assay interference”). The method is applied to a library of 1812 commercial and environmental chemicals, including 45 ER positive and negative reference chemicals. Among the reference chemicals, the network model correctly identified the agonists and antagonists with the exception of very weak compounds whose activity was outside the concentration range tested. The model agonist score also correlated with the expected potency class of the active reference chemicals. Of the 1812 chemicals evaluated, 111 (6.1%) were predicted to be strongly ER active in agonist or antagonist mode. This dataset and model were also used to begin a systematic investigation of assay interference. The most prominent cause of false-positive activity (activity in an assay that is likely not due to interaction of the chemical with ER) is cytotoxicity. The model provides the ability to prioritize a large set of important environmental chemicals with human exposure potential for additional in vivo endocrine testing. Finally, this model is generalizable to any molecular pathway for which there are multiple upstream and downstream assays available

    Falciparum malaria from coastal Tanzania and Zanzibar remains highly connected despite effective control efforts on the archipelago

    No full text
    Background: Tanzania's Zanzibar archipelago has made significant gains in malaria control over the last decade and is a target for malaria elimination. Despite consistent implementation of effective tools since 2002, elimination has not been achieved. Importation of parasites from outside of the archipelago is thought to be an important cause of malaria's persistence, but this paradigm has not been studied using modern genetic tools. Methods: Whole-genome sequencing (WGS) was used to investigate the impact of importation, employing population genetic analyses of Plasmodium falciparum isolates from both the archipelago and mainland Tanzania. Ancestry, levels of genetic diversity and differentiation, patterns of relatedness, and patterns of selection between these two populations were assessed by leveraging recent advances in deconvolution of genomes from polyclonal malaria infections. Results: Significant decreases in the effective population sizes were inferred in both populations that coincide with a period of decreasing malaria transmission in Tanzania. Identity by descent analysis showed that parasites in the two populations shared long segments of their genomes, on the order of 5 cM, suggesting shared ancestry within the last 10 generations. Even with limited sampling, two of isolates between the mainland and Zanzibar were identified that are related at the expected level of half-siblings, consistent with recent importation. Conclusions: These findings suggest that importation plays an important role for malaria incidence on Zanzibar and demonstrate the value of genomic approaches for identifying corridors of parasite movement to the island

    Predictive Endocrine Testing in the 21st Century Using <i>in Vitro</i> Assays of Estrogen Receptor Signaling Responses

    No full text
    Thousands of environmental chemicals are subject to regulatory review for their potential to be endocrine disruptors (ED). <i>In vitro</i> high-throughput screening (HTS) assays have emerged as a potential tool for prioritizing chemicals for ED-related whole-animal tests. In this study, 1814 chemicals including pesticide active and inert ingredients, industrial chemicals, food additives, and pharmaceuticals were evaluated in a panel of 13 <i>in vitro</i> HTS assays. The panel of <i>in vitro</i> assays interrogated multiple end points related to estrogen receptor (ER) signaling, namely binding, agonist, antagonist, and cell growth responses. The results from the <i>in vitro</i> assays were used to create an ER Interaction Score. For 36 reference chemicals, an ER Interaction Score >0 showed 100% sensitivity and 87.5% specificity for classifying potential ER activity. The magnitude of the ER Interaction Score was significantly related to the potency classification of the reference chemicals (<i>p</i> < 0.0001). ERα/ERβ selectivity was also evaluated, but relatively few chemicals showed significant selectivity for a specific isoform. When applied to a broader set of chemicals with <i>in vivo</i> uterotrophic data, the ER Interaction Scores showed 91% sensitivity and 65% specificity. Overall, this study provides a novel method for combining <i>in vitro</i> concentration response data from multiple assays and, when applied to a large set of ER data, accurately predicted estrogenic responses and demonstrated its utility for chemical prioritization
    corecore