635 research outputs found

    A cautionary tale of low-pass sequencing and imputation with respect to haplotype accuracy

    Get PDF
    BACKGROUND: Low-pass whole-genome sequencing and imputation offer significant cost savings, enabling substantial increases in sample size and statistical power. This approach is particularly promising in livestock breeding, providing an affordable means of screening individuals for deleterious alleles or calculating genomic breeding values. Consequently, it may also be of value in companion animal genomics to support pedigree breeding. We sought to evaluate in dogs the impact of low coverage sequencing and reference-guided imputation on genotype concordance and association analyses.RESULTS: DNA isolated from saliva of 30 Labrador retrievers was sequenced at low (0.9X and 3.8X) and high (43.5X) coverage, and down-sampled from 43.5X to 9.6X and 17.4X. Genotype imputation was performed using a diverse reference panel (1021 dogs), and two subsets of the former panel (256 dogs each) where one had an excess of Labrador retrievers relative to other breeds. We observed little difference in imputed genotype concordance between reference panels. Association analyses for a locus acting as a disease proxy were performed using single-marker (GEMMA) and haplotype-based (XP-EHH) tests. GEMMA results were highly correlated (r ≥ 0.97) between 43.5X and ≥ 3.8X depths of coverage, while for 0.9X the correlation was lower (r ≤ 0.8). XP-EHH results were less well correlated, with r ranging from 0.58 (0.9X) to 0.88 (17.4X). Across a random sample of 10,000 genomic regions averaging 17 kb in size, we observed a median of three haplotypes per dog across the sequencing depths, with 5% of the regions returning more than eight haplotypes. Inspection of one such region revealed genotype and phasing inconsistencies across sequencing depths.CONCLUSIONS: We demonstrate that saliva-derived canine DNA is suitable for whole-genome sequencing, highlighting the feasibility of client-based sampling. Low-pass sequencing and imputation require caution as incorrect allele assignments result when the subject possesses alleles that are absent in the reference panel. Larger panels have the capacity for greater allelic diversity, which should reduce the potential for imputation error. Although low-pass sequencing can accurately impute allele dosage, we highlight issues with phasing accuracy that impact haplotype-based analyses. Consequently, if accurately phased genotypes are required for analyses, we advocate sequencing at high depth (&gt; 20X).</p

    Computational pan-genomics: status, promises and challenges

    Get PDF
    International audienceMany disciplines, from human genetics and oncology to plant breeding, microbiology and virology, commonly face the challenge of analyzing rapidly increasing numbers of genomes. In case of Homo sapiens, the number of sequenced genomes will approach hundreds of thousands in the next few years. Simply scaling up established bioinformatics pipelines will not be sufficient for leveraging the full potential of such rich genomic data sets. Instead, novel, qualitatively different computational methods and paradigms are needed. We will witness the rapid extension of computational pan-genomics, a new sub-area of research in computational biology. In this article, we generalize existing definitions and understand a pan-genome as any collection of genomic sequences to be analyzed jointly or to be used as a reference. We examine already available approaches to construct and use pan-genomes, discuss the potential benefits of future technologies and methodologies and review open challenges from the vantage point of the above-mentioned biological disciplines. As a prominent example for a computational paradigm shift, we particularly highlight the transition from the representation of reference genomes as strings to representations as graphs. We outline how this and other challenges from different application domains translate into common computational problems, point out relevant bioinformatics techniques and identify open problems in computer science. With this review, we aim to increase awareness that a joint approach to computational pan-genomics can help address many of the problems currently faced in various domains

    Demographic and Population Separation History Inference Based on Whole Genome Sequences.

    Full text link
    Patterns of DNA sequence variation among present day individuals contain rich information about past population history. The recent availability of whole genome sequences provides challenges and opportunities for developing computational methods to infer detailed models of population history. The goal of this thesis is to extend current methodology and apply available techniques to answer questions about population history in human, gorilla and canine species. Recent methodologies based on the sequentially Markovian coalescent model permit the inference of population history using single or several whole genome sequences. However, these approaches fail to generate parametric estimates for split times, which are confounded by subsequent migration. Additionally, the effect of switch errors resulted from statistical phasing on split time estimation is largely unknown. We reconstructed phased haplotypes of nine individuals from diverse populations using fosmid pool sequencing. We analyzed population size and separation history using the Pairwise Sequentially Markovian Coalescent model (PSMC) and Multiple Sequentially Markovian Coalescent model (MSMC) and found that applying MSMC on statistically phased haplotypes results in more recent split time estimation compared with physically phased haplotypes due to switch errors. We further extended PSMC with Approximate Bayesian Computation to infer split time and migration rates under a standard isolation with migration model. We dated several key events in human separation history using these methods. Gorillas are human’s closet living relatives other than chimpanzees. We analyzed whole genome sequencing data of thirteen gorilla individuals and applied GPhoCS, a Bayesian coalescent-based approach to infer ancestral population sizes, divergence times and migration rates amongst three gorilla subspecies, shedding light on the evolutionary forces that have uniquely influenced patterns of gorilla genetic variation. The origins and dynamics of dog domestication has been a controversial and intriguing problem. We analyzed two ancient dog genomes from the Neolithic and over 100 contemporary canine genomes. While both dogs show signatures of admixture, they predominantly share ancestry with modern European dogs, contradicting a late Neolithic population replacement suggested by mitochondrial studies. By calibrating the mutation rate using our oldest dog, we narrowed the timing of dog domestication to a window of 20-40 kyrs ago.PhDBioinformaticsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/133341/1/songsy_1.pd

    Computational pan-genomics: status, promises and challenges

    Get PDF

    Computational pan-genomics: status, promises and challenges

    Get PDF

    Computational pan-genomics: status, promises and challenges

    Get PDF
    • …
    corecore