116 research outputs found

    Haplotype estimation in polyploids using DNA sequence data

    Get PDF
    Polyploid organisms possess more than two copies of their core genome and therefore contain k>2 haplotypes for each set of ordered genomic variants. Polyploidy occurs often within the plant kingdom, among others in important corps such as potato (k=4) and wheat (k=6). Current sequencing technologies enable us to read the DNA and detect genomic variants, but cannot distinguish between the copies of the genome, each inherited from one of the parents. To detect inheritance patterns in populations, it is necessary to know the haplotypes, as alleles that are in linkage over the same chromosome tend to be inherited together. In this work, we develop mathematical optimisation algorithms to indirectly estimate haplotypes by looking into overlaps between the sequence reads of an individual, as well as into the expected inheritance of the alleles in a population. These algorithm deal with sequencing errors and random variations in the counts of reads observed from each haplotype. These methods are therefore of high importance for studying the genetics of polyploid crops. </p

    Haplotype-aware Diplotyping from Noisy Long Reads

    No full text

    Genome Diversity and the Origin of the Arabian Horse

    Get PDF
    The Arabian horse, one of the world\u27s oldest breeds of any domesticated animal, is characterized by natural beauty, graceful movement, athletic endurance, and, as a result of its development in the arid Middle East, the ability to thrive in a hot, dry environment. Here we studied 378 Arabian horses from 12 countries using equine single nucleotide polymorphism (SNP) arrays and whole-genome re-sequencing to examine hypotheses about genomic diversity, population structure, and the relationship of the Arabian to other horse breeds. We identified a high degree of genetic variation and complex ancestry in Arabian horses from the Middle East region. Also, contrary to popular belief, we could detect no significant genomic contribution of the Arabian breed to the Thoroughbred racehorse, including Y chromosome ancestry. However, we found strong evidence for recent interbreeding of Thoroughbreds with Arabians used for flat-racing competitions. Genetic signatures suggestive of selective sweeps across the Arabian breed contain candidate genes for combating oxidative damage during exercise, and within the Straight Egyptian subgroup, for facial morphology. Overall, our data support an origin of the Arabian horse in the Middle East, no evidence for reduced global genetic diversity across the breed, and unique genetic adaptations for both physiology and conformation

    Computational haplotyping : theory and practice

    Get PDF
    Genomics has paved a new way to comprehend life and its evolution, and also to investigate causes of diseases and their treatment. One of the important problems in genomic analyses is haplotype assembly. Constructing complete and accurate haplotypes plays an essential role in understanding population genetics and how species evolve. In this thesis, we focus on computational approaches to haplotype assembly from third generation sequencing technologies. This involves huge amounts of sequencing data, and such data contain errors due to the single molecule sequencing protocols employed. Taking advantage of combinatorial formulations helps to correct for these errors to solve the haplotyping problem. Various computational techniques such as dynamic programming, parameterized algorithms, and graph algorithms are used to solve this problem. This thesis presents several contributions concerning the area of haplotyping. First, a novel algorithm based on dynamic programming is proposed to provide approximation guarantees for phasing a single individual. Second, an integrative approach is introduced to combining multiple sequencing datasets to generating complete and accurate haplotypes. The effectiveness of this integrative approach is demonstrated on a real human genome. Third, we provide a novel efficient approach to phasing pedigrees and demonstrate its advantages in comparison to phasing a single individual. Fourth, we present a generalized graph-based framework for performing haplotype-aware de novo assembly. Specifically, this generalized framework consists of a hybrid pipeline for generating accurate and complete haplotypes from data stemming from multiple sequencing technologies, one that provides accurate reads and other that provides long reads.Die Genomik hat neue Wege eröffnet, die es ermöglichen, die Evolution lebendiger Organismen zu verstehen, sowie die Ursachen zahlreicher Krankheiten zu erforschen und neue Therapien zu entwickeln. Ein wichtiges Problem ist die Assemblierung der Haplotypen eines Individuums. Diese Rekonstruktion von Haplotypen spielt eine zentrale Rolle für das Verständnis der Populationsgenetik und der Evolution einer Spezies. In der vorliegenden Arbeit werden Algorithmen zur Assemblierung von Haplotypen vorgestellt, die auf Sequenzierdaten der dritten Generation basieren. Dies erfordert große Mengen an Daten, welche wiederum Fehler enthalten, die die zugrunde liegenden Sequenzierprotokolle hervorbringen. Durch kombinatorische Formulierungen des Problems ist die Rekonstruktion von Haplotypen dennoch möglich, da Fehler erfolgreich korrigiert werden können. Verschiedene informatische Methoden, wie dynamische Programmierung, parametrisierte Algorithmen und Graph Algorithmen können verwendet werden, um dieses Problem zu lösen. Die vorliegende Arbeit stellt mehrere Lösungsansätze für die Rekonstruktion von Haplotypen vor. Als erstes wird ein neuartiger Algorithmus vorgestellt, der basierend auf dem Prinzip der dynamischen Programmierung Approximationsgarantien für das Haplotyping eines einzelnen Individuums liefert. Als zweites wird ein integrativer Ansatz präsentiert, um mehrere Sequenzierdatensätze zu kombinieren und somit akkurate Haplotypen zu generieren. Die Effektivität dieser Methode wird auf einem echten, menschlichen Datensatz demonstriert. Als drittes wird ein neuer, effzienter Algorithmus beschrieben, um Haplotypen verwandter Individuen simultan zu konstruieren und die Vorteile gegenüber der Betrachtung einzelner Individuen aufgezeigt. Als viertes präsentieren wir eine Graph-basierte Methode um mittels Haplotypinformation de-novo Assemblierung durchzuführen. Dieser Methode kombiniert Daten stammend von verschiedenen Sequenziertechnologien, welche entweder genaue oder aber lange Sequenzierreads liefern

    Benchmarking phasing software with a whole-genome sequenced cattle pedigree.

    Full text link
    peer reviewed[en] BACKGROUND: Accurate haplotype reconstruction is required in many applications in quantitative and population genomics. Different phasing methods are available but their accuracy must be evaluated for samples with different properties (population structure, marker density, etc.). We herein took advantage of whole-genome sequence data available for a Holstein cattle pedigree containing 264 individuals, including 98 trios, to evaluate several population-based phasing methods. This data represents a typical example of a livestock population, with low effective population size, high levels of relatedness and long-range linkage disequilibrium. RESULTS: After stringent filtering of our sequence data, we evaluated several population-based phasing programs including one or more versions of AlphaPhase, ShapeIT, Beagle, Eagle and FImpute. To that end we used 98 individuals having both parents sequenced for validation. Their haplotypes reconstructed based on Mendelian segregation rules were considered the gold standard to assess the performance of population-based methods in two scenarios. In the first one, only these 98 individuals were phased, while in the second one, all the 264 sequenced individuals were phased simultaneously, ignoring the pedigree relationships. We assessed phasing accuracy based on switch error counts (SEC) and rates (SER), lengths of correctly phased haplotypes and the probability that there is no phasing error between a pair of SNPs as a function of their distance. For most evaluated metrics or scenarios, the best software was either ShapeIT4.1 or Beagle5.2, both methods resulting in particularly high phasing accuracies. For instance, ShapeIT4.1 achieved a median SEC of 50 per individual and a mean haplotype block length of 24.1 Mb (scenario 2). These statistics are remarkable since the methods were evaluated with a map of 8,400,000 SNPs, and this corresponds to only one switch error every 40,000 phased informative markers. When more relatives were included in the data (scenario 2), FImpute3.0 reconstructed extremely long segments without errors. CONCLUSIONS: We report extremely high phasing accuracies in a typical livestock sample. ShapeIT4.1 and Beagle5.2 proved to be the most accurate, particularly for phasing long segments and in the first scenario. Nevertheless, most tools achieved high accuracy at short distances and would be suitable for applications requiring only local haplotypes

    Direct chromosome-length haplotyping by single-cell sequencing

    Get PDF
    Haplotypes are fundamental to fully characterize the diploid genome of an individual, yet methods to directly chart the unique genetic makeup of each parental chromosome are lacking. Here we introduce single-cell DNA template strand sequencing (Strand-seq) as a novel approach to phasing diploid genomes along the entire length of all chromosomes. We demonstrate this by building a complete haplotype for a HapMap individual (NA12878) at high accuracy (concordance 99.3%), without using generational information or statistical inference. By use of this approach, we mapped all meiotic recombination events in a family trio with high resolution (median range ∼14 kb) and phased larger structural variants like deletions, indels, and balanced rearrangements like inversions. Lastly, the single-cell resolution of Strand-seq allowed us to observe loss of heterozygosity regions in a small number of cells, a significant advantage for studies of heterogeneous cell populations, such as cancer cells. We conclude that Strand-seq is a unique and powerful approach to completely phase individual genomes and map inheritance patterns in families, while preserving haplotype differences between single cells

    Genomic insights into fine-scale recombination variation in adaptively diverging threespine stickleback fish (Gasterosteus aculeatus)

    Get PDF
    Meiotic recombination is one of the major molecular mechanisms generating genetic diversity and influencing genome evolution. By shuffling allelic combinations, it can directly influence the patterns and efficacy of natural selection. Studies in various organisms have shown that the rate and placement of recombination varies substantially within the genome, among individuals, between sexes and among different species. It is hypothesized that this variation plays an important role in genome evolution. In this PhD thesis, I investigated the extent and molecular basis of recombination variation in adaptively diverging threespine stickleback fish (Gasterosteus aculeatus) to further understand its evolutionary implications. I used both ChIP-sequencing and whole genome sequencing of pedigrees to empirically identify and quantify double strand breaks (DSBs) and meiotic crossovers (COs). Whole genome sequencing of large nuclear families was performed to identify meiotic crossovers in 36 individuals of diverging marine and freshwater ecotypes and their hybrids. This produced the first genome-wide high-resolution sex-specific and ecotype-specific map of contemporary recombination events in sticklebacks. The results show striking differences in crossover number and placement between sexes. Females recombine nearly 1.76 times more than males and their COs are distributed all over the chromosome while male COs predominantly occur near the chromosomal periphery. When compared among ecotypes a significant reduction in overall recombination rate was observed in hybrid females compared to pure forms. Even though the known loci underlying marine-freshwater adaptive divergence tend to fall in regions of low recombination, considerable female recombination is observed in the regions between adaptive loci. This suggests that the sexual dimorphism in recombination phenotype may have important evolutionary implications. At the fine-scale, COs and male DSBs are nonrandomly distributed involving ‘semi-hot’ hotspots and coldspots of recombination. I report a significant association of male DSBs and COs with functionally active open chromatin regions like gene promoters, whereas female COs did not show an association more than expected by chance. However, a considerable number of COs and DSBs away from any of the tested open chromatin marks suggests possibility of additional novel mechanisms of recombination regulation in sticklebacks. In addition, we developed a novel method for constructing individualized recombination maps from pooled gamete DNA using linked read sequencing technology by 10X Genomics®. We tested the method by contrasting recombination profiles of gametic and somatic tissue from a hybrid mouse and stickleback fish. Our pipeline faithfully detects previously described recombination hotspots in mice at high resolution and identify many novel hotspots across the genome in both species and thereby demonstrate the efficiency of the novel method. This method could be employed for large scale QTL mapping studies to further understand the genetic basis of recombination variation reported in this thesis. By bridging the gap between natural populations and lab organisms with large clutch sizes and tractable genetic tools, this work shows the utility of the stickleback system and provides important groundwork for further studies of heterochiasmy and divergence in recombination during adaptation to differing environments
    • …
    corecore