854 research outputs found

    Analysis of genome-wide structure, diversity and fine mapping of Mendelian traits in traditional and village chickens

    Get PDF
    Extensive phenotypic variation is a common feature among village chickens found throughout much of the developing world, and in traditional chicken breeds that have been artificially selected for traits such as plumage variety. We present here an assessment of traditional and village chicken populations, for fine mapping of Mendelian traits using genome-wide single-nucleotide polymorphism (SNP) genotyping while providing information on their genetic structure and diversity. Bayesian clustering analysis reveals two main genetic backgrounds in traditional breeds, Kenyan, Ethiopian and Chilean village chickens. Analysis of linkage disequilibrium (LD) reveals useful LD (r(2)⩾0.3) in both traditional and village chickens at pairwise marker distances of ∼10 Kb; while haplotype block analysis indicates a median block size of 11–12 Kb. Association mapping yielded refined mapping intervals for duplex comb (Gga 2:38.55–38.89 Mb) and rose comb (Gga 7:18.41–22.09 Mb) phenotypes in traditional breeds. Combined mapping information from traditional breeds and Chilean village chicken allows the oocyan phenotype to be fine mapped to two small regions (Gga 1:67.25–67.28 Mb, Gga 1:67.28–67.32 Mb) totalling ∼75 Kb. Mapping the unmapped earlobe pigmentation phenotype supports previous findings that the trait is sex-linked and polygenic. A critical assessment of the number of SNPs required to map simple traits indicate that between 90 and 110K SNPs are required for full genome-wide analysis of haplotype block structure/ancestry, and for association mapping in both traditional and village chickens. Our results demonstrate the importance and uniqueness of phenotypic diversity and genetic structure of traditional chicken breeds for fine-scale mapping of Mendelian traits in the species, with village chicken populations providing further opportunities to enhance mapping resolutions

    Read-based Phasing of Related Individuals

    No full text
    Motivation: Read-based phasing deduces the haplotypes of an individual from sequencing reads that cover multiple variants, while genetic phasing takes only genotypes as input and applies the rules of Mendelian inheritance to infer haplotypes within a pedigree of individuals. Combining both into an approach that uses these two independent sources of information—reads and pedigree—has the potential to deliver results better than each individually. Results: We provide a theoretical framework combining read-based phasing with genetic haplotyping, and describe a fixed-parameter algorithm and its implementation for finding an optimal solution. We show that leveraging reads of related individuals jointly in this way yields more phased variants and at a higher accuracy than when phased separately, both in simulated and real data. Coverages as low as 2× for each member of a trio yield haplotypes that are as accurate as when analyzed separately at 15× coverage per individual. Availability and Implementation: https://bitbucket.org/whatshap/whatshap Contact: [email protected]

    HapTree: A Novel Bayesian Framework for Single Individual Polyplotyping Using NGS Data

    Get PDF
    As the more recent next-generation sequencing (NGS) technologies provide longer read sequences, the use of sequencing datasets for complete haplotype phasing is fast becoming a reality, allowing haplotype reconstruction of a single sequenced genome. Nearly all previous haplotype reconstruction studies have focused on diploid genomes and are rarely scalable to genomes with higher ploidy. Yet computational investigations into polyploid genomes carry great importance, impacting plant, yeast and fish genomics, as well as the studies of the evolution of modern-day eukaryotes and (epi)genetic interactions between copies of genes. In this paper, we describe a novel maximum-likelihood estimation framework, HapTree, for polyploid haplotype assembly of an individual genome using NGS read datasets. We evaluate the performance of HapTree on simulated polyploid sequencing read data modeled after Illumina sequencing technologies. For triploid and higher ploidy genomes, we demonstrate that HapTree substantially improves haplotype assembly accuracy and efficiency over the state-of-the-art; moreover, HapTree is the first scalable polyplotyping method for higher ploidy. As a proof of concept, we also test our method on real sequencing data from NA12878 (1000 Genomes Project) and evaluate the quality of assembled haplotypes with respect to trio-based diplotype annotation as the ground truth. The results indicate that HapTree significantly improves the switch accuracy within phased haplotype blocks as compared to existing haplotype assembly methods, while producing comparable minimum error correction (MEC) values. A summary of this paper appears in the proceedings of the RECOMB 2014 conference, April 2–5.National Science Foundation (U.S.) (NSF/NIH BIGDATA Grant R01GM108348-01)National Science Foundation (U.S.) (Graduate Research Fellowship)Simons Foundatio

    Haplotype-aware Diplotyping from Noisy Long Reads

    No full text

    Islands of linkage in an ocean of pervasive recombination reveals two-speed evolution of human cytomegalovirus genomes

    Get PDF
    Human cytomegalovirus (HCMV) infects most of the population worldwide, persisting throughout the host's life in a latent state with periodic episodes of reactivation. While typically asymptomatic, HCMV can cause fatal disease among congenitally infected infants and immunocompromised patients. These clinical issues are compounded by the emergence of antiviral resistance and the absence of an effective vaccine, the development of which is likely complicated by the numerous immune evasins encoded by HCMV to counter the host's adaptive immune responses, a feature that facilitates frequent super-infections. Understanding the evolutionary dynamics of HCMV is essential for the development of effective new drugs and vaccines. By comparing viral genomes from uncultivated or low-passaged clinical samples of diverse origins, we observe evidence of frequent homologous recombination events, both recent and ancient, and no structure of HCMV genetic diversity at the whole-genome scale. Analysis of individual gene-scale loci reveals a striking dichotomy: while most of the genome is highly conserved, recombines essentially freely and has evolved under purifying selection, 21 genes display extreme diversity, structured into distinct genotypes that do not recombine with each other. Most of these hyper-variable genes encode glycoproteins involved in cell entry or escape of host immunity. Evidence that half of them have diverged through episodes of intense positive selection suggests that rapid evolution of hyper-variable loci is likely driven by interactions with host immunity. It appears that this process is enabled by recombination unlinking hyper-variable loci from strongly constrained neighboring sites. It is conceivable that viral mechanisms facilitating super-infection have evolved to promote recombination between diverged genotypes, allowing the virus to continuously diversify at key loci to escape immune detection, while maintaining a genome optimally adapted to its asymptomatic infectious lifecycle

    VeChat: correcting errors in long reads using variation graphs

    Get PDF
    Error correction is the canonical first step in long-read sequencing data analysis. Current self-correction methods, however, are affected by consensus sequence induced biases that mask true variants in haplotypes of lower frequency showing in mixed samples. Unlike consensus sequence templates, graph-based reference systems are not affected by such biases, so do not mistakenly mask true variants as errors. We present VeChat, as an approach to implement this idea: VeChat is based on variation graphs, as a popular type of data structure for pangenome reference systems. Extensive benchmarking experiments demonstrate that long reads corrected by VeChat contain 4 to 15 (Pacific Biosciences) and 1 to 10 times (Oxford Nanopore Technologies) less errors than when being corrected by state of the art approaches. Further, using VeChat prior to long-read assembly significantly improves the haplotype awareness of the assemblies. VeChat is an easy-to-use open-source tool and publicly available at https://github.com/HaploKit/vechat

    Haplotype estimation in polyploids using DNA sequence data

    Get PDF
    Polyploid organisms possess more than two copies of their core genome and therefore contain k>2 haplotypes for each set of ordered genomic variants. Polyploidy occurs often within the plant kingdom, among others in important corps such as potato (k=4) and wheat (k=6). Current sequencing technologies enable us to read the DNA and detect genomic variants, but cannot distinguish between the copies of the genome, each inherited from one of the parents. To detect inheritance patterns in populations, it is necessary to know the haplotypes, as alleles that are in linkage over the same chromosome tend to be inherited together. In this work, we develop mathematical optimisation algorithms to indirectly estimate haplotypes by looking into overlaps between the sequence reads of an individual, as well as into the expected inheritance of the alleles in a population. These algorithm deal with sequencing errors and random variations in the counts of reads observed from each haplotype. These methods are therefore of high importance for studying the genetics of polyploid crops. </p

    USE OF MOLECULAR GENETICS TO INVESTIGATE POPULATION STRUCTURE AND SWAYBACK IN HORSES

    Get PDF
    The present research incorporated molecular genetic methods to 1) investigate the genetic basis of Juvenile Onset Lordosis or Swayback in the American Saddlebred horses; and 2) conduct a population genetic study to compare the Persian Kurdish, Persian Arabian and American Thoroughbred horse populations. Juvenile-onset lordosis, or swayback, is a condition in horses where the conformational topline back curvature drops significantly within the first two years of life. The trait has a higher prevalence in Saddlebreds (5%). Prior research on them quantified the trait using a Measurement of Back Contour (MBC), defining an MBC of \u3e7.0 centimeters as swayback, and8.0) MBC horses suggested a single recessive variant on chromosome20 to be associated with the trait. The present research aimed to find the causal mutation on chr20 using Whole-Genome Sequencing, testing a hypothesis that a single recessive variant on chr20 causes high-MBC. Eleven Saddlebreds were Whole-Genome Sequenced in two experiments. Experiment1 involved 3 high-MBC horses and 3 low-MBC horses with various haplotype structures on chr20. No variants were found on chr20 to support the hypothesis, suggesting more than one major variant to be involved in swayback. Re-evaluation of the association on chr20 was performed via genotyping for tag markers on a chr20 haplotype in 34 high-MBC versus 75 low-MBC Saddlebreds, where a chi-square comparison confirmed that chr20 has a significant impact on high-MBC and that the earlier GWAS association was not a statistical artifact. We then evaluated all the genomic variation in the target region of chr20:41,000,000-44,000,000 to identify the best candidate to influence high-MBC in Saddlebreds. A total of 9,691 variant loci were detected that make 21,463 transcript variations. Of these, 599 made coding sequence variations, including 315 synonymous, 250 missense, 14 frameshift, 9 in-frame deletion, 7 in-frame insertion, and 2 splice-donor and 2 start-loss variants. The strongest candidate seems to be a frameshift deletion of 7bp in the exon 1 of the MDFI gene, at 20:41873061-41873068. MDFI-knockout mice show defects in the formation of thoracic vertebrae and ribs, which restrains fusion of the spinous processes. The last part of the dissertation research is about a study that aimed to characterize the Persian Kurdish horse population relative to the Persian Arabian and American Thoroughbred populations using genome-wide SNP data. Fifty-eight Kurdish, 38 Persian Arabian and 83 Thoroughbred horses were genotyped across 670,796 markers. The Kurdish horses were generally distinguished from the Persian Arabian and Thoroughbred samples by all analyses including Principal Component Analyses, cluster analyses and calculation of pairwise FST. These results together identify the Kurdish horse population as a unique, uniform genetic structure

    Genome Diversity and the Origin of the Arabian Horse

    Get PDF
    The Arabian horse, one of the world\u27s oldest breeds of any domesticated animal, is characterized by natural beauty, graceful movement, athletic endurance, and, as a result of its development in the arid Middle East, the ability to thrive in a hot, dry environment. Here we studied 378 Arabian horses from 12 countries using equine single nucleotide polymorphism (SNP) arrays and whole-genome re-sequencing to examine hypotheses about genomic diversity, population structure, and the relationship of the Arabian to other horse breeds. We identified a high degree of genetic variation and complex ancestry in Arabian horses from the Middle East region. Also, contrary to popular belief, we could detect no significant genomic contribution of the Arabian breed to the Thoroughbred racehorse, including Y chromosome ancestry. However, we found strong evidence for recent interbreeding of Thoroughbreds with Arabians used for flat-racing competitions. Genetic signatures suggestive of selective sweeps across the Arabian breed contain candidate genes for combating oxidative damage during exercise, and within the Straight Egyptian subgroup, for facial morphology. Overall, our data support an origin of the Arabian horse in the Middle East, no evidence for reduced global genetic diversity across the breed, and unique genetic adaptations for both physiology and conformation
    • …
    corecore