53 research outputs found

    Doctor of Philosophy

    Get PDF
    dissertationGenotype Phenotype Association (GPA) is a means to identify candidate genes and genetic variants that may contribute to phenotypic variation. Technological advances in DNA sequencing continue to improve the efficiency and accuracy of GPA. Currently, High Throughput Sequencing (HTS) is the preferred method for GPA as it is fast and economical. HTS allows for population-level characterization of genetic variation, required for GPA studies. Despite the potential power of using HTS in GPA studies, there are technical hurdles that must be overcome. For instance, the excessive error rate in HTS data and the sheer size of population-level data can hinder GPA studies. To overcome these challenges, I have written two software programs for the purpose of HTS GPA. The first toolkit, GPAT++, is designed to detect GPA using small genetic variants. Unlike pervious software, GPAT++'s association test models the inherent errors in HTS, preventing many spurious GPA. The second toolkit, Whole Genome Alignment Metrics (WHAM), was designed for GPA using large genetic variants (structural variants). By integrating both structural variant identification and association testing, WHAM can identify shared structural variants associated with a phenotype. Both GPAT++ and WHAM have been successfully applied to real-world GPA studie

    Discovery and genotyping of structural variation from long-read haploid genome sequence data

    Get PDF
    In an effort to more fully understand the full spectrum of human genetic variation, we generated deep single-molecule, real-time (SMRT) sequencing data from two haploid human genomes. By using an assembly-based approach (SMRT-SV), we systematically assessed each genome independently for structural variants (SVs) and indels resolving the sequence structure of 461,553 genetic variants from 2 bp to 28 kbp in length. We find that &gt;89% of these variants have been missed as part of analysis of the 1000 Genomes Project even after adjusting for more common variants (MAF &gt; 1%). We estimate that this theoretical human diploid differs by as much as ∼16 Mbp with respect to the human reference, with long-read sequencing data providing a fivefold increase in sensitivity for genetic variants ranging in size from 7 bp to 1 kbp compared with short-read sequence data. Although a large fraction of genetic variants were not detected by short-read approaches, once the alternate allele is sequence-resolved, we show that 61% of SVs can be genotyped in short-read sequence data sets with high accuracy. Uncoupling discovery from genotyping thus allows for the majority of this missed common variation to be genotyped in the human population. Interestingly, when we repeat SV detection on a pseudodiploid genome constructed in silico by merging the two haploids, we find that ∼59% of the heterozygous SVs are no longer detected by SMRT-SV. These results indicate that haploid resolution of long-read sequencing data will significantly increase sensitivity of SV detection.</jats:p

    Epistatic and Combinatorial Effects of Pigmentary Gene Mutations in the Domestic Pigeon

    Get PDF
    SummaryUnderstanding the molecular basis of phenotypic diversity is a critical challenge in biology, yet we know little about the mechanistic effects of different mutations and epistatic relationships among loci that contribute to complex traits. Pigmentation genetics offers a powerful model for identifying mutations underlying diversity and for determining how additional complexity emerges from interactions among loci. Centuries of artificial selection in domestic rock pigeons (Columba livia) have cultivated tremendous variation in plumage pigmentation through the combined effects of dozens of loci. The dominance and epistatic hierarchies of key loci governing this diversity are known through classical genetic studies [1–6], but their molecular identities and the mechanisms of their genetic interactions remain unknown. Here we identify protein-coding and cis-regulatory mutations in Tyrp1, Sox10, and Slc45a2 that underlie classical color phenotypes of pigeons and present a mechanistic explanation of their dominance and epistatic relationships. We also find unanticipated allelic heterogeneity at Tyrp1 and Sox10, indicating that color variants evolved repeatedly though mutations in the same genes. These results demonstrate how a spectrum of coding and regulatory mutations in a small number of genes can interact to generate substantial phenotypic diversity in a classic Darwinian model of evolution [7]

    Improved assembly and variant detection of a haploid human genome using single-molecule, high-fidelity long reads

    Get PDF
    The sequence and assembly of human genomes using long-read sequencing technologies has revolutionized our understanding of structural variation and genome organization. We compared the accuracy, continuity, and gene annotation of genome assemblies generated from either high-fidelity (HiFi) or continuous long-read (CLR) datasets from the same complete hydatidiform mole human genome. We find that the HiFi sequence data assemble an additional 10% of duplicated regions and more accurately represent the structure of tandem repeats, as validated with orthogonal analyses. As a result, an additional 5 Mbp of pericentromeric sequences are recovered in the HiFi assembly, resulting in a 2.5-fold increase in the NG50 within 1 Mbp of the centromere (HiFi 480.6 kbp, CLR 191.5 kbp). Additionally, the HiFi genome assembly was generated in significantly less time with fewer computational resources than the CLR assembly. Although the HiFi assembly has significantly improved continuity and accuracy in many complex regions of the genome, it still falls short of the assembly of centromeric DNA and the largest regions of segmental duplication using existing assemblers. Despite these shortcomings, our results suggest that HiFi may be the most effective standalone technology for de novo assembly of human genomes

    A spectrum of free software tools for processing the VCF variant call format: vcflib, bio-vcf, cyvcf2, hts-nim and slivar.

    No full text
    Since its introduction in 2011 the variant call format (VCF) has been widely adopted for processing DNA and RNA variants in practically all population studies-as well as in somatic and germline mutation studies. The VCF format can represent single nucleotide variants, multi-nucleotide variants, insertions and deletions, and simple structural variants called and anchored against a reference genome. Here we present a spectrum of over 125 useful, complimentary free and open source software tools and libraries, we wrote and made available through the multiple vcflib, bio-vcf, cyvcf2, hts-nim and slivar projects. These tools are applied for comparison, filtering, normalisation, smoothing and annotation of VCF, as well as output of statistics, visualisation, and transformations of files variants. These tools run everyday in critical biomedical pipelines and countless shell scripts. Our tools are part of the wider bioinformatics ecosystem and we highlight best practices. We shortly discuss the design of VCF, lessons learnt, and how we can address more complex variation through pangenome graph formats, variation that can not easily be represented by the VCF format

    Antimicrobial Functions of Lactoferrin Promote Genetic Conflicts in Ancient Primates and Modern Humans

    No full text
    <div><p>Lactoferrin is a multifunctional mammalian immunity protein that limits microbial growth through sequestration of nutrient iron. Additionally, lactoferrin possesses cationic protein domains that directly bind and inhibit diverse microbes. The implications for these dual functions on lactoferrin evolution and genetic conflicts with microbes remain unclear. Here we show that lactoferrin has been subject to recurrent episodes of positive selection during primate divergence predominately at antimicrobial peptide surfaces consistent with long-term antagonism by bacteria. An abundant lactoferrin polymorphism in human populations and Neanderthals also exhibits signatures of positive selection across primates, linking ancient host-microbe conflicts to modern human genetic variation. Rapidly evolving sites in lactoferrin further correspond to molecular interfaces with opportunistic bacterial pathogens causing meningitis, pneumonia, and sepsis. Because microbes actively target lactoferrin to acquire iron, we propose that the emergence of antimicrobial activity provided a pivotal mechanism of adaptation sparking evolutionary conflicts via acquisition of new protein functions.</p></div

    Model of lactoferrin evolution and genetic conflicts with pathogens.

    No full text
    <p>Following a duplication of the transferrin gene in the ancestor of eutherian mammals, interactions between the transferrin (yellow) C lobe and the bacterial transferrin receptors such as TbpA (green) led to the emergence of a molecular arms race. In contrast, while lactoferrin has likely also been engaged in evolutionary conflicts with pathogen iron acquisition receptors like LbpA (purple), the emergence of antimicrobial peptide activity in the N lobe would have provided novel defense activity against pathogens targeting lactoferrin as an iron source. This function would have led to the emergence of pathogen inhibitors of lactoferrin antimicrobial peptide activity (such as PspA or LbpB), which have dominated subsequent evolutionary conflicts localized to the lactoferrin N lobe.</p
    corecore