1,139 research outputs found

    Identification and validation of genetic variants predictive of gait in standardbred horses

    Get PDF
    Several horse breeds have been specifically selected for the ability to exhibit alternative patterns of locomotion, or gaits. A premature stop codon in the gene DMRT3 is permissive for “gaitedness” across breeds. However, this mutation is nearly fixed in both American Standardbred trotters and pacers, which perform a diagonal and lateral gait, respectively, during harness racing. This suggests that modifying alleles must influence the preferred gait at racing speeds in these populations. A genome-wide association analysis for the ability to pace was performed in 542 Standardbred horses (n = 176 pacers, n = 366 trotters) with genotype data imputed to ~74,000 single nucleotide polymorphisms (SNPs). Nineteen SNPs on nine chromosomes (ECA1, 2, 6, 9, 17, 19, 23, 25, 31) reached genome-wide significance (p < 1.44 x 10−6). Variant discovery in regions of interest was carried out via whole-genome sequencing. A set of 303 variants from 22 chromosomes with putative modifying effects on gait was genotyped in 659 Standardbreds (n = 231 pacers, n = 428 trotters) using a high-throughput assay. Random forest classification analysis resulted in an out-of-box error rate of 0.61%. A conditional inference tree algorithm containing seven SNPs predicted status as a pacer or trotter with 99.1% accuracy and subsequently performed with 99.4% accuracy in an independently sampled population of 166 Standardbreds (n = 83 pacers, n = 83 trotters). This highly accurate algorithm could be used by owners/trainers to identify Standardbred horses with the potential to race as pacers or as trotters, according to the genotype identified, prior to initiating training and would enable fine-tuning of breeding programs with designed matings. Additional work is needed to determine both the algorithm’s utility in other gaited breeds and whether any of the predictive SNPs play a physiologically functional role in the tendency to pace or tag true functional alleles

    Enhancing genetic discoveries with population-specific reference panels

    Get PDF
    Met een aanpak die bekend staat als Genoom-breed associatieonderzoek (Genome-wide association study, GWAS) brak rond tien jaar geleden een nieuw tijdperk in genetica-onderzoek, waarbij licht werd geworpen op de complexe onderliggende factoren en aandoeningen van genetische componenten die voorheen grotendeels onbekend waren. Statistisch afgeleide methoden waren belangrijke ingrediĂ«nten voor succes, waarmee onderzoekers externe gegevens aan hun onderzoeken konden toevoegen en informatie konden maximaliseren zonder extra onderzoeksuitgaven. De technologie bleef zich ontwikkelen: terwijl initieel &lt;1 miljoen punten van het DNA (genetische varianten) toegankelijk waren in een persoon, kan tegenwoordig het gehele genoom worden gekarakteriseerd (3 miljard punten) met next-generation sequentiemachines. De kosten voor sequentie zijn nog steeds onpraktisch voor GWAS, omdat er duizenden personen nodig zijn om reproduceerbare bevindingen te verzekeren. Volledige genomen kunnen echter worden afgeleid met statistische methoden, mits een gereduceerd aantal genetische varianten wordt gekarakteriseerd bij de onderzoeksvrijwilligers en een referentieset van onafhankelijke genomen beschikbaar is. Een internationale inspanning, het 1000 Genomes Project, genereerde openbare referentiesets door sequentie van ~2.500 vertegenwoordigers van de wereldpopulaties. In deze thesis evalueerden we de voordelen van een populatiespecifieke referentieset voor Sardijnen door 2.120 vrijwilligers te sequentiĂ«ren en deze vervolgens in GWAS te verwerken. We tonen aan hoe de nauwkeurigheid van afgeleide genomen verbeterd is in vergelijking met het gebruik van de 1000 Genomes-set en we identificeerden nieuwe genetische componenten voor verschillende complexe factoren die anders niet ontdekt hadden kunnen worden. Vergelijkbare inspanningen zijn gaande in andere populaties, waaronder de Nederlanders, en we bespreken in deze thesis het ontwerp en de resultaten daarvan.An approach known as Genome-wide association study (GWAS) have signed a new era in the Genetics research field around ten years ago, shedding light on the genetic components underlying complex traits and diseases, previously largely unknown. Statistical inferential methods were key ingredients for success, allowing researchers to incorporate external data in their studies, hence maximizing information at no additional experimental cost. Technology has continued to improve, and while initially &lt;1 million points of the DNA (genetic variants) were assessable in a person, nowadays the entire genome (3 billion points) can be characterized with next-generation sequencing machines. The cost of sequencing is still impractical for GWASs, because several thousands of individuals are needed to assure reproducible findings. With statistical methods however, full genomes can be inferred if a reduced number of genetic variants is characterized on the study’s volunteers and a reference set of independent genomes is available. An international effort, the 1000 Genomes Project, has generated public reference sets by sequencing ~2500 representatives of the world’s populations. In this thesis, we evaluated the benefits of a population-specific reference set for Sardinians by sequencing 2,120 volunteers and subsequently incorporate it in GWASs. We show how the accuracy of inferred genomes is improved compared to using the 1000 Genomes set, and we identified novel genetic components for several complex traits that could not have been discovered otherwise. Similar efforts are ongoing in other populations, including the Dutch, and we discuss in this thesis their design and results

    Genome-wide Genotype Imputation-Aspects of Quality, Performance and Practical Implementation

    Get PDF
    Finding a relation between a particular phenotype and genotype is one of the central themes in medical genetics. Single-nucleotide polymorphisms are easily assessable markers allowing genome wide association (GWA) studies and meta-analysis. Hundreds of such analyses were performed in the last decades. Even though several tools for such analyses are available, an efficient SNP-data transformation tool was tool was necessary. We developed a data management tool fcGENE which allows us easy transformation of genetic data into different formats required by different GWA tools. Genotype imputation which is a common technique in GWA, allows us to study the relationship of a phenotype at markers that are missing and even at completely un-typed markers. Moreover this technique helps us to infer both common and rare variants that are not directly typed. We studied different aspects of the imputation processes especially focussing on its accuracy. More specifically, our focus lied on the impact of pre-imputation filtering on the accuracy of imputation results. To measure the imputation accuracy, we defined two new statistical sores, which allowed us the comparison between imputed and true genotypes directly. Our direct comparison between the true and imputed genotypes showed that strict quality filtering of SNPs prior to imputation process may be detrimental. We further studied the impact of differently selected reference panels from publicly available projects like HapMap and 1000 genome projects on the imputation quality. More specifically, we analysed the relationship between genetic distance of the reference and the resulting imputation quality. For this purpose, we considered different summary statistics of population differentiation (e.g. Reich’s , Nei’s and other modified scores) between the study data set and the reference panel used in imputation processes. In the third analysis, we compared two basic trends of using reference panels in imputation process: (1) use of genetically best-matched reference panel, and (2) use of an admixed reference panel that allows the use of individual reference panel from all possible type of populations, and let the software itself select the optimal references in a piece-wise manner or as complete sequences of SNPs for each individual separately. We have analysed in detail the performance of different imputation software and also the accuracy of the imputation processes in both cases. We found that the current trend of using software with admixed reference panel in all cases is not always the best strategy. Prior to imputation process, phasing of study data sets by using an external reference panel is also a common trend especially when it comes to the imputation of large datasets. We studied the performance of different imputation frameworks with or without pre-phasing. It turned out that pre-phasing clearly reduces the imputation quality for medium-sized data sets.:Table of Contents List of Tables IV List of Figures V 1 Overview of the Thesis 1 1.1 Abstract 1 1.2 Outlines 4 2 Introduction 5 2.1 Basics of genetics 5 2.1.1 Phenotype, genotype and haplotype 5 2.1.2 Hardy-Weinberg law 6 2.1.3 Linkage disequilibrium 6 2.1.4 Genome-wide association analysis 7 2.2 Phasing of Genotypes 7 2.3 Genotype imputation 8 2.3.1 Tools for Imputing genotype data 9 2.3.2 Reference panels 9 3 Results 11 3.1 Detailed Abstracts 11 3.1.1 First Research Paper 11 3.1.2 Second Research Paper 14 3.1.3 Third Research Paper 17 3.1.4 Fourth Research Paper 19 3.2 Discussion and Conclusion 22 4 Published Articles 27 4.1 First Research Paper 27 4.1.1 Supplementary Information 34 4.2 Second Research Paper 51 4.2.1 Supplementary Information 62 4.3 Third Research Paper 69 4.3.1 Supplementary Information 85 4.4 Fourth Research Paper 97 4.4.1 Supplementary Information 109 5 Zusammenfassung der Arbeit 117 6 Bibliography 120 7 Eigene Publikationen 124 8 Darstellung des eigenen Beitrags 125 8.1 First Research Paper 125 8.2 Second Research Paper 126 8.3 Third Research Paper 127 8.4 Fourth Research Paper 128 9 ErklĂ€rung ĂŒber die eigenstĂ€ndige Abfassung der Arbeit 129 10 Danksagung 130 11 Curriculum Vitae 131 List of Tables IV List of Figures V 1 Overview of the Thesis 1 1.1 Abstract 1 1.2 Outlines 4 2 Introduction 5 2.1 Basics of genetics 5 2.1.1 Phenotype, genotype and haplotype 5 2.1.2 Hardy-Weinberg law 6 2.1.3 Linkage disequilibrium 6 2.1.4 Genome-wide association analysis 7 2.2 Phasing of Genotypes 7 2.3 Genotype imputation 8 2.3.1 Tools for Imputing genotype data 8 2.3.2 Reference panels 8 3 Results 8 3.1 Detailed Abstracts 8 3.1.1 First Research Paper 8 3.1.2 Second Research Paper 8 3.1.3 Third Research Paper 8 3.1.4 Fourth Research Paper 8 3.2 Discussion and Conclusion 8 4 Published Articles 8 4.1 First Research Paper 8 4.1.1 Supplementary Information 8 4.2 Second Research Paper 8 4.2.1 Supplementary Information 8 4.3 Third Research Paper 8 4.3.1 Supplementary Information 8 4.4 Fourth Research Paper 8 4.4.1 Supplementary Information 8 5 Zusammenfassung der Arbeit 8 6 Bibliography 8 7 Eigene Publikationen 8 8 ErklĂ€rung ĂŒber die eigenstĂ€ndige Abfassung der Arbeit 8 9 Danksagung 8 10 Curriculum Vitae

    Enhancing genetic discoveries with population-specific reference panels

    Get PDF

    Enhancing genetic discoveries with population-specific reference panels

    Get PDF

    Optimizing Selection of the Reference Population for Genotype Imputation From Array to Sequence Variants

    Get PDF
    Imputation of high-density genotypes to whole-genome sequences (WGS) is a cost-effective method to increase the density of available markers within a population. Imputed genotypes have been successfully used for genomic selection and discovery of variants associated with traits of interest for the population. To allow for the use of imputed genotypes for genomic analyses, accuracy of imputation must be high. Accuracy of imputation is influenced by multiple factors, such as size and composition of the reference group, and the allele frequency of variants included. Understanding the use of imputed WGSs prior to the generation of the reference population is important, as accurate imputation might be more focused, for instance, on common or on rare variants. The aim of this study was to present and evaluate new methods to select animals for sequencing relying on a previously genotyped population. The Genetic Diversity Index method optimizes the number of unique haplotypes in the future reference population, while the Highly Segregating Haplotype selection method targets haplotype alleles found throughout the majority of the population of interest. First the WGSs of a dairy cattle population were simulated. The simulated sequences mimicked the linkage disequilibrium level and the variants’ frequency distribution observed in currently available Holstein sequences. Then, reference populations of different sizes, in which animals were selected using both novel methods proposed here as well as two other methods presented in previous studies, were created. Finally, accuracies of imputation obtained with different reference populations were compared against each other. The novel methods were found to have overall accuracies of imputation of more than 0.85. Accuracies of imputation of rare variants reached values above 0.50. In conclusion, if imputed sequences are to be used for discovery of novel associations between variants and traits of interest in the population, animals carrying novel information should be selected and, consequently, the Genetic Diversity Index method proposed here may be used. If sequences are to be used to impute the overall genotyped population, a reference population consisting of common haplotypes carriers selected using the proposed Highly Segregating Haplotype method is recommended
    • 

    corecore