12 research outputs found

    PAG XXVI - HapDab: A haplotype map and imputation resource in the horse

    No full text
    The recently developed Equine genotyping array (MNEc670k) was designed for genotype imputation. Variants (SNPs) on the array were chosen to differentiate among patterns of linkage disequilibrium (LD) between diverse horse populations. This enables a smaller set of SNPs (~670k) to be imputed to a higher density (~2M) using haplotypes characterized in a reference population. Moreover, LD decays at different rates throughout the genome as well as between different breeds resulting in differing haplotype lengths. This is important for association studies where markers are unlikely to be functional, instead being in LD with the causal variant.<br>Using a reference population of 485 horses representing 15 breeds, we calculated breed specific haplotypes throughout the genome around each of 2M core SNPs. We cataloged these haplotypes along with LD decay in a relational database to enable data-driven SNP-to-gene mapping for association studies. Using this same reference population, we’ve implemented an equine imputation resource. Input samples genotyped on the MNEc670k array (or previous generation arrays) are phased and phylogenetically clustered in order to determine an imputation population. <br

    ISAG - Design and use of the MNEc670k SNP array for precision SNP imputation to millions of markers in 15 horse breeds

    No full text
    Single nucleotide polymorphism (SNP) genotyping arrays containing 54K-74 thousand (K) markers for the horse have enabled genome wide association studies examining disease and performance traits, as well as quantitation of variation within and between populations. We recently designed the MNEc670k array for denser genotyping capability, as well as genotype imputation, or the statistical inference of sample haplotypes from a smaller set of markers. As part of this design a cohort of 485 horses having 2 million (M) SNP genotype data (n=332) or whole genome sequence (n=153) was used to select “tagging SNPs” that were informative for differentiating haplotypes both across and within 15 breed tagging groups. Across all breed tagging groups, 355,903 SNPs were needed to reconstruct haplotypes with minor allele frequency (MAF) > 0.01 with an r2 > 0.99. In each of the 15 breed tagging groups, between 144,175 and 387,279 SNPs were required for haplotype reconstruction. All SNPs that were informative across breed groups, as well as SNPs that were informative in five or more breed tagging groups, were included on the MNEc670k SNP array.<br><br><br>The performance of the MNEc670k array for SNP imputation was assessed in several scenarios. Genotypes of the 485 horses with either 2M or WGS data were masked down to the MNEc670k density, as well as legacy array 54/75K SNP density, in a random 1/3 subset of individuals in each of the 15 breed tagging groups. After removing the imputation targets, the remaining horses were used as a reference population. Imputation concordance from 54/65K SNPs to 2M SNPs in breed tagging groups ranged from 82-96% depending on breed group, while concordance from 670k to 2M SNPs ranged between 97-99%. Imputation from 670K to 14M SNPs (WGS) was assessed in a cohort of 38 Standardbred and 20 Thoroughbred horses yielding a concordance of 96 and 97% respectively. Additionally, we report the gains in accuracy of imputation using breed-specific haplotype and recombination maps, which improve SNP accuracy in scenarios where breed specific parameters can be reliably estimated. Read the pre-print manuscript describing this work in more detail here: https://doi.org/10.1101/112979 <br

    Additional file 2: of eQTL discovery and their association with severe equine asthma in European Warmblood horses

    No full text
    Linear mixed models jointly modeling MCK and HDE. Linear mixed models with random intercepts for each individual model the association between the top fifteen RAO associated SNPs that were also eSNPs in either MCK or HDE (chr13.32843309, chr13.32844446, chr13.33460982, chr13.33502488, chr28.3692072, chr21.52625145) and the gene expression of the genes they regulated (DEXI, NSUN2, ATF7IP2, GLIPR1L2) with reduced maximum likelihood (REML) and maximum likelihood (ML). An R markdown document that generated this html file is available on GitHub: https://github.com/VCMason . (HTML 5545 kb

    Additional file 4: of eQTL discovery and their association with severe equine asthma in European Warmblood horses

    No full text
    Association of ATF7IP2, and GLIPR1L2 gene expression to RAO disease status. Html output of an R markdown document. The file contains one multiple logistic regression for each gene ATF7IP2, and GLIPR1L2. These models quantify the association between gene expression in ATF7IP2, or GLIPR1L2 and disease status. Four significant surrogate variables were calculated for the HDE treatment by SVA and therefore none were included in the model. An R markdown document that generated this html file is available on GitHub: https://github.com/VCMason . (HTML 866 kb

    Additional file 3: of eQTL discovery and their association with severe equine asthma in European Warmblood horses

    No full text
    Association of DEXI and NSUN2 gene expression to RAO disease status. Html output of an R markdown document. The file contains two multiple logistic regressions and one simple logistic regression showing the association between DEXI gene expression and disease status. Multiple logistic regression with known confounders as independent variables, and the simple logistic regression only has the independent variable of interest (DEXI or NSUN2 gene expression) as the single covariate. An R markdown document that generated this html file is available on GitHub: https://github.com/VCMason . (HTML 844 kb

    Additional file 1: Table S1. of Identification and validation of risk loci for osteochondrosis in standardbreds

    No full text
    Named genes located within the top regions of association on ECA14 from the GWA analysis. Table S2. Haplotype analysis within the top regions of association on ECA14 from the GWA analysis. Table S3. Top GWA SNPs from GEMMA mixed model analysis of data imputed to 670 k and 2 M SNP lists. Table S4. Summary of variants by type and region. Table S5. Regions of interest for which detailed annotation of SNPs was performed. Table S6. Putative risk variants for OC that were selected for inclusion in the custom Sequenom genotyping assay (n = 240). Table S7. Frequency of alternate allele in cases and controls for each SNP in the Sequenom platform that genotyped successfully in the discovery or validation populations. (DOCX 93 kb

    PAG XXVI - Camoco: identifying high priority candidate genes from GWAS using co-expression networks

    No full text
    Camoco is a fully featured computational framework for building, analyzing and integrating gene co-expression networks with loci identified in genome wide association studies (GWAS). Hundreds of links between genetic markers (SNPs) and agro-economically important traits have been identified by GWAS. Yet, the causal gene or allele often remains unknown due to many genes being in linkage disequilibrium (LD) with each of potentially dozens of genetic markers. Co-expression networks identify genes that share similar response patterns of gene expression making them a powerful tool for inferring the biological function of under-characterized genes. In the right biological context, sets of causal genes related to a GWAS trait will exhibit strong co-expression while inconsequential genes in LD with the marker exhibit random patterns of co-expression.<br>Camoco features methods to build, analyze, and explore co-expression networks using either microarray or RNA-Seq data. Once built, Camoco establishes a biological context for networks by evaluating their ability to recapitulate previously described ontologies (e.g. GO, KEGG, or MapMan). Vetted networks are then used to determine subsets of genes in close proximity to GWAS loci that are strongly co-expressed. GWAS SNPs are mapped to genes using a SNP-to-gene mapping algorithm using user-defined or map-based haplotype windows. High priority candidate genes are identified by evaluating gene-specific co-expression among candidate genes. Demonstrations will be shown using GWAS datasets and co-expression networks generated in both plants and animals. Camoco is free and open source software and available at http://github.com/LinkageIO/Camoco.<br
    corecore