10,174 research outputs found

    Wolf outside, dog inside? The genomic make-up of the Czechoslovakian Wolfdog

    Get PDF
    Background Genomic methods can provide extraordinary tools to explore the genetic background of wild species and domestic breeds, optimize breeding practices, monitor and limit the spread of recessive diseases, and discourage illegal crossings. In this study we analysed a panel of 170k Single Nucleotide Polymorphisms with a combination of multivariate, Bayesian and outlier gene approaches to examine the genome-wide diversity and inbreeding levels in a recent wolf x dog cross-breed, the Czechoslovakian Wolfdog, which is becoming increasingly popular across Europe. Results Pairwise FST values, multivariate and assignment procedures indicated that the Czechoslovakian Wolfdog was significantly differentiated from all the other analysed breeds and also well-distinguished from both parental populations (Carpathian wolves and German Shepherds). Coherently with the low number of founders involved in the breed selection, the individual inbreeding levels calculated from homozygosity regions were relatively high and comparable with those derived from the pedigree data. In contrast, the coefficient of relatedness between individuals estimated from the pedigrees often underestimated the identity-by-descent scores determined using genetic profiles. The timing of the admixture and the effective population size trends estimated from the LD patterns reflected the documented history of the breed. Ancestry reconstruction methods identified more than 300 genes with excess of wolf ancestry compared to random expectations, mainly related to key morphological features, and more than 2000 genes with excess of dog ancestry, playing important roles in lipid metabolism, in the regulation of circadian rhythms, in learning and memory processes, and in sociability, such as the COMT gene, which has been described as a candidate gene for the latter trait in dogs. Conclusions In this study we successfully applied genome-wide procedures to reconstruct the history of the Czechoslovakian Wolfdog, assess individual wolf ancestry proportions and, thanks to the availability of a well-annotated reference genome, identify possible candidate genes for wolf-like and dog-like phenotypic traits typical of this breed, including commonly inherited disorders. Moreover, through the identification of ancestry-informative markers, these genomic approaches could provide tools for forensic applications to unmask illegal crossings with wolves and uncontrolled trades of recent and undeclared wolfdog hybrids

    Applications and efficiencies of the first cat 63K DNA array

    Get PDF

    Statistical Methods For Detecting Genetic Risk Factors of a Disease with Applications to Genome-Wide Association Studies

    Get PDF
    This thesis aims to develop various statistical methods for analysing the data derived from genome wide association studies (GWAS). The GWAS often involves genotyping individual human genetic variation, using high-throughput genome-wide single nucleotide polymorphism (SNP) arrays, in thousands of individuals and testing for association between those variants and a given disease under the assumption of common disease/common variant. Although GWAS have identified many potential genetic factors in the genome that affect the risks to complex diseases, there is still much of the genetic heritability that remains unexplained. The power of detecting new genetic risk variants can be improved by considering multiple genetic variants simultaneously with novel statistical methods. Improving the analysis of the GWAS data has received much attention from statisticians and other scientific researchers over the past decade. There are several challenges arising in analysing the GWAS data. First, determining the risk SNPs might be difficult due to non-random correlation between SNPs that can inflate type I and II errors in statistical inference. When a group of SNPs are considered together in the context of haplotypes/genotypes, the distribution of the haplotypes/genotypes is sparse, which makes it difficult to detect risk haplotypes/genotypes in terms of disease penetrance. In this work, we proposed four new methods to identify risk haplotypes/genotypes based on their frequency differences between cases and controls. To evaluate the performances of our methods, we simulated datasets under wide range of scenarios according to both retrospective and prospective designs. In the first method, we first reconstruct haplotypes by using unphased genotypes, followed by clustering and thresholding the inferred haplotypes into risk and non-risk groups with a two-component binomial-mixture model. In the method, the parameters were estimated by using the modified Expectation-Maximization algorithm, where the maximisation step was replaced the posterior sampling of the component parameters. We also elucidated the relationships between risk and non-risk haplotypes under different modes of inheritance and genotypic relative risk. In the second method, we fitted a three-component mixture model to genotype data directly, followed by an odds-ratio thresholding. In the third method, we combined the existing haplotype reconstruction software PHASE and permutation method to infer risk haplotypes. In the fourth method, we proposed a new way to score the genotypes by clustering and combined it with a logistic regression approach to infer risk haplotypes. The simulation studies showed that the first three methods outperformed the multiple testing method of (Zhu, 2010) in terms of average specificity and sensitivity (AVSS) in all scenarios considered. The logistic regression methods also outperformed the standard logistic regression method. We applied our methods to two GWAS datasets on coronary artery disease (CAD) and hypertension (HT), detecting several new risk haplotypes and recovering a number of the existing disease-associated genetic variants in the literature

    Sparse Probit Linear Mixed Model

    Full text link
    Linear Mixed Models (LMMs) are important tools in statistical genetics. When used for feature selection, they allow to find a sparse set of genetic traits that best predict a continuous phenotype of interest, while simultaneously correcting for various confounding factors such as age, ethnicity and population structure. Formulated as models for linear regression, LMMs have been restricted to continuous phenotypes. We introduce the Sparse Probit Linear Mixed Model (Probit-LMM), where we generalize the LMM modeling paradigm to binary phenotypes. As a technical challenge, the model no longer possesses a closed-form likelihood function. In this paper, we present a scalable approximate inference algorithm that lets us fit the model to high-dimensional data sets. We show on three real-world examples from different domains that in the setup of binary labels, our algorithm leads to better prediction accuracies and also selects features which show less correlation with the confounding factors.Comment: Published version, 21 pages, 6 figure
    • …
    corecore