513 research outputs found

    Dissecting Trait Heterogeneity: a Comparison of Three Clustering Methods Applied to Genotypic Data

    Get PDF
    Background: Trait heterogeneity, which exists when a trait has been defined with insufficient specificity such that it is actually two or more distinct traits, has been implicated as a confounding factor in traditional statistical genetics of complex hu man disease. In the absence of de tailed phenotypic data collected consistently in combination with genetic data, unsupervised computational methodologies offer the potential for discovering underlying trait heteroge neity. The performance of three such methods – Bayesian Classification, Hyperg raph-Based Clustering, and Fuzzy k -Modes Clustering – appropriate for categorical data were comp ared. Also tested was the ability of these methods to detect trait heterogeneity in the presence of locus heteroge neity and/or gene-gene interaction , which are two other complicating factors in discovering genetic models of complex human disease. To dete rmine the efficacy of applying the Bayesian Classification method to re al data, the reliability of its intern al clustering metr ics at finding good clusterings was evaluated using permutation testing. Results: Bayesian Classifica tion outperformed the other two method s, with the exception that the Fuzzy k -Modes Clustering performed best on the most comp lex genetic model. Bayesian Classificati on achieved excellent recovery for 75% of the da tasets simulated under the simplest genetic model, while it achieved moderate recovery for 56% of datase ts with a sample size of 500 or more (across all simulated models) and for 86% of datasets with 10 or fewer nonfuncti onal loci (across all si mulated models). Neither Hypergraph Clustering nor Fuzzy k -Modes Clustering achieved good or excellent cluster recovery for a majority of datasets even under a re stricted set of conditions. When usin g the average log of class strength as the internal clustering metric, th e false positive rate was controlled very well, at three percent or less for all three significance levels (0. 01, 0.05, 0.10), and the false negative rate was acceptably low (18 percent) for the least stringent sign ificance level of 0.10. Conclusion: Bayesian Classificati on shows promise as an unsuper vised computational method for dissecting trait hetero geneity in genotypic data. Its control of fa lse positive and false negative rates lends confidence to the validity of its results. Further investigation of how differ ent parameter settings may improve the performance of Bayesian Classification, especi ally under more comp lex genetic models, is ongoing

    Replication in Genome-Wide Association Studies

    Full text link
    Replication helps ensure that a genotype-phenotype association observed in a genome-wide association (GWA) study represents a credible association and is not a chance finding or an artifact due to uncontrolled biases. We discuss prerequisites for exact replication, issues of heterogeneity, advantages and disadvantages of different methods of data synthesis across multiple studies, frequentist vs. Bayesian inferences for replication, and challenges that arise from multi-team collaborations. While consistent replication can greatly improve the credibility of a genotype-phenotype association, it may not eliminate spurious associations due to biases shared by many studies. Conversely, lack of replication in well-powered follow-up studies usually invalidates the initially proposed association, although occasionally it may point to differences in linkage disequilibrium or effect modifiers across studies.Comment: Published in at http://dx.doi.org/10.1214/09-STS290 the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Modelos de interacção genética de dois genes em fenótipos

    Get PDF
    Em trabalhos anteriores foram propostos diversos modelos estatísticos para a penetrância de forma a inferir a interacção de dois genes dial´elicos na construção de fenótipos binários complexos: modelos de acção independente, modelos de inibição e modelos de número mínimo de alelos. Estes modelos baseiam-se numa decomposição da penetrância através da abordagem por penetrâncias alélicas, que permitiu a inclusão dos conceitos mendelianos de dominância e recessividade alélica na sua modelação. Pretende-se aqui dar a conhecer os avanços mais recentes na parte da modelação da interacção genética, apresentando uma nova decomposição da penetrância e uma nova formulação matemática da dominância e da recessividade. Aplicam-se ainda ferramentas bayesianas para o ajustamento dos modelos de interacção genética a dados experimentais com recurso ao método de amostragem de Gibbs. Toda a metodologia é exemplificada num conjunto de dados de um estudo da susceptibilidade da malária cerebral em ratinhos

    Genetic linkage analysis in the age of whole-genome sequencing

    Get PDF
    For many years, linkage analysis was the primary tool used for the genetic mapping of Mendelian and complex traits with familial aggregation. Linkage analysis was largely supplanted by the wide adoption of genome-wide association studies (GWASs). However, with the recent increased use of whole-genome sequencing (WGS), linkage analysis is again emerging as an important and powerful analysis method for the identification of genes involved in disease aetiology, often in conjunction with WGS filtering approaches. Here, we review the principles of linkage analysis and provide practical guidelines for carrying out linkage studies using WGS data

    A Multiparent Advanced Generation Inter-Cross to Fine-Map Quantitative Traits in Arabidopsis thaliana

    Get PDF
    Identifying natural allelic variation that underlies quantitative trait variation remains a fundamental problem in genetics. Most studies have employed either simple synthetic populations with restricted allelic variation or performed association mapping on a sample of naturally occurring haplotypes. Both of these approaches have some limitations, therefore alternative resources for the genetic dissection of complex traits continue to be sought. Here we describe one such alternative, the Multiparent Advanced Generation Inter-Cross (MAGIC). This approach is expected to improve the precision with which QTL can be mapped, improving the outlook for QTL cloning. Here, we present the first panel of MAGIC lines developed: a set of 527 recombinant inbred lines (RILs) descended from a heterogeneous stock of 19 intermated accessions of the plant Arabidopsis thaliana. These lines and the 19 founders were genotyped with 1,260 single nucleotide polymorphisms and phenotyped for development-related traits. Analytical methods were developed to fine-map quantitative trait loci (QTL) in the MAGIC lines by reconstructing the genome of each line as a mosaic of the founders. We show by simulation that QTL explaining 10% of the phenotypic variance will be detected in most situations with an average mapping error of about 300 kb, and that if the number of lines were doubled the mapping error would be under 200 kb. We also show how the power to detect a QTL and the mapping accuracy vary, depending on QTL location. We demonstrate the utility of this new mapping population by mapping several known QTL with high precision and by finding novel QTL for germination data and bolting time. Our results provide strong support for similar ongoing efforts to produce MAGIC lines in other organisms

    Modified penetrance of coding variants by cis-regulatory variation contributes to disease risk

    Get PDF
    Coding variants represent many of the strongest associations between genotype and phenotype; however, they exhibit interindividual differences in effect, termed 'variable penetrance'. Here, we study how cis-regulatory variation modifies the penetrance of coding variants. Using functional genomic and genetic data from the Genotype-Tissue Expression Project (GTEx), we observed that in the general population, purifying selection has depleted haplotype combinations predicted to increase pathogenic coding variant penetrance. Conversely, in cancer and autism patients, we observed an enrichment of penetrance increasing haplotype configurations for pathogenic variants in disease-implicated genes, providing evidence that regulatory haplotype configuration of coding variants affects disease risk. Finally, we experimentally validated this model by editing a Mendelian single-nucleotide polymorphism (SNP) using CRISPR/Cas9 on distinct expression haplotypes with the transcriptome as a phenotypic readout. Our results demonstrate that joint regulatory and coding variant effects are an important part of the genetic architecture of human traits and contribute to modified penetrance of disease-causing variants.Peer reviewe

    Bayesian QTL Mapping in Inbred and Outbred Experimental Designs

    Get PDF

    Pharmacogénomique de la sclérose en plaques : méthodes et applications

    Get PDF
    L'expansion ainsi que l'évolution du domaine de la génétique au cours de ces dernières années a été fulgurante. Cela s'accompagne par la génération d'une masse importante d'information génétique sur les traits complexes chez l'homme. Une question naturelle est de savoir comment utiliser cette information dans la pratique médicale quotidienne. Il y a dix ans à peine le séquençage du génome humain nécessitait une collaboration scientifique d'envergure internationale entre les différents acteurs de la recherche biomédicale. Aujourd'hui, il n'est pas exclu à ce que, dans un avenir proche, on puisse obtenir le profil génétique de chaque patient dans la pratique médicale courante. La pharmacogénomique, une fusion de la pharmacologie et de la génomique, vise à déterminer le traitement le plus approprié à chaque patient en fonction de son patrimoine génétique. En effet, plusieurs études pharmacogénomiques ont pu démontrer l'intérêt d'intégrer l'information génétique du patient pour déterminer son traitement optimal. Le cas de la warfarine, un anticoagulant, a souvent été considéré comme l'un des succès les plus motivants pour poursuivre ce type d'études. Cependant, le succès ainsi que le besoin de ces études dépendent de multiples facteurs et varient considérablement selon les traits étudiés. L'objectif de ce travail est d'évaluer l'état actuel des connaissances pour la sclérose en plaques (SEP), une maladie neurologique invalidante touchant principalement les jeunes adultes. À ce jour, il n'existe aucun remède à la SEP, mais il existe des traitements modificateurs de la maladie avec des degrés d'efficacité et de toxicité variable. Les facteurs génétiques qui influencent la réponse au traitement chez les patients atteints de SEP sont à ce jour mal connus. Même si ces facteurs peuvent être mis en évidence dans le futur, il n'en demeure pas moins que leur utilisation en routine clinique n'est pas aussi simple que supposée. Dans ce travail, nous avons essayé de mettre en évidence la complexité du passage de l'utilisation de données génétiques à grande échelle à la pratique médicale pour les traits complexes. Nous avons mené des études d'association et de prédiction. Tout d'abord, nous exposons leurs concepts et revisitons les différences dans leurs objectifs. Plus précisément, nous avons effectué une analyse d'association simple-marqueur de la réponse à l'interféron-bêta chez les patients atteint de SEP. Ensuite, nous avons comparé les modèles simple-marqueur et multi-marqueur dans le contexte de la recherche d'association puis dans celui de la prédiction en utilisant des données réelles et des données simulées. Différentes approches de modélisation multi-marqueur existent. Nous nous sommes basés sur l'analyse des scores polygéniques et des méthodes d'estimation bayésienne en évaluant plusieurs des propriétés de ces approches de modélisation. Nos résultats montrent que, dans la cadre d'une étude d'association pangénomique, les modèles multi-marqueurs, récemment préconisés, ne sont pas forcément plus puissants que les modèles classiques simple-marqueur. En revanche, les modèles multi-marqueurs qui prennent en compte l'effet de plusieurs marqueurs simultanément apparaissent clairement mieux adaptés pour prédire le risque génétique. Néanmoins, en se concentrant sur l'analyse des scores polygéniques, nous montrons que de nombreux facteurs comme la taille de l'échantillon de l'étude et l'héritabilité du trait influencent la performance prédictive d'un modèle. Les études pharmacogénomiques peuvent révolutionner les soins aux patients. Cependant, en dehors de l'enthousiasme qu'elles peuvent susciter, nous discutons dans la dernière partie de cette thèse les questions sociales, éthiques et économiques qu'elles soulèvent.The field of genetics is rapidly expanding and evolving. As more and more is understood on the genetics of complex human traits, a natural question arises as to how these findings can be translated to the everyday medical practice. While a little more than a decade ago sequencing the entire human genome was achieved by the largest international scientific collaboration ever undertaken in biology, today it is not farfetched to expect that in the near future obtaining the genetic profile of each patient may become routine medical practice. Pharmacogenomics, a blend of pharmacology and genomics, aims to determine the most suitable treatment for each patient as a function of his or her genetic makeup. Pharmacogenomic studies have increasingly provided evidence that there are gains to be achieved by incorporating genetic information when determining the optimal treatment choice for a patient. The case of warfarin, an anticoagulant, has often been considered as one of the most motivating success stories to pursue such type of studies. The success as well as the need of such studies, however, depend on a multitude of factors and vary greatly across traits. The objective of this thesis is to evaluate the current state of the art for Multiple Sclerosis (MS), a debilitating neurological disorder affecting primarily young adults. To date, no cure exists for MS but a number of disease-modifying therapies have been approved with varying degree of efficacy and toxicity. So far, little is known on the genetic factors that influence response to treatment in MS patients. Moreover, even if such factors are known apriori, evaluating and proving their utility at the clinical level is not as straightforward as one may be inclined to think. In this thesis, we highlight why the road to translate such findings to medical practice remains rough and challenging. In particular, relying on the association and prediction studies that we have conducted, we expose the design and limitations of each and discuss model choice in each context. Specifically, we conducted single-marker association analysis of response to interferon-bêta in MS patients. We compared single-marker to multi-marker models in the context of association and also in that of prediction using both real and simulated datasets. Different approaches to multi-marker modeling exist. We focused on polygenic score analyses and Bayesian estimation methods and evaluated several of the properties of these modeling approaches. Our findings showed that, in the context of association, the use of more complex and computationally heavy multi-marker models that has been recently advocated may lead to little, if any, benefit over the classical single-marker association analysis. On the other hand, multi-marker models that take into account the effect of many markers simultaneously clearly appear better suited to predict genetic risk. Nevertheless, focusing on polygenic score analyses, we demonstrated that many factors such as the study sample size and the heritability of the trait influence the predictive performance of a model. Pharmacogenomic studies may revolutionize patient care. However, in all the excitement of the promise that they hold, in the concluding part of this thesis we also address the social, ethical and economic issues that they raise
    corecore