453 research outputs found

    Biological networks and epistasis in genome-wide association studies

    Get PDF
    Over the last few years, technological improvements have made possible the genotyping of hundreds of thousands of SNPs, enabling whole-genome association studies. The first genome-wide association studies have recently been completed to detect causal variant for complex traits. Although increasing evidence suggests that interaction between loci, such as epistasis between two loci, should be considered, most of these studies proceed by considering each SNP independently. One reason for this choice is that looking at all pairs of SNPs increases dramatically the number of tests (approximatively 50 billions of tests for a 300,000 SNPs data set) that faces with computational limitation and strong multiple testing correction.
We proposed to reduce the number of tests by focusing on pairs of SNPs that belong to genes known to interact in some metabolic network. Although some interactions might be missed, these pairs of genes are good candidates for epistasis. Furthermore the use of protein interaction databases (such as the STRING database) may reduce the number of tests by a factor of 5,000.
Results using this approach will be presented on simulated data sets and on public data sets.
&#xa

    SMILE: A novel dissimilarity-based procedure for detecting sparse-specific profiles in sparse contingency tables

    No full text
    International audienceA novel statistical procedure for clustering individuals characterized by sparse-specific profiles is introduced in the context of data summarized in sparse contingency tables. The proposed procedure relies on a single-linkage clustering based on a new dissimilarity measure designed to give equal influence to sparsity and specificity of profiles. Theoretical properties of the new dissimilarity are derived by characterizing single-linkage clustering using Minimum Spanning Trees. Such characterization allows the description of situations for which the proposed dissimilarity outperforms competing dissimilarities. Simulation examples are performed to demonstrate the strength of the new dissimilarity compared to 11 other methods. The analysis of a genomic data set dedicated to the study of molecular signatures of selection is used to illustrate the efficiency of the proposed method in a real situatio

    Gene-Based Methods to Detect Gene-Gene Interaction in R: The GeneGeneInteR Package

    Get PDF
    GeneGeneInteR is an R package dedicated to the detection of an association between a case-control phenotype and the interaction between two sets of biallelic markers (single nucleotide polymorphisms or SNPs) in case-control genome-wide associations studies. The development of statistical procedures for searching gene-gene interaction at the SNP-set level has indeed recently grown in popularity as these methods confer advantage in both statistical power and biological interpretation. However, all these methods have been implemented in home made softwares that are for most of them available only on request to the authors and at best have a web interface. Since the implementation of these methods is not straightforward, there is a need for a user-friendly tool to perform gene-based genegene interaction. The purpose of GeneGeneInteR is to propose a collection of tools for all the steps involved in gene-based gene-gene interaction testing in case-control association studies. Illustrated by an example of a dataset related to rheumatoid arthritis, this paper details the implementation of the functions available in GeneGeneInteR to perform an analysis of a collection of SNP sets. Such an analysis aims at addressing the complete statistical pipeline going from data importation to the visualization of the results through data manipulation and statistical analysis

    Décorrélation adaptative pour la prédiction en grande dimension

    Get PDF
    International audienceIn large-scale signicance analysis, ignoring dependence or not is a core issue, leading to many recent results about the impact of decorrelating the pointwise test statistics. Yet, for the estimation of a prediction model, decorrelating large proles of predicting variables is not as clearly questioned, although many comparative studies have reported the superiority of so-called naive methods, ignoring dependence. Under the usual Gaussian mixture model assumption of Linear Discriminant Analysis, we show that, for a given dependence structure, the classication performance of methods ignoring or not dependence may be markedly dierent, according to the pattern of the association signal between the predicting variables and the response. In order to minimize the largest probability of misclassication, we propose a method handling adaptively the dependence. A simulation study shows that the performance of the present method is at least as good as the best of methods ignoring dependence or based on a complete decorrelation of the predicting variables. 1Dans les procédures de tests en grande dimension, la prise en compte ou non de la dépendance donne lieu à de nombreux développements méthodologiques et discussions , notamment sur l'impact de la décorrélation des statistiques de tests. Pourtant, dans une optique d'estimation d'un modèle pour la prédiction, la question de la décorréla-tion de grands prols de variables prédictrices n'est pas abordée dans les mêmes termes, bien que de nombreuses études comparatives aient rapporté la supériorité de méthodes de prédiction dites naïves, au sens où elles ignorent la dépendance. Sous l'hypothèse clas-sique en analyse linéaire discriminante d'un mélange de lois gaussiennes, nous montrons que pour une structure de dépendance des prédicteurs donnée, les performances de clas-sication ignorant ou non cette dépendance peuvent être très variables et opposées selon la forme du signal d'association entre les prédicteurs et la classe. An de minimiser le risque maximal d'erreur de classication, nous proposons donc une prise en compte adap-tative de la dépendance et montrons sur des simulations que les performances de la règle de classication proposée sont généralement au moins aussi bonnes que la meilleure des règles parmi celles ignorant la dépendance ou au contraire basées sur une décorrélation des prédicteurs
    • …
    corecore