37 research outputs found

    Regions of the genome exhibiting moderate to strong evidence for CD, RA and T1D risk factors.

    No full text
    <p>For all CD and RA loci in this table, there is at least a 0.5 probability that one or more SNPs in the region are included in the multi-marker disease model (); for all T1D loci, . Support for disease associations is conditioned on enrichment of pathways in <a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1003770#pgen-1003770-g001" target="_blank">Figure 1</a>. Rows marked with * are selected only after accounting for pathway enrichment, or show substantial increase in support due to feedback from enrichment. Right-most column cites published GWAS findings that corroborate majority of * rows. In this column, ** indicates that validation is not required as disease association is already strongly supported without pathways; these rows recapitulate the strongest associations reported in the original study <a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1003770#pgen.1003770-Wellcome1" target="_blank">[65]</a> (see Supplementary Materials). Genes in enriched pathways are written in bold. Table columns from left to right are: (1) disease; (2) chromosomal locus; (3) region most likely containing the risk-conferring variant(s), in Megabases (Mb); (4) posterior probability that one or more SNPs in region are included in model under null, and (5) under enrichment hypothesis; (6) posterior probability that two or more SNPs are included under null, and (7) under enrichment hypothesis; (8) smallest trend <i>p</i>-value in region from original analysis <a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1003770#pgen.1003770-Wellcome1" target="_blank">[65]</a>, when available (some of these <i>p</i>-values are derived from imputed SNPs, and are not available in our data); (9) established genes in disease pathogenesis, or most credible genes of interest based on prior studies, corresponding to locus (when the most credible gene differs from gene assigned to pathway, pathway gene is shown in parentheses); (10) refSNP identifier of SNP in critical region with largest PIP (this SNP is likely in linkage disequilibrium with the causal variant rather than being causal itself, and may not match SNP reported in <a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1003770#pgen.1003770-Wellcome1" target="_blank">[65]</a> with smallest <i>p</i>-value); (11) the PIP of this SNP; (12) posterior mean of log-odds ratio (additive effect of minor allele count on log-odds of disease) given SNP that is included in multi-marker disease model; (13) 95% credible interval of effect size, ; (14) frequency of minor allele for SNP in controls, and (15) in cases. Bold numbers in and columns highlight appreciable increase in support for disease associations within region after feedback from enriched pathway. Credible interval is smallest interval about posterior mean that contains with 95% posterior probability. The “critical region” at each locus is estimated by inspecting single-SNP BFs <a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1003770#pgen.1003770-Servin1" target="_blank">[79]</a>, and bounding the region by areas of high recombination rate, inferred using data from Phase I, release 16a of the HapMap study <a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1003770#pgen.1003770-McVean1" target="_blank">[185]</a>, and visualized in UCSC Genome Browser <a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1003770#pgen.1003770-Dreszer1" target="_blank">[186]</a>. Note that statistic for critical region may be slightly different than for overlapping segment shown in <a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1003770#pgen-1003770-g003" target="_blank">Figure 3</a> due to different numbers of SNPs in segments and critical regions. All SNP information and genomic positions are based on Human Genome Assembly 17 (NCBI build 35).</p

    Diseases show a wide range of support for enrichment of disease associations in pathways.

    No full text
    <p>Each row shows the pathway with the largest BF for enrichment of disease associations among 3158 candidate gene sets. Columns left to right: (1) disease; (2) enriched pathway; (3) pathway database, and repository where pathway is retrieved if different from database; (4) BF for hypothesis that disease associations are enriched among SNPs assigned to pathway; (5) posterior probability of enrichment hypothesis; (6) number of genes assigned to pathway; (7) number of SNPs near these genes. Abbreviations used in figure: PID = NCI Nature Pathway Interaction Database <a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1003770#pgen.1003770-Schaefer1" target="_blank">[163]</a>, BS = NCBI BioSystems <a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1003770#pgen.1003770-Geer1" target="_blank">[164]</a>, PC = Pathway Commons <a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1003770#pgen.1003770-Cerami1" target="_blank">[165]</a>. Databases and database identifiers for pathways listed here: “Transport of connexons to the plasma membrane” (Reactome 11050, PC); “Tumor suppressor Arf inhibits ribosomal biogenesis” (BioCarta); “Cytokine signaling in immune system” (Reactome 75790, BS 366171); “Alanine biosynthesis” (PANTHER P02724); “Measles” (KEGG hsa05162, BS 213306); “IL2-mediated signaling events” (PID il2_1pathway, BS 137976); “Incretin synthesis, secretion, and inactivation” (Reactome 23974, PC). *Null and enrichment hypotheses for RA and T1D include enrichment of disease associations in MHC, in which SNPs within MHC are enriched at a different level than non-MHC SNPs in pathway; and 4.6 for RA and T1D, respectively. Number of genes/SNPs for RA and T1D count only non-MHC genes assigned to pathway. **Illustrative posterior probability assuming a “conservative” prior (see text).</p

    Enrichment hypotheses with multiple enriched pathways show increased support from data.

    No full text
    <p>Each row gives pathway, or combination of 2 or 3 pathways, with largest BF for enrichment of disease associations. See <a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1003770#pgen-1003770-g001" target="_blank">Figure 1</a> for legend and abbreviations used. All enrichment hypotheses for RA and T1D shown here also include enrichment of the MHC, allowing for a different level of enrichment within the MHC. Unlike the BFs in <a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1003770#pgen-1003770-g001" target="_blank">Figures 1</a> and <a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1003770#pgen-1003770-g002" target="_blank">2</a>, BFs here are all defined relative to null hypothesis of no enrichment, so that they can be easily compared. Counts of genes and SNPs only include those that are not already assigned to other enriched pathways; for example, 37 genes belong to the IL-23 pathway, and of those 15 are already cytokine signaling genes, so inclusion of IL-23 signaling adds 22 more genes. Databases and database identifiers for pathways in this figure: “IL2-mediated signaling events” (PID il2_1pathway, BS 137976); “ErbB receptor signaling network” (PID erbb_network_pathway, BS 138016); “Inositol pyrophosphates biosynthesis” (HumanCyc 6369, PC); “Measles” (KEGG hsa05162, BS 213306); “Wnt” (Cancer Cell Map, PC); “Cytokine signaling in immune system” (Reactome 75790, BS 366171); “IL23-mediated signaling events” (PID il23pathway, BS 138000); “Methionine salvage pathway” (Reactome 75881, BS 366245).</p

    Integrated Enrichment Analysis of Variants and Pathways in Genome-Wide Association Studies Indicates Central Role for IL-2 Signaling Genes in Type 1 Diabetes, and Cytokine Signaling Genes in Crohn's Disease

    Get PDF
    <div><p>Pathway analyses of genome-wide association studies aggregate information over sets of related genes, such as genes in common pathways, to identify gene sets that are <i>enriched</i> for variants associated with disease. We develop a model-based approach to pathway analysis, and apply this approach to data from the Wellcome Trust Case Control Consortium (WTCCC) studies. Our method offers several benefits over existing approaches. First, our method not only interrogates pathways for enrichment of disease associations, but also estimates the level of enrichment, which yields a coherent way to promote variants in enriched pathways, enhancing discovery of genes underlying disease. Second, our approach allows for multiple enriched pathways, a feature that leads to novel findings in two diseases where the major histocompatibility complex (MHC) is a major determinant of disease susceptibility. Third, by modeling disease as the combined effect of multiple markers, our method automatically accounts for linkage disequilibrium among variants. Interrogation of pathways from eight pathway databases yields strong support for enriched pathways, indicating links between Crohn's disease (CD) and cytokine-driven networks that modulate immune responses; between rheumatoid arthritis (RA) and “Measles” pathway genes involved in immune responses triggered by measles infection; and between type 1 diabetes (T1D) and IL2-mediated signaling genes. Prioritizing variants in these enriched pathways yields many additional putative disease associations compared to analyses without enrichment. For CD and RA, 7 of 8 additional non-MHC associations are corroborated by other studies, providing validation for our approach. For T1D, prioritization of IL-2 signaling genes yields strong evidence for 7 additional non-MHC candidate disease loci, as well as suggestive evidence for several more. Of the 7 strongest associations, 4 are validated by other studies, and 3 (near IL-2 signaling genes <i>RAF1</i>, <i>MAPK14</i>, and <i>FYN</i>) constitute novel putative T1D loci for further study.</p></div

    Scatterplots showing , posterior probability that region contains disease risk variants, given different enrichment hypotheses.

    No full text
    <p>Each point corresponds to a small region of the genome containing 50 SNPs. Posterior probabilities on vertical axis for CD, RA and T1D are conditioned on enrichment of pathway with largest BF (<a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1003770#pgen-1003770-g001" target="_blank">Figure 1</a>). For T2D, since no single pathway stands out in ranking (<a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1003770#pgen-1003770-g002" target="_blank">Figures 2</a> and <a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1003770#pgen.1003770.s001" target="_blank">S1</a>), along vertical axis is obtained by averaging over top 5 pathways (see <a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1003770#s4" target="_blank">Methods</a>). Points highlighted in red correspond to segments overlapping SNPs assigned to the enriched pathway (for T2D, at least 1 out of 5 top pathways). In RA and T1D, 50-SNP segments overlapping the MHC are drawn as open circles (SNPs in these segments are not assigned to the pathway). Overlapping segments sharing the same association signal are not shown. Some segments are labeled by gene(s) in pathway and/or most credible gene of interest based on prior studies (most credible gene is shown in parentheses if different from pathway gene). Asterisk (*) indicates an appreciable increase in the probability of a disease association, and this association is validated by other GWAS for same disease (see <a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1003770#pgen-1003770-t001" target="_blank">Table 1</a>).</p

    Top-ranked candidate pathways for enrichment of disease associations in CD, RA, T1D and T2D.

    No full text
    <p>Refer to <a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1003770#pgen-1003770-g001" target="_blank">Figure 1</a> for legend, abbreviations, and meaning of asterisk (*). Two right-most columns show posterior mean and 95% credible interval of genome-wide log-odds () and log-fold enrichment () given that pathway is enriched (). Note that enrichment level is defined on log-scale (<a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1003770#pgen.1003770.e015" target="_blank">eq. 2</a>), so indicates enrichment. Credible interval is smallest interval about mean that contains parameter with 95% posterior probability, calculated to nearest 0.1 using a numerical approximation. Database identifiers for pathways not previously mentioned: “IL23-mediated signaling events” (PID il23pathway, PC); “IL12-mediated signaling events” (PID il12_2pathway, PC); “Immune system” (Reactome 6900, BS 106386); “Release of eIF4E” (Reactome 6836, PC); “Synthesis, secretion, and inactivation of glucagon-like peptide-1” (Reactome 24019, PC); “Id signaling pathway” (WikiPathways WP53 <a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1003770#pgen.1003770-Kandasamy1" target="_blank">[166]</a>, BS 198871). See <a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1003770#pgen.1003770.s001" target="_blank">Figure S1</a> for more gene set enrichment results.</p

    Variants in non-MHC disease regions revealed by enriched pathways have smaller effects on disease risk.

    No full text
    <p>Each point in scatterplot corresponds to a 50-SNP segment outside the MHC for which . Filled circles correspond to selected regions containing disease risk factors without feedback from enriched pathways (); open circles correspond to selected regions conditioned on enrichment ( and ). For each segment, minor allele frequency and posterior mean additive effect of minor allele count on log-odds of disease (“log-odds ratio”) are taken from SNP in segment with highest probability of being included in multi-marker model.</p

    Top four BFs in CD for each setting of .

    No full text
    <p>In each case, the 3 largest BFs correspond, in order, to Cytokine signaling in immune system, IL23-mediated signaling events, and IL12-mediated signaling events (these are the top 3 pathways for CD in <a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1003770#pgen-1003770-g002" target="_blank">Figure 2</a>). Pathway with fourth largest BF differs across settings of .</p

    Summary of pathways used in the analysis.

    No full text
    <p>Chart on left shows number of unique gene sets obtained from the following pathway databases, included in this order: Reactome <a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1003770#pgen.1003770-Croft1" target="_blank">[167]</a>, Kyoto Encyclopedia of Genes and Genomes (KEGG) <a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1003770#pgen.1003770-Kanehisa1" target="_blank">[146]</a>, BioCarta (<a href="http://www.biocarta.com" target="_blank">www.biocarta.com</a>), HumanCyc <a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1003770#pgen.1003770-Caspi1" target="_blank">[147]</a>, <a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1003770#pgen.1003770-Romero1" target="_blank">[168]</a>, NCI Nature Pathway Interaction Database (PID) <a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1003770#pgen.1003770-Schaefer1" target="_blank">[163]</a>, WikiPathways <a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1003770#pgen.1003770-Pico1" target="_blank">[169]</a>, <a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1003770#pgen.1003770-Kelder1" target="_blank">[170]</a>, PANTHER <a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1003770#pgen.1003770-Mi1" target="_blank">[171]</a> and Cancer Cell Map (<a href="http://cancer.cellmap.org" target="_blank">cancer.cellmap.org</a>). The majority of these pathways are retrieved from the Pathway Commons (PC) <a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1003770#pgen.1003770-Cerami1" target="_blank">[165]</a> and NCBI BioSystems <a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1003770#pgen.1003770-Geer1" target="_blank">[164]</a> repositories. We include gene sets from both repositories when gene sets from same pathway differ (see Supplementary Materials). We include two additional gene sets for “classical” and “extended” MHC <a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1003770#pgen.1003770-MHC1" target="_blank">[90]</a>, <a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1003770#pgen.1003770-Horton1" target="_blank">[91]</a>. Right-hand chart shows gains in gene coverage by including additional databases in the analysis, where “gene coverage” is defined as any genes in reference sequence that are assigned to at least one pathway. From the total of 3160 gene sets (including MHC and ×MHC), we achieve coverage of 39% of genes in reference sequence (see <a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1003770#pgen.1003770.s006" target="_blank">Figure S6</a>).</p

    Mean computation time, in hours, of various methods for the mouse dataset.

    No full text
    <p>Values in parentheses are standard deviations. Means and standard deviations are calculated based on 2.1 million MCMC iterations in 120 replicates: 20 intra-family and 20 inter-family splits for three phenotypes. Computations were performed on a single core of an Intel Xeon L5420 2.50 GHz CPU. Since computing times for many methods will vary with number of iterations used, and we did not undertake a comprehensive evaluation of how many iterations suffice for each algorithm, these results provide only a very rough guide to the relative computational burden of different methods.</p
    corecore