10 research outputs found

    Using Prior Information from the Medical Literature in GWAS of Oral Cancer Identifies Novel Susceptibility Variant on Chromosome 4 - the AdAPT Method

    Get PDF
    Background: Genome-wide association studies (GWAS) require large sample sizes to obtain adequate statistical power, but it may be possible to increase the power by incorporating complementary data. In this study we investigated the feasibility of automatically retrieving information from the medical literature and leveraging this information in GWAS. Methods: We developed a method that searches through PubMed abstracts for pre-assigned keywords and key concepts, and uses this information to assign prior probabilities of association for each single nucleotide polymorphism (SNP) with the phenotype of interest - the Adjusting Association Priors with Text (AdAPT) method. Association results from a GWAS can subsequently be ranked in the context of these priors using the Bayes False Discovery Probability (BFDP) framework. We initially tested AdAPT by comparing rankings of known susceptibility alleles in a previous lung cancer GWAS, and subsequently applied it in a two-phase GWAS of oral cancer. Results: Known lung cancer susceptibility SNPs were consistently ranked higher by AdAPT BFDPs than by p-values. In the oral cancer GWAS, we sought to replicate the top five SNPs as ranked by AdAPT BFDPs, of which rs991316, located in the ADH gene region of 4q23, displayed a statistically significant association with oral cancer risk in the replication phase (per-rare-allele log additive p-value [p(trend)] = 2.5 x 10(-3)). The combined OR for having one additional rare allele was 0.83 (95% CI: 0.76-0.90), and this association was independent of previously identified susceptibility SNPs that are associated with overall UADT cancer in this gene region. We also investigated if rs991316 was associated with other cancers of the upper aerodigestive tract (UADT), but no additional association signal was found. Conclusion: This study highlights the potential utility of systematically incorporating prior knowledge from the medical literature in genome-wide analyses using the AdAPT methodology. AdAPT is available online (url: http://services.gate.ac.uk/lld/gwas/service/config)

    Using Prior Information from the Medical Literature in GWAS of Oral Cancer Identifies Novel Susceptibility Variant on Chromosome 4-the AdAPT Method

    No full text
    Background: Genome-wide association studies (GWAS) require large sample sizes to obtain adequate statistical power, but it may be possible to increase the power by incorporating complementary data. In this study we investigated the feasibility of automatically retrieving information from the medical literature and leveraging this information in GWAS. Methods: We developed a method that searches through PubMed abstracts for pre-assigned keywords and key concepts, and uses this information to assign prior probabilities of association for each single nucleotide polymorphism (SNP) with the phenotype of interest - the Adjusting Association Priors with Text (AdAPT) method. Association results from a GWAS can subsequently be ranked in the context of these priors using the Bayes False Discovery Probability (BFDP) framework. We initially tested AdAPT by comparing rankings of known susceptibility alleles in a previous lung cancer GWAS, and subsequently applied it in a two-phase GWAS of oral cancer. Results: Known lung cancer susceptibility SNPs were consistently ranked higher by AdAPT BFDPs than by p-values. In the oral cancer GWAS, we sought to replicate the top five SNPs as ranked by AdAPT BFDPs, of which rs991316, located in the ADH gene region of 4q23, displayed a statistically significant association with oral cancer risk in the replication phase (per-rare-allele log additive p-value [p(trend)] = 2.5 x 10(-3)). The combined OR for having one additional rare allele was 0.83 (95% CI: 0.76-0.90), and this association was independent of previously identified susceptibility SNPs that are associated with overall UADT cancer in this gene region. We also investigated if rs991316 was associated with other cancers of the upper aerodigestive tract (UADT), but no additional association signal was found. Conclusion: This study highlights the potential utility of systematically incorporating prior knowledge from the medical literature in genome-wide analyses using the AdAPT methodology. AdAPT is available online (url: http://services.gate.ac.uk/lld/gwas/service/config)

    The 12p13.33/RAD52 Locus and Genetic Susceptibility to Squamous Cell Cancers of Upper Aerodigestive Tract

    No full text
    Genetic variants located within the 12p13.33/RAD52 locus have been associated with lung squamous cell carcinoma (LUSC). Here, within 5,947 UADT cancers and 7,789 controls from 9 different studies, we found rs10849605, a common intronic variant in RAD52, to be also associated with upper aerodigestive tract (UADT) squamous cell carcinoma cases (OR = 1.09, 95% CI: 1.04-1.15, p = 6x10(-4)). We additionally identified rs10849605 as a RAD52 cis-eQTL inUADT(p = 1x10(-3)) and LUSC (p = 9x10(-4)) tumours, with the UADT/LUSC risk allele correlated with increased RAD52 expression levels. The 12p13.33 locus, encompassing rs10849605/RAD52, was identified as a significant somatic focal copy number amplification in UADT(n = 374, q-value = 0.075) and LUSC (n = 464, q-value = 0.007) tumors and correlated with higher RAD52 tumor expression levels (p = 6x10(-48) and p = 3x10(-29) in UADT and LUSC, respectively). In combination, these results implicate increased RAD52 expression in both genetic susceptibility and tumorigenesis of UADT and LUSC tumors

    Using Prior Information from the Medical Literature in GWAS of Oral Cancer Identifies Novel Susceptibility Variant on Chromosome 4 - the AdAPT Method

    No full text
    Background: Genome-wide association studies (GWAS) require large sample sizes to obtain adequate statistical power, but it may be possible to increase the power by incorporating complementary data. In this study we investigated the feasibility of automatically retrieving information from the medical literature and leveraging this information in GWAS. Methods: We developed a method that searches through PubMed abstracts for pre-assigned keywords and key concepts, and uses this information to assign prior probabilities of association for each single nucleotide polymorphism (SNP) with the phenotype of interest - the Adjusting Association Priors with Text (AdAPT) method. Association results from a GWAS can subsequently be ranked in the context of these priors using the Bayes False Discovery Probability (BFDP) framework. We initially tested AdAPT by comparing rankings of known susceptibility alleles in a previous lung cancer GWAS, and subsequently applied it in a two-phase GWAS of oral cancer. Results: Known lung cancer susceptibility SNPs were consistently ranked higher by AdAPT BFDPs than by p-values. In the oral cancer GWAS, we sought to replicate the top five SNPs as ranked by AdAPT BFDPs, of which rs991316, located in the ADH gene region of 4q23, displayed a statistically significant association with oral cancer risk in the replication phase (per-rare-allele log additive p-value [p(trend)] = 2.5 x 10(-3)). The combined OR for having one additional rare allele was 0.83 (95% CI: 0.76-0.90), and this association was independent of previously identified susceptibility SNPs that are associated with overall UADT cancer in this gene region. We also investigated if rs991316 was associated with other cancers of the upper aerodigestive tract (UADT), but no additional association signal was found. Conclusion: This study highlights the potential utility of systematically incorporating prior knowledge from the medical literature in genome-wide analyses using the AdAPT methodology. AdAPT is available online (url: http://services.gate.ac.uk/lld/gwas/service/config)

    Forest plot showing overall and stratified association results of the rs991316 SNP with oral cancer (oral cavity and oropharyngeal cancer).

    No full text
    <p>a) Apart from the OR for CT heterozygotes and TT homozygotes, which were estimated relative the major CC homozygotes, all OR and 95% CIs were estimated using the log-additive model, adjusting for age, gender and center. All subjects from the genome-wide and replication phases with available co-variates were included in this analysis (not generic controls). The overall OR for cancers of oral cavity and oropharynx is shown by the dotted vertical line. b) P for heterogeneity indicates differences in OR between strata and was derived from the Cochran's Q test. c) Never drinkers were subjects that either reported 0 g alcohol intake per day, or reported being never drinker, light drinkers consumed >0 and <6.06 g alc./day, intermediate drinkers consumed >6.06 and <46.3 g alc./day, and heavy drinkers consumed >46.3 g alc./day. d) Hypopharynx, larynx, and esophagus cases were not included in the analyses above.</p

    Association results of the SNPs included in the GWAS of oral cancer (by p-values), pair-wise r<sup>2</sup> estimates with rs991316, and recombination rates, for SNPs in the <i>ADH</i> gene region on 4q23.

    No full text
    <p>P-values indicating the strength of association for each SNP in the GWAS with oral cancer are shown on the −log10 scale (left Y-axis), against their positions on chromosome 4 (Build 36.3). The color of each point and SNP represent the degree of linkage disequilibrium (r<sup>2</sup>) with rs991316 according to HapMap phase II CEU data. Highlighted in the figure are rs1229984, rs1789924 and rs971074, which have been reported to be associated with UADT cancers previously, as well as the rs991316 SNP which was discovered to be associated specifically with oral cancer in the current study. rs1229984 was not genotyped, nor tagged by a proxy variant on the HumanHap300 BeadChip but was genotyped by Taqman assay in the same samples from Central Europe and ARCAGE studies as included in the discovery phase of current GWAS, and r<sup>2</sup> between rs1229984 and rs991316 was estimated in the 3,513 controls from Central European and ARCAGE studies. Recombination rates across the region are shown by the light blue line plotted against the right y axis. Genes in the region are represented with arrow heads indicating the direction of transcription.</p

    Summary results for the six SNPs selected for replication in oral cancer GWAS. Ranking was based on the Bayesian False Discovery Probability (BFDP).

    No full text
    a)<p>Total number of cases and controls included in the final GWA analysis (<a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0036888#pone.0036888.s004" target="_blank">Table S2</a>).</p>b)<p>Total number of cases and controls included in the replication analysis.</p>c)<p>Major and minor alleles, with corresponding allele frequencies in controls.</p>d)<p>OR, 95% CI and p-values were estimated for the per-rare-allele log-additive genetic model by unconditional logistic regression, adjusting for sex and country (see methods).</p>e)<p>Prior probability of association (prior for the alternative hypothesis H<sub>0</sub>) based on the ADAPT literature search (see methods).</p>f)<p>GWAS ranking based on p-values.</p>g)<p>The Bayesian False Discovery Probability (BFDP) was estimated based on the association results and the prior probability of association (see methods). The point BFDP estimate corresponds to 100 true susceptibility SNPs assumed to be included in the dataset that are evenly distributed across the prior categories. The range refers to a sensitivity analysis of the BFDP by varying the assumed number of true susceptibility SNPs in the dataset. The bottom and upper boundaries were estimated by assuming 500 and 50 true susceptibility SNPs, respectively.</p>h)<p>GWAS ranking based on BFDP estimates.</p>i)<p>OR, 95% CI and p-values were estimated for the per-rare-allele log-additive genetic model by unconditional logistic regression, adjusting for sex and study center (see methods).</p>j)<p>P-heterogeneity indicates differences in OR between the discovery and replication phases, and was derived from the Cochran's Q test.</p

    Comparison of GWAS ranking of validated lung cancer susceptibility SNPs by p-values and BFDPs for known susceptibility loci of lung cancer.

    No full text
    1)<p>Odds ratios were estimated based on the complete dataset (100%).</p>2)<p>Pr(H<sub>1</sub>) refers to the prior probability of the alternative hypothesis (prior probability of SNP being associated with lung cancer) and was calculated based on the AdAPT webervice being run on Entrez gene riftexts and Pubmed Abstracts, respectively. The priors were calculated in three categories (low/mid/high). See further details on the statistical framework for performing these calculations in the Methods section.</p>3)<p>50% and 75% of data were randomly sampled from the complete dataset 100 times.</p>4)<p>For each randomly sampled sub-dataset we performed logistic regression and estimated odds ratios along with 95% confidence intervals, p-values and approximate bayes factors. These were subsequently used to estimate BFDPs in order to compare the ranking of known susceptibility SNPs of lung cancer using two ranking methods, by p-values and by BFPDs.</p>5)<p>P-values were estimated using logistic regression models.</p>6)<p>Median and mean ranking were based on the results from 100 randomly sampled datasets, Δ indicates the change in ranking compared to p-value based ranking, the range refers to the highest and lowest ranking observed, respectively.</p>7)<p>BFDP (PubMed abstracts) were calculated using priors that were estimated by running the AdAPT web service on Pubmed abstracts (published before January 2008).</p

    Comparison of the statistical power for three categories of prior odds for the null hypothesis when evaluating the noteworthiness of SNPs by BFDP.

    No full text
    <p>These power calculations assume an evaluation of 300,000 SNPs of which 100 are truly associated with the outcome and distributed evenly across three prior categories, respectively. The overall distribution of SNPs across the three prior categories is assumed to be [87.5%; 10%; 2.5%]. Flat PO assumes one single prior category.</p
    corecore