33 research outputs found

    Comparison of Pathway Analysis Approaches Using Lung Cancer GWAS Data Sets

    Get PDF
    Pathway analysis has been proposed as a complement to single SNP analyses in GWAS. This study compared pathway analysis methods using two lung cancer GWAS data sets based on four studies: one a combined data set from Central Europe and Toronto (CETO); the other a combined data set from Germany and MD Anderson (GRMD). We searched the literature for pathway analysis methods that were widely used, representative of other methods, and had available software for performing analysis. We selected the programs EASE, which uses a modified Fishers Exact calculation to test for pathway associations, GenGen (a version of Gene Set Enrichment Analysis (GSEA)), which uses a Kolmogorov-Smirnov-like running sum statistic as the test statistic, and SLAT, which uses a p-value combination approach. We also included a modified version of the SUMSTAT method (mSUMSTAT), which tests for association by averaging χ2 statistics from genotype association tests. There were nearly 18000 genes available for analysis, following mapping of more than 300,000 SNPs from each data set. These were mapped to 421 GO level 4 gene sets for pathway analysis. Among the methods designed to be robust to biases related to gene size and pathway SNP correlation (GenGen, mSUMSTAT and SLAT), the mSUMSTAT approach identified the most significant pathways (8 in CETO and 1 in GRMD). This included a highly plausible association for the acetylcholine receptor activity pathway in both CETO (FDR≤0.001) and GRMD (FDR = 0.009), although two strong association signals at a single gene cluster (CHRNA3-CHRNA5-CHRNB4) drive this result, complicating its interpretation. Few other replicated associations were found using any of these methods. Difficulty in replicating associations hindered our comparison, but results suggest mSUMSTAT has advantages over the other approaches, and may be a useful pathway analysis tool to use alongside other methods such as the commonly used GSEA (GenGen) approach

    Cross-Cancer Genome-Wide Analysis of Lung, Ovary, Breast, Prostate, and Colorectal Cancer Reveals Novel Pleiotropic Associations

    Get PDF
    Identifying genetic variants with pleiotropic associations can uncover common pathways influencing multiple cancers. We took a two-stage approach to conduct genome-wide association studies for lung, ovary, breast, prostate, and colorectal cancer from the GAME-ON/GECCO Network (61,851 cases, 61,820 controls) to identify pleiotropic loci. Findings were replicated in independent association studies (55,789 cases, 330,490 controls). We identified a novel pleiotropic association at 1q22 involving breast and lung squamous cell carcinoma, with eQTL analysis showing an association with ADAM15/THBS3 gene expression in lung. We also identified a known breast cancer locus CASP8/ALS2CR12 associated with prostate cancer, a known cancer locus at CDKN2B-AS1 with different variants associated with lung adenocarcinoma and prostate cancer, and confirmed the associations of a breast BRCA2 locus with lung and serous ovarian cancer. This is the largest study to date examining pleiotropy across multiple cancer-associated loci, identifying common mechanisms of cancer development and progression. Cancer Res; 76(17); 5103-14. ©2016 AACR

    Genetic Variation at the Insulin-like Growth Factor 1 Gene and Association with Breast Cancer, Breast Density and Anthropometric Measures

    No full text
    Background and objectives Evidence suggests that circulating IGF-I levels increase mammographic density (a breast cancer risk factor) and breast cancer risk in premenopausal women. The objective of this thesis was to examine the association of genetic variation at the IGF1 gene with IGF-I concentration, mammographic density, breast cancer risk, and related anthropometric measures in premenopausal women. Methods Three IGF1 CA repeat polymorphisms (at the 5′ and 3′ ends, and in intron 2) were genotyped. A cross-sectional design was used to investigate their associations with IGF-I levels, mammographic density, BMI, weight, and height. Families from registries in Ontario and Australia were used to investigate associations with breast cancer risk and also BMI, weight and height. Results In the cross-sectional study, greater number of copies of the 5′ 19 allele were associated with lower circulating IGF-I levels. Greater number of 3′ 185 alleles were associated with greater percentage breast density, smaller amount of non-dense tissue, and lower BMI. Including BMI in regression models removed the association of the 3′ 185 allele with percentage breast density. In the family based study, nominally significant associations (5′ 21 allele, intron 2 212 allele, intron 2 216 allele) with breast cancer risk were observed, but significance was lost after multiple comparison adjustment. There was a stronger association between the intron 2 216 allele and risk under a recessive model, and 5′ allele groupings of length 18 to 20 and 20 or more repeats produced significant positive and negative associations respectively. These associations were not strongly supported in analyses stratified by registry. Results from the family based study did not support an association between genetic variation at IGF1 with BMI, weight or height. Conclusions No specific IGF1 variant influenced each of circulating IGF-I levels, mammographic density, and breast cancer risk. The failure to replicate the association of the 3′ 185 allele with BMI in the family based study suggests that the association of the 3′ 185 allele with percentage breast density is spurious, since this association was mediated through the relationship with BMI (suggesting IGF-I action on body fat). Evidence for an association between IGF1 and breast cancer risk was limited.Ph

    A Two-Dimensional Pooling Strategy for Rare Variant Detection on Next-Generation Sequencing Platforms

    Get PDF
    <div><p>We describe a method for pooling and sequencing DNA from a large number of individual samples while preserving information regarding sample identity. DNA from 576 individuals was arranged into four 12 row by 12 column matrices and then pooled by row and by column resulting in 96 total pools with 12 individuals in each pool. Pooling of DNA was carried out in a two-dimensional fashion, such that DNA from each individual is present in exactly one row pool and exactly one column pool. By considering the variants observed in the rows and columns of a matrix we are able to trace rare variants back to the specific individuals that carry them. The pooled DNA samples were enriched over a 250 kb region previously identified by GWAS to significantly predispose individuals to lung cancer. All 96 pools (12 row and 12 column pools from 4 matrices) were barcoded and sequenced on an Illumina HiSeq 2000 instrument with an average depth of coverage greater than 4,000×. Verification based on Ion PGM sequencing confirmed the presence of 91.4% of confidently classified SNVs assayed. In this way, each individual sample is sequenced in multiple pools providing more accurate variant calling than a single pool or a multiplexed approach. This provides a powerful method for rare variant detection in regions of interest at a reduced cost to the researcher.</p></div

    Alcohol and lung cancer risk among never smokers: A pooledanalysis from the international lung cancer consortium and theSYNERGY study

    No full text
    It is not clear whether alcohol consumption is associated with lung cancer risk. The relationship is likely confounded by smoking, complicating the interpretation of previous studies. We examined the association of alcohol consumption and lung cancer risk in a large pooled international sample, minimizing potential confounding of tobacco consumption by restricting analyses to never smokers. Our study included 22 case-control and cohort studies with a total of 2548 never-smoking lung cancer patients and 9362 never-smoking controls from North America, Europe and Asia within the International Lung Cancer Consortium (ILCCO) and SYNERGY Consortium. Alcohol consumption was categorized into amounts consumed (grams per day) and also modelled as a continuous variable using restricted cubic splines for potential non-linearity. Analyses by histologic sub-type were included. Associations by type of alcohol consumed (wine, beer and liquor) were also investigated. Alcohol consumption was inversely associated with lung cancer risk with evidence most strongly supporting lower risk for light and moderate drinkers relative to non-drinkers (>0-4.9 g per day: OR = 0.80, 95% CI = 0.70-0.90; 5-9.9 g per day: OR = 0.82, 95% CI = 0.69-0.99; 10-19.9 g per day: OR = 0.79, 95% CI = 0.65-0.96). Inverse associations were found for consumption of wine and liquor, but not beer. The results indicate that alcohol consumption is inversely associated with lung cancer risk, particularly among subjects with low to moderate consumption levels, and among wine and liquor drinkers, but not beer drinkers. Although our results should have no relevant bias from the confounding effect of smoking we cannot preclude that confounding by other factors contributed to the observed associations. Confounding in relation to the non-drinker reference category may be of particular importanc

    META-GSA: Combining Findings from Gene-Set Analyses across Several Genome-Wide Association Studies.

    Get PDF
    Gene-set analysis (GSA) methods are used as complementary approaches to genome-wide association studies (GWASs). The single marker association estimates of a predefined set of genes are either contrasted with those of all remaining genes or with a null non-associated background. To pool the p-values from several GSAs, it is important to take into account the concordance of the observed patterns resulting from single marker association point estimates across any given gene set. Here we propose an enhanced version of Fisher's inverse χ2-method META-GSA, however weighting each study to account for imperfect correlation between association patterns.We investigated the performance of META-GSA by simulating GWASs with 500 cases and 500 controls at 100 diallelic markers in 20 different scenarios, simulating different relative risks between 1 and 1.5 in gene sets of 10 genes. Wilcoxon's rank sum test was applied as GSA for each study. We found that META-GSA has greater power to discover truly associated gene sets than simple pooling of the p-values, by e.g. 59% versus 37%, when the true relative risk for 5 of 10 genes was assume to be 1.5. Under the null hypothesis of no difference in the true association pattern between the gene set of interest and the set of remaining genes, the results of both approaches are almost uncorrelated. We recommend not relying on p-values alone when combining the results of independent GSAs.We applied META-GSA to pool the results of four case-control GWASs of lung cancer risk (Central European Study and Toronto/Lunenfeld-Tanenbaum Research Institute Study; German Lung Cancer Study and MD Anderson Cancer Center Study), which had already been analyzed separately with four different GSA methods (EASE; SLAT, mSUMSTAT and GenGen). This application revealed the pathway GO0015291 "transmembrane transporter activity" as significantly enriched with associated genes (GSA-method: EASE, p = 0.0315 corrected for multiple testing). Similar results were found for GO0015464 "acetylcholine receptor activity" but only when not corrected for multiple testing (all GSA-methods applied; p ≈ 0.02)

    Bar graph of variant classes by frequency of observation.

    No full text
    <p>A breakdown of the classifications of variants that were observed in exactly 1, 2, 3 or 4 of the 4 matrices. Variants that were rare within a matrix (and thus labeled <i>Pinnable</i> or <i>Singleton</i>) were predominantly seen in only 1 of the 4 matrices. Similarly, variants that were common within a matrix (<i>Multiples</i>) were also common between the 4 matrices.</p

    Definition of variant classes.

    No full text
    <p>Variant calls are classified based on their relationship to the pooled individuals. The three possible classes are <i>Pinnable</i>, <i>Multiple</i> and <i>Singleton</i>. (A) <i>Pinnable</i> variants were those where the carrying individuals may be identified because there is exactly one row or exactly one column containing a variant and at least one intersecting row or column pool. (B) <i>Multiple</i> variants were those where the variant is observed in more than one row and more than one column and it is not possible to determine precisely which individuals possess the variant. (C) <i>Singletons</i> were calls that are only observed in either a row or a column but not both.</p
    corecore