183 research outputs found
Region-based and pathway-based QTL mapping using a p-value combination method
Quantitative trait locus (QTL) mapping using deep DNA sequencing data is a challenging task. In this study we performed region-based and pathway-based QTL mappings using a p-value combination method to analyze the simulated quantitative traits Q1 and Q4 and the exome sequencing data. The aims were to evaluate the performance of the QTL mapping approaches that were used and to suggest plausible strategies for QTL mapping of DNA sequencing data. We conducted single-locus QTL mappings using a linear regression model with adjustments for age and smoking status, and we also conducted region-based and pathway-based QTL mappings using a truncated product method for combining p-values from the single-locus QTL mapping. To account for the features of rare variants and common single-nucleotide polymorphisms (SNPs), we considered independently rare-variant-only, common-SNP-only, and combined analyses. An analysis of 200 simulated replications showed that the three region-based methods reasonably controlled type I error, whereas the combined analysis yielded the greatest statistical power. Rare-variant-only, common-SNP-only, and combined analyses were also applied to pathway-based QTL mappings. We found that pathway-based QTL mappings had a power of approximately 100% when the significance of the vascular endothelial growth factor pathway was evaluated, but type I errors were slightly inflated. Our approach complements single-locus QTL mapping. An integrated approach using single-locus, combined region-based, and combined pathway-based analyses should yield promising results for QTL mapping of DNA sequencing data
Use of the gamma method for self-contained gene-set analysis of SNP data
Gene-set analysis (GSA) evaluates the overall evidence of association between a phenotype and all genotyped single nucleotide polymorphisms (SNPs) in a set of genes, as opposed to testing for association between a phenotype and each SNP individually. We propose using the Gamma Method (GM) to combine gene-level P-values for assessing the significance of GS association. We performed simulations to compare the GM with several other self-contained GSA strategies, including both one-step and two-step GSA approaches, in a variety of scenarios. We denote a ‘one-step' GSA approach to be one in which all SNPs in a GS are used to derive a test of GS association without consideration of gene-level effects, and a ‘two-step' approach to be one in which all genotyped SNPs in a gene are first used to evaluate association of the phenotype with all measured variation in the gene and then the gene-level tests of association are aggregated to assess the GS association with the phenotype. The simulations suggest that, overall, two-step methods provide higher power than one-step approaches and that combining gene-level P-values using the GM with a soft truncation threshold between 0.05 and 0.20 is a powerful approach for conducting GSA, relative to the competing approaches assessed. We also applied all of the considered GSA methods to data from a pharmacogenomic study of cisplatin, and obtained evidence suggesting that the glutathione metabolism GS is associated with cisplatin drug response
Haplotype Estimation from Fuzzy Genotypes Using Penalized Likelihood
The Composite Link Model is a generalization of the generalized linear model in which expected values of observed counts are constructed as a sum of generalized linear components. When combined with penalized likelihood, it provides a powerful and elegant way to estimate haplotype probabilities from observed genotypes. Uncertain (“fuzzy”) genotypes, like those resulting from AFLP scores, can be handled by adding an extra layer to the model. We describe the model and the estimation algorithm. We apply it to a data set of accurate human single nucleotide polymorphism (SNP) and to a data set of fuzzy tomato AFLP scores
Multiethnic Genetic Association Studies Improve Power for Locus Discovery
To date, genome-wide association studies have focused almost exclusively on populations of European ancestry. These studies continue with the advent of next-generation sequencing, designed to systematically catalog and test low-frequency variation for a role in disease. A complementary approach would be to focus further efforts on cohorts of multiple ethnicities. This leverages the idea that population genetic drift may have elevated some variants to higher allele frequency in different populations, boosting statistical power to detect an association. Based on empirical allele frequency distributions from eleven populations represented in HapMap Phase 3 and the 1000 Genomes Project, we simulate a range of genetic models to quantify the power of association studies in multiple ethnicities relative to studies that exclusively focus on samples of European ancestry. In each of these simulations, a first phase of GWAS in exclusively European samples is followed by a second GWAS phase in any of the other populations (including a multiethnic design). We find that nontrivial power gains can be achieved by conducting future whole-genome studies in worldwide populations, where, in particular, African populations contribute the largest relative power gains for low-frequency alleles (<5%) of moderate effect that suffer from low power in samples of European descent. Our results emphasize the importance of broadening genetic studies to worldwide populations to ensure efficient discovery of genetic loci contributing to phenotypic trait variability, especially for those traits for which large numbers of samples of European ancestry have already been collected and tested
Is FKBP5 a genetic marker of affective psychosis? A case control study and analysis of disease related traits
BACKGROUND: A dysregulation of the hypothalamic-pituitary-adrenal (HPA) axis has been proposed as an important pathogenic factor in depression. Genetic variants of FKBP5, a protein of the HPA system modulating the glucocorticoid receptor, have been reported to be genetically associated with improved response to medical treatment and an increase of depressive episodes. METHODS: We examined three single nucleotide polymorphisms (SNPs) in FKBP5, rs4713916 in the proposed promoter region, rs1360780 in the second intron and rs3800373 in the 3'-untranslated region (3'-UTR), in a case-control study of Caucasian origin (affective psychosis: n = 248; controls: n = 188) for genetic association and association with disease related traits. RESULTS: Allele and genotype frequencies of rs4713916, rs1360780 and rs3800373 were not significantly different between cases and controls. Two three-locus haplotypes, G-C-T and A-T-G, accounted for 86.2% in controls. Odds ratios were not increased between cases and controls, except the rare haplotype G-C-G (OR 6.81), representing 2.1% of cases and 0.3% of controls. The frequency of rs4713916AG in patients deviated from expected Hardy-Weinberg equilibrium, the genotype AA at rs4713916 in monopolar depression (P = 0.011), and the two-locus haplotype rs1360780T – rs3800373T in the total sample (overall P = 0.045) were nominally associated with longer continuance of disease. CONCLUSION: Our data do not support a significant genetic contribution of FKBP5 polymorphisms and haplotypes to affective psychosis, and the findings are inconclusive regarding their contribution to disease-related traits
Assessing Significance in High-Throughput Experiments by Sequential Goodness of Fit and q-Value Estimation
We developed a new multiple hypothesis testing adjustment called SGoF+ implemented as a sequential goodness of fit metatest which is a modification of a previous algorithm, SGoF, taking advantage of the information of the distribution of p-values in order to fix the rejection region. The new method uses a discriminant rule based on the maximum distance between the uniform distribution of p-values and the observed one, to set the null for a binomial test. This new approach shows a better power/pFDR ratio than SGoF. In fact SGoF+ automatically sets the threshold leading to the maximum power and the minimum false non-discovery rate inside the SGoF' family of algorithms. Additionally, we suggest combining the information provided by SGoF+ with the estimate of the FDR that has been committed when rejecting a given set of nulls. We study different positive false discovery rate, pFDR, estimation methods to combine q-value estimates jointly with the information provided by the SGoF+ method. Simulations suggest that the combination of SGoF+ metatest with the q-value information is an interesting strategy to deal with multiple testing issues. These techniques are provided in the latest version of the SGoF+ software freely available at http://webs.uvigo.es/acraaj/SGoF.htm
Common ataxia telangiectasia mutated haplotypes and risk of breast cancer: a nested case–control study
INTRODUCTION: The ataxia telangiectasia mutated (ATM) gene is a tumor suppressor gene with functions in cell cycle arrest, apoptosis, and repair of DNA double-strand breaks. Based on family studies, women heterozygous for mutations in the ATM gene are reported to have a fourfold to fivefold increased risk of breast cancer compared with noncarriers of the mutations, although not all studies have confirmed this association. Haplotype analysis has been suggested as an efficient method for investigating the role of common variation in the ATM gene and breast cancer. Five biallelic haplotype tagging single nucleotide polymorphisms are estimated to capture 99% of the haplotype diversity in Caucasian populations. METHODS: We conducted a nested case–control study of breast cancer within the Nurses' Health Study cohort to address the role of common ATM haplotypes and breast cancer. Cases and controls were genotyped for five haplotype tagging single nucleotide polymorphisms. Haplotypes were predicted for 1309 cases and 1761 controls for which genotype information was available. RESULTS: Six unique haplotypes were predicted in this study, five of which occur at a frequency of 5% or greater. The overall distribution of haplotypes was not significantly different between cases and controls (χ(2 )= 3.43, five degrees of freedom, P = 0.63). CONCLUSION: There was no evidence that common haplotypes of ATM are associated with breast cancer risk. Extensive single nucleotide polymorphism detection using the entire genomic sequence of ATM will be necessary to rule out less common variation in ATM and sporadic breast cancer risk
An Open Access Database of Genome-wide Association Results
<p>Abstract</p> <p>Background</p> <p>The number of genome-wide association studies (GWAS) is growing rapidly leading to the discovery and replication of many new disease loci. Combining results from multiple GWAS datasets may potentially strengthen previous conclusions and suggest new disease loci, pathways or pleiotropic genes. However, no database or centralized resource currently exists that contains anywhere near the full scope of GWAS results.</p> <p>Methods</p> <p>We collected available results from 118 GWAS articles into a database of 56,411 significant SNP-phenotype associations and accompanying information, making this database freely available here. In doing so, we met and describe here a number of challenges to creating an open access database of GWAS results. Through preliminary analyses and characterization of available GWAS, we demonstrate the potential to gain new insights by querying a database across GWAS.</p> <p>Results</p> <p>Using a genomic bin-based density analysis to search for highly associated regions of the genome, positive control loci (e.g., MHC loci) were detected with high sensitivity. Likewise, an analysis of highly repeated SNPs across GWAS identified replicated loci (e.g., <it>APOE</it>, <it>LPL</it>). At the same time we identified novel, highly suggestive loci for a variety of traits that did not meet genome-wide significant thresholds in prior analyses, in some cases with strong support from the primary medical genetics literature (<it>SLC16A7, CSMD1, OAS1</it>), suggesting these genes merit further study. Additional adjustment for linkage disequilibrium within most regions with a high density of GWAS associations did not materially alter our findings. Having a centralized database with standardized gene annotation also allowed us to examine the representation of functional gene categories (gene ontologies) containing one or more associations among top GWAS results. Genes relating to cell adhesion functions were highly over-represented among significant associations (p < 4.6 × 10<sup>-14</sup>), a finding which was not perturbed by a sensitivity analysis.</p> <p>Conclusion</p> <p>We provide access to a full gene-annotated GWAS database which could be used for further querying, analyses or integration with other genomic information. We make a number of general observations. Of reported associated SNPs, 40% lie within the boundaries of a RefSeq gene and 68% are within 60 kb of one, indicating a bias toward gene-centricity in the findings. We found considerable heterogeneity in information available from GWAS suggesting the wider community could benefit from standardization and centralization of results reporting.</p
Integrated genomics and proteomics define huntingtin CAG length-dependent networks in mice.
To gain insight into how mutant huntingtin (mHtt) CAG repeat length modifies Huntington's disease (HD) pathogenesis, we profiled mRNA in over 600 brain and peripheral tissue samples from HD knock-in mice with increasing CAG repeat lengths. We found repeat length-dependent transcriptional signatures to be prominent in the striatum, less so in cortex, and minimal in the liver. Coexpression network analyses revealed 13 striatal and 5 cortical modules that correlated highly with CAG length and age, and that were preserved in HD models and sometimes in patients. Top striatal modules implicated mHtt CAG length and age in graded impairment in the expression of identity genes for striatal medium spiny neurons and in dysregulation of cyclic AMP signaling, cell death and protocadherin genes. We used proteomics to confirm 790 genes and 5 striatal modules with CAG length-dependent dysregulation at the protein level, and validated 22 striatal module genes as modifiers of mHtt toxicities in vivo
Germline polymorphisms in SIPA1 are associated with metastasis and other indicators of poor prognosis in breast cancer
INTRODUCTION: There is growing evidence that heritable genetic variation modulates metastatic efficiency. Our previous work using a mouse mammary tumor model has shown that metastatic efficiency is modulated by the GTPase-activating protein encoded by Sipa1 ('signal-induced proliferation-associated gene 1'). The aim of this study was to determine whether single nucleotide polymorphisms (SNPs) within the human SIPA1 gene are associated with metastasis and other disease characteristics in breast cancer. METHOD: The study population (n = 300) consisted of randomly selected non-Hispanic Caucasian breast cancer patients identified from a larger population-based series. Genomic DNA was extracted from peripheral leukocytes. Three previously described SNPs within SIPA1 (one within the promoter [-313G>A] and two exonic [545C>T and 2760G>A]) were characterized using SNP-specific PCR. RESULTS: The variant 2760G>A and the -313G>A allele were associated with lymph node involvement (P = 0.0062 and P = 0.0083, respectively), and the variant 545C>T was associated with estrogen receptor negative tumors (P = 0.0012) and with progesterone negative tumors (P = 0.0339). Associations were identified between haplotypes defined by the three SNPs and disease progression. Haplotype 3 defined by variants -313G>A and 2760G>A was associated with positive lymph node involvement (P = 0.0051), and haplotype 4 defined by variant 545C>T was associated with estrogen receptor and progesterone receptor negative status (P = 0.0053 and P = 0.0199, respectively). CONCLUSION: Our findings imply that SIPA1 germline polymorphisms are associated with aggressive disease behavior in the cohort examined. If these results hold true in other populations, then knowledge of SIPA1 SNP genotypes could potentially enhance current staging protocols
- …