1,568 research outputs found
Recommended from our members
The impact of adjusting for baseline in pharmacogenomic genome-wide association studies of quantitative change.
In pharmacogenomic studies of quantitative change, any association between genetic variants and the pretreatment (baseline) measurement can bias the estimate of effect between those variants and drug response. A putative solution is to adjust for baseline. We conducted a series of genome-wide association studies (GWASs) for low-density lipoprotein cholesterol (LDL-C) response to statin therapy in 34,874 participants of the Genetic Epidemiology Research on Adult Health and Aging (GERA) cohort as a case study to investigate the impact of baseline adjustment on results generated from pharmacogenomic studies of quantitative change. Across phenotypes of statin-induced LDL-C change, baseline adjustment identified variants from six loci meeting genome-wide significance (SORT/CELSR2/PSRC1, LPA, SLCO1B1, APOE, APOB, and SMARCA4/LDLR). In contrast, baseline-unadjusted analyses yielded variants from three loci meeting the criteria for genome-wide significance (LPA, APOE, and SLCO1B1). A genome-wide heterogeneity test of baseline versus statin on-treatment LDL-C levels was performed as the definitive test for the true effect of genetic variants on statin-induced LDL-C change. These findings were generally consistent with the models not adjusting for baseline signifying that genome-wide significant hits generated only from baseline-adjusted analyses (SORT/CELSR2/PSRC1, APOB, SMARCA4/LDLR) were likely biased. We then comprehensively reviewed published GWASs of drug-induced quantitative change and discovered that more than half (59%) inappropriately adjusted for baseline. Altogether, we demonstrate that (1) baseline adjustment introduces bias in pharmacogenomic studies of quantitative change and (2) this erroneous methodology is highly prevalent. We conclude that it is critical to avoid this common statistical approach in future pharmacogenomic studies of quantitative change
An Integrative Phenotype-Genotype Approach Using Phenotypic Characteristics from the UAE National Diabetes Study Identifies HSD17B12 as a Candidate Gene for Obesity and Type 2 Diabetes
The United Arab Emirates National Diabetes and Lifestyle Study (UAEDIAB) has identified obesity, hypertension, obstructive sleep apnea, and dyslipidemia as common phenotypic characteristics correlated with diabetes mellitus status. As these phenotypes are usually linked with genetic variants, we hypothesized that these phenotypes share single nucleotide polymorphism (SNP)-clusters that can be used to identify causal genes for diabetes. We explored the National Human Genome Research Institute-European Bioinformatics Institute Catalog of Published Genome-Wide Association Studies (NHGRI-EBI GWAS) to list SNPs with documented association with the UAEDIAB-phenotypes as well as diabetes. The shared chromosomal regions affected by SNPs were identified, intersected, and searched for Enriched Ontology Clustering. The potential SNP-clusters were validated using targeted DNA next-generation sequencing (NGS) in two Emirati diabetic patients. RNA sequencing from human pancreatic islets was used to study the expression of identified genes in diabetic and non-diabetic donors. Eight chromosomal regions containing 46 SNPs were identified in at least four out of the five UAEDIAB-phenotypes. A list of 34 genes was shown to be affected by those SNPs. Targeted NGS from two Emirati patients confirmed that the identified genes have similar SNP-clusters. ASAH1, LRP4, FES, and HSD17B12 genes showed the highest SNPs rate among the identified genes. RNA-seq analysis revealed high expression levels of HSD17B12 in human islets and to be upregulated in type 2 diabetes (T2D) donors. Our integrative phenotype-genotype approach is a novel, simple, and powerful tool to identify clinically relevant potential biomarkers in diabetes. HSD17B12 is a novel candidate gene for pancreatic β-cell function
Expression Quantitative Trait Locus Mapping in Pulmonary Arterial Hypertension.
Expression quantitative trait loci (eQTL) can provide a link between disease susceptibility variants discovered by genetic association studies and biology. To date, eQTL mapping studies have been primarily conducted in healthy individuals from population-based cohorts. Genetic effects have been known to be context-specific and vary with changing environmental stimuli. We conducted a transcriptome- and genome-wide eQTL mapping study in a cohort of patients with idiopathic or heritable pulmonary arterial hypertension (PAH) using RNA sequencing (RNAseq) data from whole blood. We sought confirmation from three published population-based eQTL studies, including the GTEx Project, and followed up potentially novel eQTL not observed in the general population. In total, we identified 2314 eQTL of which 90% were cis-acting and 75% were confirmed by at least one of the published studies. While we observed a higher GWAS trait colocalization rate among confirmed eQTL, colocalisation rate of novel eQTL reported for lung-related phenotypes was twice as high as that of confirmed eQTL. Functional enrichment analysis of genes with novel eQTL in PAH highlighted immune-related processes, a suspected contributor to PAH. These potentially novel eQTL specific to or active in PAH could be useful in understanding genetic risk factors for other diseases that share common mechanisms with PAH
Processing genome-wide association studies within a repository of heterogeneous genomic datasets
Background
Genome Wide Association Studies (GWAS) are based on the observation of genome-wide sets of genetic variants – typically single-nucleotide polymorphisms (SNPs) – in different individuals that are associated with phenotypic traits. Research efforts have so far been directed to improving GWAS techniques rather than on making the results of GWAS interoperable with other genomic signals; this is currently hindered by the use of heterogeneous formats and uncoordinated experiment descriptions.
Results
To practically facilitate integrative use, we propose to include GWAS datasets within the META-BASE repository, exploiting an integration pipeline previously studied for other genomic datasets that includes several heterogeneous data types in the same format, queryable from the same systems. We represent GWAS SNPs and metadata by means of the Genomic Data Model and include metadata within a relational representation by extending the Genomic Conceptual Model with a dedicated view. To further reduce the gap with the descriptions of other signals in the repository of genomic datasets, we perform a semantic annotation of phenotypic traits. Our pipeline is demonstrated using two important data sources, initially organized according to different data models: the NHGRI-EBI GWAS Catalog and FinnGen (University of Helsinki). The integration effort finally allows us to use these datasets within multisample processing queries that respond to important biological questions. These are then made usable for multi-omic studies together with, e.g., somatic and reference mutation data, genomic annotations, epigenetic signals.
Conclusions
As a result of our work on GWAS datasets, we enable 1) their interoperable use with several other homogenized and processed genomic datasets in the context of the META-BASE repository; 2) their big data processing by means of the GenoMetric Query Language and associated system. Future large-scale tertiary data analysis may extensively benefit from the addition of GWAS results to inform several different downstream analysis workflows
Recommended from our members
The impact of short tandem repeat variation on gene expression.
Short tandem repeats (STRs) have been implicated in a variety of complex traits in humans. However, genome-wide studies of the effects of STRs on gene expression thus far have had limited power to detect associations and provide insights into putative mechanisms. Here, we leverage whole-genome sequencing and expression data for 17 tissues from the Genotype-Tissue Expression Project to identify more than 28,000 STRs for which repeat number is associated with expression of nearby genes (eSTRs). We use fine-mapping to quantify the probability that each eSTR is causal and characterize the top 1,400 fine-mapped eSTRs. We identify hundreds of eSTRs linked with published genome-wide association study signals and implicate specific eSTRs in complex traits, including height, schizophrenia, inflammatory bowel disease and intelligence. Overall, our results support the hypothesis that eSTRs contribute to a range of human phenotypes, and our data should serve as a valuable resource for future studies of complex traits
Allele-specific miRNA-binding analysis identifies candidate target genes for breast cancer risk
Most breast cancer (BC) risk-associated single-nucleotide polymorphisms (raSNPs) identified in genome-wide association studies (GWAS) are believed to cis-regulate the expression of genes. We hypothesise that cis-regulatory variants contributing to disease risk may be affecting microRNA (miRNA) genes and/or miRNA binding. To test this, we adapted two miRNA-binding prediction algorithms-TargetScan and miRanda-to perform allele-specific queries, and integrated differential allelic expression (DAE) and expression quantitative trait loci (eQTL) data, to query 150 genome-wide significant ( P≤5×10-8 ) raSNPs, plus proxies. We found that no raSNP mapped to a miRNA gene, suggesting that altered miRNA targeting is an unlikely mechanism involved in BC risk. Also, 11.5% (6 out of 52) raSNPs located in 3'-untranslated regions of putative miRNA target genes were predicted to alter miRNA::mRNA (messenger RNA) pair binding stability in five candidate target genes. Of these, we propose RNF115, at locus 1q21.1, as a strong novel target gene associated with BC risk, and reinforce the role of miRNA-mediated cis-regulation at locus 19p13.11. We believe that integrating allele-specific querying in miRNA-binding prediction, and data supporting cis-regulation of expression, improves the identification of candidate target genes in BC risk, as well as in other common cancers and complex diseases.Funding Agency
Portuguese Foundation for Science and Technology
CRESC ALGARVE 2020
European Union (EU)
303745
Maratona da Saude Award
DL 57/2016/CP1361/CT0042
SFRH/BPD/99502/2014
CBMR-UID/BIM/04773/2013
POCI-01-0145-FEDER-022184info:eu-repo/semantics/publishedVersio
Incorporating Sex Chromosomes in Transcriptome Prediction Models and Improving Cross-Population Prediction Performance
Transcriptome prediction models built with data from European-descent individuals are less accurate when applied to different populations because of differences in linkage disequilibrium patterns and allele frequencies. We hypothesized multivariate adaptive shrinkage may improve cross-population transcriptome prediction, as it leverages effect size estimates across different conditions - in this case, different populations. To test this hypothesis, we made transcriptome prediction models for use in transcriptome-wide association studies (TWAS) using different methods (Elastic Net, Matrix eQTL and Multivariate Adaptive Shrinkage in R (MASHR)) and tested their out-of-sample transcriptome prediction accuracy in population-matched and cross-population scenarios. Additionally, to evaluate model applicability in TWAS, we integrated publicly available multi-ancestry genome-wide association study (GWAS) summary statistics from the Population Architecture using Genomics and Epidemiology Study (PAGE) and Pan-UK Biobank with our developed transcriptome prediction models. In regard to transcriptome prediction accuracy, MASHR models had similar performance to other methods when the training population ancestry closely matched the test population, but outperformed other methods in cross-population predictions. Furthermore, in multi-ancestry TWAS, MASHR models yielded more discoveries that replicate in both PAGE and PanUKBB across all methods analyzed, including loci previously mapped in GWAS and new loci previously not found in GWAS. Overall, we demonstrate the importance of using methods that incorporate effect size estimates from multiple populations in order to improve TWAS for multi-ancestry or underrepresented populations
Recommended from our members
Statistical correction of the Winner’s Curse explains replication variability in quantitative trait genome-wide association studies
Genome-wide association studies (GWAS) have identified hundreds of SNPs responsible for variation in human quantitative traits. However, genome-wide-significant associations often fail to replicate across independent cohorts, in apparent inconsistency with their apparent strong effects in discovery cohorts. This limited success of replication raises pervasive questions about the utility of the GWAS field. We identify all 332 studies of quantitative traits from the NHGRI-EBI GWAS Database with attempted replication. We find that the majority of studies provide insufficient data to evaluate replication rates. The remaining papers replicate significantly worse than expected (p < 10−14), even when adjusting for regression-to-the-mean of effect size between discovery- and replication-cohorts termed the Winner’s Curse (p < 10−16). We show this is due in part to misreporting replication cohort-size as a maximum number, rather than per-locus one. In 39 studies accurately reporting per-locus cohort-size for attempted replication of 707 loci in samples with similar ancestry, replication rate matched expectation (predicted 458, observed 457, p = 0.94). In contrast, ancestry differences between replication and discovery (13 studies, 385 loci) cause the most highly-powered decile of loci to replicate worse than expected, due to difference in linkage disequilibrium
- …