926 research outputs found
Idéfix:identifying accidental sample mix-ups in biobanks using polygenic scores
MOTIVATION: Identifying sample mix-ups in biobanks is essential to allow the repurposing of genetic data for clinical pharmacogenetics. Pharmacogenetic advice based on the genetic information of another individual is potentially harmful. Existing methods for identifying mix-ups are limited to datasets in which additional omics data (e.g. gene expression) is available. Cohorts lacking such data can only use sex, which can reveal only half of the mix-ups. Here, we describe Idéfix, a method for the identification of accidental sample mix-ups in biobanks using polygenic scores. RESULTS: In the Lifelines population-based biobank, we calculated polygenic scores (PGSs) for 25 traits for 32 786 participants. We then applied Idéfix to compare the actual phenotypes to PGSs, and to use the relative discordance that is expected for mix-ups, compared to correct samples. In a simulation, using induced mix-ups, Idéfix reaches an AUC of 0.90 using 25 polygenic scores and sex. This is a substantial improvement over using only sex, which has an AUC of 0.75. Subsequent simulations present Idéfix’s potential in varying datasets with more powerful PGSs. This suggests its performance will likely improve when more highly powered GWASs for commonly measured traits will become available. Idéfix can be used to identify a set of high-quality participants for whom it is very unlikely that they reflect sample mix-ups, and for these participants we can use genetic data for clinical purposes, such as pharmacogenetic profiles. For instance, in Lifelines, we can select 34.4% of participants, reducing the sample mix-up rate from 0.15% to 0.01%. AVAILABILITYAND IMPLEMENTATION: Idéfix is freely available at https://github.com/molgenis/systemsgenetics/wiki/Idefix. The individual-level data that support the findings were obtained from the Lifelines biobank under project application number ov16_0365. Data is made available upon reasonable request submitted to the LifeLines Research office ([email protected], https://www.lifelines.nl/researcher/how-to-apply/apply-here). SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online
Bayesian test for colocalisation between pairs of genetic association studies using summary statistics.
Genetic association studies, in particular the genome-wide association study (GWAS) design, have provided a wealth of novel insights into the aetiology of a wide range of human diseases and traits, in particular cardiovascular diseases and lipid biomarkers. The next challenge consists of understanding the molecular basis of these associations. The integration of multiple association datasets, including gene expression datasets, can contribute to this goal. We have developed a novel statistical methodology to assess whether two association signals are consistent with a shared causal variant. An application is the integration of disease scans with expression quantitative trait locus (eQTL) studies, but any pair of GWAS datasets can be integrated in this framework. We demonstrate the value of the approach by re-analysing a gene expression dataset in 966 liver samples with a published meta-analysis of lipid traits including >100,000 individuals of European ancestry. Combining all lipid biomarkers, our re-analysis supported 26 out of 38 reported colocalisation results with eQTLs and identified 14 new colocalisation results, hence highlighting the value of a formal statistical test. In three cases of reported eQTL-lipid pairs (SYPL2, IFT172, TBKBP1) for which our analysis suggests that the eQTL pattern is not consistent with the lipid association, we identify alternative colocalisation results with SORT1, GCKR, and KPNB1, indicating that these genes are more likely to be causal in these genomic intervals. A key feature of the method is the ability to derive the output statistics from single SNP summary statistics, hence making it possible to perform systematic meta-analysis type comparisons across multiple GWAS datasets (implemented online at http://coloc.cs.ucl.ac.uk/coloc/). Our methodology provides information about candidate causal genes in associated intervals and has direct implications for the understanding of complex diseases as well as the design of drugs to target disease pathways
An integrative systems genetics approach reveals potential causal genes and pathways related to obesity
Background: Obesity is a multi-factorial health problem in which genetic factors play an important role. Limited results have been obtained in single-gene studies using either genomic or transcriptomic data. RNA sequencing technology has shown its potential in gaining accurate knowledge about the transcriptome, and may reveal novel genes affecting complex diseases. Integration of genomic and transcriptomic variation (expression quantitative trait loci [eQTL] mapping) has identified causal variants that affect complex diseases. We integrated transcriptomic data from adipose tissue and genomic data from a porcine model to investigate the mechanisms involved in obesity using a systems genetics approach. Methods: Using a selective gene expression profiling approach, we selected 36 animals based on a previously created genomic Obesity Index for RNA sequencing of subcutaneous adipose tissue. Differential expression analysis was performed using the Obesity Index as a continuous variable in a linear model. eQTL mapping was then performed to integrate 60 K porcine SNP chip data with the RNA sequencing data. Results were restricted based on genome-wide significant single nucleotide polymorphisms, detected differentially expressed genes, and previously detected co-expressed gene modules. Further data integration was performed by detecting co-expression patterns among eQTLs and integration with protein data. Results: Differential expression analysis of RNA sequencing data revealed 458 differentially expressed genes. The eQTL mapping resulted in 987 cis-eQTLs and 73 trans-eQTLs (false discovery rate <0.05), of which the cis-eQTLs were associated with metabolic pathways. We reduced the eQTL search space by focusing on differentially expressed and co-expressed genes and disease-associated single nucleotide polymorphisms to detect obesity-related genes and pathways. Building a co-expression network using eQTLs resulted in the detection of a module strongly associated with lipid pathways. Furthermore, we detected several obesity candidate genes, for example, ENPP1, CTSL, and ABHD12B. Conclusions: To our knowledge, this is the first study to perform an integrated genomics and transcriptomics (eQTL) study using, and modeling, genomic and subcutaneous adipose tissue RNA sequencing data on obesity in a porcine model. We detected several pathways and potential causal genes for obesity. Further validation and investigation may reveal their exact function and association with obesity
Translational insights from single-cell technologies across the cardiovascular disease continuum
Cardiovascular disease is the leading cause of death worldwide. The societal health burden it represents can be reduced by taking preventive measures and developing more effective therapies. Reaching these goals, however, requires a better understanding of the pathophysiological processes leading to and occurring in the diseased heart. In the last 5 years, several biological advances applying single-cell technologies have enabled researchers to study cardiovascular diseases with unprecedented resolution. This has produced many new insights into how specific cell types change their gene expression level, activation status and potential cellular interactions with the development of cardiovascular disease, but a comprehensive overview of the clinical implications of these findings is lacking. In this review, we summarize and discuss these recent advances and the promise of single-cell technologies from a translational perspective across the cardiovascular disease continuum, covering both animal and human studies, and explore the future directions of the field
Correction for both common and rare cell types in blood is important to identify genes that correlate with age
Background Aging is a multifactorial process that affects multiple tissues and is characterized by changes in homeostasis over time, leading to increased morbidity. Whole blood gene expression signatures have been associated with aging and have been used to gain information on its biological mechanisms, which are still not fully understood. However, blood is composed of many cell types whose proportions in blood vary with age. As a result, previously observed associations between gene expression levels and aging might be driven by cell type composition rather than intracellular aging mechanisms. To overcome this, previous aging studies already accounted for major cell types, but the possibility that the reported associations are false positives driven by less prevalent cell subtypes remains. Results Here, we compared the regression model from our previous work to an extended model that corrects for 33 additional white blood cell subtypes. Both models were applied to whole blood gene expression data from 3165 individuals belonging to the general population (age range of 18-81 years). We evaluated that the new model is a better fit for the data and it identified fewer genes associated with aging (625, compared to the 2808 of the initial model;
Ciliary Genes Are Down-Regulated in Bronchial Tissue of Primary Ciliary Dyskinesia Patients
Primary ciliary dyskinesia (PCD) is a rare, genetically heterogeneous disease characterized by recurrent respiratory tract infections, sinusitis, bronchiectasis and male infertility. The pulmonary phenotype in PCD is caused by the impaired motility of cilia in the respiratory epithelium, due to ultrastructural defects of these organelles. We hypothesized that defects of multi-protein ciliary complexes should be reflected by gene expression changes in the respiratory epithelium. We have previously found that large group of genes functionally related to cilia share highly correlated expression pattern in PCD bronchial tissue. Here we performed an explorative analysis of differential gene expression in the bronchial tissue from six PCD patients and nine non-PCD controls, using Illumina HumanRef-12 Whole Genome BeadChips. We observed 1323 genes with at least 2-fold difference in the mean expression level between the two groups (t-test p-value <0.05). Annotation analysis showed that the genes down-regulated in PCD biopsies (602) were significantly enriched for terms related to cilia, whereas the up-regulated genes (721) were significantly enriched for terms related to cell cycle and mitosis. We assembled a list of human genes predicted to encode ciliary proteins, components of outer dynein arms, inner dynein arms, radial spokes, and intraflagellar transport proteins. A significant down-regulation of the expression of genes from all the four groups was observed in PCD, compared to non-PCD biopsies. Our data suggest that a coordinated down-regulation of the ciliome genes plays an important role in the molecular pathomechanism of PCD
Limited evidence for blood eQTLs in human sexual dimorphism
The genetic underpinning of sexual dimorphism is very poorly understood. The prevalence of many diseases differs between men and women, which could be in part caused by sex-specific genetic effects. Nevertheless, only a few published genome-wide association studies (GWAS) were performed separately in each sex. The reported enrichment of expression quantitative trait loci (eQTLs) among GWAS-associated SNPs suggests a potential role of sex-specific eQTLs in the sex-specific genetic mechanism underlying complex traits.
To explore this scenario, we combined sex-specific whole blood RNA-seq eQTL data from 3447 European individuals included in BIOS Consortium and GWAS data from UK Biobank. Next, to test the presence of sex-biased causal effect of gene expression on complex traits, we performed sex-specific transcriptome-wide Mendelian randomization (TWMR) analyses on the two most sexually dimorphic traits, waist-to-hip ratio (WHR) and testosterone levels. Finally, we performed power analysis to calculate the GWAS sample size needed to observe sex-specific trait associations driven by sex-biased eQTLs.
Among 9 million SNP-gene pairs showing sex-combined associations, we found 18 genes with significant sex-biased cis-eQTLs (FDR 5%). Our phenome-wide association study of the 18 top sex-biased eQTLs on >700 traits unraveled that these eQTLs do not systematically translate into detectable sex-biased trait-associations. In addition, we observed that sex-specific causal effects of gene expression on complex traits are not driven by sex-specific eQTLs. Power analyses using real eQTL- and causal-effect sizes showed that millions of samples would be necessary to observe sex-biased trait associations that are fully driven by sex-biased cis-eQTLs. Compensatory effects may further hamper their detection.
Our results suggest that sex-specific eQTLs in whole blood do not translate to detectable sex-specific trait associations of complex diseases, and vice versa that the observed sex-specific trait associations cannot be explained by sex-specific eQTLs
Using symptom-based case predictions to identify host genetic factors that contribute to COVID-19 susceptibility
Epidemiological and genetic studies on COVID-19 are currently hindered by inconsistent and limited testing policies to confirm SARS-CoV-2 infection. Recently, it was shown that it is possible to predict COVID-19 cases using cross-sectional self-reported disease-related symptoms. Here, we demonstrate that this COVID-19 prediction model has reasonable and consistent performance across multiple independent cohorts and that our attempt to improve upon this model did not result in improved predictions. Using the existing COVID-19 prediction model, we then conducted a GWAS on the predicted phenotype using a total of 1,865 predicted cases and 29,174 controls. While we did not find any common, large-effect variants that reached genome-wide significance, we do observe suggestive genetic associations at two SNPs (rs11844522, p = 1.9x10-7; rs5798227, p = 2.2x10-7). Explorative analyses furthermore suggest that genetic variants associated with other viral infectious diseases do not overlap with COVID-19 susceptibility and that severity of COVID-19 may have a different genetic architecture compared to COVID-19 susceptibility. This study represents a first effort that uses a symptom-based predicted phenotype as a proxy for COVID-19 in our pursuit of understanding the genetic susceptibility of the disease. We conclude that the inclusion of symptom-based predicted cases could be a useful strategy in a scenario of limited testing, either during the current COVID-19 pandemic or any future viral outbreak.Peer reviewe
Feasibility of predicting allele specific expression from DNA sequencing using machine learning
Allele specific expression (ASE) concerns divergent expression quantity of alternative alleles and is measured by RNA sequencing. Multiple studies show that ASE plays a role in hereditary diseases by modulating penetrance or phenotype severity. However, genome diagnostics is based on DNA sequencing and therefore neglects gene expression regulation such as ASE. To take advantage of ASE in absence of RNA sequencing, it must be predicted using only DNA variation. We have constructed ASE models from BIOS (n = 3432) and GTEx (n = 369) that predict ASE using DNA features. These models are highly reproducible and comprise many different feature types, highlighting the complex regulation that underlies ASE. We applied the BIOS-trained model to population variants in three genes in which ASE plays a clinically relevant role: BRCA2, RET and NF1. This resulted in predicted ASE effects for 27 variants, of which 10 were known pathogenic variants. We demonstrated that ASE can be predicted from DNA features using machine learning. Future efforts may improve sensitivity and translate these models into a new type of genome diagnostic tool that prioritizes candidate pathogenic variants or regulators thereof for follow-up validation by RNA sequencing. All used code and machine learning models are available at GitHub and Zenodo
- …