317 research outputs found
Modular analysis of gene expression data with R
Summary: Large sets of data, such as expression profiles from many samples, require analytic tools to reduce their complexity. The Iterative Signature Algorithm (ISA) is a biclustering algorithm. It was designed to decompose a large set of data into so-called ‘modules'. In the context of gene expression data, these modules consist of subsets of genes that exhibit a coherent expression profile only over a subset of microarray experiments. Genes and arrays may be attributed to multiple modules and the level of required coherence can be varied resulting in different ‘resolutions' of the modular mapping. In this short note, we introduce two BioConductor software packages written in GNU R: The isa2 package includes an optimized implementation of the ISA and the eisa package provides a convenient interface to run the ISA, visualize its output and put the biclusters into biological context. Potential users of these packages are all R and BioConductor users dealing with tabular (e.g. gene expression) data. Availability: http://www.unil.ch/cbg/ISA Contact: [email protected]
Comparative modular analysis of gene expression in vertebrate organs
ABSTRACT:
BACKGROUND: The degree of conservation of gene expression between homologous organs largely remains an open question. Several recent studies reported some evidence in favor of such conservation. Most studies compute organs' similarity across all orthologous genes, whereas the expression level of many genes are not informative about organ specificity.
RESULTS: Here, we use a modularization algorithm to overcome this limitation through the identification of inter-species co-modules of organs and genes. We identify such co-modules using mouse and human microarray expression data. They are functionally coherent both in terms of genes and of organs from both organisms. We show that a large proportion of genes belonging to the same co-module are orthologous between mouse and human. Moreover, their zebrafish orthologs also tend to be expressed in the corresponding homologous organs. Notable exceptions to the general pattern of conservation are the testis and the olfactory bulb. Interestingly, some co-modules consist of single organs, while others combine several functionally related organs. For instance, amygdala, cerebral cortex, hypothalamus and spinal cord form a clearly discernible unit of expression, both in mouse and human.
CONCLUSIONS: Our study provides a new framework for comparative analysis which will be applicable also to other sets of large-scale phenotypic data collected across different species
Tissue-specific regulatory circuits reveal variable modular perturbations across complex diseases
Mapping perturbed molecular circuits that underlie complex diseases remains a great challenge. We developed a comprehensive resource of 394 cell type– and tissue-specific gene regulatory networks for human, each specifying the genome-wide connectivity among transcription factors, enhancers, promoters and genes. Integration with 37 genome-wide association studies (GWASs) showed that disease-associated genetic variants—including variants that do not reach genome-wide significance—often perturb regulatory modules that are highly specific to disease-relevant cell types or tissues. Our resource opens the door to systematic analysis of regulatory programs across hundreds of human cell types and tissue
Participation bias in the UK Biobank distorts genetic associations and downstream analyses
While volunteer-based studies such as the UK Biobank have become the cornerstone of genetic epidemiology, the participating individuals are rarely representative of their target population. To evaluate the impact of selective participation, here we derived UK Biobank participation probabilities on the basis of 14 variables harmonized across the UK Biobank and a representative sample. We then conducted weighted genome-wide association analyses on 19 traits. Comparing the output from weighted genome-wide association analyses (neffective = 94,643 to 102,215) with that from standard genome-wide association analyses (n = 263,464 to 283,749), we found that increasing representativeness led to changes in SNP effect sizes and identified novel SNP associations for 12 traits. While heritability estimates were less impacted by weighting (maximum change in h2, 5%), we found substantial discrepancies for genetic correlations (maximum change in rg, 0.31) and Mendelian randomization estimates (maximum change in βSTD, 0.15) for socio-behavioural traits. We urge the field to increase representativeness in biobank samples, especially when studying genetic correlates of behaviour, lifestyles and social outcomes
ExpressionView—an interactive viewer for modules identified in gene expression data
Summary: ExpressionView is an R package that provides an interactive graphical environment to explore transcription modules identified in gene expression data. A sophisticated ordering algorithm is used to present the modules with the expression in a visually appealing layout that provides an intuitive summary of the results. From this overview, the user can select individual modules and access biologically relevant metadata associated with them. Availability: http://www.unil.ch/cbg/ExpressionView. Screenshots, tutorials and sample data sets can be found on the ExpressionView web site. Contact: [email protected]
Mendelian randomization integrating GWAS and eQTL data reveals genetic determinants of complex and clinical traits
Genome-wide association studies (GWAS) have identified thousands of variants associated with complex traits, but their biological interpretation often remains unclear. Most of these variants overlap with expression QTLs, indicating their potential involvement in regulation of gene expression. Here, we propose a transcriptome-wide summary statistics-based Mendelian Randomization approach (TWMR) that uses multiple SNPs as instruments and multiple gene expression traits as exposures, simultaneously. Applied to 43 human phenotypes, it uncovers 3,913 putatively causal gene-trait associations, 36% of which have no genome-wide significant SNP nearby in previous GWAS. Using independent association summary statistics, we find that the majority of these loci were missed by GWAS due to power issues. Noteworthy among these links is educational attainment-associated BSCL2, known to carry mutations leading to a Mendelian form of encephalopathy. We also find pleiotropic causal effects suggestive of mechanistic connections. TWMR better accounts for pleiotropy and has the potential to identify biological mechanisms underlying complex traits
FADS3 is a Δ4Z sphingoid base desaturase that contributes to gender differences to the human plasma sphingolipidome
Sphingolipids (SL) are structurally diverse lipids that are defined by the presence of a long chain base (LCB) backbone. Typically, LCBs contain a single Δ4E DB (mostly d18:1), whereas the dienic LCB sphingadienine (d18:2) contains a second DB at Δ14Z position. The enzyme introducing the Δ14Z DB is unknown. We analyzed the LCB plasma profile in a gender-, age-, and BMI-matched subgroup of the CoLaus cohort (n = 658). Sphingadienine levels showed a significant association with gender, being in average ~30% higher in females. A genome-wide association study (GWAS) revealed variants in the fatty-acid desaturase 3 (FADS3) gene to be significantly associated with the plasma d18:2/d18:1 ratio (p = -log 7.9). Metabolic labeling assays, FADS3 overexpression and knockdown approaches, as well as plasma LCB profiling in FADS3-deficient mice confirmed that FADS3 is a bona-fide LCB desaturase and required for the introduction of the Δ14Z double bound. Moreover, we showed that FADS3 is required for the conversion of the atypical cytotoxic 1-deoxysphinganine (1-deoxySA, m18:0) to 1-deoxysphingosine (1-deoxySO, m18:1). HEK293 cells overexpressing FADS3, were more resistant to m18:0 toxicity than WT cells. In summary, using a combination of metabolic profiling and GWAS, we identified FADS3 to be essential for forming Δ14Z DB containing LCBs, such as d18:2 and m18:1. Our results unravel FADS3 as a Δ14Z LCB desaturase, thereby disclosing the last missing enzyme of the SL de novo synthesis pathway
Approaches to detect genetic effects that differ between two strata in genome-wide meta-analyses: Recommendations based on a systematic evaluation.
Genome-wide association meta-analyses (GWAMAs) conducted separately by two strata have identified differences in genetic effects between strata, such as sex-differences for body fat distribution. However, there are several approaches to identify such differences and an uncertainty which approach to use. Assuming the availability of stratified GWAMA results, we compare various approaches to identify between-strata differences in genetic effects. We evaluate type I error and power via simulations and analytical comparisons for different scenarios of strata designs and for different types of between-strata differences. For strata of equal size, we find that the genome-wide test for difference without any filtering is the best approach to detect stratum-specific genetic effects with opposite directions, while filtering for overall association followed by the difference test is best to identify effects that are predominant in one stratum. When there is no a priori hypothesis on the type of difference, a combination of both approaches can be recommended. Some approaches violate type I error control when conducted in the same data set. For strata of unequal size, the best approach depends on whether the genetic effect is predominant in the larger or in the smaller stratum. Based on real data from GIANT (>175 000 individuals), we exemplify the impact of the approaches on the detection of sex-differences for body fat distribution (identifying up to 10 loci). Our recommendations provide tangible guidelines for future GWAMAs that aim at identifying between-strata differences. A better understanding of such effects will help pinpoint the underlying mechanisms
Composite trait Mendelian randomization reveals distinct metabolic and lifestyle consequences of differences in body shape
Obesity is a major risk factor for a wide range of cardiometabolic diseases, however the impact of specific aspects of body morphology remains poorly understood. We combined the GWAS summary statistics of fourteen anthropometric traits from UK Biobank through principal component analysis to reveal four major independent axes: body size, adiposity, predisposition to abdominal fat deposition, and lean mass. Mendelian randomization analysis showed that although body size and adiposity both contribute to the consequences of BMI, many of their effects are distinct, such as body size increasing the risk of cardiac arrhythmia (b = 0.06, p = 4.2 ∗ 10 <sup>-17</sup> ) while adiposity instead increased that of ischemic heart disease (b = 0.079, p = 8.2 ∗ 10 <sup>-21</sup> ). The body mass-neutral component predisposing to abdominal fat deposition, likely reflecting a shift from subcutaneous to visceral fat, exhibited health effects that were weaker but specifically linked to lipotoxicity, such as ischemic heart disease (b = 0.067, p = 9.4 ∗ 10 <sup>-14</sup> ) and diabetes (b = 0.082, p = 5.9 ∗ 10 <sup>-19</sup> ). Combining their independent predicted effects significantly improved the prediction of obesity-related diseases (p < 10 <sup>-10</sup> ). The presented decomposition approach sheds light on the biological mechanisms underlying the heterogeneity of body morphology and its consequences on health and lifestyle
- …