Search CORE

15 research outputs found

Rat cluster 1 shows co-expression between Hsp and cardiomyopathy genes which is conserved with human heart and liver tissues

Author: Aida Moreno-Moral (505400)
Enrico Petretto (204195)
Leonardo Bottolo (179493)
Maxime Rotival (133720)
Xiaolin Xiao (505399)
Publication venue
Publication date
Field of study

(A) Network of 135 annotated rat genes identified by C3D as co-expressed in heart, aorta, liver and skeletal muscle tissues (). In each tissue we selected the top 5% of edges based on the (absolute) covariance between gene expression profiles and then calculated the average covariance across the four tissues. Edges are represented by lines connecting nodes (genes) and the thickness of the line is proportional to the average covariance value. Within the network, heat shock protein (Hsp) and cardiomyopathy genes are highlighted in blue and red, respectively. The Kendall correlations between the expression profiles of Hsp and cardiomyopathy genes are graphically represented as sub-networks separately for each tissue. Line thickness is proportional to the value of the Kendall correlation. (B) Enrichment for functional categories (, full list in <a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1004006#pgen.1004006.s006" target="_blank">Table S2</a>) and for disease association (adjusted , details in <a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1004006#pgen.1004006.s007" target="_blank">Table S3</a>). (C) Significant protein-protein interaction (PPI) network () where the Hsp and cardiomyopathy genes showing conserved PPI are highlighted (blue and red circles). (D) Conserved co-expression network detected in heart tissue samples from patients with advanced idiopathic or ischemic cardiomyopathy. The network includes all human orthologous genes of the genes in rat cluster 1 that have significant edges by covariance selection (). (E) Conserved co-expression network detected in liver tissue samples from healthy volunteers. The network includes all human orthologous genes of the genes in rat cluster 1 that have significant edges by covariance selection ().</p

FigShare

Performance comparison for C3D, WGCNA and DiffCoEx methods

Author: Aida Moreno-Moral (505400)
Enrico Petretto (204195)
Leonardo Bottolo (179493)
Maxime Rotival (133720)
Xiaolin Xiao (505399)
Publication venue
Publication date
Field of study

Top, three cluster types (“common” “nested” and “overlapping”) were simulated in conditions where the cluster size () is reported for both the intersection and union part of the clusters. Bottom, for each method the average TPR and FPR () across 20 replicated datasets were calculated and reported for the simulated cluster densities. For C3D analysis (blue lines) we required each cluster to be detected with a misclassification error rate (MER) of 5% or 20% and . For WGCNA (red line) and DiffCoEx (green line) we considered two “default values” for the cut-off threshold, which were chosen according to the WGCNA guidelines (see <a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1004006#pgen.1004006.s010" target="_blank">Text S1</a> for details).</p

FigShare

Human co-expression cluster 1.

Author: Aida Moreno-Moral (505400)
Enrico Petretto (204195)
Leonardo Bottolo (179493)
Maxime Rotival (133720)
Xiaolin Xiao (505399)
Publication venue
Publication date
Field of study

Top left, each node in the network represents a gene and, in keeping with <a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1004006#pgen.1004006-Liu1" target="_blank">[61]</a>, for each gene we highlight significant up-regulation in VZ (red) or CP (green) as compared with the other neocortex regions. Genes that are were not differentially expressed between neocortex regions are coloured in grey. Genes present in relevant KEGG pathways (p53 signaling, ECM-receptor interaction, Cell cycle and DNA replication) are extracted from the main network and highlighted. Top right, functional annotation for the network: top five significant GO biological processes and KEGG pathways (full list in <a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1004006#pgen.1004006.s007" target="_blank">Table S3</a>). Bottom left, summary of cell-type enrichment analysis expressed as (Benjamini and Hochberg (BH)-adjusted p-value, Cten analysis). Bottom right, graph with the significant protein-protein interactions (PPI), including the overall significance of the directed PPI network (DAPPLE analysis, ). The colour scale on the right indicate the significance of the detected PPI.</p

FigShare

Co-expression clusters identified in all rat tissues.

Author: Aida Moreno-Moral (505400)
Enrico Petretto (204195)
Leonardo Bottolo (179493)
Maxime Rotival (133720)
Xiaolin Xiao (505399)
Publication venue
Publication date
Field of study

For each rat cluster detected in all seven tissues we report the number of probe sets, the top five functional categories and their statistical significance (full list in <a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1004006#pgen.1004006.s006" target="_blank">Table S2</a>), the summary of cell-type enrichment statistics expressed as (Benjamini and Hochberg (BH)-adjusted p-value, Cten analysis) and the graph with the significant protein-protein interactions (PPI), including the overall significance of the directed PPI network (DAPPLE analysis). The colour scale on the right indicate the significance of the detected PPI.</p

FigShare

Multi-tissue Analysis of Co-expression Networks by Higher-Order Generalized Singular Value Decomposition Identifies Functionally Coherent Transcriptional Modules

Author: Aida Moreno-Moral (505400)
Enrico Petretto (204195)
Leonardo Bottolo (179493)
Maxime Rotival (133720)
Xiaolin Xiao (505399)
Publication venue
Publication date: 22/10/2013
Field of study

<div>Recent high-throughput efforts such as ENCODE have generated a large body of genome-scale transcriptional data in multiple conditions (e.g., cell-types and disease states). Leveraging these data is especially important for network-based approaches to human disease, for instance to identify coherent transcriptional modules (subnetworks) that can inform functional disease mechanisms and pathological pathways. Yet, genome-scale network analysis across conditions is significantly hampered by the paucity of robust and computationally-efficient methods. Building on the Higher-Order Generalized Singular Value Decomposition, we introduce a new algorithmic approach for efficient, parameter-free and reproducible identification of network-modules simultaneously across multiple conditions. Our method can accommodate weighted (and unweighted) networks of any size and can similarly use co-expression or raw gene expression input data, without hinging upon the definition and stability of the correlation used to assess gene co-expression. In simulation studies, we demonstrated distinctive advantages of our method over existing methods, which was able to recover accurately both common and condition-specific network-modules without entailing ad-hoc input parameters as required by other approaches. We applied our method to genome-scale and multi-tissue transcriptomic datasets from rats (microarray-based) and humans (mRNA-sequencing-based) and identified several common and tissue-specific subnetworks with functional significance, which were not detected by other methods. In humans we recapitulated the crosstalk between cell-cycle progression and cell-extracellular matrix interactions processes in ventricular zones during neocortex expansion and further, we uncovered pathways related to development of later cognitive functions in the cortical plate of the developing brain which were previously unappreciated. Analyses of seven rat tissues identified a multi-tissue subnetwork of co-expressed heat shock protein (Hsp) and cardiomyopathy genes (Bag3, Cryab, Kras, Emd, Plec), which was significantly replicated using separate failing heart and liver gene expression datasets in humans, thus revealing a conserved functional role for Hsp genes in cardiovascular disease.</div

Directory of Open Access Journals

PubMed Central

Spiral - Imperial College Digital Repository

FigShare

Description of the cluster structures used in the simulation studies.

Author: Aida Moreno-Moral (505400)
Enrico Petretto (204195)
Leonardo Bottolo (179493)
Maxime Rotival (133720)
Xiaolin Xiao (505399)
Publication venue
Publication date
Field of study

We simulated three cluster types: “common” (Cluster pattern 1), “nested” (Cluster pattern 2) and “overlapping” (Cluster pattern 3) that are shared across three or more conditions. For Cluster pattern 2 and Cluster pattern 3, the “intersection cluster” is defined by the nodes in common to all conditions (red square) whereas the “union cluster” is defined by the nodes in common to all conditions plus the nodes present in individual conditions (black square).</p

FigShare

Comparison of the marginal phenotype-SNP associations provided by GUESS, SNPTEST and piMASS in the single trait analysis of TG.

(To increase readability, the log10(BFs) are truncated at 20). (A) Genome-wide log10(BF) obtained from GUESS. Significant SNPs found associated at an FDR of 5% are depicted by black dots (with the SNP's name) whereas significant SNPs that are also in the top Best Model Visited are represented by red dots (also with the SNP's name). (B) Genome-wide log10(BF) obtained from SNPTEST. The horizontal dashed line indicates the level of log10(BF) that provides strong evidence of a phenotype-SNP association with Marginal Posterior Probability of inclusion close to 1. For comparison purposes, SNPs detected by GUESS are highlighted (their name is printed). SNPs found by SNPTEST with log10(BF)>5 are coloured coded according to the level of pairwise Pearson correlation with the closest significant GUESS SNP (see colour bar for correlation scale). (C) Genome-wide log10(BF) obtained from piMASS. The horizontal dashed line indicates the level of log10(BF) that provides strong evidence for a phenotype-SNP association. (D) log10(BF) signals obtained from SNPTEST in a region of chromosome 11 spanning nearly 500 Kb (116,519,739–116,845,104 bp). The horizontal dashed line and colour code used to identify relevant SNPs are the same as defined in (B). Top bars indicate the position of genes in the region retrieved from Ensembl R66. (E) Scatterplot of genome-wide log10(BF) of TG obtained from GUESS and SNPTEST. Colour code used to identify relevant SNPs and the horizontal dashed line are as defined in (A) and (B). (F) Scatterplot of genome-wide log10(BF) of TG obtained from GUESS and piMASS. The colour code used to identify relevant SNPs and the horizontal dashed line are as defined in (A) and (B).</p

FigShare

Comparison of the marginal phenotype-SNP associations provided by GUESS and SNPTEST in the multiple traits analysis of TG-LDL-APOB.

(To increase readability, the log10(BFs) are truncated at 20). (A) Genome-wide log10(BF) obtained from GUESS. Significant SNPs found associated at 5% FDR are depicted by black dots (with the SNP's name) whereas significant SNPs that are also in the top Best Model Visited are represented by red dots (with the SNP's name). (B) Genome-wide log10(BF) obtained from SNPTEST. The horizontal dashed line indicates the level of log10(BF) that provides strong evidence of a phenotype-SNP association with Marginal Posterior Probability of inclusion close to 1. For comparison purposes, SNPs found by GUESS are highlighted (their name is printed). SNPs with log10(BF)>5 are coloured coded according to the level of pairwise Pearson correlation with the closest significant GUESS SNP (see colour bar for correlation scale). (C) log10(BF) signal obtained from SNPTEST in a region of chromosome 11 spanning nearly 500 Kb (116,519,739–116,845,104 bp). The horizontal dashed line and colour code used to identify relevant SNPs are as defined in (B). Top bars indicate the position of genes in the region retrieved from Ensembl R66. (D) Scatterplot of genome-wide log10(BF) of TG-LDL-APOB obtained from GUESS and SNPTEST. The colour code used to identify relevant SNPs and the horizontal dashed line are as defined in (A) and (B).</p

FigShare

Schematic representation of the analysis of single and multiple phenotypes using GUESS.

(A–B) Given a group of single traits (APOA1, APOB, HDL, LDL and TG), we constructed two top-down trees (green and blue colour coded) made by biologically driven combinations of phenotypes and centred on the pathways of LDL (A) and HDL (B). Each branch of the trees was regressed on the whole set of tagged SNPs (∼273K SNPs) using GUESS and adjusting for sex, age and body mass index. (C) Output from GUESS is used to derive the Best Models Visited (BMV), i.e. the most supported multivariate models, and their Model Posterior Probability (MPP), i.e. the fraction of the model space explained by the BMV (MPP of the top BMV and the cumulative MPP of the top five BMV are indicated in the first two columns, respectively). Based on an empirical FDR procedure, we selected a parsimonious set of significant SNPs (indicated on the top of the table with the associated locus) that explains the variation of each branch of the two trees. Merging this information with the list of SNPs in the top BMV allowed us to highlight a robust subset of significant SNPs that repeatedly contribute to the top supported model (significant SNPs are depicted in black whereas significant SNPs that are also in the top BMV are indicated in red). For each SNPs, comparison of the marginal strength of association across different combinations of traits is possible by a new rescaled measure of marginal phenotype-SNP association, Ratio of Bayes Factors (RBF) (phenotype-SNP log10(RBF) is truncated at 20 to increase readability). Based on Ensembl R66 annotation, each locus is classified as: (1) intronic, (2) 3′UTR, (3) downstream, (4) previously associated and (5) a tagSNP of a previously associated SNP. The name of the locus is also reported on the right of each branch of the two trees with the same colour code used in the table: black if the locus is associated with the phenotypes with FDR<5%, red if the locus is also in the top BMV with FDR<5%.</p

FigShare

Receiver Operating Characteristic (ROC) curves of SNPTEST (black), SPLS (blue), MLASSO (dark green), (M)ANOVA (purple), piMASS (green) and GUESS (red) for multiple traits and single trait simulated datasets.

For GUESS, ROC curves are obtained using the top Best Model Visited (BMV) (red star) and the Marginal Posterior Probability of Inclusion (MPPI) (solid red line). For SNPTEST, the ROC curve is calculated using log10(BF) while for piMASS ROC curves are obtained using MPPI. (Average) number of SNPs retained by SPLS and MLASSO under different levels of penalization are indicated (A–B). For MANOVA Wilks (A–B) and ANOVA Kruskal (C–D), the ROC curve is derived using the SNPs declared significant over a range of FDR levels. Number of false positives (x-axis) is indicated at the top of the figure while proportion of false positives is presented at the bottom. Given the large number of predictors (273,294), false positives are truncated at 10−4 at which level a large number already occurs (27.5).</p

FigShare