41 research outputs found
New Method for Joint Network Analysis Reveals Common and Different Coexpression Patterns among Genes and Proteins in Breast Cancer
We focus on characterizing
common and different coexpression patterns
among RNAs and proteins in breast cancer tumors. To address this problem,
we introduce Joint Random Forest (JRF), a novel nonparametric algorithm
to simultaneously estimate multiple coexpression networks by effectively
borrowing information across protein and gene expression data. The
performance of JRF was evaluated through extensive simulation studies
using different network topologies and data distribution functions.
Advantages of JRF over other algorithms that estimate class-specific
networks separately were observed across all simulation settings.
JRF also outperformed a competing method based on Gaussian graphic
models. We then applied JRF to simultaneously construct gene and protein
coexpression networks based on protein and RNAseq data from CPTAC-TCGA
breast cancer study. We identified interesting common and differential
coexpression patterns among genes and proteins. This information can
help to cast light on the potential disease mechanisms of breast cancer
Additional file 4: Table S3. of Inter-tissue coexpression network analysis reveals DPP4 as an important gene in heart to blood communication
Number of significant gene-modules identified for each tissue pair. (PDF 42 kb
Additional file 2: of Inter-tissue coexpression network analysis reveals DPP4 as an important gene in heart to blood communication
Supporting notes. Figure S1. The optimal numbers of principal components (PCs) to correct in each tissue. Figure S2. Histograms of correlation coefficients between sample ischemic time and RINs with gene expression profiles in nine tissues. Red lines are for correlation with RINs, and blue lines are for correlation with sample ischemic time. Solid lines are for empirical gene expression profiles in the study, dashed lines are for permuted data. (DOCX 500 kb
Sample alignment with MODMatcher.
<p>Initial labels of samples are used to determine cis pairs, which are then used to calculate similarity scores. Based on the similarity scores determined with three data types, the molecular data are matched with each other (1) by gender, (2) by cis-eSNPs, (3) by cis-mSNPs, (4) by cis mRNA-methylation pairs, and (5) by all trio mapping. Then, updated sample pairs are used to calculate new cis pairs for another round of alignment. Rounds of alignment are repeated until there are no further changes.</p
MODMatcher: Multi-Omics Data Matcher for Integrative Genomic Analysis
<div><p>Errors in sample annotation or labeling often occur in large-scale genetic or genomic studies and are difficult to avoid completely during data generation and management. For integrative genomic studies, it is critical to identify and correct these errors. Different types of genetic and genomic data are inter-connected by cis-regulations. On that basis, we developed a computational approach, Multi-Omics Data Matcher (MODMatcher), to identify and correct sample labeling errors in multiple types of molecular data, which can be used in further integrative analysis. Our results indicate that inspection of sample annotation and labeling error is an indispensable data quality assurance step. Applied to a large lung genomic study, MODMatcher increased statistically significant genetic associations and genomic correlations by more than two-fold. In a simulation study, MODMatcher provided more robust results by using three types of omics data than two types of omics data. We further demonstrate that MODMatcher can be broadly applied to large genomic data sets containing multiple types of omics data, such as The Cancer Genome Atlas (TCGA) data sets.</p></div
Examples of sample alignment in the TCGA BRCA data set.
<p>(A) A similarity score distribution of a correctly labeled profile. The red star indicates the similarity score between self-matched profile pairs (gene expression and methylation data profiles are labeled as pertaining to the same sample). (B) Similarity scores of self-matched pairs (red stars) between gene expression and methylation profiles for two samples are lower than the similarity scores of cross-matched pairs (blue stars).</p
Gender prediction based on expression of the Y-chromosome specific gene <i>RPS4Y1</i>.
<p>The log2 transformed values of <i>RPS4Y1</i> expression level are clearly separated between male and female samples both in CTRL and patients with COPD (>10 in male samples and <10 in female samples). There were no gender mismatched samples in the CTRL and 5 mismatched samples (2 in females and 3 in males) in the COPD set (error rate of 1.5%).</p
Relationship between metabolites and genes linked to eQTL hot spot 2 on Chromosome V.
<p>(A) De novo biosynthesis of pyrimidine pathway; (B) orotic acid and dihydroorotic acid concentrations are linked to the <i>URA3</i> locus; (C) <i>URA3</i> is predicted as the causal regulator for genes and metabolites linked to the eQTL hot spot. Red nodes are genes or metabolites whose variations are linked the Chromosome V locus. The shapes of the nodes follow the convention described in <a href="http://www.plosbiology.org/article/info:doi/10.1371/journal.pbio.1001301#pbio-1001301-g003" target="_blank">Figure 3</a>.</p
Overview of the experimental design.
<p>A cross between laboratory (BY) and wild (RM) strains of <i>S. cerevisiae </i><a href="http://www.plosbiology.org/article/info:doi/10.1371/journal.pbio.1001301#pbio.1001301-Brem1" target="_blank">[11]</a> was gene expression profiled. Metabolites were profiled under the same conditions. These data were then integrated with genotype data along with information from public databases to derive a BN. The derived network was used to analyze how cells are regulated.</p
Genes and metabolites linked to eQTL hot spot 3 on Chromosome XIII.
<p>(A) Variations of the metabolites isoleucine and threonine are linked to this locus. (B) These two subnetworks comprise genes and metabolites enriched for linking to the Chromosome XIII locus. The larger network consists of both gene expression and metabolite nodes enriched for the GO biological process nitrogen compound metabolism. The smaller network is enriched for the GO biological process de novo IMP biosynthetic process. Red nodes are genes with eQTLs linked to the Chromosome 13 locus. (C) Expression levels of eight genes (in red) are different between <i>VPS9</i> knockout and the wild-type strains. The shapes of the nodes follow the convention described in <a href="http://www.plosbiology.org/article/info:doi/10.1371/journal.pbio.1001301#pbio-1001301-g003" target="_blank">Figure 3</a>.</p