20 research outputs found

    Additional file 5: Figure S2. of A pilot systematic genomic comparison of recurrence risks of hepatitis B virus-associated hepatocellular carcinoma with low- and high-degree liver fibrosis

    No full text
    Differentially expressed genes signatures. (A) Differentially expressed genes between low and high liver fibrosis group are shown in heatmap. (B) Heatmap of 186 prognostic signatures genes from Hoshida et al. [38]. (TIF 1610 kb

    Additional file 17: Figure S7. of A pilot systematic genomic comparison of recurrence risks of hepatitis B virus-associated hepatocellular carcinoma with low- and high-degree liver fibrosis

    Get PDF
    Gene expression influenced by HBV integration. For the recurrent host genes, the gene expression is compared between samples with and without integration. Two recurrent host genes, (A) KMT2B and (B) ARAP2, show gene expression changes induced by HBV integrations. P value is measured by the Student t-test. (C) Differentially expressed genes between tumors with and without HBV-KMT2B integration. A total of 139 genes were over-expressed in the tumors with HBV-KMT2B integration while 32 were under-expressed. The list of the top 20 in over-expressed (red) and top 5 under expressed (green) enriched GO terms within each gene set is shown. (TIF 1420 kb

    Sample similarity measurement based on cis methylation-mRNA pairs.

    No full text
    <p>After cis methylation-mRNA pairs are identified, the methylation and gene expression levels were rank-transformed. In this figure, there are M samples and <i>i</i> cis pairs. Then Pearson correlation is calculated and used as sample similarity, , between one methylation profile and all gene expression profiles. If both methylation and gene expression profiles are from the same individual, self-self correlation coefficient is expected to be significantly higher than correlation coefficients with other samples.</p

    Examples of sample alignment in the TCGA BRCA data set.

    No full text
    <p>(A) A similarity score distribution of a correctly labeled profile. The red star indicates the similarity score between self-matched profile pairs (gene expression and methylation data profiles are labeled as pertaining to the same sample). (B) Similarity scores of self-matched pairs (red stars) between gene expression and methylation profiles for two samples are lower than the similarity scores of cross-matched pairs (blue stars).</p

    Sample alignment with MODMatcher.

    No full text
    <p>Initial labels of samples are used to determine cis pairs, which are then used to calculate similarity scores. Based on the similarity scores determined with three data types, the molecular data are matched with each other (1) by gender, (2) by cis-eSNPs, (3) by cis-mSNPs, (4) by cis mRNA-methylation pairs, and (5) by all trio mapping. Then, updated sample pairs are used to calculate new cis pairs for another round of alignment. Rounds of alignment are repeated until there are no further changes.</p

    MODMatcher: Multi-Omics Data Matcher for Integrative Genomic Analysis

    No full text
    <div><p>Errors in sample annotation or labeling often occur in large-scale genetic or genomic studies and are difficult to avoid completely during data generation and management. For integrative genomic studies, it is critical to identify and correct these errors. Different types of genetic and genomic data are inter-connected by cis-regulations. On that basis, we developed a computational approach, Multi-Omics Data Matcher (MODMatcher), to identify and correct sample labeling errors in multiple types of molecular data, which can be used in further integrative analysis. Our results indicate that inspection of sample annotation and labeling error is an indispensable data quality assurance step. Applied to a large lung genomic study, MODMatcher increased statistically significant genetic associations and genomic correlations by more than two-fold. In a simulation study, MODMatcher provided more robust results by using three types of omics data than two types of omics data. We further demonstrate that MODMatcher can be broadly applied to large genomic data sets containing multiple types of omics data, such as The Cancer Genome Atlas (TCGA) data sets.</p></div

    Gender prediction based on expression of the Y-chromosome specific gene <i>RPS4Y1</i>.

    No full text
    <p>The log2 transformed values of <i>RPS4Y1</i> expression level are clearly separated between male and female samples both in CTRL and patients with COPD (>10 in male samples and <10 in female samples). There were no gender mismatched samples in the CTRL and 5 mismatched samples (2 in females and 3 in males) in the COPD set (error rate of 1.5%).</p

    Gender prediction based on methylation intensity.

    No full text
    <p>The raw intensity of a Y-chromosome methyl probe corresponding to <i>FAM197Y2P</i> is clearly different between genders. One error was identified in the CTRL and 15 errors were identified in the COPD set (6 in females, 9 in males) (error rate of 6.4%).</p
    corecore