121,298 research outputs found

    Classification between normal and tumor tissues based on the pair-wise gene expression ratio

    Get PDF
    BACKGROUND: Precise classification of cancer types is critically important for early cancer diagnosis and treatment. Numerous efforts have been made to use gene expression profiles to improve precision of tumor classification. However, reliable cancer-related signals are generally lacking. METHOD: Using recent datasets on colon and prostate cancer, a data transformation procedure from single gene expression to pair-wise gene expression ratio is proposed. Making use of the internal consistency of each expression profiling dataset this transformation improves the signal to noise ratio of the dataset and uncovers new relevant cancer-related signals (features). The efficiency in using the transformed dataset to perform normal/tumor classification was investigated using feature partitioning with informative features (gene annotation) as discriminating axes (single gene expression or pair-wise gene expression ratio). Classification results were compared to the original datasets for up to 10-feature model classifiers. RESULTS: 82 and 262 genes that have high correlation to tissue phenotype were selected from the colon and prostate datasets respectively. Remarkably, data transformation of the highly noisy expression data successfully led to lower the coefficient of variation (CV) for the within-class samples as well as improved the correlation with tissue phenotypes. The transformed dataset exhibited lower CV when compared to that of single gene expression. In the colon cancer set, the minimum CV decreased from 45.3% to 16.5%. In prostate cancer, comparable CV was achieved with and without transformation. This improvement in CV, coupled with the improved correlation between the pair-wise gene expression ratio and tissue phenotypes, yielded higher classification efficiency, especially with the colon dataset – from 87.1% to 93.5%. Over 90% of the top ten discriminating axes in both datasets showed significant improvement after data transformation. The high classification efficiency achieved suggested that there exist some cancer-related signals in the form of pair-wise gene expression ratio. CONCLUSION: The results from this study indicated that: 1) in the case when the pair-wise expression ratio transformation achieves lower CV and higher correlation to tissue phenotypes, a better classification of tissue type will follow. 2) the comparable classification accuracy achieved after data transformation suggested that pair-wise gene expression ratio between some pairs of genes can identify reliable markers for cancer

    Analysis of cell proliferation and tissue remodelling uncovers a KLF4 activity score associated with poor prognosis in colorectal cancer

    Get PDF
    Human cancers can be classified based on gene signatures quantifying the degree of cell proliferation and tissue remodelling (PR). However, the specific factors that drive the increased tissue remodelling in tumours are not fully understood. Here we address this question using colorectal cancer as a case study. We reanalysed a reported cohort of colorectal cancer patients. The patients were stratified based on gene signatures of cell proliferation and tissue remodelling. Putative transcription factors activity was inferred using gene expression profiles and annotations of transcription factor targets as input. We demonstrate that the PR classification performs better than the currently adopted consensus molecular subtyping (CMS). Although CMS classification differentiates patients with a mesenchymal signature, it cannot distinguish the remaining patients based on survival. We demonstrate that the missing factor is cell proliferation, which is indicative of good prognosis. We also uncover a KLF4 transcription factor activity score associated with the tissue remodelling gene signature. We further show that the KLF4 activity score is significantly higher in colorectal tumours with predicted infiltration of cells from the myeloid lineage. The KLF4 activity score is associated with tissue remodelling, myeloid cell infiltration and poor prognosis in colorectal cancer

    Inferring Pathway Activity toward Precise Disease Classification

    Get PDF
    The advent of microarray technology has made it possible to classify disease states based on gene expression profiles of patients. Typically, marker genes are selected by measuring the power of their expression profiles to discriminate among patients of different disease states. However, expression-based classification can be challenging in complex diseases due to factors such as cellular heterogeneity within a tissue sample and genetic heterogeneity across patients. A promising technique for coping with these challenges is to incorporate pathway information into the disease classification procedure in order to classify disease based on the activity of entire signaling pathways or protein complexes rather than on the expression levels of individual genes or proteins. We propose a new classification method based on pathway activities inferred for each patient. For each pathway, an activity level is summarized from the gene expression levels of its condition-responsive genes (CORGs), defined as the subset of genes in the pathway whose combined expression delivers optimal discriminative power for the disease phenotype. We show that classifiers using pathway activity achieve better performance than classifiers based on individual gene expression, for both simple and complex case-control studies including differentiation of perturbed from non-perturbed cells and subtyping of several different kinds of cancer. Moreover, the new method outperforms several previous approaches that use a static (i.e., non-conditional) definition of pathways. Within a pathway, the identified CORGs may facilitate the development of better diagnostic markers and the discovery of core alterations in human disease

    Delineation of prognostic biomarkers in prostate cancer

    Full text link
    Prostate cancer is the most frequently diagnosed cancer in American men(1,2). Screening for prostate-specific antigen (PSA) has led to earlier detection of prostate cancer(3), but elevated serum PSA levels may be present in non-malignant conditions such as benign prostatic hyperlasia (BPH). Characterization of gene-expression profiles that molecularly distinguish prostatic neoplasms may identify genes involved in prostate carcinogenesis, elucidate clinical biomarkers, and lead to an improved classification of prostate cancer(4-6). Using microarrays of complementary DNA, we examined gene-expression profiles of more than 50 normal and neoplastic prostate specimens and three common prostate-cancer cell lines. Signature expression profiles of normal adjacent prostate (NAP), BPH, localized prostate cancer, and metastatic, hormone-refractory prostate cancer were determined. Here we establish many associations between genes and prostate cancer. We assessed two of these genes-hepsin, a transmembrane serine protease, and pim-1, a serine/threonine kinase-at the protein level using tissue microarrays consisting of over 700 clinically stratified prostate-cancer specimens. Expression of hepsin and pim-1 proteins was significantly correlated with measures of clinical outcome. Thus, the integration of cDNA microarray, high-density tissue microarray, and linked clinical and pathology data is a powerful approach to molecular profiling of human cancer.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/62849/1/412822a0.pd

    Classification of Genes and Putative Biomarker Identification Using Distribution Metrics on Expression Profiles

    Get PDF
    BACKGROUND: Identification of genes with switch-like properties will facilitate discovery of regulatory mechanisms that underlie these properties, and will provide knowledge for the appropriate application of Boolean networks in gene regulatory models. As switch-like behavior is likely associated with tissue-specific expression, these gene products are expected to be plausible candidates as tissue-specific biomarkers. METHODOLOGY/PRINCIPAL FINDINGS: In a systematic classification of genes and search for biomarkers, gene expression profiles (GEPs) of more than 16,000 genes from 2,145 mouse array samples were analyzed. Four distribution metrics (mean, standard deviation, kurtosis and skewness) were used to classify GEPs into four categories: predominantly-off, predominantly-on, graded (rheostatic), and switch-like genes. The arrays under study were also grouped and examined by tissue type. For example, arrays were categorized as 'brain group' and 'non-brain group'; the Kolmogorov-Smirnov distance and Pearson correlation coefficient were then used to compare GEPs between brain and non-brain for each gene. We were thus able to identify tissue-specific biomarker candidate genes. CONCLUSIONS/SIGNIFICANCE: The methodology employed here may be used to facilitate disease-specific biomarker discovery

    Multiclass Sequential Feature Selection and Classification Method for Genomic Data

    Get PDF
    This paper presents an efficient multiclass sequential feature selection and classification (mk-SS) method using gene expression signatures. The development of this method employs 10-fold cross-validation to ensure stability. The efficiency of this method is assessed through the misclassification error rate and some other performance measures. The performances of the mk-SS were compared with the classification results of the Support Vector Machines (SVM) over five published multiclass microarray datasets. The results showed that the mk-SS method efficiently selects the informative gene biomarkers for proper classification of the biological groups of the tissue samples. This method competes favourably with SVM in terms of prediction accuracy while it outperforms the SVM in 80% of cases considered. The quality of the features selected by mk-SS algorithm was validated by hybridizing the feature selection scheme of the mk-SS into the standard SVM algorithm which significantly improves the predictive power of the standard SVM method. This work has shown that classification of various cancer type using gene expression profiles is feasible especially when the endpoints are of multi-category. Keywords: k-SS, mk-SS, Support Vector Machines, Microarray, Misclassification error rat

    Application of Gene Shaving and Mixture Models to Cluster Microarray Gene Expression Data

    Get PDF
    Researchers are frequently faced with the analysis of microarray data of a relatively large number of genes using a small number of tissue samples. We examine the application of two statistical methods for clustering such microarray expression data: EMMIX-GENE and GeneClust. EMMIX-GENE is a mixture-model based clustering approach, designed primarily to cluster tissue samples on the basis of the genes. GeneClust is an implementation of the gene shaving methodology, motivated by research to identify distinct sets of genes for which variation in expression could be related to a biological property of the tissue samples. We illustrate the use of these two methods in the analysis of Affymetrix oligonucleotide arrays of well-known data sets from colon tissue samples with and without tumors, and of tumor tissue samples from patients with leukemia. Although the two approaches have been developed from different perspectives, the results demonstrate a clear correspondence between gene clusters produced by GeneClust and EMMIX-GENE for the colon tissue data. It is demonstrated, for the case of ribosomal proteins and smooth muscle genes in the colon data set, that both methods can classify genes into co-regulated families. It is further demonstrated that tissue types (tumor and normal) can be separated on the basis of subtle distributed patterns of genes. Application to the leukemia tissue data produces a division of tissues corresponding closely to the external classification, acute myeloid meukemia (AML) and acute lymphoblastic leukemia (ALL), for both methods. In addition, we also identify genes specific for the subgroup of ALL-Tcell samples. Overall, we find that the gene shaving method produces gene clusters at great speed; allows variable cluster sizes and can incorporate partial or full supervision; and finds clusters of genes in which the gene expression varies greatly over the tissue samples while maintaining a high level of coherence between the gene expression profiles. The intent of the EMMIX-GENE method is to cluster the tissue samples. It performs a filtering step that results in a subset of relevant genes, followed by gene clustering, and then tissue clustering, and is favorable in its accuracy of ranking the clusters produced

    Expression cartography of human tissues using self organizing maps

    Get PDF
    Background: The availability of parallel, high-throughput microarray and sequencing experiments poses a challenge how to best arrange and to analyze the obtained heap of multidimensional data in a concerted way. Self organizing maps (SOM), a machine learning method, enables the parallel sample- and gene-centered view on the data combined with strong visualization and second-level analysis capabilities. The paper addresses aspects of the method with practical impact in the context of expression analysis of complex data sets.
Results: The method was applied to generate a SOM characterizing the whole genome expression profiles of 67 healthy human tissues selected from ten tissue categories (adipose, endocrine, homeostasis, digestion, exocrine, epithelium, sexual reproduction, muscle, immune system and nervous tissues). SOM mapping reduces the dimension of expression data from ten thousands of genes to a few thousands of metagenes where each metagene acts as representative of a minicluster of co-regulated single genes. Tissue-specific and common properties shared between groups of tissues emerge as a handful of localized spots in the tissue maps collecting groups of co-regulated and co-expressed metagenes. The functional context of the spots was discovered using overrepresentation analysis with respect to pre-defined gene sets of known functional impact. We found that tissue related spots typically contain enriched populations of gene sets well corresponding to molecular processes in the respective tissues. Analysis techniques normally used at the gene-level such as two-way hierarchical clustering provide a better signal-to-noise ratio and a better representativeness of the method if applied to the metagenes. Metagene-based clustering analyses aggregate the tissues into essentially three clusters containing nervous, immune system and the remaining tissues. 
Conclusions: The global view on the behavior of a few well-defined modules of correlated and differentially expressed genes is more intuitive and more informative than the separate discovery of the expression levels of hundreds or thousands of individual genes. The metagene approach is less sensitive to a priori selection of genes. It can detect a coordinated expression pattern whose components would not pass single-gene significance thresholds and it is able to extract context-dependent patterns of gene expression in complex data sets.
&#xa
    corecore