3,223 research outputs found

    Mining expressed sequence tags identifies cancer markers of clinical interest

    Get PDF
    BACKGROUND: Gene expression data are a rich source of information about the transcriptional dis-regulation of genes in cancer. Genes that display differential regulation in cancer are a subtype of cancer biomarkers. RESULTS: We present an approach to mine expressed sequence tags to discover cancer biomarkers. A false discovery rate analysis suggests that the approach generates less than 22% false discoveries when applied to combined human and mouse whole genome screens. With this approach, we identify the 200 genes most consistently differentially expressed in cancer (called HM200) and proceed to characterize these genes. When used for prediction in a variety of cancer classification tasks (in 24 independent cancer microarray datasets, 59 classifications total), we show that HM200 and the shorter gene list HM100 are very competitive cancer biomarker sets. Indeed, when compared to 13 published cancer marker gene lists, HM200 achieves the best or second best classification performance in 79% of the classifications considered. CONCLUSION: These results indicate the existence of at least one general cancer marker set whose predictive value spans several tumor types and classification types. Our comparison with other marker gene lists shows that HM200 markers are mostly novel cancer markers. We also identify the previously published Pomeroy-400 list as another general cancer marker set. Strikingly, Pomeroy-400 has 27 genes in common with HM200. Our data suggest that a core set of genes are responsive to the deregulation of pathways involved in tumorigenesis in a variety of tumor types and that these genes could serve as transcriptional cancer markers in applications of clinical interest. Finally, our study suggests new strategies to select and evaluate cancer biomarkers in microarray studies

    What can digital transcript profiling reveal about human cancers?

    Get PDF
    Important biological and clinical features of malignancy are reflected in its transcript pattern. Recent advances in gene expression technology and informatics have provided a powerful new means to obtain and interpret these expression patterns. A comprehensive approach to expression profiling is serial analysis of gene expression (SAGE), which provides digital information on transcript levels. SAGE works by counting transcripts and storing these digital values electronically, providing absolute gene expression levels that make historical comparisons possible. SAGE produces a comprehensive profile of gene expression and can be used to search for candidate tumor markers or antigens in a limited number of samples. The Cancer Genome Anatomy Project has created a SAGE database of human gene expression levels for many different tumors and normal reference tissues and provides online tools for viewing, comparing, and downloading expression profiles. Digital expression profiling using SAGE and informatics have been useful for identifying genes that have a role in tumor invasion and other aspects of tumor progression.Universidade Federal de São Paulo (UNIFESP) Departamento de Medicina Divisão de EndocrinologiaDuke University Medical CenterInstituto Ludwig de Pesquisa sobre o CâncerUNIFESP, Depto. de Medicina Divisão de EndocrinologiaSciEL

    Bioinformatic identification of proteins with tissue-specific expression for biomarker discovery

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>There is an important need for the identification of novel serological biomarkers for the early detection of cancer. Current biomarkers suffer from a lack of tissue specificity, rendering them vulnerable to non-disease-specific increases. The present study details a strategy to rapidly identify tissue-specific proteins using bioinformatics.</p> <p>Methods</p> <p>Previous studies have focused on either gene or protein expression databases for the identification of candidates. We developed a strategy that mines six publicly available gene and protein databases for tissue-specific proteins, selects proteins likely to enter the circulation, and integrates proteomic datasets enriched for the cancer secretome to prioritize candidates for further verification and validation studies.</p> <p>Results</p> <p>Using colon, lung, pancreatic and prostate cancer as case examples, we identified 48 candidate tissue-specific biomarkers, of which 14 have been previously studied as biomarkers of cancer or benign disease. Twenty-six candidate biomarkers for these four cancer types are proposed.</p> <p>Conclusions</p> <p>We present a novel strategy using bioinformatics to identify tissue-specific proteins that are potential cancer serum biomarkers. Investigation of the 26 candidates in disease states of the organs is warranted.</p

    hSAGEing: An Improved SAGE-Based Software for Identification of Human Tissue-Specific or Common Tumor Markers and Suppressors

    Get PDF
    SAGE (serial analysis of gene expression) is a powerful method of analyzing gene expression for the entire transcriptome. There are currently many well-developed SAGE tools. However, the cross-comparison of different tissues is seldom addressed, thus limiting the identification of common- and tissue-specific tumor markers.To improve the SAGE mining methods, we propose a novel function for cross-tissue comparison of SAGE data by combining the mathematical set theory and logic with a unique “multi-pool method” that analyzes multiple pools of pair-wise case controls individually. When all the settings are in “inclusion”, the common SAGE tag sequences are mined. When one tissue type is in “inclusion” and the other types of tissues are not in “inclusion”, the selected tissue-specific SAGE tag sequences are generated. They are displayed in tags-per-million (TPM) and fold values, as well as visually displayed in four kinds of scales in a color gradient pattern. In the fold visualization display, the top scores of the SAGE tag sequences are provided, along with cluster plots. A user-defined matrix file is designed for cross-tissue comparison by selecting libraries from publically available databases or user-defined libraries

    Bioinformatics approach to mRNA markers discovery for detection of circulating tumor cells in patients with gastrointestinal cancer

    Get PDF
    [Abstract] Background: Detection of tumor cells in the blood, or minimal deposits in distant organs as bone marrow, could be important to identify cancer patients at high risk of relapse or disease progression. Quantitative polymerase chain reaction (PCR) amplification of tissue or tumor selective mRNA is the most powerful tool for the detection of this circulating or occult metastatic cells. Our study aims to identify novel gastrointestinal cancer-specific markers for circulating tumor cell detection. Method: Phase I preclinical study was performed by means of computational tools for expression analysis. In silico data were used to identify and prioritize molecular markers highly expressed in gastrointestinal cancers but absent in hematopoietic-derived libraries. Selected genes were evaluated by means of qRT-PCR in gastrointestinal cancer and hematopoietic cell-lines, normal human bone marrows and bloods, tumor tissue, and blood from cancer patients. Results: Novel and known mRNA markers for circulating tumor cell detection in gastrointestinal cancer have been identified. Among all the genes assessed, PKP3, AGR2, S100A16, S100A6, LGALS4, and CLDN3 were selected and assays based on blood qRT-PCR were developed. Reliably qRT-PCR assays for the novel targets plakophilin 3 (PKP3) and anterior gradient-2 (AGR2) to identify blood-borne cells in cancer patients were developed. Conclusions: Novel and known gastrointestinal-specific mRNA markers for circulating tumor cells have been identified through in silico analysis and validated in clinical material. qRT-PCR assay targeted to PKP3 and AGR2 mRNAs might be helpful to detect circulating tumor cells in patients with gastrointestinal cancer

    Using Serial Analysis of Gene Expression to Identify Tumor Markers and Antigens

    Get PDF

    Integrative Genomic Data Mining for Discovery of Potential Blood-Borne Biomarkers for Early Diagnosis of Cancer

    Get PDF
    Background: With the arrival of the postgenomic era, there is increasing interest in the discovery of biomarkers for the accurate diagnosis, prognosis, and early detection of cancer. Blood-borne cancer markers are favored by clinicians, because blood samples can be obtained and analyzed with relative ease. We have used a combined mining strategy based on an integrated cancer microarray platform, Oncomine, and the biomarker module of the Ingenuity Pathways Analysis (IPA) program to identify potential blood-based markers for six common human cancer types. Methodology/Principal Findings: In the Oncomine platform, the genes overexpressed in cancer tissues relative to their corresponding normal tissues were filtered by Gene Ontology keywords, with the extracellular environment stipulated and a corrected Q value (false discovery rate) cut-off implemented. The identified genes were imported to the IPA biomarker module to separate out those genes encoding putative secreted or cell-surface proteins as blood-borne (blood/serum/plasma) cancer markers. The filtered potential indicators were ranked and prioritized according to normalized absolute Student t values. The retrieval of numerous marker genes that are already clinically useful or under active investigation confirmed the effectiveness of our mining strategy. To identify the biomarkers that are unique for each cancer type, the upregulated marker genes that are in common between each two tumor types across the six human tumors were also analyzed by the IPA biomarker comparison function. Conclusion/Significance: The upregulated marker genes shared among the six cancer types may serve as a molecular tool to complement histopathologic examination, and the combination of the commonly upregulated and unique biomarkers may serve as differentiating markers for a specific cancer. This approach will be increasingly useful to discover diagnostic signatures as the mass of microarray data continues to grow in the ‘omics’ era

    Proteomic analysis and translational perspective of hepatocellular carcinoma: Identification of diagnostic protein biomarkers by an onco-proteogenomics approach

    Get PDF
    AbstractHepatocellular carcinoma (HCC) has been ranked as the third leading cause of cancer-related mortality worldwide. Typically, patients are already in advanced stages of liver cirrhosis at the time of HCC diagnosis. Because HCC is often detected at a late stage and is highly aggressive, noninvasive biomarkers are urgently needed for early diagnosis. Recent advances in gene-expression profiling technologies have enabled molecular classification of HCC into defined subclasses that provide a firm basis for further study of potential mechanisms and biomarkers underlying the development of HCC. This study applied an integrated onco-proteogenomics approach to identify and characterize HCC biomarkers. Specifically, this study integrated proteomic, genomic, and transcriptomic methods to obtain protein expression profiles of urine and tissue samples from HCC patients and from normal controls. Two mediators of inflammation were positively identified: S100A9 and granulin protein markers, which belong to the cytoplasmic alarmin family of the host innate immune system. These HCC-associated cancer-specific biomarkers may have contributing roles not only in the dysregulated processes associated with various inflammatory and autoimmune conditions, but also in tumorigenesis and cancer metastasis

    Ontology-Based Meta-Analysis of Global Collections of High-Throughput Public Data

    Get PDF
    The investigation of the interconnections between the molecular and genetic events that govern biological systems is essential if we are to understand the development of disease and design effective novel treatments. Microarray and next-generation sequencing technologies have the potential to provide this information. However, taking full advantage of these approaches requires that biological connections be made across large quantities of highly heterogeneous genomic datasets. Leveraging the increasingly huge quantities of genomic data in the public domain is fast becoming one of the key challenges in the research community today.We have developed a novel data mining framework that enables researchers to use this growing collection of public high-throughput data to investigate any set of genes or proteins. The connectivity between molecular states across thousands of heterogeneous datasets from microarrays and other genomic platforms is determined through a combination of rank-based enrichment statistics, meta-analyses, and biomedical ontologies. We address data quality concerns through dataset replication and meta-analysis and ensure that the majority of the findings are derived using multiple lines of evidence. As an example of our strategy and the utility of this framework, we apply our data mining approach to explore the biology of brown fat within the context of the thousands of publicly available gene expression datasets.Our work presents a practical strategy for organizing, mining, and correlating global collections of large-scale genomic data to explore normal and disease biology. Using a hypothesis-free approach, we demonstrate how a data-driven analysis across very large collections of genomic data can reveal novel discoveries and evidence to support existing hypothesis

    Identification of potential biomarkers in lung cancer as possible diagnostic agents using bioinformatics and molecular approaches

    Get PDF
    >Magister Scientiae - MScLung cancer remains the leading cause of cancer deaths worldwide, with the majority of cases attributed to non-small cell lung carcinomas. At the time of diagnosis, a large percentage of patients present with advanced stage of disease, ultimately resulting in a poor prognosis. The identification circulatory markers, overexpressed by the tumour tissue, could facilitate the discovery of an early, specific, non-invasive diagnostic tool as well as improving prognosis and treatment protocols. The aim was to analyse gene expression data from both microarray and RNA sequencing platforms, using bioinformatics and statistical analysis tools. Enrichment analysis sought to identify genes, which were differentially expressed (p 2) and had the potential to be secreted into the extracellular circulation, by using Gene Ontology terms of the Cellular Component. Results identified 1 657 statically significant genes between normal and early lung cancer tissue, with only 1 gene differentially expressed (DE) between the early and late stage disease. Following statistical analysis, 171 DE genes selected as potential early stage biomarkers. The overall sensitivity of RNAseq, in comparison to arrays enabled the identification of 57 potential serum markers. These genes of interest were all downregulated in the tumour tissue, and while they did not facilitate the discovery of an ideal diagnostic marker based on the set criteria in this study, their roles in disease initiation and progression require further analysis
    corecore