16 research outputs found

    A filter-based feature selection approach for identifying potential biomarkers for lung cancer

    Get PDF
    Background: Lung cancer is the leading cause of death from cancer in the world and its treatment is dependant on the type and stage of cancer detected in the patient. Molecular biomarkers that can characterize the cancer phenotype are thus a key tool in planning a therapeutic response. A common protocol for identifying such biomarkers is to employ genomic microarray analysis to find genes that show differential expression according to disease state or type. Data-mining techniques such as feature selection are often used to isolate, from among a large manifold of genes with differential expression, those specific genes whose differential expression patterns are of optimal value in phenotypic differentiation. One such technique, Biomarker Identifier (BMI), has been developed to identify features with the ability to distinguish between two data groups of interest, which is thus highly applicable for such studies. Results: Microarray data with validated genes was used to evaluate the utility of BMI in identifying markers for lung cancer. This data set contains a set of 129 gene expression profiles from large-airway epithelial cells (60 samples from smokers with lung cancer and 69 from smokers without lung cancer) and 7 genes from this data have been confirmed to be differentially expressed by quantitative PCR. Using this data set, BMI was compared with various well-known feature selection methods and was found to be more successful than other methods in finding useful genes to classify cancerous samples. Also it is evident that genes selected by BMI (given the same number of genes and classification algorithms) showed better discriminative power than those from the original study. After pathway analysis on the selected genes by BMI, we have been able to correlate the selected genes with well-known cancer-related pathways. Conclusions: Our results show that BMI can be used to analyze microarray data and to find useful genes for classifying samples. Pathway analysis suggests that BMI is successful in identifying biomarker-quality cancer-related genes from the data

    GlycomicsDB - A Data Integration Platform for Glycans and their Strucutres

    Get PDF
    Glycomics is a discipline of biology that deals with the structure and function of glycans (or carbohydrates). Analytical techniques such as mass spectrometry (MS) and nuclear magnetic resonance (NMR) are having a significant impact on the field of glycomics. However, effective progress in glycomics research requires collaboration between laboratories to share experimental data, structural information of glycans, and simulation results. Herein we report the development of a web-based data management system that can incorporate large volumes of data from disparate sources and organize them into a uniform format for users to store and access. This system enables participating laboratories to set up a shared data repository which members of interdisciplinary teams can access. The system is able to manage and share raw MS data and structural information of glycans

    LipidomeDB Data Calculation Environment: Online Processing of Direct-Infusion Mass Spectral Data for Lipid Profiles

    Get PDF
    The final publication is available at Springer via http://dx.doi.org/10.1007/s11745-011-3575-8.LipidomeDB Data Calculation Environment (DCE) is a web application to quantify complex lipids by processing data acquired after direct infusion of a lipid-containing biological extract, to which a cocktail of internal standards has been added, into an electrospray source of a triple quadrupole mass spectrometer. LipidomeDB DCE is located on the public Internet at http://lipidome.bcf.ku.edu:9000/Lipidomics. LipidomeDB DCE supports targeted analyses; analyte information can be entered, or pre-formulated lists of typical plant or animal polar lipid analytes can be selected. LipidomeDB DCE performs isotopic deconvolution and quantification in comparison to internal standard spectral peaks. Multiple precursor or neutral loss spectra from up to 35 samples may be processed simultaneously with data input as Excel files and output as tables viewable on the web and exportable in Excel. The pre-formulated compound lists and web access, used with direct-infusion mass spectrometry, provide a simple approach to lipidomic analysis, particularly for new users

    Discovery and Validation of Barrett's Esophagus MicroRNA Transcriptome by Next Generation Sequencing

    Get PDF
    Objective: Barrett's esophagus (BE) is transition from squamous to columnar mucosa as a result of gastroesophageal reflux disease (GERD). The role of microRNA during this transition has not been systematically studied. Design: For initial screening, total RNA from 5 GERD and 6 BE patients was size fractionated. RNA <70 nucleotides was subjected to SOLiD 3 library preparation and next generation sequencing (NGS). Bioinformatics analysis was performed using R package “DEseq”. A p value<0.05 adjusted for a false discovery rate of 5% was considered significant. NGS-identified miRNA were validated using qRT-PCR in an independent group of 40 GERD and 27 BE patients. MicroRNA expression of human BE tissues was also compared with three BE cell lines. Results: NGS detected 19.6 million raw reads per sample. 53.1% of filtered reads mapped to miRBase version 18. NGS analysis followed by qRT-PCR validation found 10 differentially expressed miRNA; several are novel (-708-5p, -944, -224-5p and -3065-5p). Up- or down- regulation predicted by NGS was matched by qRT-PCR in every case. Human BE tissues and BE cell lines showed a high degree of concordance (70–80%) in miRNA expression. Prediction analysis identified targets that mapped to developmental signaling pathways such as TGFβ and Notch and inflammatory pathways such as toll-like receptor signaling and TGFβ. Cluster analysis found similarly regulated (up or down) miRNA to share common targets suggesting coordination between miRNA. Conclusion: Using highly sensitive next-generation sequencing, we have performed a comprehensive genome wide analysis of microRNA in BE and GERD patients. Differentially expressed miRNA between BE and GERD have been further validated. Expression of miRNA between BE human tissues and BE cell lines are highly correlated. These miRNA should be studied in biological models to further understand BE development

    Systematic data integration platform for functional glycomics

    No full text
    Click on the DIO link to access this article (may not be free)Glycomics is a discipline of biology that deals with the structure and function of glycans (or carbohydrates). Analytical techniques such as mass spectrometry (MS) and nuclear magnetic resonance (NMR) are having a significant impact on the field of glycomics. However, effective progress in glycomics research requires collaboration between laboratories to share experimental data, structural information of glycans, and simulation results. Herein we report the development of a web-based data management system that can incorporate large volumes of data from disparate sources and organize them into a uniform format for users to store and access. This system enables participating laboratories to set up a shared data repository which members of interdisciplinary teams can access. The system is able to manage and share raw MS data and structural information of glycans. The web-based interface will be available at http://www.glycomics.bcf.ku.edu/ (public release projected for xx/xx/2010)

    A multi-tier data mining workflow to analyze the age related shift from diglycosylated- to tetra-glycosylated-FSH secretion by the anterior pituitary

    No full text
    Click on the DOI link to access this conference paper (may not be free)FSH is a glycoprotein hormone secreted as two major glycosylation variants by the anterior pituitary, which regulates reproduction in adults. As FSH consists of two functionally significant glycoforms, differentially expressed genes related to FSH biosynthesis in the anterior pituitary can help us to understand implications of changes in their relative abundance at the genomic level. Mapping these kinds of biomarker genes and their corresponding pathways is a key technology for studying the elaboration of FSH variants that affect the reproductive system. In this paper we use a multiple tier data mining work flow to identify FSH biosynthesis-related genes in the anterior pituitary. Our methodology combines different filterbased feature selection mechanisms like Linear Regression (LR), Z-Score statistics and the Biomarker Identifier (BMI). Consequently, we identified differentially expressed genes in response to the synthetic estrogen, diethylstilbestrol (DES), treatment in male rats. As a next step, we performed pathway analysis to identify the most relevant metabolic pathways associated with a set of identified genes in a pathway. Finally, we applied Mutual Information (MI) to calculate the measure of association between differentially expressed genes and several biosynthetic and signaling pathways of interest
    corecore