16 research outputs found
A filter-based feature selection approach for identifying potential biomarkers for lung cancer
Background: Lung cancer is the leading cause of death from cancer in the world and its treatment is dependant on the type and stage of cancer detected in the patient. Molecular biomarkers that can characterize the cancer phenotype are thus a key tool in planning a therapeutic response. A common protocol for identifying such biomarkers is to employ genomic microarray analysis to find genes that show differential expression according to disease state or type. Data-mining techniques such as feature selection are often used to isolate, from among a large manifold of genes with differential expression, those specific genes whose differential expression patterns are of optimal value in phenotypic differentiation. One such technique, Biomarker Identifier (BMI), has been developed to identify features with the ability to distinguish between two data groups of interest, which is thus highly applicable for such studies.
Results: Microarray data with validated genes was used to evaluate the utility of BMI in identifying markers for lung cancer. This data set contains a set of 129 gene expression profiles from large-airway epithelial cells (60 samples from smokers with lung cancer and 69 from smokers without lung cancer) and 7 genes from this data have been confirmed to be differentially expressed by quantitative PCR. Using this data set, BMI was compared with various well-known feature selection methods and was found to be more successful than other methods in finding useful genes to classify cancerous samples. Also it is evident that genes selected by BMI (given the same number of genes and classification algorithms) showed better discriminative power than those from the original study. After pathway analysis on the selected genes by BMI, we have been able to correlate the selected genes with well-known cancer-related pathways.
Conclusions: Our results show that BMI can be used to analyze microarray data and to find useful genes for classifying samples. Pathway analysis suggests that BMI is successful in identifying biomarker-quality cancer-related genes from the data
GlycomicsDB - A Data Integration Platform for Glycans and their Strucutres
Glycomics is a discipline of biology that deals with the structure and function of glycans (or carbohydrates). Analytical techniques such as mass spectrometry (MS) and nuclear magnetic resonance (NMR) are having a significant impact on the field of glycomics. However, effective progress in glycomics research requires collaboration between laboratories to share experimental data, structural information of glycans, and simulation results. Herein we report the development of a web-based data management system that can incorporate large volumes of data from disparate sources and organize them into a uniform format for users to store and access. This system enables participating laboratories to set up a shared data repository which members of interdisciplinary teams can access. The system is able to manage and share raw MS data and structural information of glycans
LipidomeDB Data Calculation Environment: Online Processing of Direct-Infusion Mass Spectral Data for Lipid Profiles
The final publication is available at Springer via http://dx.doi.org/10.1007/s11745-011-3575-8.LipidomeDB Data Calculation Environment (DCE) is a web application to quantify complex lipids by processing data acquired after direct infusion of a lipid-containing biological extract, to which a cocktail of internal standards has been added, into an electrospray source of a triple quadrupole mass spectrometer. LipidomeDB DCE is located on the public Internet at http://lipidome.bcf.ku.edu:9000/Lipidomics. LipidomeDB DCE supports targeted analyses; analyte information can be entered, or pre-formulated lists of typical plant or animal polar lipid analytes can be selected. LipidomeDB DCE performs isotopic deconvolution and quantification in comparison to internal standard spectral peaks. Multiple precursor or neutral loss spectra from up to 35 samples may be processed simultaneously with data input as Excel files and output as tables viewable on the web and exportable in Excel. The pre-formulated compound lists and web access, used with direct-infusion mass spectrometry, provide a simple approach to lipidomic analysis, particularly for new users
Discovery and Validation of Barrett's Esophagus MicroRNA Transcriptome by Next Generation Sequencing
Objective: Barrett's esophagus (BE) is transition from squamous to columnar mucosa as a result of gastroesophageal reflux disease (GERD). The role of microRNA during this transition has not been systematically studied.
Design: For initial screening, total RNA from 5 GERD and 6 BE patients was size fractionated. RNA <70 nucleotides was subjected to SOLiD 3 library preparation and next generation sequencing (NGS). Bioinformatics analysis was performed using R package “DEseq”. A p value<0.05 adjusted for a false discovery rate of 5% was considered significant. NGS-identified miRNA were validated using qRT-PCR in an independent group of 40 GERD and 27 BE patients. MicroRNA expression of human BE tissues was also compared with three BE cell lines.
Results: NGS detected 19.6 million raw reads per sample. 53.1% of filtered reads mapped to miRBase version 18. NGS analysis followed by qRT-PCR validation found 10 differentially expressed miRNA; several are novel (-708-5p, -944, -224-5p and -3065-5p). Up- or down- regulation predicted by NGS was matched by qRT-PCR in every case. Human BE tissues and BE cell lines showed a high degree of concordance (70–80%) in miRNA expression. Prediction analysis identified targets that mapped to developmental signaling pathways such as TGFβ and Notch and inflammatory pathways such as toll-like receptor signaling and TGFβ. Cluster analysis found similarly regulated (up or down) miRNA to share common targets suggesting coordination between miRNA.
Conclusion: Using highly sensitive next-generation sequencing, we have performed a comprehensive genome wide analysis of microRNA in BE and GERD patients. Differentially expressed miRNA between BE and GERD have been further validated. Expression of miRNA between BE human tissues and BE cell lines are highly correlated. These miRNA should be studied in biological models to further understand BE development
Systematic data integration platform for functional glycomics
Click on the DIO link to access this article (may not be free)Glycomics is a discipline of biology that deals with the structure
and function of glycans (or carbohydrates). Analytical techniques
such as mass spectrometry (MS) and nuclear magnetic resonance
(NMR) are having a significant impact on the field of glycomics.
However, effective progress in glycomics research requires
collaboration between laboratories to share experimental data,
structural information of glycans, and simulation results. Herein
we report the development of a web-based data management
system that can incorporate large volumes of data from disparate
sources and organize them into a uniform format for users to store
and access. This system enables participating laboratories to set
up a shared data repository which members of interdisciplinary
teams can access. The system is able to manage and share raw MS
data and structural information of glycans. The web-based
interface will be available at http://www.glycomics.bcf.ku.edu/
(public release projected for xx/xx/2010)
A multi-tier data mining workflow to analyze the age related shift from diglycosylated- to tetra-glycosylated-FSH secretion by the anterior pituitary
Click on the DOI link to access this conference paper (may not be free)FSH is a glycoprotein hormone secreted as two major
glycosylation variants by the anterior pituitary, which regulates
reproduction in adults. As FSH consists of two functionally
significant glycoforms, differentially expressed genes related to
FSH biosynthesis in the anterior pituitary can help us to
understand implications of changes in their relative abundance
at the genomic level. Mapping these kinds of biomarker genes
and their corresponding pathways is a key technology for
studying the elaboration of FSH variants that affect the
reproductive system. In this paper we use a multiple tier data
mining work flow to identify FSH biosynthesis-related genes in
the anterior pituitary. Our methodology combines different filterbased feature selection mechanisms like Linear Regression (LR),
Z-Score statistics and the Biomarker Identifier (BMI).
Consequently, we identified differentially expressed genes in
response to the synthetic estrogen, diethylstilbestrol (DES),
treatment in male rats. As a next step, we performed pathway
analysis to identify the most relevant metabolic pathways
associated with a set of identified genes in a pathway. Finally,
we applied Mutual Information (MI) to calculate the measure of
association between differentially expressed genes and several
biosynthetic and signaling pathways of interest