20,389 research outputs found

    PLS dimension reduction for classification of microarray data

    Get PDF
    PLS dimension reduction is known to give good prediction accuracy in the context of classification with high-dimensional microarray data. In this paper, PLS is compared with some of the best state-of-the-art classification methods. In addition, a simple procedure to choose the number of components is suggested. The connection between PLS dimension reduction and gene selection is examined and a property of the first PLS component for binary classification is proven. PLS can also be used as a visualization tool for high-dimensional data in the classification framework. The whole study is based on 9 real microarray cancer data sets

    Information visualization for DNA microarray data analysis: A critical review

    Get PDF
    Graphical representation may provide effective means of making sense of the complexity and sheer volume of data produced by DNA microarray experiments that monitor the expression patterns of thousands of genes simultaneously. The ability to use ldquoabstractrdquo graphical representation to draw attention to areas of interest, and more in-depth visualizations to answer focused questions, would enable biologists to move from a large amount of data to particular records they are interested in, and therefore, gain deeper insights in understanding the microarray experiment results. This paper starts by providing some background knowledge of microarray experiments, and then, explains how graphical representation can be applied in general to this problem domain, followed by exploring the role of visualization in gene expression data analysis. Having set the problem scene, the paper then examines various multivariate data visualization techniques that have been applied to microarray data analysis. These techniques are critically reviewed so that the strengths and weaknesses of each technique can be tabulated. Finally, several key problem areas as well as possible solutions to them are discussed as being a source for future work

    Identification of an Efficient Gene Expression Panel for Glioblastoma Classification.

    Get PDF
    We present here a novel genetic algorithm-based random forest (GARF) modeling technique that enables a reduction in the complexity of large gene disease signatures to highly accurate, greatly simplified gene panels. When applied to 803 glioblastoma multiforme samples, this method allowed the 840-gene Verhaak et al. gene panel (the standard in the field) to be reduced to a 48-gene classifier, while retaining 90.91% classification accuracy, and outperforming the best available alternative methods. Additionally, using this approach we produced a 32-gene panel which allows for better consistency between RNA-seq and microarray-based classifications, improving cross-platform classification retention from 69.67% to 86.07%. A webpage producing these classifications is available at http://simplegbm.semel.ucla.edu

    Principal manifolds and graphs in practice: from molecular biology to dynamical systems

    Full text link
    We present several applications of non-linear data modeling, using principal manifolds and principal graphs constructed using the metaphor of elasticity (elastic principal graph approach). These approaches are generalizations of the Kohonen's self-organizing maps, a class of artificial neural networks. On several examples we show advantages of using non-linear objects for data approximation in comparison to the linear ones. We propose four numerical criteria for comparing linear and non-linear mappings of datasets into the spaces of lower dimension. The examples are taken from comparative political science, from analysis of high-throughput data in molecular biology, from analysis of dynamical systems.Comment: 12 pages, 9 figure

    Partial Least Squares: A Versatile Tool for the Analysis of High-Dimensional Genomic Data

    Get PDF
    Partial Least Squares (PLS) is a highly efficient statistical regression technique that is well suited for the analysis of high-dimensional genomic data. In this paper we review the theory and applications of PLS both under methodological and biological points of view. Focusing on microarray expression data we provide a systematic comparison of the PLS approaches currently employed, and discuss problems as different as tumor classification, identification of relevant genes, survival analysis and modeling of gene networks

    Simple and Effective Visual Models for Gene Expression Cancer Diagnostics

    Get PDF
    In the paper we show that diagnostic classes in cancer gene expression data sets, which most often include thousands of features (genes), may be effectively separated with simple two-dimensional plots such as scatterplot and radviz graph. The principal innovation proposed in the paper is a method called VizRank, which is able to score and identify the best among possibly millions of candidate projections for visualizations. Compared to recently much applied techniques in the field of cancer genomics that include neural networks, support vector machines and various ensemble-based approaches, VizRank is fast and finds visualization models that can be easily examined and interpreted by domain experts. Our experiments on a number of gene expression data sets show that VizRank was always able to find data visualizations with a small number of (two to seven) genes and excellent class separation. In addition to providing grounds for gene expression cancer diagnosis, VizRank and its visualizations also identify small sets of relevant genes, uncover interesting gene interactions and point to outliers and potential misclassifications in cancer data sets

    Integrative analysis of large-scale biological data sets

    Get PDF
    We present two novel web-applications for microarray and gene/protein set analysis, ArrayMining.net and TopoGSA. These bioinformatics tools use integrative analysis methods, including ensemble and consensus machine learning techniques, as well as modular combinations of different analysis types, to extract new biological insights from experimental transcriptomics and proteomics data. They enable researchers to combine related algorithms and datasets to increase the robustness and accuracy of statistical analyses and exploit synergies of different computational methods, ranging from statistical learning to optimization and topological network analysis

    Neuroblastoma patient outcomes, tumor differentiation, and ERK activation are correlated with expression levels of the ubiquitin ligase UBE4B.

    Get PDF
    BackgroundUBE4B is an E3/E4 ubiquitin ligase whose gene is located in chromosome 1p36.22. We analyzed the associations of UBE4B gene and protein expression with neuroblastoma patient outcomes and with tumor prognostic features and histology.MethodsWe evaluated the association of UBE4B gene expression with neuroblastoma patient outcomes using the R2 Platform. We screened neuroblastoma tumor samples for UBE4B protein expression using immunohistochemistry. FISH for UBE4B and 1p36 deletion was performed on tumor samples. We then evaluated UBE4B expression for associations with prognostic factors and with levels of phosphorylated ERK in neuroblastoma tumors and cell lines.ResultsLow UBE4B gene expression is associated with poor outcomes in patients with neuroblastoma and with worse outcomes in all patient subgroups. UBE4B protein expression was associated with neuroblastoma tumor differentiation, and decreased UBE4B protein levels were associated with high-risk features. UBE4B protein levels were also associated with levels of phosphorylated ERK.ConclusionsWe have demonstrated associations between UBE4B gene expression and neuroblastoma patient outcomes and prognostic features. Reduced UBE4B protein expression in neuroblastoma tumors was associated with high-risk features, a lack of differentiation, and with ERK activation. These results suggest UBE4B may contribute to the poor prognosis of neuroblastoma tumors with 1p36 deletions and that UBE4B expression may mediate neuroblastoma differentiation
    corecore