5,398 research outputs found

    Joint mapping of genes and conditions via multidimensional unfolding analysis

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Microarray compendia profile the expression of genes in a number of experimental conditions. Such data compendia are useful not only to group genes and conditions based on their similarity in overall expression over profiles but also to gain information on more subtle relations between genes and conditions. Getting a clear visual overview of all these patterns in a single easy-to-grasp representation is a useful preliminary analysis step: We propose to use for this purpose an advanced exploratory method, called multidimensional unfolding.</p> <p>Results</p> <p>We present a novel algorithm for multidimensional unfolding that overcomes both general problems and problems that are specific for the analysis of gene expression data sets. Applying the algorithm to two publicly available microarray compendia illustrates its power as a tool for exploratory data analysis: The unfolding analysis of a first data set resulted in a two-dimensional representation which clearly reveals temporal regulation patterns for the genes and a meaningful structure for the time points, while the analysis of a second data set showed the algorithm's ability to go beyond a mere identification of those genes that discriminate between different patient or tissue types.</p> <p>Conclusion</p> <p>Multidimensional unfolding offers a useful tool for preliminary explorations of microarray data: By relying on an easy-to-grasp low-dimensional geometric framework, relations among genes, among conditions and between genes and conditions are simultaneously represented in an accessible way which may reveal interesting patterns in the data. An additional advantage of the method is that it can be applied to the raw data without necessitating the choice of suitable genewise transformations of the data.</p

    Temporal patterns of gene expression via nonmetric multidimensional scaling analysis

    Full text link
    Motivation: Microarray experiments result in large scale data sets that require extensive mining and refining to extract useful information. We have been developing an efficient novel algorithm for nonmetric multidimensional scaling (nMDS) analysis for very large data sets as a maximally unsupervised data mining device. We wish to demonstrate its usefulness in the context of bioinformatics. In our motivation is also an aim to demonstrate that intrinsically nonlinear methods are generally advantageous in data mining. Results: The Pearson correlation distance measure is used to indicate the dissimilarity of the gene activities in transcriptional response of cell cycle-synchronized human fibroblasts to serum [Iyer et al., Science vol. 283, p83 (1999)]. These dissimilarity data have been analyzed with our nMDS algorithm to produce an almost circular arrangement of the genes. The temporal expression patterns of the genes rotate along this circular arrangement. If an appropriate preparation procedure may be applied to the original data set, linear methods such as the principal component analysis (PCA) could achieve reasonable results, but without data preprocessing linear methods such as PCA cannot achieve a useful picture. Furthermore, even with an appropriate data preprocessing, the outcomes of linear procedures are not as clearcut as those by nMDS without preprocessing.Comment: 11 pages, 6 figures + online only 2 color figures, submitted to Bioinformatic

    Random matrix analysis for gene interaction networks in cancer cells

    Get PDF
    Investigations of topological uniqueness of gene interaction networks in cancer cells are essential for understanding this disease. Based on the random matrix theory, we study the distribution of the nearest neighbor level spacings P(s)P(s) of interaction matrices for gene networks in human cancer cells. The interaction matrices are computed using the Cancer Network Galaxy (TCNG) database, which is a repository of gene interactions inferred by a Bayesian network model. 256 NCBI GEO entries regarding gene expressions in human cancer cells have been selected for the Bayesian network calculations in TCNG. We observe the Wigner distribution of P(s)P(s) when the gene networks are dense networks that have more than 38,000\sim 38,000 edges. In the opposite case, when the networks have smaller numbers of edges, the distribution P(s)P(s) becomes the Poisson distribution. We investigate relevance of P(s)P(s) both to the size of the networks and to edge frequencies that manifest reliance of the inferred gene interactions.Comment: 22 pages, 7 figure

    Comparative study of unsupervised dimension reduction techniques for the visualization of microarray gene expression data

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Visualization of DNA microarray data in two or three dimensional spaces is an important exploratory analysis step in order to detect quality issues or to generate new hypotheses. Principal Component Analysis (PCA) is a widely used linear method to define the mapping between the high-dimensional data and its low-dimensional representation. During the last decade, many new nonlinear methods for dimension reduction have been proposed, but it is still unclear how well these methods capture the underlying structure of microarray gene expression data. In this study, we assessed the performance of the PCA approach and of six nonlinear dimension reduction methods, namely Kernel PCA, Locally Linear Embedding, Isomap, Diffusion Maps, Laplacian Eigenmaps and Maximum Variance Unfolding, in terms of visualization of microarray data.</p> <p>Results</p> <p>A systematic benchmark, consisting of Support Vector Machine classification, cluster validation and noise evaluations was applied to ten microarray and several simulated datasets. Significant differences between PCA and most of the nonlinear methods were observed in two and three dimensional target spaces. With an increasing number of dimensions and an increasing number of differentially expressed genes, all methods showed similar performance. PCA and Diffusion Maps responded less sensitive to noise than the other nonlinear methods.</p> <p>Conclusions</p> <p>Locally Linear Embedding and Isomap showed a superior performance on all datasets. In very low-dimensional representations and with few differentially expressed genes, these two methods preserve more of the underlying structure of the data than PCA, and thus are favorable alternatives for the visualization of microarray data.</p

    ClgR regulation of chaperone and protease systems is essential for Mycobacterium tuberculosis parasitism of the macrophage

    Get PDF
    Chaperone and protease systems play essential roles in cellular homeostasis and have vital functions in controlling the abundance of specific cellular proteins involved in processes such as transcription, replication, metabolism and virulence. Bacteria have evolved accurate regulatory systems to control the expression and function of chaperones and potentially destructive proteases. Here, we have used a combination of transcriptomics, proteomics and targeted mutagenesis to reveal that the clp gene regulator (ClgR) of Mycobacterium tuberculosis activates the transcription of at least ten genes, including four that encode protease systems (ClpP1/C, ClpP2/C, PtrB and HtrA-like protease Rv1043c) and three that encode chaperones (Acr2, ClpB and the chaperonin Rv3269). Thus, M. tuberculosis ClgR controls a larger network of protein homeostatic and regulatory systems than ClgR in any other bacterium studied to date. We demonstrate that ClgR-regulated transcriptional activation of these systems is essential for M. tuberculosis to replicate in macrophages. Furthermore, we observe that this defect is manifest early in infection, as M. tuberculosis lacking ClgR is deficient in the ability to control phagosome pH 1 h post-phagocytosis

    An Overview of the Use of Neural Networks for Data Mining Tasks

    Get PDF
    In the recent years the area of data mining has experienced a considerable demand for technologies that extract knowledge from large and complex data sources. There is a substantial commercial interest as well as research investigations in the area that aim to develop new and improved approaches for extracting information, relationships, and patterns from datasets. Artificial Neural Networks (NN) are popular biologically inspired intelligent methodologies, whose classification, prediction and pattern recognition capabilities have been utilised successfully in many areas, including science, engineering, medicine, business, banking, telecommunication, and many other fields. This paper highlights from a data mining perspective the implementation of NN, using supervised and unsupervised learning, for pattern recognition, classification, prediction and cluster analysis, and focuses the discussion on their usage in bioinformatics and financial data analysis tasks
    corecore